Wednesday, March 20, 2013

Extending ArchC cores with TLM memory

Continuing from where I left last post, I want to take a look at TLM, the ArchC TLM port, and the possibilities it brings.

About TLM

I should probably start by saying I didn't really get much out of the first things I read about TLM (might be either that I didn't really read it through, or that it is too abstract a concept to understand without seeing some examples of). TLM stands for Transaction Level Modelling, and Wikipedia defines it as "a high-level approach to modeling digital systems where details of communication among modules are separated from the details of the implementation of functional units or of the communication architecture". The definition is rather vague which is part of made me confused about this, but actually this loose definition is what makes it powerful and interesting.

Let's take an example to get a more solid grip on it: main memory connected to a processor, no caches, no complicated memory hierarchies. A Von Neumann type processor has a rather simple expectation from the main memory it's connected to: it wants to be able to write to the memory at a given address, and similarly read from the memory at a given address. If you wanted to create a model for a memory that satisfies these requirements, you could do it in many ways on many levels. You could, for example, write a piece of C code that fills in/reads out elements of an array with read/write function calls, or a slightly more complicated version with some kind of wait() function to emulate the delays occurring while accessing the memory. Or you could write a VHDL/Verilog description at the RTL (register transfer level). Or go down a level to build this memory transistor by transistor. Alternatively, the processor and the memory may be communication through some kind of bus instead of having a direct connection, and this connection itself can be subject to different levels of modelling.

All of these levels of modelling have their advantages/disadvantages, serving some kind of specific purpose. There is, however, one thing that does not change: the interface to the memory is essentially the same in all these levels of modelling, whereas the gory (or not so gory) details of the actual implementation change according to model. This is essentially what TLM dictates: define the interface, what kind of data is passed back and forth. It's quite alike the object oriented programming concept of interfaces in this respect.

There's of course more than TLM than this, the OSCI (Open SystemC Initiative) has standardized the approach and offer a set of well-defined building blocks for constructing TLM interfaces. It may sound scary, but especially TLM 1.0 is quite simple to understand: it defines interfaces for either uni- or bi-directional communication for any given data type (via C++ templating), either blocking or unblocking.

If that muddled things down instead of clearing them up, the next bit should help more: a real TLM example, from ArchC!

The ArchC TLM port

ArchC normally offers a pretty complete package for playing around with processors, but there's a great deal of flexibility possible when you want to start customizing parts of that package, and this is where TLM comes into play. Starting from ArchC 2.0 (or something close to that :)) a TLM interface is offered for connecting "memory" to ArchC-generated cores. This is how:


AC_ARCH(mips1)
{
  ac_tlm_port DM:16M; // declare a TLM port that can address 16M
  ac_regbank RB:32;
  ac_reg npc;
  ac_reg hi, lo;
  ac_tlm_intr_port inta;

  ac_wordsize 32;

  ARCH_CTOR(mips1) {
    ac_isa("mips1_isa.ac");
    set_endian("big");
  };
};

Easy! So once we have the TLM port, what do we do with it? What does the ArchC TLM interface look like? On the "internal" side (=the side which the processor core itself talks to) the functions inheritsed from ac_inout_if is used, just like regular memory. But it's the "external" side that concerns us: what does the TLM port do to "talk" to the outside world? It's derived from sc_port, which in turn is a port for the following interface:

typedef tlm_transport_if ac_tlm_transport_if;

Now that's a SystemC TLM interface: the template uses ac_tlm_req for sending out requests to the external world and ac_tlm_rsp for getting responses from the external world. Those types are:


/// ArchC TLM request packet.
struct ac_tlm_req {
  ac_tlm_req_type type; // READ, WRITE, LOCK, UNLOCK...
  int dev_id;
  uint32_t addr;
  uint32_t data;
};

/// ArchC TLM response packet.
struct ac_tlm_rsp {
  ac_tlm_rsp_status status; // ERROR, SUCCESS
  ac_tlm_req_type req_type; // same as in request
  uint32_t data;
};

And that's it! That's what ArchC uses when it wants to do something with a piece of external memory. The TLM base interface itself (tlm_transport_if) needs a function called transport that carries the request and returns the response:

ac_tlm_rsp transport(const ac_tlm_req & req);

Having seen these, it shouldn't be difficult to model a simple memory that can be connected to the ArchC TLM port:


class ArchCTLMMemory : public ac_tlm_transport_if {
public:
ArchCTLMMemory(uint32_t size_bytes) {memory = new char[size_bytes];};
~ArchCTLMMemory() {delete memory;};

ac_tlm_rsp transport(const ac_tlm_req & req)
        {
                ac_tlm_rsp response;
                response.status = SUCCESS;
                response.req_type = req.type;
                if(req.type == READ)
                        response.data = *((uint32_t *) memory[req.addr]);
                else if(req.type == WRITE)
                        *((uint32_t *) memory[req.addr]) = req.data;

                return response;
        };

protected:
uint8_t *memory;
};

While it's by no means a complete example, it should be enough to illustrate the simplicity of making an ArchC-interfaceable memory element. And the real power is of course the TLM interface dictates nothing about the memory implementation - you're free to include delays, assertions, stat counters, or routing this memory request to some other component (which is the case for the tile-based system I'm building).


Monday, March 11, 2013

Playing around with ArchC: TLM and multicores

As the final goal of my MSc thesis is to create a simulation framework for the SHMAC (Single ISA Heterogeneous Multicore Architecture Computer) I've been spending some time on creating ArchC/SystemC simulations for multicores and interconnects, and it feels like I'm finally getting to a point where I have a clearer picture of the concepts involved. Or so I hope :)

A bit of a background on SHMAC: it's a tile-based heterogeneous architecture which was first realized on an FPGA last year in the form of this MSc thesis by Leif Tore Rusten and Gunnar Inge Sortland at NTNU. The simulator I'm developing will (hopefully) eventually be used to evaluate interconnect and cache matters for the architecture.

It's mostly the "Processor Design with ArchC" chapter of the "Processor Description Languages" book that motivated me to write this - it gives a little warm-up on the features of ArchC including the TLM memory and interrupt controller port, and then goes on to how these concepts could be used to construct a two-core system.

// ...includes, includes...
int sc_main(int ac, char *av[])
{
    mips1 proc1("p1");
    mips1 proc2("p2");

    someBus bus("b");
    someMemory mem("m");

    proc1.memoryPort(bus);
    proc2.memoryPort(bus);
    bus.memoryPort(mem);

    // ...even more connections, omitted since they're related to the interrupt controller
    proc1.initAndLoad(someBinary);
    proc2.initAndLoad(someBinary);

    sc_start();

    // ..print start and exit
}


Despite how promising it looks, there unfortunately is no source code that follows with the book (at least that I'm aware of) - even though it is rather well described how the top-level components are connected in the code example in the book, how the memory, bus and interrupt controller are actually implemented is not mentioned. And I was unable to find any further code examples using ArchC for multi-core simulation. So I decided to take the plunge into ArchC's documentation and source code to understand how I could make it work, and well, I guess I can say the results are better than expected :)

More implementation details and juicy TLM stuff (not really, it's actually quite simple) coming up in the next post!