Monday, June 28, 2010

GSoC Weekly Report #6

Weekly Report #6
Submitted on 2010-06-28
Covers 2010-06-21 to 2010-06-28
Corresponding Draft Schedule Item:
Create the RPC framework that can call functions on the GPP side from the DSP and return values back to the DSP.  Implement and unit-test the POSIX function wrappers according to the planned order.

Status and Accomplishments
DSP->GPP RPC code is now integrated into the C6Run sources and works fine alongside with the regular C6RunApp C I/O calls. The RPC layer is compiled/linked together with the user-provided sources for now as putting them into the C6Run library caused some strange problems (see issues section).
the GPP side RPC stubs are now compiled into a dynamically linked library (.so) and the GPP side server uses dlfcn functions (dlopen, dlsym..) to locate the function with the given name and invoke it. the stubs still pull the parameters from the buffer on their own - assembly code for automating this is still in the works

Plans and Tasks
finalize the GPP-side assembly function caller so that GPP side stubs don't need to be provided at all
experiment with different combinations of parameters, including doubles/long integers and shared-mem allocated buffers to make sure everything is working correctly with the marshaller, DSP/Link transfers, unmarshaller and GPP-side server
finalize the DSP-side stubs for the POSIX functions which don't take/return any buffers as parameters

Risks, issues, blockers
One strange thing that occured: if I compile the RPC marshalling function together with the user sources (thus having the RPC layer in the user code, since readmsg/writemsg is accessible from there) given to c6runapp-cc everything works fine, but when I try to include the functions in the DSP-side C6Run library, something goes wrong. The function is a variadic function whose definition looks like this:

void rpc_marshal(char *function_name, char *function_signature, ...)

When I compile the DSP side libs with this function inside, the parameters after (and including) the second parameter are not passed correctly. I've observed that even the address of the formal parameter goes awry. To illustrate, let's say I put a printf call inside this function (thanks to C6Run CIO :)):

printf("Stack addresses: %x %x values: %s %s", &function_name, &function_signature, function_name, function_signature);

If I compile and use this function alongside with the user-provided sources to c6runapp-cc, everything works correctly, so I get output that looks like this (I've made up the stack addresses but they were similar to these):

printf("Stack addresses: 0x80000800 0x8000804 values: rpc_function iii@");

but if used from inside the library:

printf("Stack addresses: 0x80000800 0x80000854 values: rpc_function ");

Not sure why this is happening - linker settings? a problem with passing variadic parameters? (since if I remove the ... at the end the two parameters get passed correctly, but of course then the others don't). Since it works fine if provided alongside user sources I decided to stick with that for now.

Monday, June 21, 2010

GSoC Weekly Report #5

Weekly Report #5
Submitted on 2010-06-21
Covers 2010-06-14 to 2010-06-21
Corresponding Draft Schedule Item:
Create the RPC framework that can call functions on the GPP side from the DSP and return values back to the DSP.  Implement and unit-test the POSIX function wrappers according to the planned order.

Status and Accomplishments

  • we now have a marshaller which takes a function name, signature and a variable number of arguments and packs them neatly into a buffer to be transferred via DSP/Link (as well as the corresponding unmarshaller :))
  • all DSP-side stubs are reduced to three lines: rpc_marshal, rpc_invoke and return result, quite suitable for automation
  • looked into shared memory issues some more without any fruitful effort; for now we'll enforce the user to pass buffer parameters allocated from shared areas only, and develop a better method later



Plans and Tasks

  • unfortunately I managed to break C6RunApp's existing RTS I/O while trying to integrate the RPC server, I'll work on fixing this - it'll be nice to have both working, see issues section regarding possible shift of focus to general RPC instead of POSIX
  • the static jump table design used for resolving rpc calls on the GPP right now is awful, write an assembly function launcher similar to the one in C6RunLib


Risks, issues, blockers

  • shared memory issues continue, but for now we can overlook them since we can force the user to pass only buffers allocated from shared areas (POOL or CMEM). PROC address translation is there but we have to avoid segfaults / protection issues, this'll need a bit more time
  • as I worked more and more with C6RunApp and the existing RTS C library implementation, I've realized that the available functionality is quite sufficient for prototyping purposes. so instead of replacing this with RPC POSIX calls, we can have both methods and leave the choice to the user. it also may be appropriate to shift the focus of the project on RPC issues, providing a library of examples which can be built up to useful code, or focusing on other ease-of-development issues.

Friday, June 18, 2010

A brief overview of RPC over multiple cores

RPC (which stands for Remote Procedure Call) is defined as follows by Wikipedia:

Remote procedure call (RPC) is an Inter-process communication technology that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction (...)

Within our context of heterogenous OMAP3 multi-cores, it doesn't exactly refer to the same thing as it does in general, but still, there are enough familiarities. For example, the DSP and GPP use the same memory (ie, the DDR RAM on the Beagle) as main memory, but this doesn't make their address spaces the same - to start with, the DSP works with physical addresses whereas the GPP works with logical ones. And the binaries are obviously not compatible since we have two different processors and architectures (hence the "heterogenous" in "heterogenous multicore processing"). If we want the GPP and DSP to co-operate towards the same goal (especially in situations like video decoding which we'd really appreciate the DSPs help), it would be useful to get them to talk to each other without much effort. The DSP should be able to easily obtain the data it needs to work with (which usually resides in the GPP file system or program memory), process it and hand it back to the GPP.

While all of this is quite feasible as of the moment, doing something like this is far from trivial (set up OE, build DSP/Link, examine the examples, learn the DSP/Link API, set up the GPP side structures for bringing up the DSP, try to get the DSP things working without being able to see what you're doing unless you have CCS or are using C6RunApp ;) and so on...). And DSP side code is C code. TI's CGT6000 is a C compiler. You could essentially write the same things and compile/run on either side. Why all this hassle? Why not be able to pass parameters and call functions from one side to the other? Why not live in harmony and peace? Then RPC is the answer! (maybe except that last one).

Scrolling down the Wikipedia article, we see the steps involved in making RPC work; and it's rather trivial to see these from a DSP to GPP RPC point of view. Let's say we have this function int gpp_side_function(int param1, float param2, char *param3) somewhere in the GPP side, which we want to call from the DSP.

  1. The client (DSP in our case) calls the Client stub, which has the same definition. The call is a local procedure call, with parameters pushed on to the stack in the normal way. 
  2. The client stub packs the parameters into a message (done according to some predefined structure, e.g 2 bytes function name length, n bytes function name, then 2 bytes parameter count, then each of the parameters with their lengths, etc.) and making a system call to send the message. In our case, sending the message is putting a message on the DSP->GPP MSGQ. Packing the parameters is called marshaling
  3. The server (the GPP side in our case) receives the message, unpacks (unmarshals) it and locates the corresponding local function call stub.
  4. Finally, the server stub calls the desired procedure. The reply traces the same in other direction.
Similarly, one could to GPP->DSP RPC calls - a component of the C6Run project called C6RunLib is meant for this purpose.

There are certain implementation details that need special attention in our case:

Passing Pointers as RPC Parameters
Since the addressing mechanism of the GPP and DSP are different, we need to make sure that any pointer parameters / buffers which are passed are accessible from both sides. When doing GPP->DSP calls, one can use the CMEM module to obtain a shared region of memory and translate the addresses forth and back to ensure mutual accessibility, but there is no CMEM interface on the DSP side, thus one must resort to workarounds. If the size of the buffer parameters is always available, one can pass the contents of the buffers themselves back and forth via DSP/Link. One approach could be using the DSP/Link POOL module which allocates shared buffers and provides address translation, though this will not be suitable for large amounts of memory usage since there is a limited number of constant-sized POOL buffers. Another could be using a special protocol to access the CMEM interface on the GPP side and doing all the allocation from there, but both in this case and the POOL buffer case we have to ensure the passed pointer parameters are allocated with our special method and not from the DSP stack region, otherwise we'll be in trouble.

Openness to Expansion
Having a RPC framework that can only call 5 (or 50, or 100) predefined functions on the other side is not very useful, one wants to be able to define and call one's own functions. And to be able to do this, the framework must not have too many hardcoded details such as when to do the address translations mentioned above. To address this, I'm using a function signature system in my implementation to be able to identify parameter types and where special attention is needed. For example, for our above function 

int gpp_side_function(int param1, float param2, char *param3)

the corresponding signature would be

iif@

where the first symbol identifies the return type as an integer, and the other three identify parameter types. The final @ indicates that some manner of address translation (or passing the buffer back/forth, if that's the preferred approach) is needed: the marshaler may take care of the translation before passing the message, or the server stub may do the translation itself, depending on how the protocol is defined.

Also, I've designed the marshaler itself to be as user-friendly as possible - let's say you want to provide the DSP side function stub for the above function. All you have to do is pass the function name, function signature (which can be extracted via a simple lookup table) and all the parameters to the marshaller, prompt the data transfer, and return the result with the desired data type. So the stub for the above function would look like this:

int gpp_side_function(int param1, float param2, char *param3)
{
    rpc_marshal("gpp_side_function", "iif@", param1, param2, param3);
    rpc_makecall();
    return RPC_GETRESULT(int);
}

The process is quite suitable for automation, and I'm planning to have a script that auto-generates the stubs given the definitions.

Finding and invoking the corresponding GPP stub
It's easier said than done - so you have a string giving you the GPP-side function name and a bunch of parameters. How do you locate the corresponding function, and how do you actually make the function call? So far, I've been thinking of these approaches:
  1. Use a static jump table. An intuitive albeit not-so-elegant solution. Associate each function name with a number (perhaps via hashing?), and use a function pointer table to jump to the stub. The stub has to take care of the parameter demarshaling and passing the parameters. A static approach, so one has to add the table entry as well as the GPP-side stubs. 
  2. Use an assembly function caller. Once we are able to decode the function address (perhaps statically as in 1, perhaps via dynamic loading with libdl functions) we can have an assembly routine which pushes the parameters onto the stack and call the function living at the located address. This is what C6RunLib uses, and will be the eventually preferred method.

Monday, June 14, 2010

GSoC Weekly Report #4


Weekly Report #4
Submitted on 2010-06-14
Covers 2010-06-07 to 2010-06-14
Corresponding Draft Schedule Item:
Familiarize with DSP/BIOS Link and its various modules. Experiment with GPP-DSP communication and compilation processes to identify potential issues and useful features. Review existing RPC protocols and create one suitable for DSP-GPP communication over DSP/BIOS Link.

Status and Accomplishments
  • My exchange period in Sweden ended and I had to move back to Turkey, so couldn't really get as much work done as I'd like to, but now I have access my evil scientist laboratory headquarters :) I'll be based here for the rest of the summer.
  • I now have an RPC implementation for DSP->GPP calls, composed of four parts: DSP-side function stubs, DSP-side marshaller/unmarshaller and transport functions, GPP side marshaller/unmarshaller and transport functions, and GPP side stubs/invokers. No dynamic loading on the GPP side, the invocation is a simple static jump table and there's no support for pointer parameters but it works fine otherwise.
  • had some time to play around with kernel module loading. didn't really discover anything groundbreaking, but noticed that C6Run doesn't work with the *.ko's that came with my Ångström. probably version differences which won't be an issue if one builds C6Run libraries and the distro itself using OpenEmbedded, but in the other case we could have a config file on the Beagle which points to the appropriate module filenames, and load these ones at startup instead.
Plans and Tasks
  • right now the stubs are doing most of the work for marshalling, this will make it difficult to add new RPC functions in the future. have a marshaller that can pack variable number of parameters into the data, I've already have had some success with this. I'm planning to have a string "function signature" for each stub, and feed this along with all the parameters to the marshaller. bad idea? should each stub keep doing its own packing to increase efficiency?
  • passing pointer parameters is still a big issue due to memory sharing, we have to make sure that any direct memory pointers are mutually accessible after address translation, look into this.
  • finalize and commit my initial group of DSP-side wrappers with integer parameters and return types (mostly is*() functions from ctype.h)
Risks, issues, blockers
  • Memory issues still here: C6RunLib can utilize the CMEM interface to get DSP-GPP shareable contiguous memory regions, but there is no CMEM on the DSP side - what'll be the cure for this? force allocating everything from POOL buffers if they are to be passed as RPC parameters? set the DSP linker parameters to allocate the heap from a pre-determined shared region and do the address translation manually? copy and back-copy the DSP-side buffers manually into/out of the message buffers during marshalling and unmarshalling (but how do we know the size of a void* buffer, for example?)? 

Wednesday, June 9, 2010

C6RunApp and DSP-RPC-POSIX

Now we've looked into how C6RunApp works in general, it's time to bring DSP-RPC-POSIX into the picture. Although the project builds largely on top of C6RunApp in general, there are two keywords in the name which hint into a different direction:

  • RPC: the project aims to bring a fully functional RPC (Remote Procedure Call) framework into the picture in the sense that GPP-side procedures are remote procedures when viewed from the DSP side. In other words, the DSP will be able to call any GPP-side function using this component. Note that there already exists a limited form of RPC in the existing C6RunApp structure, but only for the file system access calls.
  • POSIX: using the RPC component, the GPP-side POSIX library will be made accessible from the DSP side.
At this point, the question "but the RTS lib already provides POSIX functionality! why's a RPC POSIX layer necessary?" may come up. These were my primary reasons for going in this direction while writing my project proposal:
  • being able to offer the already stable GPP-side POSIX libraries with no extra effort required
  • have greater flexibility for expansion since it can eventually be used to call any GPP-side function, such as writing to the frame buffer or user-defined functions.
  • although the DSP is an indeed powerful processor it is not meant for general computing, so it's not practical for, for example, string processing while formatting printf strings - this is a task better done by the GPP
Right now, I'm still experimenting with RPC-related tasks such as packing ("marshalling") and extracting ("unmarshalling") a variable number of arguments into/out of messages and dynamic loading, but the first wrappers should be appearing quite soon :)

C6RunApp

In my blog post entitled (rather appropriately, in my opinion :)) Moment of Truth #1, I had briefly mentioned what C6RunApp allows you to do - you write C code in the conventional way you would for the ARM (including things like debug outputs with printf or data input with scanf), and then use the C6RunApp script to get an ARM executable which actually runs everything you wrote on the DSP (except things that require access to the ARM side itself, but we'll get to that in a moment).

First, let's make a mention of the DSP RTS library as it's a relevant component on which C6RunApp itself builds on. The TI CGT (which stands for Code Generation Tools and is essentially the DSP side compiler, linker and other binary utilities) C6000 contains a library of standard C functions which is aptly called the Run Time Support (RTS) library. As you would expect from such, it contains implementations of regularly used C lib functions such as printf and scanf. But of course, the problem is not writing C lib function implementations on the DSP (which is quite a capable processor) - it's things that actually require GPP side capabilities such as file system access (which in turn can also provide console input and output). The implementations of these functions in the RTS use a number of base lower-level functions (such as open, close, read and write) to carry out the needed GPP side communication, which can be user-defined. For example, if one is using the CCS (Code Composer Studio) for the development process, CCS has the driver which provides the communication between the host which is running the development environment and the DSP, so that you can see the terminal output and access local files.

Of course, this is not a very convenient scenario as we are dependent on the CCS for file system access. Why not just the GPP side OS (that is, the Ångström distribution) instead? This is one of the underlying ideas for C6RunApp: we have a host application on the GPP side which recieves requests over DSP/Link, performs the necessary file system calls, and passes back the results over to the DSP again. Another idea is that the GPP host app takes care of setting up the DSP/Link and loads the DSP with the DSP-side executable without any effort from the user. Combining these two ideas that provide us with "verbose" DSP side programs and abstract away the details of DSP/Link, we get easier DSP side development - we get C6RunApp!

C6RunApp Workflow


So what happens when you want to compile hello_world.c using the C6RunApp cross-compiler script? Let's have a look here first, and then we'll examine the involved libraries in some more detail. This is what the C6RunApp readme file has to say on the subject (with some small clarifications from me):



  1. The DSP tools are used to build the supplied source and link it against the prebuilt C6RunApp DSP-side library.  The result is the complete DSP-side executable image in standard TI COFF format.
  2. The DSP executable image is minimized in size by using the symbol stripping tool, strip6x.
  3. The contents of the stripped DSP executable file are converted to a C byte array in a temporary C header file.  This header file is referenced by the main GPP-side C6RunApp loader, and thus the DSP executable image will be embedded into the final resulting GPP executable.
  4. The main C6RunApp loader program is built using the ARM cross compiler tools, including the DSP-side executable inside of the binary ARM ELF executable.  This ARM executable is the same name as specified on the command line of the C6RunApp cross-compiler script.
  5. Once the GPP executable is ran, it sets up the DSP/Link, loads the DSP with the in-built DSP executable and initializes needed communication channels (namely, constructing the GPP->DSP message queue and locating the DSP->GPP message queue).
  6. The GPP executable waits for the DSP to send it file system call requests, performs the requested ones and sends back the results, until it receives a signal that it can terminate.
  7. Teardown is performed on the DSP/Link setup and the DSP is cleanly shut down.



C6RunApp Components

Let's have a look at the pieces of which C6RunApp consists:

  1. The DSP side library, C6RunApp_dsp.lib - the library which the user-provided code is linked against, contains entry and exit points for the DSP/BIOS, initializes the communication channels and starts running the user-defined main(). It provides implementations of writemsg and readmsg which the DSP RTS lib bases the low-level communications on. These implementations pass the requested function call to the GPP via the message queue and read back the result in the same manner.
  2. The GPP side library, C6RunApp_gpp.lib - the library that contains the functions which serve the DSP's C I/O requests. 
  3. The GPP main object, C6RunApp_load.o - once the DSP side executable is created and turned into a header file, the binary object C6RunApp_load.o is linked against the GPP side library to create the final GPP executable
  4. The kernel modules - not really components so much as dependencies. C6RunApp utilizes  CMEM for the initial loading of the DSP executable, the LPM to do a clean shutdown of the DSP as is needed on OMAP3530s, and the DSP/Link module for some obscure purpose :)

Hopefully this will have provided some insight into how the magic of C6RunApp works - coming up next (but sooner this time!) is where my GSoC project DSP-RPC-POSIX fits in with all of this.

Monday, June 7, 2010

GSoC Weekly Report #3

Weekly Report #3
Submitted on 2010-06-07
Covers 2010-06-01 to 2010-06-07
Corresponding Draft Schedule Item:
Familiarize with DSP/BIOS Link and its various modules. Experiment with GPP-DSP communication and compilation processes to identify potential issues and useful features. Review existing RPC protocols and create one suitable for DSP-GPP communication over DSP/BIOS Link.


Status and Accomplishments
  • more experimentation with DSP/Link APIs and general reading especially for the DSP side arch — I feel confident and comfortable enough to seriously start working
  • studied the C6RunLib source code for inspiration on RPC calls and got some — but I still have doubts on implementing the exact same system, see issues section below
  • experimented with some GPP-side tasks which can be of use for the RPC framework such as passing a variable number of parameters to functions and methods for dynamic and static linking to dynamic link libraries
  • did some basic RPC tests using the existing C6RunApp writemsg/readmsg architecture, all went well (nothing fancy — was mainly for the purpose of testing my understanding of how things work)
  • looked for existing statistics and similar documentation on which POSIX functions are the most commonly used without any luck...decided to follow a C standard library reference documentation and start with the simpler functions, see plans and tasks section below
Plans and Tasks
  • finalize and make a working draft of the RPC system — possibly without dynamic loading on the GPP side at this stage, which can be added later on
  • finalize blog post on C6RunApp architecture and how DSP-RPC-POSIX fits in — hoping that increasing awareness will stir more interest
  • write the first wrapper(s) for the DSP side, starting with noncomplex (in terms of having a constant number of basic non-pointer parameters) functions (e.g putchar())
  • experiment a little with possibilities of user-friendly features such as checking and auto loading of kernel modules at program startup and catching Ctrl-C signals to perform DSP side cleanup before exiting
Risks, issues, blockers
  • I want the RPC system to eventually work with any user-defined GPP side function (not just C I/O) without much hassle, and I believe dynamic library loading will be necessary for this (ie, given the function name, parameter type list and library name, be able to locate the function and call it on the GPP side). I realize this can be done by preprocessing the source code and adding the function defs before compiling but I'm curious if it can be done at runtime as well. C6RunLib uses this preprocessing method (via perl scripts) but in that case we provide and therefore have access to source code of all the possible RPC fxns anyway
  • Memory issues continued: Is it a really good idea to do RPC GPP calls for functions that take pointers as parameters and modify the pointer data, like malloc() and memset() ? They're already present and working in the DSP RTS libs, so is it necessary? If so, we could do address translation before calling the actual GPP side function, can we ensure that the memory is mutually accessible by the processors by doing all allocation from POOL buffers? The issue eventually extends to any function that takes a pointer parameter.. I'd appreciate some broader-perspective views on this subject.

Wednesday, June 2, 2010

A bit more about DSP/Link

My original plan for this particular blog post was to do a write-up on what the OMAP3530 multicore architecture looks like, where the DSP sits in this picture, how DSP/Link comes into play and what it can do for us. But then again, I noticed the document which I learned most of these thing from is an excellent source of information and doesn't really need a re-write, so I'm just going to offer my salutations to the ETH PIXHAWK MAV project and invite you to read their excellent DSP/Link API guide :)

Coming up soon: how C6RunApp works, and how my project DSP-RPC-POSIX will fit in it to take things a bit further.