maltanar's scribbles: 2010

Monday, August 16, 2010

Finaly Weekly Report, no #13

So here comes the final weekly report... I just want to say once again what a wonderful experience it has been to work with the BeagleBoard for GSoC! As I stated in my previous week's report, I do believe I've managed to complete most of what I (and my mentors) had in mind for the end of the GSoC period. There's (as always!) more to do, of course... and I intend to continue development for C6Run in specific and BeagleBoard in general (though with the start of the academic year, it's unlikely that I'll be able to devote as much time), though right now I could really use a little break, and will be taking one :)

Weekly Report #13
Submitted on 2010-08-16
Covers 2010-08-09 to 2010-08-16

Status and Accomplishments
• documentation moved to its own eLinux wiki page (at http://elinux.org/BeagleBoard/GSoC/2010_Projects/C6Run/Documentation), divided into two sections (Usage, Architecture) and expanded
• build system modifications: support auto-copying of the generated stub files by the stub generator utility, make a folder for user sources
• code cleanup: move GPP&DSP common definitions to a single header file, work on GPP-side pointer alignment problems, partition GPP-side code into modules

Wednesday, August 11, 2010

Documentation's moved

Just writing to let my readers know that the C6Run documentation has moved to the elinux wiki and will be residing there from now on:

http://elinux.org/BeagleBoard/GSoC/2010_Projects/C6Run/Documentation

I've been working on expanding and improving it yesterday, and there's more to come.

Monday, August 9, 2010

Weekly Report #12

Weekly Report #12
Submitted on 2010-08-09
Covers 2010-08-02 to 2010-08-09

Reflections
As we're past the soft deadline (aka "the suggested pencils down date" in GSoC speak) I wanted to write a bit about how DSP-RPC-POSIX has progressed so far. I'd say that the project is now in a state that can do most of the things that me (and my mentors) had imagined it would be doing by now. C6Run's own goals of easy DSP app prototyping with console/file system access are pretty much accomplished, as is being easy to get started with (although there are debatable issues surrounding this particular goal, see below). DSP-RPX-POSIX's twofold goals, which were basically being able to call GPP side functions from the DSP, and providing a prebuilt set of POSIX (actually, "standard C Library" would be a much appropriate term) remote accessor functions, are also met. Right now, a sample DSP development cycle for a beginner developer utilizing the aid of C6Run could look like:

• obtaining the sources from SVN
• setting up (which includes a script that downloads and sets up the C6Run dependencies) and building
• using C6RunApp and DSP-RPC-POSIX to construct, test and debug the wanted DSP-side algorithm
• using C6RunLib to wrap up the finalized DSP-side algorithm with an ARM library and use this library to build DSP-aided GPP apps

The specific capabilities and properties of DSP-RPC-POSIX include:

• ability to call most C standard library functions on the GPP side without any extra effort (stubs are already generated)
• ability to call any GPP-side function, with support for all basic parameter types and buffers and obtain the return value, provided that stubs for the target function exist
• a stub generation tool to easily generate stubs for GPP-side functions (given a number of C source files, it will generate the stubs necessary for RPC access to all the functions contained)
• function signature system allows addition of new data types for parameters/return types with special treatment
• architecturally, the RPC layer is mostly seperate from C6Run's other functionality so it could be taken apart and used in other projects

There are, of course, certain shortcomings:

• C6Run still does not build as part of OE. the build system has been modified to skip rebuilding existing dependencies, but there are still problems building for the Beagle. once these are fixed it won't take long to have a OE recipe in place. to my knowledge, there are folks at TI which are already working on this.
• pointers/buffers issues: double/triple/etc. pointers, pointers inside structs, pointers hidden inside buffers (ie, any pointer that is not explicitly declared as a parameter) can't be handled automatically and need to be manually address-translated by the user.
• structs can't be passed as parameters since a different signature character would be needed for each, they need to be passed as pointers to structs instead
• pointers/buffers to be passed via RPC need to be either less than a fixed number of bytes, or be allocated using rpc_malloc instead of from the DSP heap or stack

Status and Accomplishments
• DSP-side caches are re-enabled for all platforms, stubs that use pointer/buffer parameters call the needed writeback/invalidate functions when needed to keep cache coherence
• ARM-side cache coherence code and cache coherency testing example in place
• stubs for string.h functions and POSIX low level I/O functions

Plans and Tasks
• what Google suggests after the soft pencils down date - code scrubbing (mainly fixing possible memory alignment issues, moving commonly used things to GPP/DSP shared header files etc.), improving documentation and writing more tests/examples

Risks, issues, blockers
• for some reason (probably due to the absence of a BCACHE_wbAll function on DSP-side?) enabling ARM caches still results in cache coherency problems.
• the C6Run trunk still produces problematic executables for the BeagleBoard, so a merge with the dsp-rpc-posix branch doesn't make sense at this point, which means OE integration won't make it to the GSoC deadline

Wednesday, August 4, 2010

Cache Coherency

The presence and usage of ARM and DSP side caches poses a cache coherency problem when we want to access a shared area by both processors. Let's consider the following scenario:

1. A CMEM buffer is allocated for shared usage by the GPP side and its physical pointer is passed to the DSP.
2. The DSP wants to read and then write some data into this buffer. Let's say that there are free entry slots in the DSP L2 cache - so the data actually gets written to the DSP cache, instead of making it to the DDR shared region. The DSP then signals the GPP that it's done with the buffer for the time being.
3. The GPP attempts to read the buffer, but what it reads is just the garbage values present in the buffer after initialization since the DSP-written data is in the DSP cache. This buffer also gets cached in the ARM-side now, so when the GPP tries to write some new data into it, it stays into the ARM cache and doesn't make it to the main memory either.
4. If the DSP tries to read the cache now, it won't get what the ARM has written into it most recently since it'll be reading from its own cache, and vice-versa for the ARM side.

We can see that the "same" buffer actually exists in three different locations (main memory, DSP cache, ARM cache), all of which can contain totally different data - in this case it is said that they are not coherent, and that we have a cache coherency problem.

In most cached systems, there are cache coherency protocols which prevent these situations from occuring. The TMS320C64x+ DSP Cache User's Guide states:

In the following cases, it is your responsibility to maintain cache coherence:
â€¢ DMA or other external entity writes data or code to external memory that is then read by the CPU
â€¢ CPU writes data to external memory that is then read by DMA or another external entity

thus we have to manually maintain cache coherence for mutual access to CMEM regions by the DSP and the GPP. Studying the scenario above, we can observe that there are two underlying problems:

1. If the memory block to be read already exists in the local cache, there's a risk that the local cache is outdated: we need to discard the local cache entries and fill them up with information from the main memory. This process is called cache invalidation.
2. When the memory block is to be written into, there's a risk that the info remains in the local cache and doesn't make it to the main memory: we have to make sure that the new info gets written to the main memory as well. This process is called cache writeback.

Therefore, from a RPC perspective, for a call that involves transferring buffers, the steps we have to take are as follows:

1. Before passing the marshalled info via DSP/Link, the DSP must do a cache writeback
2. Before passing the params to the GPP side stub, the GPP must do a cache invalidate
3. After the GPP side stub is finished, the GPP must do a cache writeback
4. The DSP side stub must do a cache invalidate before terminating

This is assuming that both processor caches will be active - in case the DSP cache is disabled, the steps 1 and 4 will not be necessary, and likewise with steps 2 and 3 for a disabled GPP cache.

Monday, August 2, 2010

Weekly Report #11

Weekly Report #11
Submitted on 2010-08-02
Covers 2010-07-26 to 2010-08-02

Status and Accomplishments
• an elementary version of the mechanism to be able to pass DSP-allocated (stack or heap) buffers to RPC functions is now in place. what essentially happens is that when the GPP side detects a non-CMEM-allocated buffer is passed as a parameter, it uses PROC_read to read from the DSP memory and copy this into a regular GPP buffer which the functions can access.
• bugfixes for RPC functions returning structures - but pointer parameters inside structs still remain an issue and have to be translated manually
• dsp-rpc-posix branch re-synced with c6run trunk, but to no avail (see blockers section)

Plans and Tasks
• the ARM and DSP caches are both disabled for dsp-rpc-posix to deal with cache coherency, but this impacts the memory access performance quite negatively - enable them again and use writeback/invalidate functions to keep caches in sync
• some POSIX functions are still missing from the RPC library since their corresponding header files (with type definitions and all) don't exist for the DSP side. this includes useful things like ioctls - find a way to expose these through RPC

Risks, issues, blockers
• despite all the time I spent on it, I couldn't uncover the cause of the "bus error" that occurs when I compile even the simplest things (such as hello world) with the latest C6Run trunk (so it's not just my synced branch that's troublesome). in some cases it's just "bus error" and the program stopping, in others the PROC_setup fails. the code responsible for these parts hasn't really changed so I'm suspecting it to be a build/configuration issue. due to this and the fact that my primary computer's hard drive crashed (I'm stuck with a silly little netbook!) I haven't been able to try and build C6Run inside OE. I'm not sure if this is just happening for me or the c6run trunk is broken (there are indeed some errors that prevent it from building but they are small and easy to fix)

Monday, July 26, 2010

Weekly Report #10

Weekly Report #10
Submitted on 2010-07-26
Covers 2010-07-19 to 2010-07-26

Status and Accomplishments
• the stub generation script (c6runapp-rpcgen) is now in place; provided with a number of C source files it can generate the corresponding GPP and DSP stubs for the functions defined in the given files, thus exposing them via RPC
• the documentation is expanded to include architectural details and will be maintained in the project wiki pages here
• more RPC examples to cover newly added things like stdio.h variadics and functions returning structures
• synchronized branch with the latest C6Run trunk, whose modified build system can create the C6Run libs without having to re-build the dependencies every time - but there are problems

Plans and Tasks
• investigate why the trunk-synced branch errors out on the produced executables ("Bus error", including on the non-RPC examples) and fix this
• experiment with building a bitbake recipe for C6Run to get it built inside OE
• for predefined RPC stubs whose pointer parameters are only "in" (ie, the RPC target won't modify the contents of the memory pointed) implement a mechanism that will allow these parameters to be alloc'd from the DSP stack or heap - having to allocate every little string via rpc_malloc is annoying

Risks, issues, blockers
• it's not clear why the bus errors occur with the latest trunk-synced version but hopefully it'll be something that I overlooked while merging the changes from the trunk, or possibly a problem with the version of the trunk I used

Monday, July 19, 2010

Forwarding invocation of variadic function in C

I had been brooding over how to do the RPC calls for variadic functions for some time now. Although marshalling any given variadic isn't really possible due to a lack of general method for obtaining argument count and sizes (see my weekly report #9, issues section), for commonly used stdio.h variadics such as printf and scanf, the arguments are well-defined by the format string so it should be possible to manually marshal these.

I'll be writing a seperate blog post about how I went about doing that, but for now I want to talk about a sub-problem of this: given a number of arguments, how do you forward these to a variadic function? A re-statement could be, how to "dynamically" invoke a variadic function?

Looking this up in the net, I've found these two discussions to be the most relevant:

http://stackoverflow.com/questions/150543/forward-an-invocation-of-a-variadic-function-in-c

http://stackoverflow.com/questions/1721655/passing-parameters-dynamically-to-variadic-functions

The second link mentions a library called FFCALL which can be used to pass parameters to variadics dynamically, and this probably is the ideal way of doing things.

I may have found another method for this - so far as I've seen it works on x86 and ARM. It's based on the assumption that the last mandatory parameter and all the variadic parameters reside continuously in the memory, as well as lots of terrible coding practices, but it should be illustrative enough.

What I'm doing is basically copying a fixed number of bytes from the memory region (=stack) where the variadic parameters are located into a buffer, then passing this buffer to another variadic function which is called with a fixed number of arguments (I picked doubles since they are larger and will allocate more space on the called function's stack). The function then calls memcpy to overwrite its variadic args section with the passed buffer, and afterwards can call the stdarg macros to obtain the variadic arguments, or pass them to something like vprintf.

Here's the code:

#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>

void accepter(char *fmt, char *ptr, ...);

void forwarder(char *fmt, ...);

void forwarder(char *fmt, ...)

{

double d = 9.1;

char *buf = (char*) malloc(10*sizeof(double));

memcpy(buf, (void*)((unsigned int)(&fmt)+sizeof(char*)), 80);

FILE *o = fopen("tmp", "wb");

fwrite(buf, 10, sizeof(double), o);

fclose(o);

// call function with 10 double arguments to open up stack space

accepter(fmt, buf, d, d, d, d, d, d, d, d, d, d);

free(buf);

}

void accepter(char *fmt, char *ptr, ...)

{

memcpy((void*)((unsigned int)(&ptr)+sizeof(char*)), ptr, 10*sizeof(double));

va_list ap;

va_start(ap, ptr);

vprintf(fmt, ap);

va_end(ap);

}

int main()

{

printf("Testing...\n");

double d = 65.98;

forwarder("%d %d %d ermm %s and more params! %x %f %x %x \n", 1, 2, 3, "hello world", 199, d , 0xdeadbeef, 0xbeefdead);

return 0;

}

Weekly Report #9

Weekly Report #9
Submitted on 2010-07-19
Covers 2010-07-12 to 2010-07-19

Status and Accomplishments
• support for returning structures by the addition of a special return type signature ('#')
• address translations are now exposed through RPC, so the user is able to perform translations manually
• build system modifications - DSP and GPP-side stubs are now collected from inside two directories instead of two single files
• dsp-rpc-posix branch updated to contain the latest changes in the C6Run trunk
• manual marshalling for stdio.h variadics (printf, scanf, fprintf, sprintf...)

Plans and Tasks
• implement scripted stub generation (ie, the user will provide function declarations and the corresponding GPP and DSP-side stubs will be generated by a script)
• complete architectural documentation
• expand library of RPC examples to illustrate use-cases with different function signatures

Risks, issues, blockers
• since there is no general way of extracting the number and size of parameters in variadic functions (eg, one function may only specify the number of args as the first fixed parameter and accept only integer args, while another may take a zero-argument as a terminator of a number of float args, and yet another may use printf-style encoding), we can't implement a generic marshaller for variadics. for this reason, support for user-defined variadics is left out for now, although users writing their own variadic functions isn't very common and this is not expected to become an issue.

Monday, July 12, 2010

Weekly Report #8

Weekly Report #8
Submitted on 2010-07-12
Covers 2010-07-05 to 2010-07-12

Status and Accomplishments
• lots of work went to solving the long-standing buffer/pointer parameters and return types issue, and we finally have a reasonably well-working system in place. all address translation / memory space mapping is now done automatically, provided that any pointers to be passed are pointing to memory allocated using the RPC allocator (CMEM based).
• three types of pointer return types are identified and supported: no-translation pointers (such as FILE* which aren't meant to be dereferenced at all), direct-translation pointers (such as strspn whose return value points to somewhere within the passed parameter) and manual-copy pointers (such as ctime when the function allocates memory with some other method and passes that pointer, size information needs to be specified in the GPP stub)
• most C I/O stubs are now complete (the ones remaining are variadics and infeasible things like threads)

Plans and Tasks
• find a way for marshalling variadic function calls (will probably involve manual work) and complete the remaining stubs
• test completed stubs with existing code
• offer more flexibility for buffer/pointer parameters by providing the ability to do address translations manually and maybe detecting non-shared buffers in the DSP-side stubs then syncing them automatically with some shared ones
• support multiple stub input files instead of placing them only in rpc_stubs_gpp.c and rpc_stubs_dsp.c

Risks, issues, blockers
• variadic functions such as printf and scanf are problematic for marshalling - since there is no well-defined way of knowing how many parameters of which size there is, it's likely that the user will have to provide the parameter packing for these manually
• even when the stub generator tool is in place, it's likely that the user will have to provide some manual guidance for certain cases in stub generation such as identifying buffer/pointer parameters which don't need address translation, return types which need manual copying into a shared buffer and variadic functions

Monday, July 5, 2010

So what happened to the variadic marshaller?

As you may recall, I had a variadic marshaller I had been using for the RPC layer for some time now, which recently had to be dropped since it was causing trouble with passing float parameters. I wanted to talk about here a little bit since it's not really a specific problem concerning DSP or marshallers, but rather how certain arguments are passed to variadic functions.

Let's start by talking about what a variadic function is, since you may or may not have heard the term. A variadic function is a function that can take a varying number of arguments. There usually are a few "required" arguments but the sky (or rather, the bottom of the stack :)) is the limit. Sounds familiar? Yes indeed, probably the two most famous functions in the C library are variadic: printf and scanf.

Variadics in C are easily identifiable by their declarations, which looks something like this:

int summation_function(int count, ...)

notice the ellipsis ( ... ) - this is the notation used to state that there will be an arbitrary number of arguments here.

And for quick reference - accessing the variadic arguments is done via stdarg.h macros, for example:

va_list arg_list;
va_start(arg_list, count);
for(int i=0; i < count; i++)
{
sum += va_arg(int);
}
va_end(arg_list);

whose detailed usage descriptions you can find easily by Googling.

Moving back to the problem I had with the variadic marshaller: I was passing regular 4-byte floats as arguments to be marshalled, but somehow the 4-byte region corresponding in the marshalled buffer always showed up to be corrupted somehow.

The issue was eventually revealed to be about the default argument promotions that are applied to variadics, as described in:

http://www.gnu.org/s/libc/manual/html_node/Calling-Variadics.html

therefore, any short int or char arguments passed are automatically promoted to int's, and all float's are casted into double's - thus having a 8 byte representation whose first 4 bytes don't really mean all that much :)

My initial idea was to simply cast the obtained double parameter back into a float before marshalling it into the buffer, but unfortunately this leads to loss of precision during double->float conversion. Therefore, I've decided to switch to using macros and doing the parameter marshalling directly inside the stubs. Comparing what the DSP-side stubs look like in the old vs. new methods of marshalling:

void rpc_mixedprint(int a, char b, float c, double d, short e, int f)

{

rpc_marshal("rpc_mixedprint", "vicfdsi",a,b,c,d,e,f);

rpc_perform();

}

versus

void rpc_mixedprint(int a, char b, float c, double d, short e, int f)
{
   RPC_INIT("rpc_mixedprint", "vicfdsi");
   RPC_PACK(int, &a);
   RPC_PACK(char, &b);
   RPC_PACK(float, &c);
   RPC_PACK(double, &d);
   RPC_PACK(short, &e);
   RPC_PACK(int, &f);
   RPC_PERFORM();
}

I'm aware it doesn't look quite as elegant as the variadic marshaller did... but in terms of ease of stub generation, it's not all that different, and there's no loss of data precision involved. And the GPP-side stub uses macros to extract the parameters as well, once the unmarshaller unpacks the buffer into the void** param_buffer:

void rpc_mixedprint(void **param_buffer, void *result_buffer)
{
   mixedprint(RPC_CAST_PARAM(param_buffer[0], int),
   RPC_CAST_PARAM(param_buffer[1], char),
   RPC_CAST_PARAM(param_buffer[2], float),
   RPC_CAST_PARAM(param_buffer[3], double),
   RPC_CAST_PARAM(param_buffer[4], short),
   RPC_CAST_PARAM(param_buffer[5], int)
   );
}

GSoC Weekly Report #7

Weekly Report #7
Submitted on 2010-07-05
Covers 2010-06-28 to 2010-07-05
Corresponding Draft Schedule Item:
Create the RPC framework that can call functions on the GPP side from the DSP and return values back to the DSP. Implement and unit-test the POSIX function wrappers according to the planned order.

Status and Accomplishments

the RPC framework was tested with many different kinds and combinations of non-buffer parameters, a few bugs unearthed and the issue with floats issued as variadic parameters led to the deprecation of the variadic marshaller. DSP-side stubs use macros to do the marshalling now. not as elegant but works far better.
the GPP-side RPC stubs dynamic link library is now embedded directly inside the resulting executable for cleaner deployment (ie, the user doesn't have to copy the library manually to the Beagle)
rpc_malloc and rpc_free handlers using CMEM for alloc/free and address translation added into the GPP server, so basic buffer/pointer parameters support will soon be in place
implemented a simple version of the ARM function caller, but doesn't work with double arguments or 4+ params of any kind. the macro-based parameter passing method works fine for now, though, and I intend to keep it for a while longer.
all POSIX/C lib stubs for functions that don't take any buffer parameters are now in place (there isn't that many, though :))
first usable version of dsp-rpc-posix committed to the repository

Plans and Tasks

more testing and support for buffer parameters - there's still questions here, see issues section
write stubs for the POSIX functions with buffer parameters/return types
review the build system (how the user provides his/her own sources for use in RPC) and make improvements where necessary, see issues section

Risks, issues, blockers

the user needs to be able to provide source code and declarations for functions he/she intends to use with RPC - how should this be ideally done? keep them in a pre-determined directory and have the user add them there (easy but not very flexible)? pass them to the tool as command line parameters prefixed with something special?
the GPP and DSP-side stubs for custom functions need to be provided manually for now. although it's relatively straightforward to do by hand, it's very very suitable for automation and I intend to have a stub generator script for this. I'd prefer not to have to write a C parser though, anything available I can use for this?
the situation with buffer parameters issue is as follows at the moment: any buffer parameters the user wants to pass on via RPC *must* be allocated via the rpc_malloc call. this call is mapped to GPP-side CMEM allocation and the GPP server performs virtual-to-physical address translation before passing the buffer address to the DSP, so the DSP can directly work on this buffer. but there isn't any direct physical->virtual address translation available on the GPP side - what's the best way to deal with this? currently the C6Run allocator saves 16 bytes of extra info, including the virtual base address, along with the allocated buffer, and this is how we get rpc_free to work. but there'll be problems if the user doesn't directly pass the allocated buffer, but just a part of it - how will we find the virtual address then? even if we can find the virtual address for any given physical address...we can only do address translation if we're aware if it's required. what if the user puts a physical address somewhere inside a struct, or even inside another buffer (say, a void** parameter) ?
one solution to this could be adding the allocated virtual addresses to the DSP MMU TLB and have the DSP work directly with virtual addresses. but there's only 31 slots available in the table :( EDIT this is not a solution at all, the DSP MMU doesn't do any address translation at all (virtual = physical)! it's just for protection (preventing DSP from accessing places it's not supposed to)
another solution could be making this the user's problem: providing virtual addresses from rpc_malloc and making the user do virtual->physical translation on their own (via another RPC function, of course :)) if it's needed.

Saturday, July 3, 2010

DSP-RPC-POSIX initial commit is in place!

I've made the initial commit for dsp->gpp RPC calls (the development had been on my personal SVN repo so far, since I didn't feel it was ready to see daylight :P)

It's nothing spectacular yet (e.g no buffer/pointer parameters or return values allowed, so only ctype and math c library calls) and you probably won't think very highly of the coding style (or the way I do things in the makefiles...) either, but it still works :)

The SVN URL is:

http://gforge.ti.com/svn/dspeasy/branches/dsp-rpc-posix

There's a readme file under the top-level rpc/ directory which should be sufficient to get started.

possible makefile/build issue: the version of LPM I was using (local_power_manager_linux_1_24_02_09) has lpm.av5T instead of lpm_linux.av5T (as was stated in the original C6Run makefiles) so I've changed that. just find/replace it the other way around in build/gpp_libs/modules/Makefile if it complains about that.

Some notes:

I had to put away my variadic marshaller since it was causing issues with floats (why? blog post coming soon!) - all marshalling is done inside the stubs with macros. Takes more lines but it's probably a better idea in the longer run.
The main points of interest in terms of code would be: the files under the rpc/ directory (dsp and gpp side stubs, some additional dsp side functions), rpc_server.c and .h under build/gpp_libs (gpp unmarshalling, symbol location and execution), cio_ipc.c (gpp RPC server, inside the C6Run C I/O server) c6run.c under build/gpp_libs (extracting the embedded GPP stubs library and setting up RPC buffers).
I'm using the C6Run C I/O transport system to pass RPC buffers back and forth, for now.
There's some additional parts in c6run-cc to handle the --rpc switch (add the dsp stub file to the sources list, compile the gpp stubs into a dynamic link library and embed it inside the final executable)
Custom stubs have to be hand-coded, there's no stub generator tool yet

Support for buffer parameters coming soon! (actually, if you write a GPP side stub that does allocation from CMEM or POOL, use this RPC call for allocating memory on the DSP and pass data in this buffer,
everything should work).

Comments, criticism and suggestions are very, very welcome!

Monday, June 28, 2010

GSoC Weekly Report #6

Weekly Report #6
Submitted on 2010-06-28
Covers 2010-06-21 to 2010-06-28
Corresponding Draft Schedule Item:
Create the RPC framework that can call functions on the GPP side from the DSP and return values back to the DSP. Implement and unit-test the POSIX function wrappers according to the planned order.

Status and Accomplishments
DSP->GPP RPC code is now integrated into the C6Run sources and works fine alongside with the regular C6RunApp C I/O calls. The RPC layer is compiled/linked together with the user-provided sources for now as putting them into the C6Run library caused some strange problems (see issues section).
the GPP side RPC stubs are now compiled into a dynamically linked library (.so) and the GPP side server uses dlfcn functions (dlopen, dlsym..) to locate the function with the given name and invoke it. the stubs still pull the parameters from the buffer on their own - assembly code for automating this is still in the works

Plans and Tasks
finalize the GPP-side assembly function caller so that GPP side stubs don't need to be provided at all
experiment with different combinations of parameters, including doubles/long integers and shared-mem allocated buffers to make sure everything is working correctly with the marshaller, DSP/Link transfers, unmarshaller and GPP-side server
finalize the DSP-side stubs for the POSIX functions which don't take/return any buffers as parameters

Risks, issues, blockers
One strange thing that occured: if I compile the RPC marshalling function together with the user sources (thus having the RPC layer in the user code, since readmsg/writemsg is accessible from there) given to c6runapp-cc everything works fine, but when I try to include the functions in the DSP-side C6Run library, something goes wrong. The function is a variadic function whose definition looks like this:

void rpc_marshal(char *function_name, char *function_signature, ...)

When I compile the DSP side libs with this function inside, the parameters after (and including) the second parameter are not passed correctly. I've observed that even the address of the formal parameter goes awry. To illustrate, let's say I put a printf call inside this function (thanks to C6Run CIO :)):

printf("Stack addresses: %x %x values: %s %s", &function_name, &function_signature, function_name, function_signature);

If I compile and use this function alongside with the user-provided sources to c6runapp-cc, everything works correctly, so I get output that looks like this (I've made up the stack addresses but they were similar to these):

printf("Stack addresses: 0x80000800 0x8000804 values: rpc_function iii@");

but if used from inside the library:

printf("Stack addresses: 0x80000800 0x80000854 values: rpc_function ");

Not sure why this is happening - linker settings? a problem with passing variadic parameters? (since if I remove the ... at the end the two parameters get passed correctly, but of course then the others don't). Since it works fine if provided alongside user sources I decided to stick with that for now.

Monday, June 21, 2010

GSoC Weekly Report #5

Weekly Report #5
Submitted on 2010-06-21
Covers 2010-06-14 to 2010-06-21
Corresponding Draft Schedule Item:
Create the RPC framework that can call functions on the GPP side from the DSP and return values back to the DSP. Implement and unit-test the POSIX function wrappers according to the planned order.

Status and Accomplishments

we now have a marshaller which takes a function name, signature and a variable number of arguments and packs them neatly into a buffer to be transferred via DSP/Link (as well as the corresponding unmarshaller :))
all DSP-side stubs are reduced to three lines: rpc_marshal, rpc_invoke and return result, quite suitable for automation
looked into shared memory issues some more without any fruitful effort; for now we'll enforce the user to pass buffer parameters allocated from shared areas only, and develop a better method later

Plans and Tasks

unfortunately I managed to break C6RunApp's existing RTS I/O while trying to integrate the RPC server, I'll work on fixing this - it'll be nice to have both working, see issues section regarding possible shift of focus to general RPC instead of POSIX
the static jump table design used for resolving rpc calls on the GPP right now is awful, write an assembly function launcher similar to the one in C6RunLib

Risks, issues, blockers

shared memory issues continue, but for now we can overlook them since we can force the user to pass only buffers allocated from shared areas (POOL or CMEM). PROC address translation is there but we have to avoid segfaults / protection issues, this'll need a bit more time
as I worked more and more with C6RunApp and the existing RTS C library implementation, I've realized that the available functionality is quite sufficient for prototyping purposes. so instead of replacing this with RPC POSIX calls, we can have both methods and leave the choice to the user. it also may be appropriate to shift the focus of the project on RPC issues, providing a library of examples which can be built up to useful code, or focusing on other ease-of-development issues.

Friday, June 18, 2010

A brief overview of RPC over multiple cores

RPC (which stands for Remote Procedure Call) is defined as follows by Wikipedia:

Remote procedure call (RPC) is an Inter-process communication technology that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network) without the programmer explicitly coding the details for this remote interaction (...)

Within our context of heterogenous OMAP3 multi-cores, it doesn't exactly refer to the same thing as it does in general, but still, there are enough familiarities. For example, the DSP and GPP use the same memory (ie, the DDR RAM on the Beagle) as main memory, but this doesn't make their address spaces the same - to start with, the DSP works with physical addresses whereas the GPP works with logical ones. And the binaries are obviously not compatible since we have two different processors and architectures (hence the "heterogenous" in "heterogenous multicore processing"). If we want the GPP and DSP to co-operate towards the same goal (especially in situations like video decoding which we'd really appreciate the DSPs help), it would be useful to get them to talk to each other without much effort. The DSP should be able to easily obtain the data it needs to work with (which usually resides in the GPP file system or program memory), process it and hand it back to the GPP.

While all of this is quite feasible as of the moment, doing something like this is far from trivial (set up OE, build DSP/Link, examine the examples, learn the DSP/Link API, set up the GPP side structures for bringing up the DSP, try to get the DSP things working without being able to see what you're doing unless you have CCS or are using C6RunApp ;) and so on...). And DSP side code is C code. TI's CGT6000 is a C compiler. You could essentially write the same things and compile/run on either side. Why all this hassle? Why not be able to pass parameters and call functions from one side to the other? Why not live in harmony and peace? Then RPC is the answer! (maybe except that last one).

Scrolling down the Wikipedia article, we see the steps involved in making RPC work; and it's rather trivial to see these from a DSP to GPP RPC point of view. Let's say we have this function int gpp_side_function(int param1, float param2, char *param3) somewhere in the GPP side, which we want to call from the DSP.

The client (DSP in our case) calls the Client stub, which has the same definition. The call is a local procedure call, with parameters pushed on to the stack in the normal way.
The client stub packs the parameters into a message (done according to some predefined structure, e.g 2 bytes function name length, n bytes function name, then 2 bytes parameter count, then each of the parameters with their lengths, etc.) and making a system call to send the message. In our case, sending the message is putting a message on the DSP->GPP MSGQ. Packing the parameters is called marshaling.
The server (the GPP side in our case) receives the message, unpacks (unmarshals) it and locates the corresponding local function call stub.
Finally, the server stub calls the desired procedure. The reply traces the same in other direction.

Similarly, one could to GPP->DSP RPC calls - a component of the C6Run project called C6RunLib is meant for this purpose.

There are certain implementation details that need special attention in our case:

Passing Pointers as RPC Parameters

Since the addressing mechanism of the GPP and DSP are different, we need to make sure that any pointer parameters / buffers which are passed are accessible from both sides. When doing GPP->DSP calls, one can use the CMEM module to obtain a shared region of memory and translate the addresses forth and back to ensure mutual accessibility, but there is no CMEM interface on the DSP side, thus one must resort to workarounds. If the size of the buffer parameters is always available, one can pass the contents of the buffers themselves back and forth via DSP/Link. One approach could be using the DSP/Link POOL module which allocates shared buffers and provides address translation, though this will not be suitable for large amounts of memory usage since there is a limited number of constant-sized POOL buffers. Another could be using a special protocol to access the CMEM interface on the GPP side and doing all the allocation from there, but both in this case and the POOL buffer case we have to ensure the passed pointer parameters are allocated with our special method and not from the DSP stack region, otherwise we'll be in trouble.

Openness to Expansion

Having a RPC framework that can only call 5 (or 50, or 100) predefined functions on the other side is not very useful, one wants to be able to define and call one's own functions. And to be able to do this, the framework must not have too many hardcoded details such as when to do the address translations mentioned above. To address this, I'm using a function signature system in my implementation to be able to identify parameter types and where special attention is needed. For example, for our above function

int gpp_side_function(int param1, float param2, char *param3)

the corresponding signature would be

iif@

where the first symbol identifies the return type as an integer, and the other three identify parameter types. The final @ indicates that some manner of address translation (or passing the buffer back/forth, if that's the preferred approach) is needed: the marshaler may take care of the translation before passing the message, or the server stub may do the translation itself, depending on how the protocol is defined.

Also, I've designed the marshaler itself to be as user-friendly as possible - let's say you want to provide the DSP side function stub for the above function. All you have to do is pass the function name, function signature (which can be extracted via a simple lookup table) and all the parameters to the marshaller, prompt the data transfer, and return the result with the desired data type. So the stub for the above function would look like this:

int gpp_side_function(int param1, float param2, char *param3)

{

rpc_marshal("gpp_side_function", "iif@", param1, param2, param3);

rpc_makecall();

return RPC_GETRESULT(int);

}

The process is quite suitable for automation, and I'm planning to have a script that auto-generates the stubs given the definitions.

Finding and invoking the corresponding GPP stub

It's easier said than done - so you have a string giving you the GPP-side function name and a bunch of parameters. How do you locate the corresponding function, and how do you actually make the function call? So far, I've been thinking of these approaches:

Use a static jump table. An intuitive albeit not-so-elegant solution. Associate each function name with a number (perhaps via hashing?), and use a function pointer table to jump to the stub. The stub has to take care of the parameter demarshaling and passing the parameters. A static approach, so one has to add the table entry as well as the GPP-side stubs.
Use an assembly function caller. Once we are able to decode the function address (perhaps statically as in 1, perhaps via dynamic loading with libdl functions) we can have an assembly routine which pushes the parameters onto the stack and call the function living at the located address. This is what C6RunLib uses, and will be the eventually preferred method.

Monday, June 14, 2010

GSoC Weekly Report #4

Weekly Report #4

Submitted on 2010-06-14

Covers 2010-06-07 to 2010-06-14

Corresponding Draft Schedule Item:

Familiarize with DSP/BIOS Link and its various modules. Experiment with GPP-DSP communication and compilation processes to identify potential issues and useful features. Review existing RPC protocols and create one suitable for DSP-GPP communication over DSP/BIOS Link.

Status and Accomplishments

My exchange period in Sweden ended and I had to move back to Turkey, so couldn't really get as much work done as I'd like to, but now I have access my evil scientist laboratory headquarters :) I'll be based here for the rest of the summer.
I now have an RPC implementation for DSP->GPP calls, composed of four parts: DSP-side function stubs, DSP-side marshaller/unmarshaller and transport functions, GPP side marshaller/unmarshaller and transport functions, and GPP side stubs/invokers. No dynamic loading on the GPP side, the invocation is a simple static jump table and there's no support for pointer parameters but it works fine otherwise.
had some time to play around with kernel module loading. didn't really discover anything groundbreaking, but noticed that C6Run doesn't work with the *.ko's that came with my Ångström. probably version differences which won't be an issue if one builds C6Run libraries and the distro itself using OpenEmbedded, but in the other case we could have a config file on the Beagle which points to the appropriate module filenames, and load these ones at startup instead.

Plans and Tasks

right now the stubs are doing most of the work for marshalling, this will make it difficult to add new RPC functions in the future. have a marshaller that can pack variable number of parameters into the data, I've already have had some success with this. I'm planning to have a string "function signature" for each stub, and feed this along with all the parameters to the marshaller. bad idea? should each stub keep doing its own packing to increase efficiency?
passing pointer parameters is still a big issue due to memory sharing, we have to make sure that any direct memory pointers are mutually accessible after address translation, look into this.
finalize and commit my initial group of DSP-side wrappers with integer parameters and return types (mostly is*() functions from ctype.h)

Risks, issues, blockers

Memory issues still here: C6RunLib can utilize the CMEM interface to get DSP-GPP shareable contiguous memory regions, but there is no CMEM on the DSP side - what'll be the cure for this? force allocating everything from POOL buffers if they are to be passed as RPC parameters? set the DSP linker parameters to allocate the heap from a pre-determined shared region and do the address translation manually? copy and back-copy the DSP-side buffers manually into/out of the message buffers during marshalling and unmarshalling (but how do we know the size of a void* buffer, for example?)?

Wednesday, June 9, 2010

C6RunApp and DSP-RPC-POSIX

Now we've looked into how C6RunApp works in general, it's time to bring DSP-RPC-POSIX into the picture. Although the project builds largely on top of C6RunApp in general, there are two keywords in the name which hint into a different direction:

RPC: the project aims to bring a fully functional RPC (Remote Procedure Call) framework into the picture in the sense that GPP-side procedures are remote procedures when viewed from the DSP side. In other words, the DSP will be able to call any GPP-side function using this component. Note that there already exists a limited form of RPC in the existing C6RunApp structure, but only for the file system access calls.
POSIX: using the RPC component, the GPP-side POSIX library will be made accessible from the DSP side.

At this point, the question "but the RTS lib already provides POSIX functionality! why's a RPC POSIX layer necessary?" may come up. These were my primary reasons for going in this direction while writing my project proposal:

being able to offer the already stable GPP-side POSIX libraries with no extra effort required
have greater flexibility for expansion since it can eventually be used to call any GPP-side function, such as writing to the frame buffer or user-defined functions.
although the DSP is an indeed powerful processor it is not meant for general computing, so it's not practical for, for example, string processing while formatting printf strings - this is a task better done by the GPP

Right now, I'm still experimenting with RPC-related tasks such as packing ("marshalling") and extracting ("unmarshalling") a variable number of arguments into/out of messages and dynamic loading, but the first wrappers should be appearing quite soon :)

C6RunApp

In my blog post entitled (rather appropriately, in my opinion :)) Moment of Truth #1, I had briefly mentioned what C6RunApp allows you to do - you write C code in the conventional way you would for the ARM (including things like debug outputs with printf or data input with scanf), and then use the C6RunApp script to get an ARM executable which actually runs everything you wrote on the DSP (except things that require access to the ARM side itself, but we'll get to that in a moment).

First, let's make a mention of the DSP RTS library as it's a relevant component on which C6RunApp itself builds on. The TI CGT (which stands for Code Generation Tools and is essentially the DSP side compiler, linker and other binary utilities) C6000 contains a library of standard C functions which is aptly called the Run Time Support (RTS) library. As you would expect from such, it contains implementations of regularly used C lib functions such as printf and scanf. But of course, the problem is not writing C lib function implementations on the DSP (which is quite a capable processor) - it's things that actually require GPP side capabilities such as file system access (which in turn can also provide console input and output). The implementations of these functions in the RTS use a number of base lower-level functions (such as open, close, read and write) to carry out the needed GPP side communication, which can be user-defined. For example, if one is using the CCS (Code Composer Studio) for the development process, CCS has the driver which provides the communication between the host which is running the development environment and the DSP, so that you can see the terminal output and access local files.

Of course, this is not a very convenient scenario as we are dependent on the CCS for file system access. Why not just the GPP side OS (that is, the Ångström distribution) instead? This is one of the underlying ideas for C6RunApp: we have a host application on the GPP side which recieves requests over DSP/Link, performs the necessary file system calls, and passes back the results over to the DSP again. Another idea is that the GPP host app takes care of setting up the DSP/Link and loads the DSP with the DSP-side executable without any effort from the user. Combining these two ideas that provide us with "verbose" DSP side programs and abstract away the details of DSP/Link, we get easier DSP side development - we get C6RunApp!

C6RunApp Workflow

So what happens when you want to compile hello_world.c using the C6RunApp cross-compiler script? Let's have a look here first, and then we'll examine the involved libraries in some more detail. This is what the C6RunApp readme file has to say on the subject (with some small clarifications from me):

The DSP tools are used to build the supplied source and link it against the prebuilt C6RunApp DSP-side library. The result is the complete DSP-side executable image in standard TI COFF format.
The DSP executable image is minimized in size by using the symbol stripping tool, strip6x.
The contents of the stripped DSP executable file are converted to a C byte array in a temporary C header file. This header file is referenced by the main GPP-side C6RunApp loader, and thus the DSP executable image will be embedded into the final resulting GPP executable.
The main C6RunApp loader program is built using the ARM cross compiler tools, including the DSP-side executable inside of the binary ARM ELF executable. This ARM executable is the same name as specified on the command line of the C6RunApp cross-compiler script.
Once the GPP executable is ran, it sets up the DSP/Link, loads the DSP with the in-built DSP executable and initializes needed communication channels (namely, constructing the GPP->DSP message queue and locating the DSP->GPP message queue).
The GPP executable waits for the DSP to send it file system call requests, performs the requested ones and sends back the results, until it receives a signal that it can terminate.
Teardown is performed on the DSP/Link setup and the DSP is cleanly shut down.

C6RunApp Components

Let's have a look at the pieces of which C6RunApp consists:

The DSP side library, C6RunApp_dsp.lib - the library which the user-provided code is linked against, contains entry and exit points for the DSP/BIOS, initializes the communication channels and starts running the user-defined main(). It provides implementations of writemsg and readmsg which the DSP RTS lib bases the low-level communications on. These implementations pass the requested function call to the GPP via the message queue and read back the result in the same manner.
The GPP side library, C6RunApp_gpp.lib - the library that contains the functions which serve the DSP's C I/O requests.
The GPP main object, C6RunApp_load.o - once the DSP side executable is created and turned into a header file, the binary object C6RunApp_load.o is linked against the GPP side library to create the final GPP executable
The kernel modules - not really components so much as dependencies. C6RunApp utilizes CMEM for the initial loading of the DSP executable, the LPM to do a clean shutdown of the DSP as is needed on OMAP3530s, and the DSP/Link module for some obscure purpose :)

Hopefully this will have provided some insight into how the magic of C6RunApp works - coming up next (but sooner this time!) is where my GSoC project DSP-RPC-POSIX fits in with all of this.

Monday, June 7, 2010

GSoC Weekly Report #3

Weekly Report #3

Submitted on 2010-06-07

Covers 2010-06-01 to 2010-06-07

Corresponding Draft Schedule Item:

Status and Accomplishments

more experimentation with DSP/Link APIs and general reading especially for the DSP side arch — I feel confident and comfortable enough to seriously start working
studied the C6RunLib source code for inspiration on RPC calls and got some — but I still have doubts on implementing the exact same system, see issues section below
experimented with some GPP-side tasks which can be of use for the RPC framework such as passing a variable number of parameters to functions and methods for dynamic and static linking to dynamic link libraries
did some basic RPC tests using the existing C6RunApp writemsg/readmsg architecture, all went well (nothing fancy — was mainly for the purpose of testing my understanding of how things work)
looked for existing statistics and similar documentation on which POSIX functions are the most commonly used without any luck...decided to follow a C standard library reference documentation and start with the simpler functions, see plans and tasks section below

Plans and Tasks

finalize and make a working draft of the RPC system — possibly without dynamic loading on the GPP side at this stage, which can be added later on
finalize blog post on C6RunApp architecture and how DSP-RPC-POSIX fits in — hoping that increasing awareness will stir more interest
write the first wrapper(s) for the DSP side, starting with noncomplex (in terms of having a constant number of basic non-pointer parameters) functions (e.g putchar())
experiment a little with possibilities of user-friendly features such as checking and auto loading of kernel modules at program startup and catching Ctrl-C signals to perform DSP side cleanup before exiting

Risks, issues, blockers

I want the RPC system to eventually work with any user-defined GPP side function (not just C I/O) without much hassle, and I believe dynamic library loading will be necessary for this (ie, given the function name, parameter type list and library name, be able to locate the function and call it on the GPP side). I realize this can be done by preprocessing the source code and adding the function defs before compiling but I'm curious if it can be done at runtime as well. C6RunLib uses this preprocessing method (via perl scripts) but in that case we provide and therefore have access to source code of all the possible RPC fxns anyway
Memory issues continued: Is it a really good idea to do RPC GPP calls for functions that take pointers as parameters and modify the pointer data, like malloc() and memset() ? They're already present and working in the DSP RTS libs, so is it necessary? If so, we could do address translation before calling the actual GPP side function, can we ensure that the memory is mutually accessible by the processors by doing all allocation from POOL buffers? The issue eventually extends to any function that takes a pointer parameter.. I'd appreciate some broader-perspective views on this subject.

Wednesday, June 2, 2010

A bit more about DSP/Link

My original plan for this particular blog post was to do a write-up on what the OMAP3530 multicore architecture looks like, where the DSP sits in this picture, how DSP/Link comes into play and what it can do for us. But then again, I noticed the document which I learned most of these thing from is an excellent source of information and doesn't really need a re-write, so I'm just going to offer my salutations to the ETH PIXHAWK MAV project and invite you to read their excellent DSP/Link API guide :)

Coming up soon: how C6RunApp works, and how my project DSP-RPC-POSIX will fit in it to take things a bit further.

Monday, May 31, 2010

GSoC Weekly Report #2

Weekly Report #2

Submitted on 2010-05-31

Covers 2010-05-23 to 2010-05-31

Corresponding Draft Schedule Item:

Status and Accomplishments

initially built DSP/Link module and examples using OpenEmbedded
modified DSP/Link examples and built them using the ti-devshell script with the OE toolchain
successfully compiled and tried out C6RunApp on the BeagleBoard, using both OE and CodeSourcery toolchains
created the dsp-rpc-posix branch at http://gforge.ti.com/svn/dspeasy/C6RunApp/branches/dsp-rpc-posix (no commits yet, though)
did some more studying of the C6RunApp source code, better understanding when one sees it in action

Plans and Tasks

more experimentation with DSP<->GPP communication and especially with DSP side code
benchmark some DSP/Link on the Beagle, mainly NOTIFY and MSGQ, may be useful to have numbers around
study how C6RunLib does RPC calls from GPP to DSP to get some inspiration for implementing the reverse
come up with a finalized list for the order in which the POSIX wrappers will be written

Risks, issues, blockers

probably not an issue but worth a mention: the speed at which DSP/Link conveys messages and notifications may have a performance impact, but this isn't expected to be a major issue since the primary goal is easy prototyping on the DSP anyway
how best to determine which C standard library functions are used the most often / are the most useful? (aside from personal experience) statistics? polls?
the original project definition contains a few too many references to POSIX - Daniel Allred pointed out that there are many fine details and many, many functions in the standard itself. full POSIX compatibility could be an eventual goal (but why full compliance is such a good idea is open to discussion) but it looks wiser and more productive to start and progress with commonly used / useful functions.
providing easier prototyping will lower the entry level and this may, in turn, mean there will be more inexperienced behavior regarding the dsp. some basic "baby-sitting" (perhaps checking/loading the needed kernel modules automatically at the app startup, and catching Ctrl-C signals so we can cleanup before the exit) may be necessary.
DSP works with physical addresses, while the GPP works with virtual ones. what if the user wants to print a string which is allocated from the DSP heap? the issue can be handled by copying buffers to POOL buffers and passing them in this manner (this is what C6RunApp currently does) but the issue itself is worth further investigation. is there a better approach? will POOL buffer copying be sufficient, or is there something it won't be able to handle correctly?

Thursday, May 27, 2010

Modifying the DSP/Link examples and utilizing OE toolchains outside OE

Now that we got OE to build the DSP/Link kernel module and examples for us, we'll probably want to do some tinkering with the sample applications to feel more comfortable with DSP/Link - one may just as well start from scratch, but it definitely won't be as easy as modifying the existing examples.

You should be able to find the dsplink source tree at

~/oe/setup-scripts/build/tmp-angstrom_2008_1/sysroots/beagleboard-angstrom-linux-gnueabi/usr/share/ti/ti-dsplink-tree/packages/dsplink

and under this directory, you'll find the two directories for the sample sources, namely gpp/src/samples for the GPP-side code of the samples, and dsp/src/samples for the DSP-side. The source codes found in these directories themselves is quite interesting to study to learn how DSP/Link and its components are used, and I've heard that actually tinkering with things allows you to learn much faster *wink wink*.

The thing is, once you have modified the sources, it's not very straightforward to re-compile them as they're part of the DSP/Link make system. Here one can utilize OpenEmbedded's environment to make this go a bit more smoothly.

The ti-devshell recipe generates a shell script which you can run to become bitbake. Well, maybe not become bitbake, but it'll export all the related environment variables for you once you run it, which will make life much easier. One thing before baking the actual recipe that you may want to do is, check the ti-devshell.bb recipe file and see if it contains this line:

PACKAGE_ARCH = "${MACHINE_ARCH}"

if it doesn't, you can just add it somewhere near the top - it'll be generating incorrect paths for the ti- thingies otherwise. Koen mentioned that this is needed due to some hack the ti- recipes are using.

After bitbake ti-devshell you'll end up with a shell script in the OE deploy/addons directory. Try running this script and typing in env to get an idea how much it does for you :)

The part that comes now was inspired by how the ti-dsplink.bb recipe does things - it's highly recommended that you examine the file yourself.

Next, go into the dsplink source directory by issuing cd $LINK_INSTALL_DIR. There's one particular environment variable that the script doesn't set, so export DSPLINK=$LINK_INSTALL_DIR/dsplink. Find the config script under config/bin in the dsplink directory, and execute this to configure the DSP/Link make system:

perl dsplinkcfg.pl --platform=OMAP3530 --nodsp=1 --dspcfg_0=OMAP3530SHMEM --dspos_0=DSPBIOS5XX --gppos=OMAPLSP --comps=ponslrmc

What was that bunch of parameters? You can run the script without any arguments and it'll describe each of them for you, along with the choices you have. Following the default settings (ie the ones I mention above) is highly recommended though.

Just two steps away from completion...there's some more bitbake-inspired environment variables to be set - beware that these represent my particular paths with the OE root located at ~/oe.

export TARGET_PREFIX=arm-angstrom-linux-gnueabi-

export TOOLCHAIN_PATH=~/oe/setup-scripts/build/tmp-angstrom_2008_1/cross/armv7a

export TARGET_SYS=arm-angstrom-linux-gnueabi

export STAGING_KERNEL_DIR=~/oe/setup-scripts/build/tmp-angstrom_2008_1/sysroots/beagleboard-angstrom-linux-gnueabi/kernel

Finally, go into the gpp/src/samples directory and...

make BASE_TOOLCHAIN="${TOOLCHAIN_PATH}" BASE_CGTOOLS="${BASE_TOOLCHAIN}/bin" OSINC_PLATFORM="${TOOLCHAIN_PATH}/lib/gcc/${TARGET_SYS}/$(${TARGET_PREFIX}gcc -dumpversion)/include" OSINC_TARGET="${BASE_TOOLCHAIN}/target/usr/include" CROSS_COMPILE="${TARGET_PREFIX}" CC="${TOOLCHAIN_PATH}/bin/${TARGET_PREFIX}gcc" LD="${TOOLCHAIN_PATH}/bin/${TARGET_PREFIX}gcc" AR="${TOOLCHAIN_PATH}/bin/${TARGET_PREFIX}ar" COMPILER="${TOOLCHAIN_PATH}/bin/${TARGET_PREFIX}gcc" LINKER="${TOOLCHAIN_PATH}/bin/${TARGET_PREFIX}gcc" ARCHIVER="${TOOLCHAIN_PATH}/bin/${TARGET_PREFIX}ar" KERNEL_DIR="${STAGING_KERNEL_DIR}" all

it's just ripped off from the OE dsp/link recipe, it should be just fine to copy-paste it into your terminal as it's pretty neutral with a lot of environment variables (which we already defined in the previous step). You can repeat the same make comand in dsp/src/samples to build the DSP-side code as well, all of which will end up in the gpp/BUILD and dsp/BUILD directories.

Moment of Truth #1: Hello World from the DSP with C6RunApp

O World, behold the C6RunApp saying Hello to Thee!

C6RunApp (formerly known as DSPEasy) is an awesome cross-compiling tool that I'll be basing my work on. What it allows you to do is basically this: you write your 5-line run-of-the-mill Hello World application in plain old C, then call the cross-compiling script on this file as you would call gcc to produce the hello_world_dsp executable you see above. The executable itself is (obviously) an ARM executable, but when you run it it magically extracts the compiled DSP binary from within itself, takes care of all the tedious work of setting up the communications via DSP/Link, and services some C I/O (such as printf) on request from the DSP. Now imagine this with the full capabilities of the standard C library we know and love, not to mention the eventual ability to connect to any ARM-side function. Wouldn't it be lovely to do DSP development like this?

Coming soon: Developing for the DSP, using the OE toolchain to build modified DSP/Link samples, setting up C6RunApp and much more!

Wednesday, May 26, 2010

Building DSP/Link with OpenEmbedded

I'll be shortly making a post about how the traditional DSP development cycle with DSP/Link works to further emphasize why C6Run and DSP-RPC-POSIX (that'd be my GSoC project's name :)) are needed. But before that, I though I'd talk a little about setting up DSP/Link -which is a vital component for doing DSP development on OMAP3530 in the preferred way- using OpenEmbedded.

Start off by building the ti-dsplink recipe:

bitbake ti-dsplink

(if, for some reason, you want only the kernel module for dsplink, you can use the ti-dsplink-module recipe instead - the ti-dsplink recipe contains both the module and the samples)

This should result in several package files in your OE deployment directory (something like ~/oe/setup-scripts/build/tmp-angstrom_2008_1/deploy/glibc/ipk), which can be transferred to your Beagle (by directly copying them to the SD, by networking, by nfs...the options are endless - I personally mount the beagle via sshfs on my host machine) and then installed with opkg install .

That's all there is to building and install DSP/Link, really! You can invoke the provided examples by cd'ing into /usr/share/ti/ti-dsplink-examples and then:

./ti-dsplink-examples-loadmodules.sh to load the kernel modules. Note that you should have your bootargs configured to leave several megabytes of memory to the DSP/Link module. You may get a warning on the newer Beagles with 256 megs of RAM even if your bootargs are configured correctly, since the script just checks if you have more than 126 megs allocated for the kernel, so no panic.
./ti-dsplink-examples-run.sh to run all the examples with some default parameters - you can examine the contents of this file to see how particular examples are invoked. Aside from the second parameter (which is the DSP executable name) and the last (which is the DSP ID), playing around with the parameters should be safe.
./ti-dsplink-examples-unloadmodules.sh to optionally unload the modules once you're done.

Monday, May 24, 2010

Getting a working gzip

I've previously mentioned that the current version of gzip that ships with Ubuntu 10.04 is broken, among some other distros. I've also mentioned a manual workaround with the gzip recovery tools, but for several reasons (such as not wanting to do the same damn sequence of actions, md5 and sha256 calculations and recipe updating while building things with OpenEmbedded!) one may want to fix those annoying crc error / unexpected end of file error's once and for all.

Download the gzip-1.4 package from here.
Extract into folder of your choice, I used ~/gzip-1.4
Open the terminal and cd into the extraction folder
Run ./configure
Run make
Run sudo make install

And now you have the gzip-1.4 binaries installed on your system, which should take care of the gzip crc errors.

GSoC Weekly Report #1

Here's my first weekly report on my GSoC work. My dear visitors will probably want to skip to parts 2 and 3 since the first part contains mostly my ramblings about OpenEmbedded, which you've already read.

I'll promise it'll be shorter and easier to read the next time - but well, here goes :)

1) Setting up OpenEmbedded

Since the GSoC coding begins officially on 24th of May, and also since
I'm required to be very familiar with the development environment if I
want to be able to make life easier for DSP developers who'll be
working in the same environment, I was suggested to concentrate mainly
on setting up the OpenEmbedded toolchain correctly and build some of
the DSP/Link-related recipes. My initial experience with the setup was
rather frustrating, and it's slightly ironic that this occured not
because of lack of information on the topic, but due to too much
outdated/irrelevant information which is spread around the net, on the
wikis, etc. I initially tried out manually building the stable/2009
branch and learned that this branch was way too outdated. I then tried
setting up the apt-get'able OE and building the dev branch, which I've
had some success with and made a blog post to talk about the issues I
ran into during the process, only to find out later by Koen's mail
that the apt-get'able OE is somehow broken by design and the Ångström
setup scripts are *the* preferred way of doing things. As of the time
of reporting, I've successfully set up my devenv using the setup
scripts, but I honestly think that the visibility of these scripts
needs to be improved in the community provided documentation.

2) Familiarization with DSP/Link and the existing C6Run code

I should first mention that I haven't been able to spend as much time
on this subject as I would have liked to, since I was practically
struggling with the OE setup during the whole week. I had already
compiled DSP/Link and the sample applications using the CodeSourcery
toolchain and gone through parts of the DSP/Link API documentation,
but I still had a long way to go towards being able to use this
knowledge practically, thus I spent some more time on reading
documentation and source code. Aside from the DSP/Link User Manual and
the provided samples, I found the tutorials and documentation on the
ETH PIXHAWK (Computer Vision on Micro Air Vehicles) very helpful.
After these I went through the C6RunApp (formerly DSPEasy) source code
and I now do feel that I have sufficient practical understanding of
the DSP/Link architecture in general and how C6RunApp works in
specific to be able to start experimenting with GPP-hosted DSP-side
calls. This week I'll be trying to get C6Run to build using the OE
toolchain and play around with the PROC, NOTIFY, MSGQ and RINGIO
DSP/Link modules on my own, including some benchmarking for the
various calls since Daniel Allred mentioned over IRC that GPP->DSP
MSGQ message passing can take as much as 0.5ms on the hawkboard and
using NOTIFY may be able to speed things up, although these numbers
are likely to differ on the Beagle.

3) Order of porting for the POSIX functions and how much work the DSP
should be doing on them

One of my goals in the preliminary period was to determine how much of
the POSIX calls should be executed on the DSP and also in which order
the porting should be carried out with respect to their internal
dependencies. Concluding from my previous talk with Daniel over IRC
and my own studies of the commonly used C library .h functions which
my opinion is that -possibly- aside from a few math functions that
involve some internal number crunching, all of the POSIX calls should
be *completely* hosted on the GPP side, ie the DSP should only send a
request with the function name and parameters without doing any
pre-processing. The reasons behind this would be:

* the DSP is not really suitable for most of the tasks involved in
the internals of POSIX functions, which usually don't involve heavy
mathemathical operations but things like string manipulation which the
GPP is much more suited and optimized for
* there isn't really any significant time to be gained by doing some
of the processing on the DSP since it appears as though the bottleneck
will be the speed at which messages are carried through DSP/Link
* this will result in cleaner and simpler to understand code since
the level at which processing is done on different processors doesn't
change from function to function
* in the case of any modifications to the GPP side code (thinking of
the RPC framework as in general GPP-side function calling, not just
for POSIX calls which are very unlikely to change :)), there will be
no extra effort involved in keeping the DSP-side code up-to-date

This also means that the internal dependencies of POSIX functions
won't have to be taken into consideration regarding the order of
porting - the POSIX side wrappers will be dependent only on the GPP
side host code which is already there anyway. Therefore, a pragmatic
approach such as porting the generally most frequently used functions
can be taken.

Thursday, May 20, 2010

OpenEmbedded on Ubuntu 10.04

NOTE: In light of the instructions received from Koen, the Angström head developer, you should avoid following the instructions below, since the deb-based OE is appearantly faulty by design and bitbake 1.10.x should be used instead of Ubuntu's one. Use these setup scripts to set-up your dev branch instead. I'm not deleting the post in case the same errors occur in some other way as well.

Now's the time to talk about...setting up OpenEmbedded on Ubuntu 10.04...and...building the Ångström distribution for the BeagleBoard! (I do realize I've created some suspense surrounding this particular subject with my previous posts, but despite the problems one may run into it's not that terrifying really :))

An advantage of setting up OE on Ubuntu (or any other .deb-based system, for that matter) is that one is able to use the apt-get'able OpenEmbedded. Following the instructions on the blog post (that is, get the repository in your sources list - sudo aptitude install openembedded-minimal - oe-setup-minimal org.openembedded.dev) the set-up part becomes a breeze - this will take care of all the necessary dependencies.

You may read from several sources that you absolutely shouldn't be using the bitbake that is in Ubuntu's own repositories (ie, things like DON'T sudo apt-get install bitbake) but from what I've seen, they've updated the repository and version 1.8.18 was directly available at the time of my writing this. You may want to check for yourself by executing bitbake --version once you have obtained the packages, anything over 1.8.16 should be fine.

Once you've set up the environment by executing oe-setup-minimal org.openembedded.dev (yes, using the dev branch is appearantly recommended, instead of the stable/2009 branch) you can make an attempt at trying out your new oven...oh, I mean, development environment, by trying out something smaller via giving this command:

MACHINE=beagleboard bitbake nano

While we're at it, we can take a short look at parts of this particular command. Not that it's that complicated but here goes anyway:

OpenEmbedded supports a lot of different target platforms, so we should specify the platform we'll be targeting. This is what the MACHINE=beagleboard part is for.
bitbake is the central make tool, used extensively in OpenEmbedded, and nano is the name of the "recipe" for the miniscule (but useful!) text editor. For a nice introduction to bitbake and its recipes system, you may want to read this.

Although nano itself isn't all that complicated, the build can take some time since the initial setting up of the toolchain and base libraries will also be done.

During the build process, you may run into several errors. I've documented the ones I've personally ran into and how I got over them.

Error #1: Extraction error while building expat

OE Message: ERROR: '/home/maltanar/oe/openembedded.org/recipes/expat/expat_2.0.1.bb' failed

Digging down a bit further reveals

NOTE: Unpacking ../../sources/expat-2.0.1.tar.gz to ../../tmp/minimal/org.openembedded.dev/work/armv7a-oe-linux-gnueabi/expat-2.0.1-r3/

gzip: stdin: invalid compressed data--crc error

This is a known bug in the current gzip which appearantly affects several distributions on i386 computers. The workaround I did is as follows:

Install the gzip recovery tools: sudo apt-get install gzrt

cd into the OE sources folder ($OE_ROOT/sources, should be ~/oe/sources if you used the deb setup)

Run gzrecover expat-2.0.1.tar.gz; mv expat-2.0.1.tar.recovered expat-2.0.1.tar; gzip expat-2.0.1.tar and say yes to the overwrite

Generate the md5 and sha256 sums of the new file by running md5sum expat-2.0.1.tar.gz; sha256sum expat-2.0.1.tar.gz and note them somewhere. Edit the expat-2.0.1.tar.gz.md5 file to reflect the new MD5 checksum.

cd into oe/openembedded.org/recipes/expat then edit the expat_2.0.1.bb file in your favorite text editor - change the MD5 and SHA256 checksums there to reflect the new ones, and save.

Done! You can run MACHINE=beagleboard bitbake nano once again.

I didn't run into any more problems while building nano, and it was eventually successfully built. And now it's the time to try the bit-baking oven on something bigger...an image of the actual Ångström distribution. Sounds fancy? Well, it starts off as simple as

MACHINE=beagleboard bitbake -v console-image

provided that you want to start out with something simple, that is, the barebones console image. Notice the extra -v passed on the command line - it can be helpful to have a verbose bitbake while building something major.

I've also kept track of what happened during the console-image build process:

Error #2: Trouble with glibc version while building gtk+

OE Message: NOTE: Task failed: /home/maltanar/oe/tmp/minimal/org.openembedded.dev/work/i686-linux/gtk+-native-2.20.0-r8.0/temp/log.do_configure.11847

In more detail:

| checking for BASE_DEPENDENCIES... configure: error: Package requirements (glib-2.0 >= 2.23.6 atk >= 1.29.2 pango >= 1.20 cairo >= 1.6) were not met:

| Requested 'glib-2.0 >= 2.23.6' but version of GLib is 2.22.1

For some reason, although many newer glib-2.0 recipes are available, OE chooses the older ones. To solve this, I went into the oe/openembedded.org/recipes/glib-2.0 and deleted all the recipe (*.bb) files with the -native suffix (for example, glib-2.0-native_2.2.3.bb), and re-started the build process.

I don't understand it completely but I think the old -native suffixed recipes are sort of left-overs, since OE seems to be using the BBCLASSEXTEND = "native" flag for this purpose now (the non-suffixed recipes all have this).

Error #3: Trouble with mtd-utils build

OE error message: NOTE: Task failed: /home/maltanar/oe/tmp/minimal/org.openembedded.dev/work/i686-linux/mtd-utils-native-1.3.1-r0/temp/log.do_compile.6960

crc32.o: could not read symbols: File in wrong format

Not sure why this happened either. I cd'd into the mtd-utils source directory (oe/tmp/minimal/org.openembedded.dev/work/i686-linux/mtd-utils-native-1.3.1-r0/git) and ran make clean followed by MACHINE=beagleboard bitbake mtd-utils which did complete successfully this time.

After these (and several hours later) the process was complete, and I ended up with the rootfs, uImage and u-boot images in the oe/tmp/minimal/org.openembedded.dev/deploy/images directory! The uImage was still causing a kernel panic for me upon boot with a proper USB cable, so one may want to use the uImage from the /boot folder from the last BeagleBoard demo image as I mentioned in my previous post. Or maybe use an USB cable which isn't working :)