A place to discuss the MulticoreBSP library and its applications, and for discussing the use of the Bulk Synchronous Parallel model on shared-memory architectures, or hybrid distributed/shared architectures.
You are not logged in.
I already posted on this forum a while ago, about the recursive function with a bunch of global variables. Well, after we got our initial idea (with those globals) to work on Huygens, I tried to port it to MCBSP by just passing all the parameters (as const's) to the recursive function. Weird things started to happen when I was done...
Sometimes the program would work just fine, but on other occasions the program crashed either with segfaults or after printing an error message from bsp_put() (3 processors):
Error: bsp_put would go out of bounds at destination processor (offset=0, size=4, while registered memory area is 3 bytes)! Error: bsp_put would go out of bounds at destination processor (offset=4, size=4, while registered memory area is 3 bytes)! Error: bsp_put would go out of bounds at destination processor (offset=8, size=4, while registered memory area is 3 bytes)!
Looking at these errors, the message is clear: I'm trying to put something in the wrong place.
However, there was only registered one array of size P * sizeof(bool) == 3 and I have made sure that this is NOT where the processors are putting. Indeed, the only occasion on which something is put in this array, there is only 1 processor doing so, and he will put a boolean value (1 byte).
I read your documentation on bsp_push_reg() which says:
"The order of variable registration must be the same across all threads in the SPMD program."
Although this appeared to be correct in our application, I thought I'd try to place bsp_sync()'s all over the place. Each registration (there are a lot of those) is now followed by a sync which fixes the problem. In the original BSP implementation, we only synced once after having registered all the variables. This can't be right...right? Is there another solution?
Some background information:
We're using structs to pack some variables together, among which are (pointer-)pointers. To allow for communication of these structs, including the memory the pointers are pointing to, we register each field of the struct separately in a function regStruct(). However, this does not seem to be a problem since the program still crashes when we remove the syncs between regular bsp_push_reg() calls.
Last edited by jorenheit (2012-11-23 10:49:04)
I've seen such problems before, but only when deregistrations were in place as well; this was fixed in 1.0.1 though. Are there pop_regs within supersteps in your application? If not, I'd like a (minimal) working example so that I can check what goes wrong...
I checked the logs a while back: the issue I referred to above occurred when a process registers a new variable which incidentally had the same address as a previously registered variable, but not on all processors. This became evident in an application where mallocs, push_regs, pop_regs and frees were woven into subroutines in an incorrect fashion. It did lead to a detection of a bug related to multiple push_regs on the same pointer, which was fixed before version 1.0.0 already; not 1.0.1.
In any case, synching once should be all right, and the only thing I can think of is that some of the push_regs try to register the same address on some of your threads, like the above. In that case the last register will prevail (as per the standard). Note that in code like
x=malloc(30*sizeof(double)); free(x); y=malloc(2*sizeof(char)); assert( x != y);
the assertion may fail, but doesn't have to, and in multithreading some threads may fail while others would not; this was the root cause in the bug I had.
Hope this helps.