The MulticoreBSP Forums

A place to discuss the MulticoreBSP library and its applications, and for discussing the use of the Bulk Synchronous Parallel model on shared-memory architectures, or hybrid distributed/shared architectures.

You are not logged in.

#1 2012-11-10 16:01:56

jorenheit
Member
Registered: 2012-11-10
Posts: 3

Bug? Global memory corrupted after bsp_sync() [not-a-bug]

We have a global pointer to some struct, which is initialized (by each processor individually) in the parallel part of a BSP program (after bsp_begin()). After initialization, each processor reports the correct values for all of the fields of this struct. However, when we sync after initialization (more precisely, after pushing some other variable on the register, but this did not seem to have anything to do with the problem), the data in the struct has been corrupted, causing segfaults in some instances of the program, or random values in others.

When porting the program to BSP instead of MCBSP, these problems vanish.
Below is an example to illustrate the problem:

SomeStruct* globalPointer;

void parallelPart(void)
{
    bsp_begin(P);
    globalPointer = initSomeStruct();
    
    int someOtherVar = 42;
    bsp_push_reg(&someOtherVar, sizeof(int));
    bsp_sync();

    printf("%d\n", globalPointer->someIntField);
    bsp_end();
}

Last edited by jorenheit (2012-11-10 21:52:50)

Offline

#2 2012-11-11 00:41:59

Albert-Jan Yzelman
Moderator
Registered: 2011-08-04
Posts: 32

Re: Bug? Global memory corrupted after bsp_sync() [not-a-bug]

Hi, and welcome to the forums!

I believe the issue here is the scope of globalPointer. It's defined globally, but as MulticoreBSP is a threading library, it means it's actually shared amongst all threads. This differs from the process-based BSP libraries (Oxford BSPlib, BSPonMPI). The only way to prevent this change in behaviour was to make MulticoreBSP for C process-based, thus losing all the advantages of threading. If this is indeed the issue however, it should be independent of registering someOtherVar; the problem is racing inits on the globalPointer.

A fix to your code would be:

void parallelPart(void)
{
    bsp_begin(P);
    SomeStruct* localPointer;
    localPointer = initSomeStruct();
    
    int someOtherVar = 42;
    bsp_push_reg(&someOtherVar, sizeof(int));
    bsp_sync();

    printf("%d\n", localPointer->someIntField);
    bsp_end();
}

Each thread-local variable should thus be defined locally within each function. I can understand this is inconvenient (although it does give more freedom in shared-memory programming). If you strongly prefer to not declare all variables within functions, programming in C++ using the MulticoreBSP C++-wrapper might be preferable (just include mcbsp.hpp, instead of .h, from within C++ source).

I hope this indeed explains your issue, and I apologise for not describing this issue earlier; in fact describing precisely this issue was on my to-do list.

With best regards,
Albert-Jan

Offline

#3 2012-11-11 10:20:09

jorenheit
Member
Registered: 2012-11-10
Posts: 3

Re: Bug? Global memory corrupted after bsp_sync() [not-a-bug]

Thanks for your reply! Good to know where it went wrong :-) We will try to find a solution based on this!

P.S. Normally, I'm no fan of global variables, but in this specific case we have a highly recursive tree-searching function in which some variables must be accessed from arbitrarily deep within the tree. Therefore it seems more convenient to declare the variables global instead of passing pointers (thus copying them quite a lot) to all the nodes of the tree.

Offline

#4 2012-11-11 11:36:30

Albert-Jan Yzelman
Moderator
Registered: 2011-08-04
Posts: 32

Re: Bug? Global memory corrupted after bsp_sync() [not-a-bug]

If it's searching for a position only (no changes to the tree), then passing the global variables as const parameters should already stop the compiler from pushing them onto the stack after each recursive call wink

It generally also helps performance to store tree-like structures in flat arrays, and transform recursive algorithms into loop-based ones (to help data locality and prevent pushing return addresses at each recursion, respectively); if your application allows such changes of course..

In any case, I hope the changes are not hard to incorporate,
and don't hesitate to post again if BSP issues seem to pop up! smile

Offline

Board footer

Powered by FluxBB