[Jack-Devel] Race conditions in JackPosixSemaphore

PrevNext  Index
DateWed, 26 Dec 2012 18:59:59 +0400
From Max <[hidden] at i-free dot com>
To[hidden] at jackaudio dot org
Follow-UpStéphane Letz Re: [Jack-Devel] Race conditions in JackPosixSemaphore
Hi, Jackers
>
>   I found a race conditions in JackPosixSemaphore on client side 
> (method ConnectInput) that can lead to open a semaphore twice or more 
> and close it only once. As result, a semaphore descriptor leaves 
> opened after closing  and the client app can hang up ("too more open 
> files"). It is critical error for our apps. The cause is that more 
> than one thread can try to open client-side semaphore simultaneously 
> if fSemaphore is NULL, - classic race condition.
>
> I rewrite JackPosixSemaphore::ConnectInput using CAS (I use Linux and 
> gcc):
>
> // Client side : get the published semaphore from server
> bool JackPosixSemaphore::ConnectInput(const char* name, const char* 
> server_name)
> {
>     if (fSemaphore) {
>         jack_log("JackPosixSemaphore::Connect: already connected name 
> = %s", fName);
>         return true;
>     }
>
>     BuildName(name, server_name, fName, sizeof(fName));
>     jack_log("JackPosixSemaphore::Connect name = %s", fName);
>
>     while ( fSemaphore == SEM_FAILED ) {
>         sem_t * pSem = sem_open(fName, O_CREAT) ;
>         if ( pSem == SEM_FAILED ) {
>             jack_error("JackPosixSemaphore::Connect: can't connect 
> named semaphore name = %s err = %d %s", fName, errno, strerror(errno));
>             return false;
>         }
>
>         if ( __sync_bool_compare_and_swap( &fSemaphore, SEM_FAILED, 
> pSem )) {
>             // sem_getvalue for debugging purpose...
>             int val = 0 ;
>
>             sem_getvalue(fSemaphore, &val);
>             jack_log("JackPosixSemaphore::Connect: %s 
> sem_getvalue=%ld", fName, val);
>
>             return true ;
>         }
>
>         sem_close( pSem )   ;
>     }
>
>     jack_log("JackPosixSemaphore::Connect: already connected name = 
> %s", fName);
>     return true ;
> }
>
> After this patch, the race cond is disappeared and Jack stay more stable.
> Of course, instead of use gcc-specific intrinsics 
> __sync_bool_compare_and_swap it will be better to use C++11 
> std::atomic to implement CAS.
>
> JackPosixSemaphore::Disconnect should be rewritten in that way:
>
> // Client side: close semaphore without destroying
> bool JackPosixSemaphore::Disconnect()
> {
>     if ( fSemaphore != SEM_FAILED ) {
>         jack_log("JackPosixSemaphore::Disconnect name = %s", fName);
>
>         while ( fSemaphore != SEM_FAILED ) {
>             sem_t * pSem = fSemaphore ;
>             if ( __sync_bool_compare_and_swap( &fSemaphore, pSem, 
> SEM_FAILED )) {
>                 if ( sem_close( pSem ) != 0 ) {
>                     jack_error( "Disconnect: can't disconnect named 
> semaphore name = %s err = %d %s", fName, errno, strerror(errno) );
>                     return false ;
>                 }
>                 break ;
>             }
>         }
>     }
>     else {
>         jack_log("JackPosixSemaphore::Disconnect: attempt to close 
> empty semaphore %s", fName);
>     }
>     return true ;
> }
>
> Second problem is on the server side, 
> JackRequestDecoder::HandleRequest. If the socket is down on client 
> side, the HandleRequest method continues to try read a request from 
> bad socket (of course, it reads garbage data and issues a lot of error 
> messages) and ** try to execute ** such bad request. In some cases 
> this can lead to crash. I patch this problem raising "bad_request" 
> exception in read client socket and catch it in 
> JackSocketServerChannel::Execute:
>
> // Decode request
>                     } else {
>                         // Result is not needed here
>                         try {
> fDecoder->HandleRequest(socket, header.fType);
>                         }
>                         catch ( bad_request ) {
> jack_error("JackSocketServerChannel::Execute : Bad request encountered 
> for fPollTable i = %ld fd = %ld", i, fd);
>                             ClientKill(fd) ;
>                         }
>                     }
>
>
> I rewrite the CheckRead macro:
>
> #define CheckRead(req, socket)   { if (req.Read(socket) <  0) { 
> jack_error("CheckRead error %d", errno); throw bad_request() /*return 
> -1*/; } }
>
> and define bad_request exception:
>
> // exception
> struct bad_request {};
>
>
> Best regards
>     Max Khizsinsky (aka khizmax)
>     the author of libcds.sourceforge.net, the C++ template library of 
> lock-free data structs
>
> PS Bugs report (http://trac.jackaudio.org) does not work (internal 
> server error 500), so I'm writing my bug report in this dev list, sorry.
>
PrevNext  Index

1356534015.10874_0.ltw:2,a <50DB10EF.8000901 at i-free dot com>