Category Archives: synchronization

mutex mistakes i made (2)

We need a mutex for protecting the access to a hardware block.

Two conditions:

— the hardware block is shared by multiple processes.

— Each operation performed on this hardware block  typically lasts for 2 ms.

Therefore, the mutex is initialized as a shared object, /dev/shmem, and my code look like:

main_thread()

{

start a new operation;

}

dedicated_thread()

{

Wait for “operation completion” interrupt with a timeout;

if the user wants to stop, if is_mutex_locked == 1, unlock the mutex, and exit.

When interrupt comes, if is_mutex_locked == 1, unlock the mutex;

If enough buffers,

lock the mutex, and set is_mutex_locked to 1,

Set up the next operation.

}

Signal_thread()

{

listen to signals;

if it is signal asking for stop,

Wait for the operation complete (by reading a DMA status register); if is_mutex_locked == 1, unlock the mutex

Tell the dedicated thread to exit (by setting a global variable g_dead).

Wait for the dedicated thread exit;

}

There are other two changes:

  1. I defined a global variable is_mutex_locked (done at the beginning)
  2. unlock the mutex after waiting for the operation complete..(done as the last step)

Since #2 is added, everything goes weired. another process is able to lock the mutex when the previous process is still using it, the mutex never gets unlocked…

where is the problem: main thead executes a void, useless pthread_mutex_unlock() while waiting for completion, which leads to dedicted_thread() doesn’t do any unlocking before exit.

mutex: a mistake I made

I defined a shared mutex for processes, in /dev/shmem

The mutex is for protecting the access to a hardware block. Each operation might last for 2 ms.

Dedicated_thread()

{

Wait for “operation completion” interrupt.

When interrupt comes, unlock the mutex;

(if the user wants to stop, unlock the mutex).

If required,

Lock the mutex,

Set up the next operation.

}

I wanted to only unlock the mutex if it was being locked, therefore, I add some check, an global_variable is_mutex_locked.

Don’t use global variable to indicate if the mutex is locked, or unlock it.

Stop_operation()

{

Wait for the operation complete (by reading a DMA status register);

Tell the dedicated thread to exit (by setting a global variable g_dead).

Wait for the dedicated thread exit;

}

IPC:signal basics(3)

Example of signal thread/signal handler

In main thread and other threads,

sigset_t sigset;

sigfillset(&sigset);

sigdelset(&sigset, SIGSEGV);

pthread_sigmask(SIG_BLOCK, &sigset, NULL);

in main thread,

signal(SIGSEGV, clean_up);

if(sigsetjmp(env, 1) != 0) {

          // do something

          exit(EXIT_FAILURE);

}

in signal thread,

sigset_t sigset;

sigfillset(&sigset);

sigdelset(&sigset, SIGSEGV);

pthread_sigmask(SIG_BLOCK, &sigset, NULL);

for(;;) {

if (sigwait(&sigset, &signo) == 0) {

                    // do something;

          }

}

In clean_up(signo)

Option 1 (simple signal handler):

// do something

signal(signo, SIG_DFL);

raise(signo);

option 2(siglongjmp):

signal(signo, SIG_DFL);

siglognjmp(env, 1); 

Note:

  1. signal_thread is not able to catch SIGSEGV, generated due to program errors (it targets to the erroneous thread itself).
  2. signal handler is asynchronous, therefore, inspection and manupulation of shared variables need to be protected.
  3. signal handler is possible to be executed by any thread.
  4. the code after siglongjmp will be continue executed by the same thread which calles signal hander..but the code is a part of main thread..
    1. I have seen an extra SIGSEGV generated, if it is other thread, not main thread, executes the content in sigsetjmp().
  5. even the internal mutex, used between threads, is locked prior SIGSEGV is generated. signal handler, or the code in sigsetjmp can still accquire the mutex, without errors (pthread_mutex_lock() returns 0, no errno set)…the mutex is with default atrributes, no recurrsive. qnx doc says the behavior is unpredictable…

IPC:signal basics (2)

Signal actions

three types of actions:

  • SIG_DFL,
    • for most signals, kill the process.
    • SIGSTOP: stop the process
    • SIGCONT: continue the program.
    • SIGCHLD, SIGIO, SIGURG, SIGWINCH: ignore the signal.
  • SIG_IGN,
    • discard all pending signals, whether it is blocked or not. the new signals are discarded, too.
  • or a pointer to a function (signal handler)
    • the function is called in a manner equivalent to the code sequence of : signal (sig_no, SIG_DFL); (*func)(sig_no);

Note:

  • Initially, all signals are set to SIG_DFL or SIG_IGN, prior to entry of main() routine.
  • The default action for most signals is to terminate the process.

Related functions

  • int sigaction (int sig, const struct sigaction *act, struct sigaction *oact) examine or specify the action associated with a signal.
  • void *signal (int sig, void (*func)(int)) specify an action to take place when the process is hit by the signal.
    • signal() will call sigaction().

More about signal handler

  1. What you can do:
    • Return
    • Call exit() or abort() to terminate the program
    • Call longjmp(), or siglongjmp().
  2. After return from the signal handler, the receiving process resumes execution at the point where it was interrupted.
  3. signal handlers are invoked asynchronously with process execution, so you should be taken into account the same sort of things that you would in a multithread environment, when inspecting or manipulating shared resources.
  4. When a signal handler is invoked,  the responsible signal is masked before its handler is called. If the handler returns normally, OS restores the signal mask. Any signal mask change using pthread_sigmask() in the handler are undone..

Signal Mask

pthread_sigmask(int how, const sigset_t* set, sigset_t oset)  exmain or/and change the calling thread’s signal mask.

how:

SIG_BLOCK:  block the “set”, together with the current signal mask.

SIG_UNBLOCK: unblock the “set”.

SIG_SETMASK: block only the “set”.

Or sigprocmask()

Generate a signal
1) int raise (int sig) == pthread_kill(pthread-self(), sig)

2) pthread_kill(thread, int sig)

3)kill(pid, sig)

Manipulate a signal set

sigfillset(*set): initialize set to contain all signals.

Sigempty(*set)

Sigaddset(*set, signo)

Sigdelset(*set, signo)

sigismember(*set, signo)

Wait for a signal

1)  sigwait(*set, *signo)

returns 0 if success, EINTR, EINVAL,, etc, otherwise.

Note: the signals defined by “set” must be blocked before you call sigwait(). otherwise, the behavior is undefined.

2) sigwaitinfo(*set, *info). blocks until the specified signal (defined in *set) is pending. If already pending, returns immediately.

    returns: a signal number, or -1 (with errno set).

3) sigtimedwait(*set, *info, const sruct timespec *timeout)

IPC: Signal Basics

Processes can ignore, block, or catch all signals except SIGSTOP and SIGKILL.

  • Catch: If a process catches a signal, it means that it contains code that will take appropriate action when the signal is received.
  • signal mask: Attribute of a thread.  All the signals in signal mask are being blocked by the thread. To change it, use pthread_sigmask().
    • If a masked signal occurs, it becomes pending (can be checked by sigpending (*set)), but doesn’t affect the execution of the process.
    • When a pending signal is unmasked, it’s act upon immediately, before pthread_sigmask() returns.
    • Ignore:

Some principles/rules

  • The signal actions (ignore, catch) are maintained at the process level.
    • If a thread ignores or catches a signal, it affects all threads within that process.
  • The signal mask(block, unblock) is maintained at the thread level.
    • If a thread blocks a signal, it affects only that thread.
  • An unignored signal targeted at a thread will be delivered to that thread alone.
  • An unignored signal targeted at a process is delivered to the first thread that doesn’t have the signal blocked.
    • If all threads have the signal blocked, the signal will be queued on the process until one thread ignores or unblocks it.
    • A practical way is to mask the signals in all threads but one, which is dedicated to handling them.

Some signals

  • CTRL-C: SIGINT (signo 2)
  • CTRL-Z: SIGTSTP (signo 24)
  • Slay: SIGTERM (signo 15)
  • Bad pointer/invalid memory address: SIGSEGV (signo 11)
    • Hardware generates a fault – the kerneal set SIGSEGV signal on the process
  • note: in QNX, SIGSEGV, is generated by the program itself, will target only the thread which causes the issue, not the process. the signal is not caught by the separate signal thread!
  • SIGBUS (signo 10)

Send a signal to a process: slay -s <process name>

QNX extends the signal-delivery mechanisms of POSIX by allowing signals to be targeted at specific threads, rather than simply at the process containing the threads. QNX kernel uses common code to manage signal and pulse (8 bit code, 32-bit value). Signal number is mapped to a pulse priority using _SIGMAX-signo. As a result, signals are delivered in priority order with lower signal numbers having higher priority.

an example of mutex and spinlock

typedef struct {
    pthread_mutex_t mutex_hw; intrspin_t spinlock;     pthread_mutex_t mutex_reset; volatile unsigned ref_count; volatile unsigned reset; volatile unsigned enabled[2]; } share_context_t;

typedef struct { pthread_mutex_t mutex; share_context_t *share_ctx; } private_context_t;

private_context_t *private_ctx;
  • mutex is used to protect the access to the content of private_ctx, maybe between different threads, as long as they all have access to private_ctx.
  • mutex_hw is used for protect the access to critical sections (e.g.  Shared registers, shared variables) between threads, unnecessarily the same context.
  • spinlock is used to protect the access to  critical sections (e.g.  Shared registers, shared variables)  between threads and interrupt handlers.
New issue: the FIFO controller, which is shared by two contexts, needs to be reset on-the-fly to fix some problems. We want to reset FIFO controller when a context is being started.
Solution: As resetting FIFO controller while the other one is using it, is not safe. I introduced extra members to shared_context_t:
  • ref_count counts the number of clients who use the specified hardware block (FIFO).
  • enabled indicates if the other context is enabled/active.
  • reset If we are the only user, we will reset the hardware block immediately; if there is another user and it is active (enabled), we set “reset” to “1”, and ask the other user to reset the FIFOwhen it is safe: in isr handler when EOF is received (1st implementation); or when timeout (waiting for EOF timeouts) occurs (added in 2nd implementation).
1. 1st implementation
client_start() {
   …..
    atomic_add(&ctx->share_ctx->ref_count, 1);
    // what if someone else changes ref_count at this point? ….No problem.     // what if someone else changes enable here?  … No problem.

    pthread_mutex_lock(&ctx->share_ctx->mutex_reset);
    if(ctx->share_ctx->ref_count == 1 || ctx->share_ctx->enable == 0) {
  // what if the other user changes ref_count, or enable, here? The reset might bring unpredictable affect on other user.  Therefore, the “if” check has to be mutex locked (reset_mutex). 
    do reset;
    pthread_mutex_unlock(&ctx->share_ctx->mutex_reset);
    } else {
    atomic_set(&ctx->share_ctx->reset, 1);  // the other user will do reset
while it’s safe
    pthread_mutex_unlock(&ctx->share_ctx->mutex_reset);
    // we don’t need mutex lock to check whether ctx->reset is cleared or not.
    While(ctx->share_ctx->reset) {
            delay(1);
    }
  }
…..
}
 
In isr_handler,
    if(EOF IRQ is set) {
        if(ctx->share_ctx->reset) {
            reset;
            atomi_clr(&ctx->share_ctx->reset, 1);
        }
    }
2. 2nd implementation
 As the EOF interrupt might not come in error condition. We need do the reset when the waiting for EOF gets timeouted. This brings complexity here, as the “reset” bit might be cleared by ISR handler, or by a thread when timeout occurs.
check_n_reset()
{

    pthread_mutex_lock(&ctx->share_ctx->mutex_reset);
    InterruptLock(ctx->share_ctx->spin_lock);
    if(ctx->share_ctx->ref_count == 1 || ctx->share_ctx->enable == 0 || ctx->share_ctx->reset) {
  // what if the EOF interrupt is raised here? the reset will be done twice. a spinlock is required.
    do reset;
    pthread_mutex_unlock(&ctx->share_ctx->mutex_reset);
    InterruptUnlock(ctx->share_ctx->spin_lock);
    } else {
    atomic_set(&ctx->share_ctx->reset, 1);  // the other user will do reset
while it’s safe
     InterruptUnlock(ctx->share_ctx->spin_lock);
     pthread_mutex_unlock(&ctx->share_ctx->mutex_reset);
    // we don’t need mutex lock to check whether ctx->reset is cleared or not.
    While(ctx->share_ctx->reset) {
            delay(1);
    }
  }
…..
}
client_start()
{
    atomic_add(&ctx->share_ctx->ref_count, 1);
    check_n_reset();
}
 isr_handler,….
 
 

IPC: Message fundamentals

Channel: Servers receive on channels.
Connection: Clients should connect/attach to the channel first, then send to the channel using the connection.
Channel creation
int ChannelCreate(unsigned flags)
FLAGS
_NTO_CHF_PRIVATE    only be used inside a process
_NTO_CHF_FIXED_PRIORITY  receiving threads won’t change priorities to those of the sending threads.
_NTO_CHF_THREAD_DEATH  ask to deliver a pulse if a thread is dead.
_NTO_CHF_DISCONNECT  Deliver a pulse when all connections from a process are detached.
_NTO_CHF_COID_DISCONNECT Deliver a pulse to all connections if the channel is destroyed.
_NTO_CHF_UNBLOCK
Connection attach/detach
Int coid = ConnectAttach(node_id, pid, chid, index | _NTO_SIDE_CHANNEL, flags)
_NTO_SIDE_CHANNEL should always be used, to make sure the returned coid is greater than any valid file descriptor.
When _NTO_SIDE_CHANNEL is used, index is ignored.
Int ConnectDetach(coid)
How does the client find the server?
Before MsgSend(coid, &msg, ..), a connection attach is required, coid = ConnectAttach(nd, pid, chid…).
  • If they are in same process, it is easy. Sharing “chid” across the process should work.
  •  If they are not (common case), below is how…
If server is a Resource manager
  • The server: resmgr_attach(..,”/dev/mymgr/”,…);
  • The client: fd = open(“/net/nodenme/dev/mymgr”, ..);
    • Write(fd,..), read(fd,..), MsgSend(fd,…)
    • Close(fd);
    • note: fds are a particular type of coid.
If server is just A simple MsgReceive() loop,
  • server: name_attach(NULL, “myname”, 0);
  • client: coid = name_open(“myname”, 0);
    • MsgSend(coid, &msg,….);
    • name_close(coid);