Using GDB to check core file

An application crashed, and there was a core file generated.

1. Collect all the associated libraries, and run “gdb”.

(gdb) file testApp

(gdb) set solib-search-path .

(gdb) core testApp. core

it will show something like:

Program terminated with signal 11, Segmentation fault.
#0 0x781306a6 in xx_thread () from

(gdb) backtrace
#0 0x781306a6 in xxx_thread () from
#1 0x01025a8c in timer_settime () from

So you want to know what the instruction is at 0x781306a6.. go to..

2.  check the disassembly code

(gdb) disassemble hw_capture_thread

it will show you the assembly code and where the crash occurs.


(gdb) info registers


so it is clear that an invalid address is being accessed: register r3 is 0.

3. Which piece of C code?

you need have a library with symbols (-g), and run “pahole”, or run “pahole xx.o”, it will show all the structures in that library/file.

the crashing point shows that the code was trying to access a member of a structure at offset 52, so you look for the structures which has a member at 52..

Note: why does the backtrace show “imer_settime()” calls “xx_thread()”?

the reason is, it is some function in C library which calls/schedules xx_thread. However, since there are no debug symbols in libC, gdb can’t locate the exact function, and can only return a static/global function which is closer.


1. “info” is a very useful command.

  • info threads
  • info shared libraries
  • info registers

2. to check the states of other threads, ” thread apply all bt”.

3. Pohole (Poke-a-Hole) was developed to find the size of the data structures, and the holes caused due to aligning the data elements to the word-size of the CPU by the compiler. see


Mapping to device memory

1. mmap_device_io()

uintptr_t mmap_device_io( size_t len, uint64_t io );

  • maps len bytes of device I/O memory at io, and makes it accessiable via the in*() and out*() function.
  • returns a handle to the device’s I/O memory, or MAP_DEVICE_FAILED if an error occurs (errnois set).


uintptr_t vin_base;

ctx->vin_base = mmap_device_io(RCAR3_VIN_SIZE, RCAR3_VIN0_BASE); 

 if (ctx->vin_base == (uintptr_t)MAP_DEVICE_FAILED) {

                 ctx->vin_base = (uintptr_t)NULL; 

                 return errno;


Access the registers:

uint32_t reg_val;

reg_val = in32(vin_base + (off));

out32(vin_base + (off), (value));

2. mmap_device_memory()

void * mmap_device_memory( void * addr, size_t len,  int prot, int flags, uint64_t physical); 

  • maps len bytes of a device’s physical memory address into the caller’s address space at the location returned by mmap_device_memory().
  • Returns the address of the mapped-in object, or MAP_FAILED if an error occurs (errno is set).


 uint32_t *ipu_regp;   

ipu_regp = (uint32_t*)mmap_device_memory(NULL, IPU_REGSIZE,                   PROT_READ|PROT_WRITE|PROT_NOCACHE, 0, ipu_regbase);

if (ipu_regp == MAP_FAILED) {

                  ipu_regptr = NULL;

                  return err; 


#define ipu_regptr(offset)  (uint32_t volatile *) (((unsigned char volatile *) ipu_regp) + offset)

#define IPU_CONF         ipu_regptr(IPU_CONF_OFFSET + 0x0)

uint32_t ipu_conf_val;

ipu_conf_val = *IPU_CONF;           

*IPU_CONF = ipu_conf_val;

Register knowledge

Access to the shared registers

  1. configuration registers, usually being modified via “read – modify – write”.
    • use mutex to protect the register access.
    • use irqspin lock, if the registers are being accessed by ISR handler.
    • NOTE: the if condition needs to be mutex/spinlock protected too, in the this example: if(!*IPU_CONF & (1 << 3)) *IPU_CONF |= (1 << 3);
  2.  status registers
    • “write 1 to clear” register
      • if the “write” operation is atomic, no protection is needed; otherwise, protect it.
      • however, if you perform a “read & write”, you need protect the entire ‘read-write” sequence.
    • “read to clear” register
      • protect doesn’t work in this case.

Register definition

  1. MAP_CONF(x) registers, each register contains one field, for map x only.
    • #define MAP_CONF(x)  (OFFSET + (x) * 4)
      • 0 — 0
      • 1– 4
      • 2– 8
  2. MAP_CONF(x) registers, each register contains two fields: bit 0 ~ 15 for map 0/2/4; bit 16 ~ 31 for map 1/3/5.
    • #define MAP_CONF(x)  (OFFSET + ((x) & ~0x1) * 2) or
    • #define MAP_CONF(x)  (OFFSET + ((x) / 2 ) * 4)
      • 0,1 — 0
      • 2,3– 4
      • 4,5—8
    • write to the correct bit fields
      • shift = (x & 1) * 16;
      • *MAP_CONF(x) |= val << shift;
  3. MAP_CONF(x) registers, each registers contains 3 fields: bit 0 ~ 10 for map 0/3/6; bit 11 ~ 20 for map 1/4/7, bit 21~30 for map 2/5/7.
    • #define MAP_CONF(x) (OFFSET + ((x)/3 * 4)
      • 0,1,2— 0
      • 3,4,5– 4
      • 6,7,8 — 8
    • write the correct bit fields
      • shift = (x%3) * 10;
      • *MAP_CONF(x) |= val << shift;
  4. An complicated example:
    • MAP_CONF(x) contains 6 fields, and defined as (OFFSET + ((x) / 2 ) * 4).
      • 30-26: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 2 —- 5
      • 25-21: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 1. —- 4
      • 16-20: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 0. —- 3
      • 14-10: mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 2  —- 2
      • 9-5:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 1. —- 1
      • 4-0:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 0  —- 0
    • MAP_VAL(y) contains 4 fields, and defined as (OFFSET + ((y) / 2 ) * 4).
      • 28-24: offset #1 (or 3,5,7,..).
      • 23-16: mask #1
      • 12-8: offset #0 (or 2,4,6…)
      • 7-0:    mask #0
    • For a specified “map” value, e.g. 0, and 1, we want to set:
    • map   byte0         byte1         byte2
    • —————————————————
    • 0         7, 0xFF,  15, 0xFF,     23, 0xFF
    • 1         5, 0xFC,  11, 0xFC,     17, 0xFC
      • the register fields in MAP_CONF(0) will be set to the values, as marked as blue above.
      • for MAP_VAL(x), it will look like this
      •            MAP_VAL(0)      MAP_VAL(1)        MAP_VAL(2)
      • ————————————————————————————————
      • 28-24         15                      5                         17
      • 23-16          0xFF                 0xFC                  0xFC
      • 12-8            7                        23                      11
      • 7-0               0xFF                0xFF                  0xFC
    • the implementation of configure_map(map, offset_b0, mask_b0, offset_b1, mask_b1, offset_b2, mask_b2) would be:
      • shift = (map & 1) * 16;
      • pointer = map * 3;
      • *MAP_CONF(map) |=  ((pointer + 2) << 10 | (pointer +1 ) << 5 | (pointer) <<0 ) << shift;
      • // We need use “pointer”, “pointer + 1”, “pointer + 2” to find the offset of the associated MAP_VAL(y), and the shifts.
        • shift = (pointer & 1) * 16;
        • *MAP_VAL(pointer) |= ((offset_b0 << 8) | (mask_b0 << 0)) << shift;
        • shift = ((pointer +1) & 1) * 16;
        • *MAP_VAL(pointer+1) |= ((offset_b1 << 8) | (mask_b1 << 0)) << shift;
        • shift = ((pointer +2) & 1) * 16;
        • *MAP_VAL(pointer+2) |= ((offset_b2 << 8) | (mask_b2 << 0)) << shift;

Jaggy artifacts (raw yuv file is important!)

Jaggy arfifacts are introduced by the mismatching between the mismatching of neighbouring pixels. In YUV color space, the focus is on Y component.

Some jaggy artifact is unavoidable. For instance,  when the WEAVE (field combination) deinterlacing is deployed, any change between fields will result “jaggies”, as the pixels in one field do not line up with the pixels in the other.

I met one issue, where UYVY output is good, while YUYV output shows apparent jaggies. First thing coming into my mind is: do the Ys get reversed while being output? Y0U0Y1V1–>Y1UxY0Vx?

The lucky thing is we can route the output to the input interface and capture the raw data to analyse. The raw data clearly shows that Y1,Y3, are not there, but Y0, Y2, are there for twice.

Use a hex editor to open the raw YUV file,

at address 0xC50, uyvy file shows “87 59 7E 5C”; yuyv file shows “59 87 59 7E”.

screen basics

Screen is a compositing windowing system. It is able to combine multiple content sources together into a single image.

Two types of composition:

  1. Hardware composition: composes all visible(enabled) pipelines at display time.
    • In order to use this,
      • You need specify a pipeline for your window: use screen_set_window_property_iv().
      • use screen_set_window_property_iv() to set the SCREEN_USAGE_OVERLAY bit of your SCREEN_PROPERTY_USAGE window property.
    • The window is considered autonomous as no composition was performed (on the buffers, which belong to this window) by the composition manager.
    • For a window to be displayed autonomously on a pipeline, this window buffer’s format must be supported by its associated pipeline.
  2. Composition manager: Composes multiple window buffers (belong to multiple windows) into a single buffer, which is associated to a pipeline.
    • The single buffer is called /composite buffer/ screen framebuffer.
    • Used when your platform doesn’t have hardware capabilities to support a sufficient number of pipelines to compose a number of required elements, or to support a particular behavior,
    • One pipeline is involved (you don’t specify the pipeline number and OVERLAY usage).
    • Requires processing power of CPU and/or GPU to compose buffers

Note:Pipeline (in display controller) equals to layer (in composition manager), which is indexed by EGL level of app.

Pipeline ordering (Hardware property) and z-ordering (for windows)

  • Pipeline ordering and the z-ordering of windows on a layer are applied independently of each other.
  • Pipeline ordering takes precedence over z-ordering operations in Screen. Screen does not have control over the ordering of hardware pipelines. Screen windows are always arranged in the z-order that is specified by the application.
  • If your application manually assigns pipelines, you must ensure that the z-order values make sense with regard to the pipeline order of the target hardware. For example, if you assign a high z-order value to a window (meaning it is to be placed in the foreground), then you must make a corresponding assignment of this window to a top layer pipeline. Otherwise the result may not be what you expect, regardless of the z-order value.

Window: a window represents the fundamental drawing surface.

  • An application needs use multiple windows when content comes from different sources, when one or more parts of the application must be updated independently from others, or when the application tries to target multiple displays.

Pixmap: A pixmap is similar to a bitmap except that it can have multiple bits per pixel (a measurement of the depth of the pixmap) that store the intensity or color component values. Bitmaps, by contrast, have a depth of one bit per pixel.

  • You can draw directly onto a pixmap surface, outside the viewable area, and then copy the pixmap to a buffer later on.

Note: Multiple buffers can be associated with a window whereas only one buffer can be associated with a pixmap.

sleep & delay function (2)

To let the thread to suspend the exact amount of time, without being affected by thread scheduling, we can use nanospin().

int nanospin( const struct timespec *when );


The nanospin() function occupies the CPU for the amount of time specified by the argument when without blocking the calling thread. (The thread isn’t taken off the ready list.) The function is essentially a do…while loop.

The first time you call nanospin(), the C library invokes nanospin_calibrate() with an argument of 0 (interrupts enabled), if you haven’t already called it.

int nanospin_ns( unsigned long nsec );

The nanospin_ns() function busy-waits for the number of nanoseconds specified in nsec, without blocking the calling thread.

void nanospin_count( unsigned long count );

The nanospin_count() function busy-waits for the number of iterations specified in count. Use nanospin_ns_to_count() to turn a number of nanoseconds into an iteration count suitable for nanospin_count().

sleep & delay functions (1)

Quoted from QNX document.

— delay(unsigned int duration) suspends the calling thread for duration milliseconds.

— sleep(unsigned int seconds) function suspends the calling thread until the number of realtime seconds specified by the seconds argument have elapsed, or the thread receives a signal whose action is either to terminate the process or to call a signal handler.

both delay() and sleep() returns either 0 or the number of unslept time if interrupt by a signal.

— usleep(useconds_t useconds) function suspends the calling thread until useconds microseconds of realtime have elapsed, or until a signal that isn’t ignored is received.

— nanosleep( const struct timespec* rqtp, struct timespec* rmtp )  function causes the calling thread to be suspended from execution until either:

  • The time interval specified by the rqtp argument has elapsed


  • A signal is delivered to the thread, and the signal’s action is to invoke a signal-catching function or terminate the process.

usleep() and nanosleep()  returns either 0 (success) or -1 (an error occured)



With all the functions above, the suspension time may be greater than the requested amount, due to the nature of time measurement (see the Tick, Tock: Understanding the Neutrino Microkernel’s Concept of Time chapter of the QNX Neutrino Programmer’s Guide), or due to the scheduling of other, higher priority threads by the system.