Logical thinking(2)

With 2 headache issues being solved, now I am relaxed. At the same time, a few things need to be learnt from.

summary of 2 issues:

#1 Kernel crash or system hang, if capture is started 5 ~ 10s after bootup. the crash point is in VPE isr handler most times. and the kernel developer’s analysis indicates that there is memory corruption.

— I have narrowed down to the VPE operation, as the issue disappeared if there is no VPE operation.

#2 Display shows stripes then system resets, if there is capture activity involved right after bootup.

—  EMIF registers were corrupted (JTAG debug), most of time 32 bytes. By setting up a firewall against EMIF registers, the analysis showed that it is VIP or VPE hit the exception a few times.

My action:

I tried the VPE max_size to restrict VPDMA to write over boundry, and it doesn’t help. then, I have no idea.

There is an idea from some else: VIP initialisation might bring some issues here. so most people focused on the VIP instantiation sequence, delay, ..coordination.. If that’s the reason, it looks that nothing I can do…

Correct way to continue:

if VIP intialization doesn’t solve the problem, we need focus on the VPE driver  — It is a memory corruption, where are we writing during an VPE VPDMA operation?  My first idea is the VPE output buffer, and didn’t check the code clearly. .. Actually, there is another write: write descriptor, which is provided to VPDMA to write into during a VPDMA transaction.

Also, register dump and dma configuration (extremely important! as it might be initialised on the stack before being passed the the hardware) are essential if the system is still reachable (via JTAG), as they will give some clue about what might be wrong. 

the  dma configuration printed in issue #2 showed that a write descriptor address is a random value (as it is defined on the stack and not gets initialised to 0), which leads to VPDMA writing to EMIF registers.


Issue analysis skill — logical thinking

We met one lockup issue recently, which was quite confusing from the symptom. There were a few facts based on the testing:

  1. CVBS input works well.
  2. LVDS/MIPI with format RGB888 works well.
  3. LVDS/MIPI with format UYVY doesn’t work well.
  4. With a non-zero verbosity, it seems to work well (later, the test confirmed that there were still failures, just less frequently).

so my first impression is that the nightmare came back — when we worked on this platform a few years ago, we had a few lockups due to the un-sturdy DMA hardware and the interaction of DMA operations between capture and display — Another developer said it should have been fixed by two changes before (don’t disable DMA, don’t disable DMA channel during the capturing) and the lockup did not happen for quite a long time.  However, I think he didn’t convince me as I don’t trust the hardware…

I did tried to not touch DMA channel, or touch it at a different time point, but they didn’t help. I also asked to use separate IPUs for display and capture, and got confirmed that no issues were seen in this configuration. Then, I signed: oh, the nightmare, it is the interaction of IDMaC channels..my instinct was: it is not fixable.. we probably can only try to avoid it.. it’s timing related…

OK…so the problem is here. Since separate IPUs don’t have the issue, I should not focus on the nightmare before, but think bout the issue using logical thinking: with single IPU being used, there are both hardware interaction and software interaction. we need think about the issue from both aspects.

one more thing, through the perplexing test result, there is another fact: we recently changed the code on capture side (and some shared control in display driver), it is possible the new change introduced this issue since the lockup had not seen for long time.

Eventually, with the help of JTAG, the root cause is that a kernel call between InterruptLock() and InterruptUnlock() leads to early enabling of interrupts, so a deadlock is formed. As the piece of code only gets executed when the hardware fails to clear the ready flag so(software clears it) and the deadlock is only formed when display interrupt is raised right after InterruptLock() is called. So it is rare to see..

Using GDB to check core file

An application crashed, and there was a core file generated.

1. Collect all the associated libraries, and run “gdb”.

(gdb) file testApp

(gdb) set solib-search-path .

(gdb) core testApp. core

it will show something like:

Program terminated with signal 11, Segmentation fault.
#0 0x781306a6 in xx_thread () from libcapture-soc-xx.so

(gdb) backtrace
#0 0x781306a6 in xxx_thread () from libcapture-soc-xxx.so
#1 0x01025a8c in timer_settime () from libc.so.3

So you want to know what the instruction is at 0x781306a6.. go to..

2.  check the disassembly code

(gdb) disassemble hw_capture_thread

it will show you the assembly code and where the crash occurs.


(gdb) info registers


so it is clear that an invalid address is being accessed: register r3 is 0.

3. Which piece of C code?

you need have a library with symbols (-g), and run “pahole libcapture-soc-xx.so”, or run “pahole xx.o”, it will show all the structures in that library/file.

the crashing point shows that the code was trying to access a member of a structure at offset 52, so you look for the structures which has a member at 52..

Note: why does the backtrace show “imer_settime()” calls “xx_thread()”?

the reason is, it is some function in C library which calls/schedules xx_thread. However, since there are no debug symbols in libC, gdb can’t locate the exact function, and can only return a static/global function which is closer.


1. “info” is a very useful command.

  • info threads
  • info shared libraries
  • info registers

2. to check the states of other threads, ” thread apply all bt”.

3. Pohole (Poke-a-Hole) was developed to find the size of the data structures, and the holes caused due to aligning the data elements to the word-size of the CPU by the compiler. see https://lwn.net/Articles/335942/

Mapping to device memory

1. mmap_device_io()

uintptr_t mmap_device_io( size_t len, uint64_t io );

  • maps len bytes of device I/O memory at io, and makes it accessiable via the in*() and out*() function.
  • returns a handle to the device’s I/O memory, or MAP_DEVICE_FAILED if an error occurs (errnois set).


uintptr_t vin_base;

ctx->vin_base = mmap_device_io(RCAR3_VIN_SIZE, RCAR3_VIN0_BASE); 

 if (ctx->vin_base == (uintptr_t)MAP_DEVICE_FAILED) {

                 ctx->vin_base = (uintptr_t)NULL; 

                 return errno;


Access the registers:

uint32_t reg_val;

reg_val = in32(vin_base + (off));

out32(vin_base + (off), (value));

2. mmap_device_memory()

void * mmap_device_memory( void * addr, size_t len,  int prot, int flags, uint64_t physical); 

  • maps len bytes of a device’s physical memory address into the caller’s address space at the location returned by mmap_device_memory().
  • Returns the address of the mapped-in object, or MAP_FAILED if an error occurs (errno is set).


 uint32_t *ipu_regp;   

ipu_regp = (uint32_t*)mmap_device_memory(NULL, IPU_REGSIZE,                   PROT_READ|PROT_WRITE|PROT_NOCACHE, 0, ipu_regbase);

if (ipu_regp == MAP_FAILED) {

                  ipu_regptr = NULL;

                  return err; 


#define ipu_regptr(offset)  (uint32_t volatile *) (((unsigned char volatile *) ipu_regp) + offset)

#define IPU_CONF         ipu_regptr(IPU_CONF_OFFSET + 0x0)

uint32_t ipu_conf_val;

ipu_conf_val = *IPU_CONF;           

*IPU_CONF = ipu_conf_val;

Register knowledge

Access to the shared registers

  1. configuration registers, usually being modified via “read – modify – write”.
    • use mutex to protect the register access.
    • use irqspin lock, if the registers are being accessed by ISR handler.
    • NOTE: the if condition needs to be mutex/spinlock protected too, in the this example: if(!*IPU_CONF & (1 << 3)) *IPU_CONF |= (1 << 3);
  2.  status registers
    • “write 1 to clear” register
      • if the “write” operation is atomic, no protection is needed; otherwise, protect it.
      • however, if you perform a “read & write”, you need protect the entire ‘read-write” sequence.
    • “read to clear” register
      • protect doesn’t work in this case.

Register definition

  1. MAP_CONF(x) registers, each register contains one field, for map x only.
    • #define MAP_CONF(x)  (OFFSET + (x) * 4)
      • 0 — 0
      • 1– 4
      • 2– 8
  2. MAP_CONF(x) registers, each register contains two fields: bit 0 ~ 15 for map 0/2/4; bit 16 ~ 31 for map 1/3/5.
    • #define MAP_CONF(x)  (OFFSET + ((x) & ~0x1) * 2) or
    • #define MAP_CONF(x)  (OFFSET + ((x) / 2 ) * 4)
      • 0,1 — 0
      • 2,3– 4
      • 4,5—8
    • write to the correct bit fields
      • shift = (x & 1) * 16;
      • *MAP_CONF(x) |= val << shift;
  3. MAP_CONF(x) registers, each registers contains 3 fields: bit 0 ~ 10 for map 0/3/6; bit 11 ~ 20 for map 1/4/7, bit 21~30 for map 2/5/7.
    • #define MAP_CONF(x) (OFFSET + ((x)/3 * 4)
      • 0,1,2— 0
      • 3,4,5– 4
      • 6,7,8 — 8
    • write the correct bit fields
      • shift = (x%3) * 10;
      • *MAP_CONF(x) |= val << shift;
  4. An complicated example:
    • MAP_CONF(x) contains 6 fields, and defined as (OFFSET + ((x) / 2 ) * 4).
      • 30-26: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 2 —- 5
      • 25-21: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 1. —- 4
      • 16-20: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 0. —- 3
      • 14-10: mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 2  —- 2
      • 9-5:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 1. —- 1
      • 4-0:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 0  —- 0
    • MAP_VAL(y) contains 4 fields, and defined as (OFFSET + ((y) / 2 ) * 4).
      • 28-24: offset #1 (or 3,5,7,..).
      • 23-16: mask #1
      • 12-8: offset #0 (or 2,4,6…)
      • 7-0:    mask #0
    • For a specified “map” value, e.g. 0, and 1, we want to set:
    • map   byte0         byte1         byte2
    • —————————————————
    • 0         7, 0xFF,  15, 0xFF,     23, 0xFF
    • 1         5, 0xFC,  11, 0xFC,     17, 0xFC
      • the register fields in MAP_CONF(0) will be set to the values, as marked as blue above.
      • for MAP_VAL(x), it will look like this
      •            MAP_VAL(0)      MAP_VAL(1)        MAP_VAL(2)
      • ————————————————————————————————
      • 28-24         15                      5                         17
      • 23-16          0xFF                 0xFC                  0xFC
      • 12-8            7                        23                      11
      • 7-0               0xFF                0xFF                  0xFC
    • the implementation of configure_map(map, offset_b0, mask_b0, offset_b1, mask_b1, offset_b2, mask_b2) would be:
      • shift = (map & 1) * 16;
      • pointer = map * 3;
      • *MAP_CONF(map) |=  ((pointer + 2) << 10 | (pointer +1 ) << 5 | (pointer) <<0 ) << shift;
      • // We need use “pointer”, “pointer + 1”, “pointer + 2” to find the offset of the associated MAP_VAL(y), and the shifts.
        • shift = (pointer & 1) * 16;
        • *MAP_VAL(pointer) |= ((offset_b0 << 8) | (mask_b0 << 0)) << shift;
        • shift = ((pointer +1) & 1) * 16;
        • *MAP_VAL(pointer+1) |= ((offset_b1 << 8) | (mask_b1 << 0)) << shift;
        • shift = ((pointer +2) & 1) * 16;
        • *MAP_VAL(pointer+2) |= ((offset_b2 << 8) | (mask_b2 << 0)) << shift;

Jaggy artifacts (raw yuv file is important!)

Jaggy arfifacts are introduced by the mismatching between the mismatching of neighbouring pixels. In YUV color space, the focus is on Y component.

Some jaggy artifact is unavoidable. For instance,  when the WEAVE (field combination) deinterlacing is deployed, any change between fields will result “jaggies”, as the pixels in one field do not line up with the pixels in the other.

I met one issue, where UYVY output is good, while YUYV output shows apparent jaggies. First thing coming into my mind is: do the Ys get reversed while being output? Y0U0Y1V1–>Y1UxY0Vx?

The lucky thing is we can route the output to the input interface and capture the raw data to analyse. The raw data clearly shows that Y1,Y3, are not there, but Y0, Y2, are there for twice.

Use a hex editor to open the raw YUV file,

at address 0xC50, uyvy file shows “87 59 7E 5C”; yuyv file shows “59 87 59 7E”.

screen basics

Screen is a compositing windowing system. It is able to combine multiple content sources together into a single image.

Two types of composition:

  1. Hardware composition: composes all visible(enabled) pipelines at display time.
    • In order to use this,
      • You need specify a pipeline for your window: use screen_set_window_property_iv().
      • use screen_set_window_property_iv() to set the SCREEN_USAGE_OVERLAY bit of your SCREEN_PROPERTY_USAGE window property.
    • The window is considered autonomous as no composition was performed (on the buffers, which belong to this window) by the composition manager.
    • For a window to be displayed autonomously on a pipeline, this window buffer’s format must be supported by its associated pipeline.
  2. Composition manager: Composes multiple window buffers (belong to multiple windows) into a single buffer, which is associated to a pipeline.
    • The single buffer is called /composite buffer/ screen framebuffer.
    • Used when your platform doesn’t have hardware capabilities to support a sufficient number of pipelines to compose a number of required elements, or to support a particular behavior,
    • One pipeline is involved (you don’t specify the pipeline number and OVERLAY usage).
    • Requires processing power of CPU and/or GPU to compose buffers

Note:Pipeline (in display controller) equals to layer (in composition manager), which is indexed by EGL level of app.

Pipeline ordering (Hardware property) and z-ordering (for windows)

  • Pipeline ordering and the z-ordering of windows on a layer are applied independently of each other.
  • Pipeline ordering takes precedence over z-ordering operations in Screen. Screen does not have control over the ordering of hardware pipelines. Screen windows are always arranged in the z-order that is specified by the application.
  • If your application manually assigns pipelines, you must ensure that the z-order values make sense with regard to the pipeline order of the target hardware. For example, if you assign a high z-order value to a window (meaning it is to be placed in the foreground), then you must make a corresponding assignment of this window to a top layer pipeline. Otherwise the result may not be what you expect, regardless of the z-order value.

Window: a window represents the fundamental drawing surface.

  • An application needs use multiple windows when content comes from different sources, when one or more parts of the application must be updated independently from others, or when the application tries to target multiple displays.

Pixmap: A pixmap is similar to a bitmap except that it can have multiple bits per pixel (a measurement of the depth of the pixmap) that store the intensity or color component values. Bitmaps, by contrast, have a depth of one bit per pixel.

  • You can draw directly onto a pixmap surface, outside the viewable area, and then copy the pixmap to a buffer later on.

Note: Multiple buffers can be associated with a window whereas only one buffer can be associated with a pixmap.