Register knowledge

Access to the shared registers

  1. configuration registers, usually being modified via “read – modify – write”.
    • use mutex to protect the register access.
    • use irqspin lock, if the registers are being accessed by ISR handler.
    • NOTE: the if condition needs to be mutex/spinlock protected too, in the this example: if(!*IPU_CONF & (1 << 3)) *IPU_CONF |= (1 << 3);
  2.  status registers
    • “write 1 to clear” register
      • if the “write” operation is atomic, no protection is needed; otherwise, protect it.
      • however, if you perform a “read & write”, you need protect the entire ‘read-write” sequence.
    • “read to clear” register
      • protect doesn’t work in this case.

Register definition

  1. MAP_CONF(x) registers, each register contains one field, for map x only.
    • #define MAP_CONF(x)  (OFFSET + (x) * 4)
      • 0 — 0
      • 1– 4
      • 2– 8
  2. MAP_CONF(x) registers, each register contains two fields: bit 0 ~ 15 for map 0/2/4; bit 16 ~ 31 for map 1/3/5.
    • #define MAP_CONF(x)  (OFFSET + ((x) & ~0x1) * 2) or
    • #define MAP_CONF(x)  (OFFSET + ((x) / 2 ) * 4)
      • 0,1 — 0
      • 2,3– 4
      • 4,5—8
    • write to the correct bit fields
      • shift = (x & 1) * 16;
      • *MAP_CONF(x) |= val << shift;
  3. MAP_CONF(x) registers, each registers contains 3 fields: bit 0 ~ 10 for map 0/3/6; bit 11 ~ 20 for map 1/4/7, bit 21~30 for map 2/5/7.
    • #define MAP_CONF(x) (OFFSET + ((x)/3 * 4)
      • 0,1,2— 0
      • 3,4,5– 4
      • 6,7,8 — 8
    • write the correct bit fields
      • shift = (x%3) * 10;
      • *MAP_CONF(x) |= val << shift;
  4. An complicated example:
    • MAP_CONF(x) contains 6 fields, and defined as (OFFSET + ((x) / 2 ) * 4).
      • 30-26: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 2 —- 5
      • 25-21: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 1. —- 4
      • 16-20: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 0. —- 3
      • 14-10: mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 2  —- 2
      • 9-5:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 1. —- 1
      • 4-0:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 0  —- 0
    • MAP_VAL(y) contains 4 fields, and defined as (OFFSET + ((y) / 2 ) * 4).
      • 28-24: offset #1 (or 3,5,7,..).
      • 23-16: mask #1
      • 12-8: offset #0 (or 2,4,6…)
      • 7-0:    mask #0
    • For a specified “map” value, e.g. 0, and 1, we want to set:
    • map   byte0         byte1         byte2
    • —————————————————
    • 0         7, 0xFF,  15, 0xFF,     23, 0xFF
    • 1         5, 0xFC,  11, 0xFC,     17, 0xFC
      • the register fields in MAP_CONF(0) will be set to the values, as marked as blue above.
      • for MAP_VAL(x), it will look like this
      •            MAP_VAL(0)      MAP_VAL(1)        MAP_VAL(2)
      • ————————————————————————————————
      • 28-24         15                      5                         17
      • 23-16          0xFF                 0xFC                  0xFC
      • 12-8            7                        23                      11
      • 7-0               0xFF                0xFF                  0xFC
    • the implementation of configure_map(map, offset_b0, mask_b0, offset_b1, mask_b1, offset_b2, mask_b2) would be:
      • shift = (map & 1) * 16;
      • pointer = map * 3;
      • *MAP_CONF(map) |=  ((pointer + 2) << 10 | (pointer +1 ) << 5 | (pointer) <<0 ) << shift;
      • // We need use “pointer”, “pointer + 1”, “pointer + 2” to find the offset of the associated MAP_VAL(y), and the shifts.
        • shift = (pointer & 1) * 16;
        • *MAP_VAL(pointer) |= ((offset_b0 << 8) | (mask_b0 << 0)) << shift;
        • shift = ((pointer +1) & 1) * 16;
        • *MAP_VAL(pointer+1) |= ((offset_b1 << 8) | (mask_b1 << 0)) << shift;
        • shift = ((pointer +2) & 1) * 16;
        • *MAP_VAL(pointer+2) |= ((offset_b2 << 8) | (mask_b2 << 0)) << shift;

Jaggy artifacts (raw yuv file is important!)

Jaggy arfifacts are introduced by the mismatching between the mismatching of neighbouring pixels. In YUV color space, the focus is on Y component.

Some jaggy artifact is unavoidable. For instance,  when the WEAVE (field combination) deinterlacing is deployed, any change between fields will result “jaggies”, as the pixels in one field do not line up with the pixels in the other.

I met one issue, where UYVY output is good, while YUYV output shows apparent jaggies. First thing coming into my mind is: do the Ys get reversed while being output? Y0U0Y1V1–>Y1UxY0Vx?

The lucky thing is we can route the output to the input interface and capture the raw data to analyse. The raw data clearly shows that Y1,Y3, are not there, but Y0, Y2, are there for twice.

Use a hex editor to open the raw YUV file,

at address 0xC50, uyvy file shows “87 59 7E 5C”; yuyv file shows “59 87 59 7E”.

screen basics

Screen is a compositing windowing system. It is able to combine multiple content sources together into a single image.

Two types of composition:

  1. Hardware composition: composes all visible(enabled) pipelines at display time.
    • In order to use this,
      • You need specify a pipeline for your window: use screen_set_window_property_iv().
      • use screen_set_window_property_iv() to set the SCREEN_USAGE_OVERLAY bit of your SCREEN_PROPERTY_USAGE window property.
    • The window is considered autonomous as no composition was performed (on the buffers, which belong to this window) by the composition manager.
    • For a window to be displayed autonomously on a pipeline, this window buffer’s format must be supported by its associated pipeline.
  2. Composition manager: Composes multiple window buffers (belong to multiple windows) into a single buffer, which is associated to a pipeline.
    • The single buffer is called /composite buffer/ screen framebuffer.
    • Used when your platform doesn’t have hardware capabilities to support a sufficient number of pipelines to compose a number of required elements, or to support a particular behavior,
    • One pipeline is involved (you don’t specify the pipeline number and OVERLAY usage).
    • Requires processing power of CPU and/or GPU to compose buffers

Note:Pipeline (in display controller) equals to layer (in composition manager), which is indexed by EGL level of app.

Pipeline ordering (Hardware property) and z-ordering (for windows)

  • Pipeline ordering and the z-ordering of windows on a layer are applied independently of each other.
  • Pipeline ordering takes precedence over z-ordering operations in Screen. Screen does not have control over the ordering of hardware pipelines. Screen windows are always arranged in the z-order that is specified by the application.
  • If your application manually assigns pipelines, you must ensure that the z-order values make sense with regard to the pipeline order of the target hardware. For example, if you assign a high z-order value to a window (meaning it is to be placed in the foreground), then you must make a corresponding assignment of this window to a top layer pipeline. Otherwise the result may not be what you expect, regardless of the z-order value.

Window: a window represents the fundamental drawing surface.

  • An application needs use multiple windows when content comes from different sources, when one or more parts of the application must be updated independently from others, or when the application tries to target multiple displays.

Pixmap: A pixmap is similar to a bitmap except that it can have multiple bits per pixel (a measurement of the depth of the pixmap) that store the intensity or color component values. Bitmaps, by contrast, have a depth of one bit per pixel.

  • You can draw directly onto a pixmap surface, outside the viewable area, and then copy the pixmap to a buffer later on.

Note: Multiple buffers can be associated with a window whereas only one buffer can be associated with a pixmap.

sleep & delay function (2)

To let the thread to suspend the exact amount of time, without being affected by thread scheduling, we can use nanospin().

int nanospin( const struct timespec *when );


The nanospin() function occupies the CPU for the amount of time specified by the argument when without blocking the calling thread. (The thread isn’t taken off the ready list.) The function is essentially a do…while loop.

The first time you call nanospin(), the C library invokes nanospin_calibrate() with an argument of 0 (interrupts enabled), if you haven’t already called it.

int nanospin_ns( unsigned long nsec );

The nanospin_ns() function busy-waits for the number of nanoseconds specified in nsec, without blocking the calling thread.

void nanospin_count( unsigned long count );

The nanospin_count() function busy-waits for the number of iterations specified in count. Use nanospin_ns_to_count() to turn a number of nanoseconds into an iteration count suitable for nanospin_count().

sleep & delay functions (1)

Quoted from QNX document.

— delay(unsigned int duration) suspends the calling thread for duration milliseconds.

— sleep(unsigned int seconds) function suspends the calling thread until the number of realtime seconds specified by the seconds argument have elapsed, or the thread receives a signal whose action is either to terminate the process or to call a signal handler.

both delay() and sleep() returns either 0 or the number of unslept time if interrupt by a signal.

— usleep(useconds_t useconds) function suspends the calling thread until useconds microseconds of realtime have elapsed, or until a signal that isn’t ignored is received.

— nanosleep( const struct timespec* rqtp, struct timespec* rmtp )  function causes the calling thread to be suspended from execution until either:

  • The time interval specified by the rqtp argument has elapsed


  • A signal is delivered to the thread, and the signal’s action is to invoke a signal-catching function or terminate the process.

usleep() and nanosleep()  returns either 0 (success) or -1 (an error occured)



With all the functions above, the suspension time may be greater than the requested amount, due to the nature of time measurement (see the Tick, Tock: Understanding the Neutrino Microkernel’s Concept of Time chapter of the QNX Neutrino Programmer’s Guide), or due to the scheduling of other, higher priority threads by the system.

Interrupt(8):callout for cascaded interrupts

SDMA interrupt entry is added in init_intrinfo().

 Identify SDMA interrupt source.
 * Returns interrupt number in r4
 * -----------------------------------------------------------------------
CALLOUT_START(interrupt_id_omap4_sdma, 0, interrupt_patch_sdma)
	 * Get the interrupt controller base address (patched)
	mov		ip,     #0x000000ff
	orr		ip, ip, #0x0000ff00
	orr		ip, ip, #0x00ff0000
	orr		ip, ip, #0xff000000

	 * Read Interrupt Mask and Status
	ldr		r3, [ip, #SDMA_IRQSTATUS]		// Status
	ldr		r2, [ip, #SDMA_IRQENABLE]		// Mask
	and		r3, r3, r2

	 * Scan for first set bit
#if 0
	mov		r4, #32
	mov		r1, #1

	subs	r4, r4, #1
	blt		1f
	tst		r3, r1, lsl r4
	beq		0b
	clz		r4, r3
	rsbs	r4, r4, #31
	blt		1f
	mov		r1, #1
	 * Mask the interrupt source
	mov		r1, r1, lsl r4
	bic		r2, r2, r1
	str		r2, [ip, #SDMA_IRQENABLE]
	ldr		r2, [ip, #SDMA_IRQENABLE]

	 * Clear interrupt status
	 * 09.17.2014: clearing the staus bit is moved to the eoi-callout since the status bit related
	 * to a channel can only be claered if the channel status register of the associated
	 * channel is cleared. Clearing the csr can't be done in a generic way here because the attached
	 * isterrupt service routines need to know the interrupt reason (block, fram, drop etc. ...)
	//str		r1, [ip, #SDMA_IRQSTATUS]

 * -----------------------------------------------------------------------
 * Acknowledge specified SDMA interrupt
 * On entry:
 *	r4 contains the interrupt number
 *	r7 contains the interrupt mask count
 * -----------------------------------------------------------------------
CALLOUT_START(interrupt_eoi_omap4_sdma, 0, interrupt_patch_sdma)
	 * Get the interrupt controller base address (patched)
	mov		ip,     #0x000000ff
	orr		ip, ip, #0x0000ff00
	orr		ip, ip, #0x00ff0000
	orr		ip, ip, #0xff000000

     * Only unmask interrupt if mask count is zero
	teq		r7, #0
	bne		0f
	 * Clear interrupt status
	 * see comment in the id-callout
	mov		r2, #1
	mov		r2, r2, lsl r4
	str		r2, [ip, #SDMA_IRQSTATUS]

	ldr		r1, [ip, #SDMA_IRQENABLE]
	orr		r1, r1, r2
	str		r1, [ip, #SDMA_IRQENABLE]


ARM assembly language

Parameter passing

  1. use registers for first 4 parameters (r0, r1, r2, r3);
  2. use stack beyond;
  3. return using r0.
  4. Functions can freely modify registers R0–R3 and R12. If a function needs to use R4 through R11, it is necessary to push their current register values onto the stack, use the register, and then pop the old value off the stack before returning.

Variable declaration

.balign 4      //  ensure the next address will start a 4-byte boundary.

myvar1: .word 3  // .word directive states that the assembler tool should emit the value of the argument of the directive as a 4 byte integer. the initial value of the variable is 3.


ANDS {Rd,} Rn, Rm

ORRS {Rd,} Rn, Rm  ==> inclusive OR           Used to set bits

EORS {Rd,} Rn, Rm  ==> exclusive OR (only true if the corresponding bits differ)

BICS {Rd,} Rn, Rm  == >                           Used to clear bits

TEQ{cond} Rn, Operand2  == > same as EORs, except that the result is discarded ==> use for check if the bits of Rn is same as Operand2, result is 0 if same; result is 1 if different.

BEQ label                        ==> branch if equal to 0; used in conjunction with the previous instruction

MOV Rd, #expr

MOV Rd, Rm

MVN Rd, Rm  ==> takes the value in Rm, performs a bitwise logical NOT operation on the value, and places the result in Rd.

NEG Rd, Rm   ==> takes the value in Rm, multiplies it by –1, and places the result in Rd.

MOV r0, r0, LSL #1   ==> r0 << 1