All posts by Tianlittlecity

A mom of two boys. Decide to use this space as a display spot for Jason's LEGO projects.

Mapping to device memory

1. mmap_device_io()

uintptr_t mmap_device_io( size_t len, uint64_t io );

  • maps len bytes of device I/O memory at io, and makes it accessiable via the in*() and out*() function.
  • returns a handle to the device’s I/O memory, or MAP_DEVICE_FAILED if an error occurs (errnois set).

Example:

uintptr_t vin_base;

ctx->vin_base = mmap_device_io(RCAR3_VIN_SIZE, RCAR3_VIN0_BASE); 

 if (ctx->vin_base == (uintptr_t)MAP_DEVICE_FAILED) {

                 ctx->vin_base = (uintptr_t)NULL; 

                 return errno;

   } 

Access the registers:

uint32_t reg_val;

reg_val = in32(vin_base + (off));

out32(vin_base + (off), (value));

2. mmap_device_memory()

void * mmap_device_memory( void * addr, size_t len,  int prot, int flags, uint64_t physical); 

  • maps len bytes of a device’s physical memory address into the caller’s address space at the location returned by mmap_device_memory().
  • Returns the address of the mapped-in object, or MAP_FAILED if an error occurs (errno is set).

example:

 uint32_t *ipu_regp;   

ipu_regp = (uint32_t*)mmap_device_memory(NULL, IPU_REGSIZE,                   PROT_READ|PROT_WRITE|PROT_NOCACHE, 0, ipu_regbase);

if (ipu_regp == MAP_FAILED) {

                  ipu_regptr = NULL;

                  return err; 

 }  

#define ipu_regptr(offset)  (uint32_t volatile *) (((unsigned char volatile *) ipu_regp) + offset)

#define IPU_CONF         ipu_regptr(IPU_CONF_OFFSET + 0x0)

uint32_t ipu_conf_val;

ipu_conf_val = *IPU_CONF;           

*IPU_CONF = ipu_conf_val;

Register knowledge

Access to the shared registers

  1. configuration registers, usually being modified via “read – modify – write”.
    • use mutex to protect the register access.
    • use irqspin lock, if the registers are being accessed by ISR handler.
    • NOTE: the if condition needs to be mutex/spinlock protected too, in the this example: if(!*IPU_CONF & (1 << 3)) *IPU_CONF |= (1 << 3);
  2.  status registers
    • “write 1 to clear” register
      • if the “write” operation is atomic, no protection is needed; otherwise, protect it.
      • however, if you perform a “read & write”, you need protect the entire ‘read-write” sequence.
    • “read to clear” register
      • protect doesn’t work in this case.

Register definition

  1. MAP_CONF(x) registers, each register contains one field, for map x only.
    • #define MAP_CONF(x)  (OFFSET + (x) * 4)
      • 0 — 0
      • 1– 4
      • 2– 8
  2. MAP_CONF(x) registers, each register contains two fields: bit 0 ~ 15 for map 0/2/4; bit 16 ~ 31 for map 1/3/5.
    • #define MAP_CONF(x)  (OFFSET + ((x) & ~0x1) * 2) or
    • #define MAP_CONF(x)  (OFFSET + ((x) / 2 ) * 4)
      • 0,1 — 0
      • 2,3– 4
      • 4,5—8
    • write to the correct bit fields
      • shift = (x & 1) * 16;
      • *MAP_CONF(x) |= val << shift;
  3. MAP_CONF(x) registers, each registers contains 3 fields: bit 0 ~ 10 for map 0/3/6; bit 11 ~ 20 for map 1/4/7, bit 21~30 for map 2/5/7.
    • #define MAP_CONF(x) (OFFSET + ((x)/3 * 4)
      • 0,1,2— 0
      • 3,4,5– 4
      • 6,7,8 — 8
    • write the correct bit fields
      • shift = (x%3) * 10;
      • *MAP_CONF(x) |= val << shift;
  4. An complicated example:
    • MAP_CONF(x) contains 6 fields, and defined as (OFFSET + ((x) / 2 ) * 4).
      • 30-26: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 2 —- 5
      • 25-21: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 1. —- 4
      • 16-20: mapping pointer for map #1 (or 3, 5, 7 when x increases) byte 0. —- 3
      • 14-10: mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 2  —- 2
      • 9-5:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 1. —- 1
      • 4-0:     mapping pointer for map #0 (or 2, 4, 6 when x increases) byte 0  —- 0
    • MAP_VAL(y) contains 4 fields, and defined as (OFFSET + ((y) / 2 ) * 4).
      • 28-24: offset #1 (or 3,5,7,..).
      • 23-16: mask #1
      • 12-8: offset #0 (or 2,4,6…)
      • 7-0:    mask #0
    • For a specified “map” value, e.g. 0, and 1, we want to set:
    • map   byte0         byte1         byte2
    • —————————————————
    • 0         7, 0xFF,  15, 0xFF,     23, 0xFF
    • 1         5, 0xFC,  11, 0xFC,     17, 0xFC
      • the register fields in MAP_CONF(0) will be set to the values, as marked as blue above.
      • for MAP_VAL(x), it will look like this
      •            MAP_VAL(0)      MAP_VAL(1)        MAP_VAL(2)
      • ————————————————————————————————
      • 28-24         15                      5                         17
      • 23-16          0xFF                 0xFC                  0xFC
      • 12-8            7                        23                      11
      • 7-0               0xFF                0xFF                  0xFC
    • the implementation of configure_map(map, offset_b0, mask_b0, offset_b1, mask_b1, offset_b2, mask_b2) would be:
      • shift = (map & 1) * 16;
      • pointer = map * 3;
      • *MAP_CONF(map) |=  ((pointer + 2) << 10 | (pointer +1 ) << 5 | (pointer) <<0 ) << shift;
      • // We need use “pointer”, “pointer + 1”, “pointer + 2” to find the offset of the associated MAP_VAL(y), and the shifts.
        • shift = (pointer & 1) * 16;
        • *MAP_VAL(pointer) |= ((offset_b0 << 8) | (mask_b0 << 0)) << shift;
        • shift = ((pointer +1) & 1) * 16;
        • *MAP_VAL(pointer+1) |= ((offset_b1 << 8) | (mask_b1 << 0)) << shift;
        • shift = ((pointer +2) & 1) * 16;
        • *MAP_VAL(pointer+2) |= ((offset_b2 << 8) | (mask_b2 << 0)) << shift;

Jaggy artifacts (raw yuv file is important!)

Jaggy arfifacts are introduced by the mismatching between the mismatching of neighbouring pixels. In YUV color space, the focus is on Y component.

Some jaggy artifact is unavoidable. For instance,  when the WEAVE (field combination) deinterlacing is deployed, any change between fields will result “jaggies”, as the pixels in one field do not line up with the pixels in the other.

I met one issue, where UYVY output is good, while YUYV output shows apparent jaggies. First thing coming into my mind is: do the Ys get reversed while being output? Y0U0Y1V1–>Y1UxY0Vx?

The lucky thing is we can route the output to the input interface and capture the raw data to analyse. The raw data clearly shows that Y1,Y3, are not there, but Y0, Y2, are there for twice.

Use a hex editor to open the raw YUV file,

at address 0xC50, uyvy file shows “87 59 7E 5C”; yuyv file shows “59 87 59 7E”.

screen basics

Screen is a compositing windowing system. It is able to combine multiple content sources together into a single image.

Two types of composition:

  1. Hardware composition: composes all visible(enabled) pipelines at display time.
    • In order to use this,
      • You need specify a pipeline for your window: use screen_set_window_property_iv().
      • use screen_set_window_property_iv() to set the SCREEN_USAGE_OVERLAY bit of your SCREEN_PROPERTY_USAGE window property.
    • The window is considered autonomous as no composition was performed (on the buffers, which belong to this window) by the composition manager.
    • For a window to be displayed autonomously on a pipeline, this window buffer’s format must be supported by its associated pipeline.
  2. Composition manager: Composes multiple window buffers (belong to multiple windows) into a single buffer, which is associated to a pipeline.
    • The single buffer is called /composite buffer/ screen framebuffer.
    • Used when your platform doesn’t have hardware capabilities to support a sufficient number of pipelines to compose a number of required elements, or to support a particular behavior,
    • One pipeline is involved (you don’t specify the pipeline number and OVERLAY usage).
    • Requires processing power of CPU and/or GPU to compose buffers

Note:Pipeline (in display controller) equals to layer (in composition manager), which is indexed by EGL level of app.

Pipeline ordering (Hardware property) and z-ordering (for windows)

  • Pipeline ordering and the z-ordering of windows on a layer are applied independently of each other.
  • Pipeline ordering takes precedence over z-ordering operations in Screen. Screen does not have control over the ordering of hardware pipelines. Screen windows are always arranged in the z-order that is specified by the application.
  • If your application manually assigns pipelines, you must ensure that the z-order values make sense with regard to the pipeline order of the target hardware. For example, if you assign a high z-order value to a window (meaning it is to be placed in the foreground), then you must make a corresponding assignment of this window to a top layer pipeline. Otherwise the result may not be what you expect, regardless of the z-order value.

Window: a window represents the fundamental drawing surface.

  • An application needs use multiple windows when content comes from different sources, when one or more parts of the application must be updated independently from others, or when the application tries to target multiple displays.
  • To use the same window, the content must have the same FORMAT, DISPLAY, BRIGHTNESS, PIPELINE, POSITION, SIZE, SOURCE_POSITION, SOURCE_SIZE, TRANSPARENCY, ZORDER, etc.

Pixmap: A pixmap is similar to a bitmap except that it can have multiple bits per pixel (a measurement of the depth of the pixmap) that store the intensity or color component values. Bitmaps, by contrast, have a depth of one bit per pixel.

  • You can draw directly onto a pixmap surface, outside the viewable area, and then copy the pixmap to a buffer later on.

Note: Multiple buffers can be associated with a window whereas only one buffer can be associated with a pixmap.

sleep & delay function (2)

To let the thread to suspend the exact amount of time, without being affected by thread scheduling, we can use nanospin().

int nanospin( const struct timespec *when );

 

The nanospin() function occupies the CPU for the amount of time specified by the argument when without blocking the calling thread. (The thread isn’t taken off the ready list.) The function is essentially a do…while loop.

The first time you call nanospin(), the C library invokes nanospin_calibrate() with an argument of 0 (interrupts enabled), if you haven’t already called it.

int nanospin_ns( unsigned long nsec );

The nanospin_ns() function busy-waits for the number of nanoseconds specified in nsec, without blocking the calling thread.

void nanospin_count( unsigned long count );

The nanospin_count() function busy-waits for the number of iterations specified in count. Use nanospin_ns_to_count() to turn a number of nanoseconds into an iteration count suitable for nanospin_count().

sleep & delay functions (1)

Quoted from QNX document.

— delay(unsigned int duration) suspends the calling thread for duration milliseconds.

— sleep(unsigned int seconds) function suspends the calling thread until the number of realtime seconds specified by the seconds argument have elapsed, or the thread receives a signal whose action is either to terminate the process or to call a signal handler.

both delay() and sleep() returns either 0 or the number of unslept time if interrupt by a signal.

— usleep(useconds_t useconds) function suspends the calling thread until useconds microseconds of realtime have elapsed, or until a signal that isn’t ignored is received.

— nanosleep( const struct timespec* rqtp, struct timespec* rmtp )  function causes the calling thread to be suspended from execution until either:

  • The time interval specified by the rqtp argument has elapsed

    Or

  • A signal is delivered to the thread, and the signal’s action is to invoke a signal-catching function or terminate the process.

usleep() and nanosleep()  returns either 0 (success) or -1 (an error occured)

 

Note:

With all the functions above, the suspension time may be greater than the requested amount, due to the nature of time measurement (see the Tick, Tock: Understanding the Neutrino Microkernel’s Concept of Time chapter of the QNX Neutrino Programmer’s Guide), or due to the scheduling of other, higher priority threads by the system.

Interrupt(8):callout for cascaded interrupts

SDMA interrupt entry is added in init_intrinfo().

 Identify SDMA interrupt source.
 *
 * Returns interrupt number in r4
 * -----------------------------------------------------------------------
 */
CALLOUT_START(interrupt_id_omap4_sdma, 0, interrupt_patch_sdma)
	/*
	 * Get the interrupt controller base address (patched)
	 */
	mov		ip,     #0x000000ff
	orr		ip, ip, #0x0000ff00
	orr		ip, ip, #0x00ff0000
	orr		ip, ip, #0xff000000

	/*
	 * Read Interrupt Mask and Status
	 */
	ldr		r3, [ip, #SDMA_IRQSTATUS]		// Status
	ldr		r2, [ip, #SDMA_IRQENABLE]		// Mask
	and		r3, r3, r2

	/*
	 * Scan for first set bit
	 */
#if 0
	mov		r4, #32
	mov		r1, #1

0:
	subs	r4, r4, #1
	blt		1f
	tst		r3, r1, lsl r4
	beq		0b
#else
	clz		r4, r3
	rsbs	r4, r4, #31
	blt		1f
	mov		r1, #1
#endif
    /*
	 * Mask the interrupt source
	 */
	mov		r1, r1, lsl r4
	bic		r2, r2, r1
	str		r2, [ip, #SDMA_IRQENABLE]
	ldr		r2, [ip, #SDMA_IRQENABLE]

	/*
	 * Clear interrupt status
	 * 09.17.2014: clearing the staus bit is moved to the eoi-callout since the status bit related
	 * to a channel can only be claered if the channel status register of the associated
	 * channel is cleared. Clearing the csr can't be done in a generic way here because the attached
	 * isterrupt service routines need to know the interrupt reason (block, fram, drop etc. ...)
	 */
	//str		r1, [ip, #SDMA_IRQSTATUS]
1:
CALLOUT_END(interrupt_id_omap4_sdma)

/*
 * -----------------------------------------------------------------------
 * Acknowledge specified SDMA interrupt
 *
 * On entry:
 *	r4 contains the interrupt number
 *	r7 contains the interrupt mask count
 * -----------------------------------------------------------------------
 */
CALLOUT_START(interrupt_eoi_omap4_sdma, 0, interrupt_patch_sdma)
	/*
	 * Get the interrupt controller base address (patched)
	 */
	mov		ip,     #0x000000ff
	orr		ip, ip, #0x0000ff00
	orr		ip, ip, #0x00ff0000
	orr		ip, ip, #0xff000000

    /*
     * Only unmask interrupt if mask count is zero
     */
	teq		r7, #0
	bne		0f
	
	/*
	 * Clear interrupt status
	 * see comment in the id-callout
	 */
	mov		r2, #1
	mov		r2, r2, lsl r4
	str		r2, [ip, #SDMA_IRQSTATUS]

	ldr		r1, [ip, #SDMA_IRQENABLE]
	orr		r1, r1, r2
	str		r1, [ip, #SDMA_IRQENABLE]

0:
CALLOUT_END(interrupt_eoi_omap4_sdma)