QNX Neutrino for ARMv7 Cortex A-8 and A-9 Processors

Contents

Overview

This technote provides the details about the processes you should use when setting up QNX Neutrino for boards that support ARMv7 Cortex A-8 and Cortex A-9 processors. Although the ARMv7 generic support was included in QNX SDP 6.4.1 when the initial Cortex A-8 support was released, the CPU implementations for both Cortex A-8 and Cortex A-9 are included in the QNX SDP 6.5.0 release.

The support for ARMv7 architecture processors (Cortex) is provided by:


Note:

For an ARMle processor, you must use the procnto-v6 ( the QNX Neutrino microkernel) if you're using the ARMle variant, and procnto if you're using the ARMle-v7 variant.

ARMv7 has two options for handling single-precision floating point: NEON and VFPv3.

ARMv7 processors boot with the NEON engine and Floating-Point Unit (FPU) disabled. Until you enable these features, attempting to execute any NEON or VFP instructions results in an undefined instruction exception.

At the time of release, the feature to generate NEON instructions wasn't tested on Cortex-A9 because there are currently no Cortex-A9 implementations with a NEON unit. As a result, the feature is available on an “as is” basis.


$QNX_TARGET/armle provides binaries that can run on all ARM platforms. In order to run across a variety of ARM architecture revisions and processor implementations, these binaries are built with a restricted set of options that use only the following:

This restricted set of options provides a runtime where all the user-mode binaries and libraries will run on all the supported ARM platforms. However, there are two variants of the kernel:

$QNX_TARGET/armle-v7 provides binaries that only run ARMv7 processors, and it is built with options optimized for the ARMv7 architecture:

The kernels are named procnto and procnto-instr since there's a single variant that supports only ARMv7 processors; it's functionally the same as the procnto-v6 kernel, except that it's compiled to use ARMv7 instructions.


Note: The ARMle-v7 binaries use the ARM EABI instead of the GNU-APCS ABI used by the ARMle binaries. This means that they are not binary compatible and you can't run ARMle-v7 binaries on an ARMle runtime (and vice versa).

libstartup

The libstartup CPU detection and configuration process includes the following changes in the 6.5.0 release:


Note:

The default startup/lib/arm/cstart.S uses some CP15 cache maintenance operations that aren't implemented on ARMv7 processors. This means that you'll need to create a modified copy of this file in your board startup directory, and then modify the lines that are commented with “FIXME_v7”.

For example, replace the instruction:

MCR P15, 0, IP, C7, C7, 0

(to invalidate the data and instruction caches) with the instruction:

MCR P15, 0, ip, c7, c5, 0

(to invalidate the instruction cache). The startup code typically assumes that the data cache is cleaned by the IPL before the startup program is executed, and that the startup doesn't run with the MMU enabled. These assumptions mean that there's no requirement to invalidate the data cache; however:

  • If the IPL doesn't clean and invalidate the data cache, you must explicitly do this in cstart.S in the _start() function.
  • If your startup enables the MMU, you must clean and invalidate the data cache before jumping to the kernel in vstart().

armv_chip

The armv_chip structure describes the configuration for a particular CPU.


Note:

The ARMv7 processors use the WFI instruction to enter “wait for interrupt” mode.

To enable swap instructions, bit 10 (ARM_MMU_CR_F) must be set. In ARMv7, it's disabled by default, causing it to generate illegal instruction exceptions.


The members of the armv_chip structure include:

cpuid
Contains bits 15:0 of the CP15 main ID register.

The armv_list[] array defined in armv_list.c contains a list of all supported CPUs, and the arm_chip_detect() function iterates through this array to match bits 15:0 of the ID register.

A BSP can override the library's armv_list.c to provide a customized list of supported CPUs, for example to specify armv_chip structures that aren't implemented in libstartup, or to restrict the list to the processor(s) implemented by the target board.

name
The textual name of the processor.
mmu_cr
Specifies which bits to set in the MMU control register when the MMU is enabled in vstart().
mmu_cr_clr
Specifies which bits to clear in the MMU control register when the MMU is enabled in vstart().
cycles
The number of CPU cycles taken by the arm_cpuspeed.c calibration loop (which calculates loop cycles based on processor architecture from the ID register).
cache
A pointer to an armv_cache structure describing the cache configuration.
power
A pointer to the CPU-specific power callout.

If no power callout is specified, the kernel's idle loop simply busy-loops, and the sysmgr_cpumode() call fails with ENOSYS.

flush and deferred
Pointers to the CPU-specific callouts used by procnto to handle unmapping pages.

The flush callout is used to flush the cache and TLB when unmapping a page. This is called for each page in a region being unmapped.

The deferred callout is used after all pages in a region have been unmapped, and can be used to perform any actions that the flush callout didn't perform.

For example, if the MMU doesn't support flushing the instruction cache by virtual address, the deferred callout can be used to flush the instruction cache after all pages have been unmapped, to reduce the cost of flushing.

pte
A pointer to the default page table configuration
pte_wa
A pointer to the page table configuration for write-allocate cache behavior.

If you specify the -wa option, the pte_wa configuration is used. If the CPU doesn't support write-allocate caching, set pte_wa to 0, and the default pte values will be used instead.

pte_wb
A pointer to the page table configuration for write-back cache behavior.

If you specify the -wb compile option, the pte_wb configuration is used. If the CPU doesn't support write-back caching, set pte_wb to 0, and the default pte values will be used instead.


Note: The pte_wb member isn't supported by MPCore.

pte_wt
A pointer to the page table configuration for write-through cache behavior.

If you specify the -wt compile option, the pte_wt configuration is used. If the CPU doesn't support write-through caching, set pte_wt to 0, and the default pte values will be used instead.


Note: The pte_wt member isn't supported by MPCore.

setup
A pointer to a function that performs additional CPU-specific initialization.

armv_cache

The armv_cache structure describes the CPU caches. The members include:

dcache_config
Describes the data cache. It's required only when a CPU doesn't implement the CP15 cache-type register.

When a CPU does implement the CP15 cache-type register, set this to 0, so that the startup library will use arm_add_cache() to determine the cache register configuration based on the CP15 cache-type register.

dcache_rtn
Manage the data cache with the help of a callout.
icache_config
Describes the instruction cache. This is required only if the CPU doesn't implement the CP15 cache type register. When a CPU does implement the CP15 cache-type register, set this to 0, so that the startup library will use arm_add_cache() to determine the cache register configuration based on the CP15 cache-type register.
icache_rtn
Manage the instruction cache with the help of a callout.

armv_pte

The armv_pte structure describes the MMU page table encodings. It's members include:

upte_ro
User-mode read-only pages.
upte_rw
User-mode read-write pages.
kpte_ro
Kernel-mode read-only pages.
kpte_rw
Encoding for kernel-mode read-write pages.
mask_nc
Non-cacheable mappings.
l1_pgtable
L2 page table pointer with L1 descriptor.
kscn_ro
Kernel mode L1 read-only section mapping.
kscn_rw
Kernel mode L1 read-write section mapping.
kscn_cb
Cacheable section mapping.

setup()

The setup() function performs any CPU-specific initialization.

For ARMv7, there is a generic function, armv_setup_v7(), that performs generic ARMv7 initialization:

The armv_setup_v7() function must be called by any CPU-specific setup function for an ARMv7 CPU after it has performed its CPU-specific actions.

arm_chip_detect()

The arm_chip_detect() function checks for various configurations for Cortex A-8 and Cortex A-9 processors.

This function checks for a NULL name and non-NULL detect function to invoke the CPU-specific detect function that returns the appropriate armv_chip().

Behavior of procnto-v6 shm_ctl()

The procnto-v6 removes the 32 MB process address space limit:

The procnto-v6 microkernel

The procnto-v6 microkernel takes advantage of the ARMv7 MMU's physically-tagged cache to remove the 32 MB address space restriction imposed by the previous ARM MMU architecture.


Note:

There is no special global memory region, so the shm_ctl() function no longer has any special ARM-specific behavior when using procnto-v6.


The procnto-v6 microkernel doesn't implement the ARM-specific global memory region implemented by the non-procnto-v6. This means that shm_ctl() no longer has any ARM-specific special behavior. The shm_ctl() function exhibits the following:

If code must run on both ARMv7 and non-ARMv7 processors, you must check the __cpu_flags value at runtime to select the correct implementation. For example:

if (__cpu_flags & ARM_CPU_FLAG_V7) {
     /*
     * Code for ARMv7 processor
     */
} else if (__cpu_flags & ARM_CPU_FLAG_V6) {
     /*
     * Code for ARMv6 processor
     */
} else {
     /*
     * Code for ARMv4/ARMv5 processor
     */
}

Note:

The kernel behavior is identical for ARMv6 and ARMv7. If you need to address differences in the kernel functionality, you only need to use the following syntax:

if (__cpu_flags & (ARM_CPU_FLAG_V6 | ARM_CPU_FLAG_v7)) {
     /*
     * Code to deal with ARMv6/ARMv7 kernel behavior
     */
} else {
     /*
     * Code to deal with ARMv4/ARMv5 kernel behavior
     */
}

For targets that have NEON units, not all ARMv7 processors implement a NEON Media Engine.


CPU flags

Runtime features for a processor will be represented by the following flags found within __cpu_flags:

CPU_FLAG_FPU
A VFP unit is present. The VFP functionality support is enabled when the startup program detects the presence of VFP hardware and sets the system page CPU_FLAG_FPU flag. The CPU_FLAG_FPU flag is set in the cpuinfo_entry->flags field for the CPU.
ARM_CPU_FLAG_NEON
A NEON unit is present.
ARM_CPU_FLAG_WMMX2
An iWMMX2 coprocessor is present.
ARM_CPU_FLAG_V7
The CPU implements ARMv7 architecture.
ARM_CPU_FLAG_V6
The CPU implements ARMv6 (also set when version 7 is set).
ARM_CPU_FLAG_SMP
The target is running multiple processors.

Board startup for SMP

The procnto-smp kernel relies on the startup program to manage the initialization of each CPU and to provide support for interprocess communication using interprocess interrupts (IPIs). This support is divided into two areas:

For every BSP that has SMP support, we include a board_smp.c file. This board-specific startup program is responsible for providing a number of support functions for the generic libstartup code:

In addition to the board-specific startup program file included with the software, you'll need to create a send_ipi callout used by procnto-smp to send interprocess interrupts. The kernel uses this board-specific callout routine to send an interprocess interrupt (IPI) to a specific CPU.

This callout routine must be manually created in assembler to ensure it is position-independent, meaning that the code is copied into the system page.

The kernel IPI protocol uses a bitmask of pending commands for each CPU. Use this send_ipi callout to set the command bit for the target CPU and to perform the board-specific operations required to trigger an IPI interrupt on the target CPU:

void send_ipi(struct syspage_entry *sysp, unsigned cpu, 
              unsigned cmd, unsigned *cmd_bits)
{
    if (atomic_set_value(cmd_bits, cmd) == 0) {
        // Trigger an IPI interrupt on the target CPU
    }
}

board_smp_num_cpu()

The smp_init() function calls this function to find the number of CPUs physically present in the system.

Example:

unsigned 
board_smp_num_cpu()
{
    unsigned num;

    // A board-specific operation to determine the number of 
    // CPUs in the system
    return num;
}

board_smp_init()

The smp_init() function calls this function to perform any board-specific SMP initialization in the system page. At a minimum, it must specify the board-specific send_ipi callout:

Example:

void 
board_smp_init(struct smp_entry *smp, unsigned num_cpus)
{
    smp->send_ipi = (void *)&my_send_ipi;
}

board_smp_start()

The start_aps() function calls this function to perform any board and CPU-specific actions required to start the specified CPU.

int
board_smp_start(unsigned cpu, void (*start)(void)) {
      return of_smp_start(cpu, start); 
}

board_smp_adjust_num()

The start_aps() function calls this function to perform any board- and CPU-specific actions required to adjust the CPU number.

Example:

unsigned 
board_smp_adjust_num(unsigned cpu)
{
    // Board- or CPU-specific actions to set CPU ID to CPU
    return cpu;
}

Using ARMv7 instructions

By default, qcc provides only ARMv4 instructions. This ensures that all compiled code will run on any supported ARM processor.

The ARMv7 architecture introduces a number of new instructions that may provide performance benefits for certain code. For example, DSP algorithms can take advantage of the new media instructions.

For ARMv7, there is an option (i.e. -Vgcc_ntoarmv7le) that makes qcc use ARMv7 instructions and generate VFPv3-d16 code for floating point. This is the default for the ARMle-v7 variant. This means that the ARMle-v7 variant can run only on ARMv7 targets with a VFP unit; it's been compiled for VFPv3-d16, which is the lowest common denominator for VFP. For example, if you use a VFPv3-d32 FPU, your compiled code will use only half the available double-precision registers.

Additional compiler flags are required to instruct the compiler to generate ARMv7 instructions. Using such code on non-ARMv7 processors may cause undefined instruction exceptions (generating a SIGILL signal).

Compiling for the ARMv7 architecture

You can compile for the ARMv7 architecture by using one of the following methods:

Using Makefiles

In a recursive Makefile structure, include an le-v7 as a variant name. The ARMle-v7 code will then be compiled (automatically set by the recursive Makefiles by an arm/le.v7 variant).

For more information about Makefiles, see “Conventions for Recursive Makefiles and Directories” in the Neutrino Programmer's Guide.

Using qcc

Invoke qcc directly using the -V option with the ARMv7 target and any required options.


Note:

You must use the correct gcc and binutils versions that implement ARMv7:

  • gcc 4.4 or later
  • binutils 2.19 or later

-V [[compiler/]version,][target]
The compiler name, version number, and the target name. If you don't specify -V at all, qcc will use the default compiler, version, and target.

If you specify the target, qcc looks for the target configuration files in the following paths, according to how compiler, version, and target are specified:

If you specify: qcc looks here:
-Vtarget ${QCC_CONF_PATH}/compiler/version, where compiler is inferred from target, and version is the default.
-Vversion,target ${QCC_CONF_PATH}/compiler/version, where compiler is inferred from target; if that path doesn't exist, or if version contains a slash (/), then ${QCC_CONF_PATH}/version is used.
-Vcompiler,target ${QCC_CONF_PATH}/compiler/version, where version is the specified compiler's default version.
-Vcompiler/version,target ${QCC_CONF_PATH}/compiler/version

For example, to list all targets in all versions of gcc, use this command:

qcc -Vgcc,

Note:

The target type for ARMv7 is gcc_ntoarmv7le:

qcc -Vgcc_ntoarmv7le

For ARMv7, the following C++ libraries are supported in $HOME/QNX650/host/win32/x86/etc/qcc/gcc/4.4.2:

For C++ library information, see the Dinkum C++ Library. For GNU library information, see GNU C++ Library.

Using command-line compile options

To compile for an ARMle-v7 distribution and to use VFP instructions (if you need to run on any VFPv3 implementation), use the option:

-mfpu=vfpv3-d16

or if you know your target has VFPv3-d32 implementation, use this option:

-mfpu=vfpv3

Note: Currently, all ARMv7 processors implement a VFP floating point unit; however, some implementations support only 16 double-precision registers instead of the full 32 double-precision registers that are allowed by the ARMv7 architecture.

To compile for the ARMv7 processor, use this option:

-march=armv7

To compile to use the ARMv7 instructions, use this option:

-march=armv7-a

To compile code to make use of NEON instructions, use this option:

-mfpu=neon

Note: Although we indicate the option to generate NEON instructions, support for ARMv7 Cortex A-9 with NEON is available on an “as is” basis.

This option tells the compiler to generate NEON instructions.


Note:

Cortex only supports either the NEON unit or the VFPv3 unit but not both.

Not all ARMv7 processors implement a NEON Media Engine. If a NEON media engine is implemented, the processor will also support VFP instructions.


The standard QNX Neutrino libraries are compiled to use a soft-float implementation for floating-point operations to ensure that the code can run on all supported ARM processors. The soft-float implementation passes floating-point arguments and results in ARM registers, or results on the stack. The code that uses VFP instructions for floating point must use the same argument (or result mechanism) to ensure that it can interoperate correctly with code compiled for soft-float.


Note:

In the $QNX_TARGET/armle distribution, all binaries are compiled to use soft-float because they're intended to run on all ARM targets (many of which don't implement any floating point hardware).

If you explicitly compile any code to use NEON or VFP instructions, you must use the soft-float ABI in order to pass floating point parameters and results in ARM registers. Using soft-float ABI ensures binary compatibility and interoperability with object files and shared objects compiled with soft-float. To use soft-float ABI, use the option:

-mfloat-abi=softfp

To tell the compiler to recognize specific NEON opcodes, you'll need to specify the hardware floating-point support type from the command line.

To optimize your ARMv7 operations, you'll need to do the following:


Caution: The object files, libraries, and binaries that are compiled to use ARMv7 instructions can run only on a target with an ARMv7 CPU. On a non-ARMv7 CPU, this causes an undefined instruction exception (SIGILL signal) or may result in unpredictable behavior.

Generating hardware floating point instructions

To build using “soft-float” so that a soft linkage is used (the compiler generates hardware floating point instructions), use the following option:

-mfloat-abi=softfp

Optimizing source code

If you tune source code for a specific Cortex processor, it will produce optimized source code and be more efficient for that specific processor.

To tune a specific CPU, use the following syntax:

-mtune=cpu_type

Examples:

-mtune=cortex-a9
-mtune=cortex-a8

Vector Floating Point (VFP) math library

A VFP-enabled math library, called libm-vfp.so, is provided with the ARMle distribution and can be used on targets that implement VFP hardware in two ways:

For the $QNX_TARGET/armle distribution, a version of libm compiled to use VFP is available on targets known to implement VFP hardware. This library is called libm-vfp.so, and it uses VFPv2 instructions (with 16 double-precision registers) to allow it to be used on both ARMv6 and ARMv7 targets that implement a VFP unit.

For the $QNX_TARGET/armle-v7 distribution, all code is compiled to use VFP for floating point, and no other actions are required.

BSP configuration for VFP

The startup program is responsible for detecting the presence of VFP hardware:

Processor Detection is performed by
ARMv6 armv_setup_v6() in libstartup.a
ARMv7 armv_setup_v7() in libstartup.a
Other Board-specific code in your board startup directory. For more information about startup routines, see Board startup for SMP.

For board-specific code for a non-ARMv6 or non-ARMv7 processor that implements a VFP unit, your code must ensure that: