ARM Cortex M4 Power Management Tutorial.
-Aviral Mittal avimit att yahu datt cam.
Connect @ https://www.linkedin.com/in/avimit/
SITE HOME

1.1    ARM Cortex M4 Power Management

1.1.1   ARM Cortex M4 Sleep Modes

 

Cortex M4 Power management methodology supports 2 sleep modes

1.     Sleep

2.     Deep Sleep.

From M4F processor logic point of view, there is little difference between the above two sleep modes. However, it’s the way these sleep modes may be utilized and/or implemented, which may make a difference. For example, the first i.e. the ‘Sleep’ mode may be implemented such that it’s only the clocks to the processor and any related logic which are stopped and the ‘Deep Sleep’ may be implemented such that the clocks and power both can be switched off to the processor. Its to be noted that if a WIC is implemented, the corresponding functionality is not available in ‘Sleep’ mode for M4F processor. i.e. the WIC’s functionality and features are only available in ‘Deep-Sleep mode’ for M4F processor. Deep Sleep in turn can be implemented with and without state retention.

1.1.2   How to put M4F processor in Sleep/Deep Sleep mode.

The M4F processor has a register called SCR ‘Status control Register’. This register has a bit which selects between the ‘Sleep’/’DeepSleep’ modes. The subsystem software will write to this bit to indicate which sleep mode is required, when the processor next enters the sleep mode. The processor enters the sleep mode by executing an instruction called ‘WFE (wait for event) or WFI (wait for Interrupt)’. Soon after the processor has executed one of these instructions, the processor will assert a hardware signal (o/p signal w.r.t the processor itself) to let the outside world know that its OK to stop clock and/or power to the processor.

1.1.3   M4F State retention in deep-sleep.

The M4F processor may be implemented using what is called the SRPG methodology. SRPG stands for State Retention Power Gating. This allows the internal ‘state’ of the processor to be ‘retained’ during low power modes, such that when the power is re-applied to the processor, it starts working from where it was before entering the sleep mode. This is an alternative method which may be used as opposed to copying the ‘state’ of the processor in RAMs, and then re-copying the state from RAM to back to the processor on power up. Of course, this is very helpful in reducing the turn-on time for a processor from deep-sleep state, however since it is implemented using ‘state retention’ flops, it will consume power while in deep-sleep state. Usually, all the sequential elements inside the processor are ‘retained’, and these state retention flops are powered by a different power rail. The implementation SRPG will require specific control over how the reset is applied, as while coming out from the deep-sleep mode which uses state-retention, the processor reset cannot be applied.

The use of this methodology is also technology dependent, as the technology library must support ‘retention’ flops or something equivalent.

1.1.4   M4F WIC

The WIC or Wake-up Interrupt Controller is an optional component with the M4F processor. The WIC is used to help wake-up of the processor from deep-sleep power state, when a valid interrupt is received at its boundary. While going into the deep-sleep mode, some of the data e.g. interrupt makes & priorities are copied from the processor NVIC to the WIC automatically. There are also a couple of hand-shake signals which must be activated, to agree that the next deep-sleep mode the processor will enter, will be a WIC enabled one.

WIC is a configuration option and can be used/omitted as required.

WIC has no program-able registers. It has nothing to do with software, it is completely transparent to software.

In principle the WIC can be implemented without a clock as well, where it requests to the power controller to make its clock/power available, as a valid interrupt is asserted towards it. However, this is not usually done, as it requires change to the generated RTL, and the implementation has some limitations and after-effects which must be taken care of.

1.1.5   Summary of M4F Power Options

Based upon the capabilities and the option(s) available for the M4F processor, following is the summary of what power management modes may be implemented

1.     Light Sleep Mode: Where only clock(s) i.e. FCLK and HCLK to the processor are stopped.

2.     Deep Sleep Mode: Where the power domain containing the M4F processor may be switched off. This mode can be further categorized into two kinds

a.     Deep Sleep with State Retention: It will allow fast wake-up times at an expense of increased leakage. Following a wake-up the processor is in a state which is the same as the state it was in before going into Deep Sleep.

b.     Deep Sleep without State retention: It will allow most power savings at an expense of decreased wake-up times. Before going to Deep Sleep, the processor state will have to be stored in SRAMs. Following wake-up, the processor will have to be re-initialized from scratch, before it will come in a state in which it was before going into deep-Sleep.







Some Random Info

FCLK vs HCLK
FCLK is told by arm as 'free running clock'. This clock also powers the WIC. So in case you are using the WIC, the FCLK may not be turned off.  Since the WIC functionality is only available in Deep-Sleep for Cortex M4, this means that if you want to power gate the ARM processor to do max power savings during deep-sleep mode, then you have to make sure that the WIC is implemented out of the power domain you want to put the rest of the M4 in. WIC will have a separate FCLK branch going into it. During Deep-Sleep, the FCLK branch going into the WIC will be running, while the FCLK branch going into the rest of the M4F can be gated.
HCLK is the one which is gated when the M4F goes in Sleep mode. The processor will assert a signal called 'SLEEPING' which can be used by hardware to gate the HCLK. The 'SLEEPING' in turn is asserted following the WFI instruction execution by the processor.

Now if you are planning to use the SRPG, then when coming out of power collapsed state, you must NOT issue a reset. That means, control of the reset must be separated from control of the power domain collapse/wakeup function.
 

I was wondering what is the purpose of the signals
WICENREQ, WICENACK.

These signals are merely to agree the sleep 'mode'.
A handshake on the above signals will make sure that any sleep which happens following this handshaking,
will be 'wic' mode sleep, i.e. the WIC will be automatically loaded upon WFI instruction.

The WICENREQ is input signal to the M4F Integration level RTL,
WICENACK is output signal from M4F Integration level RTL.

It is possible to tie WICENREQ to '1' permanently to indicate that the all the sleeps are WIC mode sleep.
It is also possible to tie WICENREQ to '0', if WIC is not configured/implemented.





ARM Cortex M Boot Process:
The way cortex M4 boots or may be all cortex-Ms boot is bit in-conventional
1. Processor fetches data from 0x0000_0000, and load this 32 bit value in its stack pointer:
2. Processor then fetches data from 0x0000_0004, and assumes it as a jump address, and jumps to this address and start executing from this address: For example:
0x0000_0000 : 0x0100_0000
0x0000_0004 : 0x0000_0201

The processor will load the value of 0x0100_0000 in r13 (stack pointer)
The processor will jump to 0x0000_0200 (not 0x0000_0201) and start execution from there. The LSB must be '1' for each value indicating that "Thumb Code" : whatever it means.

Hence the value at the address 0x0000_0004 is also called reset vector.
This means, anytime the processor is reset, it will do the above two steps.
The meaning of subsequent values in the memory space i.e. value at 0x0000_0008, 0000_0010 is also pre-defined, and assumed to be jump locations for different types of interrupts/exceptions.




Cortex-M4 Resets Info:
If power-on reset is present, it resets both the system and debug system. The reason that we separate the reset into two signals is to allow the processor to be reset without affecting the debug system. Otherwise, the debug settings like breakpoints, watchpoints, and the debug connection from the debugger to the core, would be lost each time the processor core is reset

Cortex-M4 GATEHCLK:

GATEHCLK : This is asserted when the processor is in sleep-mode, and there is no debug connection. This signal can be used to gate off the system clock.


STKALIGNINIT:
This input can be tied off to 1 or 0. This is for 'stack pointer' alignment.
When something is pushed on the stack, the stack can end-up with an address which is not aligned to 32 bit location or a 64 bit location. This input, tied to a constant, is used for selecting between 32 bit or 64 bit alignment of the stack pointer at every exception entry.  It can be later on changed by a register bit.
Usually its good to enable 64 bit alignment.


Cortex-M4 System Bus:
For Cortex-M3 and Cortex-M4 processors, the internal bus interconnect has a registering stage between the instruction fetch interface and the system bus. Therefore, the performance of the system is reduced if the software image is executed from the system bus.


Cortex-M4 : Code & System Memory Aliasing:
If needed, it is possible to have an SRAM shared between code and SRAM regions by having bus accesses from both code and system buses (i.e., memory address aliasing). This allows the software  to use a single SRAM block and execute code from SRAM without performance loss:
 


Vector Table Offset Register:
Cortex-M0+, Cortex-M3 and Cortex-M4 processors: by default the vector table is located in the starting of the memory map (address 0x0).
In Cortex-M7, Cortex-M23 and Cortex-M33 processors: the default value for VTOR is defined by chip designers. Cortex-M23 and Cortex-M33 processors can have two separated vector tables for Secure and Non-secure exceptions/interrupts.

TCM vs AHB based SRAM:
No performance difference as such. However TCM may reduce some integration complexity. TCM will limit the address ranges, and the size of memory, but with the AHB based srams, we have more flexibility.
In some processor designs, the use of TCM is required to allow deterministic interrupt responses. For example, in the Cortex-M7 processor, access to memories on the AXI bus system can have non- deterministic timing due to cache hit/miss scenarios. Having TCM enables interrupt services to be carried out quickly in deterministic manners. But in small processors like Cortex-M0 to Cortex-M33, the omission of a TCM feature is not a real issue.

TnD: Trace & Debug
„„Reset – debugger can request a reset of the target board, typically a system reset through the SYSRESETREQ feature.


Bare-Metal M4 power-on reset boot:
You have M4 on a SoC, the M4 is connected to SRAM, the SRAM do not have any code in it. How can you then power it on?
Usually the design of the SoC would support some method for a debugger to be attached to the SoC, and that debugger then will have control over the processor SYSRESETn, also it will have control over PORESETn.
At power-up, SYSRESETn and PORESETn both are asserted.
HCLK and FCLK both are then started, via the debugger gaining access to clk control
DAPCLK is then started, and DAPRESTn then de-asserted
PORESETn is then de-asserted:
This will enable a path from the debugger to the SRAMs via the CORTEXM4 processor's Bus Matrix, which is 1/2 in reset and 1/2 awake.
1/2 in reset because SYSRESETn is still asserted.
1/2 out of reset because, PORESETn is de-asserted.
The path to SRAMs will be something like this
SWD/JTAG -> DAPBUS->AHB-AP->CORTEXM4 Bus Matrix -> Subsystem Bus Matrix -> SRAM.
You will download the code to SRAM
The debugger will then de-assert the SYSRESETn
The processor will start to run.

The new processors support a CPUWAIT signal. In multi processor SoCs, the processor may run from RAM, and the RAM may be empty at the start. When the reset is DE-asserted to the processor, the CPUWAIT is still asserted, so that the processor do not run. Now, the code may be copied to the processor's SRAM. When the code-copy is over, the CPUWAIT signal is DE-asserted, and the processor then starts to run.

However the above method is suitable when the M4 Subsystem has been designed in such a way that the above is possible.
There is another method as suggested by ARM:
One of the common questions from new Cortex-M designers is: How can you bring up a microcontroller device first time without any valid program in the embedded flash? The actual sequence is no different from normal flash programming:
„„ When the device starts up for the first time, since the flash do not contain a valid program image, it will quickly enter fault exception and eventually go into LOCKUP state.
„„ Even if the device is in LOCKUP state, the debugger can still establish a debug connection via JTAG/Serial Wire. 
„„ The debugger can then enable a reset vector catch (a debug feature in the Cortex-M processors), and use System Reset Request (by programming Application Interrupt and Reset Control Register, AIRCR) to reset the system. When the processor comes out from system reset, it enters halt state immediately because the reset vector catch is enabled.
„„ The debugger can download the flash programming algorithm and pages of program image into SRAM and set the PC (program counter) to launch the flash programming algorithm.
„„ When all the required flash pages are programmed, it can reset the system again to start the application or to debug it.
The same concept can also be applied to devices that run code from external flash (e.g., QSPI flash).

What is DAP:
DAP stands for Debug Access Port: Its actually a collection of a few components really:
The Serial Wire Debug Port -> Debug Bus
Debug bus -> AHB Access port/Or APB Access Port/Or AXI Access port.
The above whole of the infrastructure I guess collectively is termed as DAP.

Debug Authentication:
In systems that need debug authentication support, the CoreSight debug authentication signals are connected to a debug authentication control unit (not a part of the Cortex-M processor) that authenticates debug connections. The authentication process typically based on the product’s life cycle state and user’s input such as debug certificate or password. Based on guidelines from Platform Security Architecture (PSA), generally certificated based debug authentication is preferred over password-based authentication for products that can contain sensitive information.

Other Debug thingies:
Power down is normally disabled when a debugger is attached. This is because debuggers require access to the processor even when the processor core is in sleep modes. In many such cases, the power down FSM is automatically disabled by the WIC interface inside the processor. As a result, testing of deep sleep can show a different set of behaviors and interrupt latency when a debugger is connected.

Detect a Attached debugger? How to check if debugger is attached for ARM cortex M processors?
C_DEBUGEN can only be written by (the) AHB-AP and not by the core. So if this bit is '1', the debugger is attached.

Debug Power Management: the debug interface modules provides handshaking signals to indicate whether there is a debugger connection, which allows system designers to implement power management for the debug system of the processors if needed.For example, in Cortex-M0, Cortex-M0+, Cortex-M7, Cortex-M23, Cortex-M33, and Cortex-M35P processors, there is a separate debug power domain that can be powered down if there is no debug connection.

In most cases, the debug interface pins (JTAG or Serial Wire Debug) need to be accessible at the device’s top-level by default. For Cortex-M3, Cortex-M4 and Cortex-M33 processors, the debug interface module supports dynamic protocol switching, so it is possible to expose just two pins of the SWD debug by default. If there is a need to switch over to JTAG, then you can program a device-specific pin multiplexer (mux) control register to expose the other pins for JTAG, and then apply a switchover sequence to start JTAG operations.


More Debug Requirements/Ideas
Debug access port (DAP) – You might optionally move the SWJ-DP to the always-on power domain to allow the debugger to wake up the system with a debug connection. An alternative solution is to use another hardware mechanism to wake up the system so that debugger can connect to the processor to start the debug sessions. I will prefer the latter, as putting DAP in AON will just burn power in mission mode.

Potentially some peripherals like watchdog timers might need to suspend their operations when the processor is halted. Otherwise, a reset could be triggered unexpectedly during debugging. Some timers (e.g., SysTick timers inside the Cortex-M processors) also stop counting automatically when the processor is halted to allow single-stepping of application code. 

Interrupts:
The allocation of interrupt signals affects the C head files for software development, including the vector table definitions and interrupt numbers, which are both visible to the software.

On all current Cortex-M processors, the interrupt signals: „

Possible NMI uses:
In common embedded systems the NMI could be connected to: „

Faults in normal interrupt handlers allow the Hard Fault handler (or other configurable fault handlers) to be triggered and executed.
A fault generated within the NMI handler can cause the processor to enter lockup state.

If a peripheral generates an interrupt request in the form of a level signal, the interrupt handler must clear the request at the peripheral
The key advantage of a pulsed interrupt is that it saves a few clock cycles in the ISR that there is no need to clear the interrupt requests at the peripherals.

However, in many cases, a level-triggered interrupt is preferred because: „

Event Inteface:
RXEV, TXEV
The event interface is typically used in multi-core systems to allow one processor to wake up another during spinlocks. In RTOS semaphores, if a processor is waiting for a spinlock, it can enter sleep mode using WFE to save power and wakes up if there is an interrupt to serve or if there is an event from another processor
For single-processor systems, it is fine to tie RXEV to 0 and leave TXEV unconnected

Resets:

All of the Cortex-M processors use an asynchronous active-low reset signal and must be de-asserted synchronously to the system and debug clock to prevent timing violations
Most of the Cortex-M processors require the reset to last at least two clock cycles

SYSRESETREQ o/p
This is controlled by a register bit in the Application Interrupt and Reset Control Register (AIRCR) inside System Control Space.
This allows:

Designers must make sure that: „

We can also design the reset generator so that it can optionally reset the system if it enters lock-up state. To make this behavior controllable, a programmable register would be needed in your FPGA/ system design to specify if a lock-up state can cause a reset. This register is not provided in the Cortex-M processor core as such requirement is application dependent. During software development, the control signal at this external reset control register can be set to 0 to disable the automatic reset. In a production system, the reset control register can be set to 1 so that when the system enters lockup state, the SYSRESETn is activated automatically.


Debug:
Interface for debug connection (JTAG or Serial Wire Debug) – for connecting a debugger to the hardware target to carry out halting, stepping, restart, resume, setting breakpoints/watchpoints, access to memories and peripherals.Debug connection is also used for downloading code and flash programming.

Debugger authentication:
By default, the JTAG instruction register is locked and a debugger has no access to the debug infrastructure. To initiate debugging, a debugger must first be authenticated, and its tag ID is returned if successful. More on how the debugger is authenticated can be found in the SoC Security article.

Secure Asset Tagging:
The SoC integrator tags each asset with an ID of the asset owner. The ID itself has no confidentiality requirements; it is simply used to indicate to which debugger(s) the asset can be exposed via debug traces.

Asset filtering:
During debug, if an asset is being traced, its tag is compared to the ID of the authenticated debugger. If the comparison matches, the asset is traced; otherwise, it is obfuscated.

How to secure SoC assets? BTW SoC Asserts is nothing but the Software on the SoC, for example it can be a MP3 decoder code, or some other algorithm.
Easiest way is to permanently disable the debug infrastructure after silicon validation (i.e., by blowing fuses). However this may not be done where there is a need to have debug features throughout the product lifecycle.

In ARM-based SoCs, the debug infrastructure provides two debug modes: secure and nonsecure [ARM 2013]. If authenticated in secure mode, the debugger can trace instructions of secure and privileged software (potential assets in our case). Otherwise, only user-level software can be traced. The drawback here is the following:
If authenticated in secure mode, this debugger can trace its own secure software as well as all other secure software. If authenticated in nonsecure mode, this debugger cannot analyze its own code. So in the systems where the SoC assets (i.e. software) are provided by multiple parties, this approach fails.


The ICODE & DCODE Buses in Cortex M4
Some of the Cortex-M processors have separate ICODE and DCODE buses. These are 2 AHB buses from the processor which in certain circumstances can boost performance of the system by exploiting parallel operation of both the buses. Some say that these support Harvard Architecture, where the Instructions are separated from data. However, ARM cortex M3 and M4 do not have true-Harvard architecture. They only have what is called "Harvard Bus Architecture", this means, the code and data buses are 2 different buses, and code & data accesses can happen in parallel, but the code & data memory space is unified.
Moreover the type of data the DCODE bus will fetch will be the 'literals' embedded in the program.

The Cortex-M processors which support Separate ICODE and DCODE buses, provides an option to merge the operation of these as if they were a single bus, if it is so desired.
The input pin DNOTITRANS when tied to '1', will effectively merge the two buses, so that they would no longer operate in parallel. This means that they will no longer generate parallel accesses. However there will be still 2 physical buses on the processor boundary.

However if the parallelism of these buses are to be exploited,
1. DNOTITRANS input to the CORTEX-M must be tied to '0'.
2. The design would also make sure that there is a parallel path from these 2 buses to 2 different memory instances or memory banks which can be accessed in parallel.
3. The compiler must be directed to keep the 'literals' in separate memory instance/bank so that the execution would be able to do parallel access for ICODE and DCODE.

Literals:
Literals are the constant data which are embedded into code. For example, for a for loop, the maximum loop count will be a literal.
Other examples of literals could be string values. In the famous "Hello World" example, the string "Hello World" is a literal. These literals must be stored stored somewhere as data.




If we dont have a 'literal' cache attached to D-CODE bus, then I think these can be  unified.

For example, in the Cortex-M1 processor, there are two TCM interfaces: the ITCM interface is primarily for instruction memory (including literal data access inside a program), and the DTCM is primarily for data transfers.

The Cortex-M35P processor supports an optional built-in program cache (sometimes referred to as instruction cache but technically it is a unified cache that can cache both instruction and read-only data).

Debug Certificate Authentication Flow - Directly from ARM.

The high-level description of the debug authentication flow involves the following steps:

  1. Establishing link from external agent to the security enclave

This step essentially ensures correct provisioning of the CoreSight SDC-600 for transmitting protocol messages. This includes power-up of the internal block if unpowered, and performing a protocol discovery check.

  1. SoC ID discovery

Debugger requests the SoC ID from the Secure CPU. The relevant SoC ID is derived through CryptoCell and is delivered back to external debugger.

  1. Introducing debug certificate

Debugger requests the secure CPU to authenticate its debug certificate which includes the requested DCU values for this debug session. The debug certificate must be based on the provided SoC ID and must include Root of Trust (ROT) permissions to debug this specific platform.

  1. Debug access authentication

Secure CPU with the help of CryptoCell verifies the debug certificate and applies corresponding settings to the DCU before responding to debugger’s Introduce Debug Certificate command with a message which includes the current values of the DCU, indicating what capabilities are now available for use.