Power management methodology supports 2 sleep modes
From M4F processor
logic point of view, there is little difference between the above
two sleep modes. However, it’s the way these sleep modes may be
utilized and/or implemented, which may make a difference. For
example, the first i.e. the ‘Sleep’ mode may be implemented such
that it’s only the clocks to the processor and any related logic
which are stopped and the ‘Deep Sleep’ may be implemented such
that the clocks and power both can be switched off to the
processor. Its to be noted that if a WIC is implemented, the
corresponding functionality is not available in ‘Sleep’ mode for
M4F processor. i.e. the WIC’s functionality and features are only
available in ‘Deep-Sleep mode’ for M4F processor. Deep Sleep in
turn can be implemented with and without state retention.
The M4F processor has a register called SCR
‘Status control Register’. This register has a bit which selects
between the ‘Sleep’/’DeepSleep’ modes. The subsystem software will
write to this bit to indicate which sleep mode is required, when
the processor next enters the sleep mode. The processor enters the
sleep mode by executing an instruction called ‘WFE (wait for
event) or WFI (wait for Interrupt)’. Soon after the processor has
executed one of these instructions, the processor will assert a
hardware signal (o/p signal w.r.t the processor itself) to let the
outside world know that its OK to stop clock and/or power to the
The M4F processor may
be implemented using what is called the SRPG methodology. SRPG
stands for State Retention Power Gating. This allows the internal
‘state’ of the processor to be ‘retained’ during low power modes,
such that when the power is re-applied to the processor, it starts
working from where it was before entering the sleep mode. This is
an alternative method which may be used as opposed to copying the
‘state’ of the processor in RAMs, and then re-copying the state
from RAM to back to the processor on power up. Of course, this is
very helpful in reducing the turn-on time for a processor from
deep-sleep state, however since it is implemented using ‘state
retention’ flops, it will consume power while in deep-sleep state.
Usually, all the sequential elements inside the processor are
‘retained’, and these state retention flops are powered by a
different power rail. The implementation SRPG will require
specific control over how the reset is applied, as while coming
out from the deep-sleep mode which uses state-retention, the
processor reset cannot be applied.
The use of this
methodology is also technology dependent, as the technology
library must support ‘retention’ flops or something equivalent.
The WIC or Wake-up
Interrupt Controller is an optional component with the M4F
processor. The WIC is used to help wake-up of the processor from
deep-sleep power state, when a valid interrupt is received at its
boundary. While going into the deep-sleep mode, some of the data
e.g. interrupt makes & priorities are copied from the
processor NVIC to the WIC automatically. There are also a couple
of hand-shake signals which must be activated, to agree that the
next deep-sleep mode the processor will enter, will be a WIC
WIC is a configuration
option and can be used/omitted as required.
WIC has no program-able
registers. It has nothing to do with software, it is completely
transparent to software.
In principle the WIC
can be implemented without a clock as well, where it requests to
the power controller to make its clock/power available, as a valid
interrupt is asserted towards it. However, this is not usually
done, as it requires change to the generated RTL, and the
implementation has some limitations and after-effects which must
be taken care of.
Based upon the
capabilities and the option(s) available for the M4F processor,
following is the summary of what power management modes may be
1.Light Sleep Mode: Where
only clock(s) i.e. FCLK and HCLK to the processor are stopped.
2.Deep Sleep Mode: Where
the power domain containing the M4F processor may be switched off.
This mode can be further categorized into two kinds
a.Deep Sleep with State
Retention: It will allow fast wake-up times at an expense of
increased leakage. Following a wake-up the processor is in a state
which is the same as the state it was in before going into Deep
b.Deep Sleep without State
retention: It will allow most power savings at an expense of
decreased wake-up times. Before going to Deep Sleep, the processor
state will have to be stored in SRAMs. Following wake-up, the
processor will have to be re-initialized from scratch, before it
will come in a state in which it was before going into deep-Sleep.
Some Random Info
FCLK vs HCLK
FCLK is told by arm as 'free running clock'. This clock also powers
the WIC. So in case you are using the WIC, the FCLK may not be
turned off. Since the WIC functionality is only available in
Deep-Sleep for Cortex M4, this means that if you want to power gate
the ARM processor to do max power savings during deep-sleep mode,
then you have to make sure that the WIC is implemented out of the
power domain you want to put the rest of the M4 in. WIC will have a
separate FCLK branch going into it. During Deep-Sleep, the FCLK
branch going into the WIC will be running, while the FCLK branch
going into the rest of the M4F can be gated.
HCLK is the one which is gated when the M4F goes in Sleep mode. The
processor will assert a signal called 'SLEEPING' which can be used
by hardware to gate the HCLK. The 'SLEEPING' in turn is asserted
following the WFI instruction execution by the processor.
Now if you are planning to use the SRPG, then when coming out of
power collapsed state, you must NOT issue a reset. That means,
control of the reset must be separated from control of the power
domain collapse/wakeup function.
I was wondering what is the purpose of the signals
These signals are merely to agree the sleep 'mode'.
A handshake on the above signals will make sure that any sleep which
happens following this handshaking,
will be 'wic' mode sleep, i.e. the WIC will be automatically loaded
upon WFI instruction.
The WICENREQ is input signal to the M4F Integration level RTL,
WICENACK is output signal from M4F Integration level RTL.
It is possible to tie WICENREQ to '1' permanently to indicate that
the all the sleeps are WIC mode sleep.
It is also possible to tie WICENREQ to '0', if WIC is not
ARM Cortex M Boot Process:
The way cortex M4 boots or may be all cortex-Ms boot is bit
1. Processor fetches data from 0x0000_0000, and load this 32 bit
value in its stack pointer:
2. Processor then fetches data from 0x0000_0004, and assumes it as a
jump address, and jumps to this address and start executing from
this address: For example:
0x0000_0000 : 0x0100_0000
0x0000_0004 : 0x0000_0201
The processor will load the value of 0x0100_0000 in r13 (stack
The processor will jump to 0x0000_0200
(not 0x0000_0201) and start execution
from there. The LSB must be '1' for each value indicating that
"Thumb Code" : whatever it means.
Hence the value at the address 0x0000_0004 is also called reset
This means, anytime the processor is reset, it will do the above two
The meaning of subsequent values in the memory space i.e. value at
0x0000_0008, 0000_0010 is also pre-defined, and assumed to be jump
locations for different types of interrupts/exceptions.
Cortex-M4 Resets Info:
If power-on reset is present, it resets both the system and debug
system. The reason that we separate the reset into two signals is to
allow the processor to be reset without affecting the debug system.
Otherwise, the debug settings like breakpoints, watchpoints, and the
debug connection from the debugger to the core, would be lost each
time the processor core is reset
GATEHCLK : This is asserted when the processor
is in sleep-mode, and there is no debug connection. This signal
can be used to gate off the system clock.
This input can be tied off to 1 or 0. This is for 'stack pointer'
When something is pushed on the stack, the stack can end-up with
an address which is not aligned to 32 bit location or a 64 bit
location. This input, tied to a constant, is used for selecting
between 32 bit or 64 bit alignment of the stack pointer at every
exception entry. It can be later on changed by a register
Usually its good to enable 64 bit alignment.
Cortex-M4 System Bus:
For Cortex-M3 and Cortex-M4 processors, the internal bus
interconnect has a registering stage between the instruction fetch
interface and the system bus. Therefore, the performance of the
system is reduced if the software image is executed from the
Cortex-M4 : Code & System Memory Aliasing:
If needed, it is possible to have an SRAM shared between code and
SRAM regions by having bus accesses from both code and system
buses (i.e., memory address aliasing). This allows the
software to use a single SRAM block and execute code from
SRAM without performance loss:
Vector Table Offset Register:
Cortex-M0+, Cortex-M3 and Cortex-M4 processors: by default the
vector table is located in the starting of the memory map (address
In Cortex-M7, Cortex-M23 and Cortex-M33 processors: the default
value for VTOR is defined by chip designers. Cortex-M23 and
Cortex-M33 processors can have two separated vector tables for
Secure and Non-secure exceptions/interrupts.
TCM vs AHB based SRAM:
No performance difference as such. However TCM may reduce some
integration complexity. TCM will limit the address ranges, and the
size of memory, but with the AHB based srams, we have more
In some processor designs, the use of TCM is required to allow
deterministic interrupt responses. For example, in the Cortex-M7
processor, access to memories on the AXI bus system can have non-
deterministic timing due to cache hit/miss scenarios. Having TCM
enables interrupt services to be carried out quickly in
deterministic manners. But in small processors like Cortex-M0 to
Cortex-M33, the omission of a TCM feature is not a real issue.
TnD: Trace & Debug
Reset – debugger can request a reset of the target board,
typically a system reset through the SYSRESETREQ feature.
Bare-Metal M4 power-on reset boot: You have M4 on a SoC, the M4 is connected to SRAM, the
SRAM do not have any code in it. How can you then power it on?
Usually the design of the SoC would support some method for a
debugger to be attached to the SoC, and that debugger then will have
control over the processor SYSRESETn, also it will have control over
At power-up, SYSRESETn and PORESETn both are asserted.
HCLK and FCLK both are then started, via the debugger gaining access
to clk control
DAPCLK is then started, and DAPRESTn then de-asserted
PORESETn is then de-asserted:
This will enable a path from the debugger to the SRAMs via the
CORTEXM4 processor's Bus Matrix, which is 1/2 in reset and 1/2
1/2 in reset because SYSRESETn is still asserted.
1/2 out of reset because, PORESETn is de-asserted.
The path to SRAMs will be something like this
SWD/JTAG -> DAPBUS->AHB-AP->CORTEXM4 Bus Matrix ->
Subsystem Bus Matrix -> SRAM.
You will download the code to SRAM
The debugger will then de-assert the SYSRESETn
The processor will start to run.
The new processors support a CPUWAIT signal. In multi processor
SoCs, the processor may run from RAM, and the RAM may be empty at
the start. When the reset is DE-asserted to the processor, the
CPUWAIT is still asserted, so that the processor do not run. Now,
the code may be copied to the processor's SRAM. When the code-copy
is over, the CPUWAIT signal is DE-asserted, and the processor then
starts to run.
However the above method is suitable when the M4 Subsystem has been
designed in such a way that the above is possible. There is another method as suggested by ARM:
One of the common questions from new Cortex-M designers is: How can
you bring up a microcontroller device first time without any valid
program in the embedded flash? The actual sequence is no different
from normal flash programming:
When the device starts up for the first time, since the flash do
not contain a valid program image, it will quickly enter fault
exception and eventually go into LOCKUP state.
Even if the device is in LOCKUP state, the debugger can still
establish a debug connection via JTAG/Serial Wire.
The debugger can then enable a reset vector catch (a debug
feature in the Cortex-M processors), and use System Reset Request
(by programming Application Interrupt and Reset Control Register,
AIRCR) to reset the system. When the processor comes out from system
reset, it enters halt state immediately because the reset vector
catch is enabled.
The debugger can download the flash programming algorithm and
pages of program image into SRAM and set the PC (program counter) to
launch the flash programming algorithm.
When all the required flash pages are programmed, it can reset
the system again to start the application or to debug it.
The same concept can also be applied to devices that run code from
external flash (e.g., QSPI flash).
What is DAP:
DAP stands for Debug Access Port: Its actually a collection of a few
The Serial Wire Debug Port -> Debug Bus
Debug bus -> AHB Access port/Or APB Access Port/Or AXI Access
The above whole of the infrastructure I guess collectively is termed
In systems that need debug authentication support, the CoreSight
debug authentication signals are connected to a debug authentication
control unit (not a part of the Cortex-M processor) that
authenticates debug connections. The authentication process
typically based on the product’s life cycle state and user’s input
such as debug certificate or password. Based on guidelines from
Platform Security Architecture (PSA), generally certificated based debug
authentication is preferred over password-based authentication
for products that can contain sensitive information.
Other Debug thingies:
Power down is normally disabled when a debugger is attached. This is
because debuggers require access to the processor even when the
processor core is in sleep modes. In many such cases, the power down
FSM is automatically disabled by the WIC interface inside the
processor. As a result, testing of deep sleep can show a different
set of behaviors and interrupt latency when a debugger is connected.
Detect a Attached debugger? How to check if debugger
is attached for ARM cortex M processors? C_DEBUGEN can
only be written by (the) AHB-AP and not by the core. So if this
bit is '1', the debugger is attached.
Power Management: the debug interface modules provides
handshaking signals to indicate whether there is a debugger
connection, which allows system designers to implement power
management for the debug system of the processors if needed.For example, in
Cortex-M0, Cortex-M0+, Cortex-M7, Cortex-M23, Cortex-M33, and
Cortex-M35P processors, there is a separate debug power domain
that can be powered down if there is no debug connection.
In most cases,
the debug interface pins (JTAG or Serial Wire Debug) need to be
accessible at the device’s top-level by default. For Cortex-M3,
Cortex-M4 and Cortex-M33 processors, the debug interface module
supports dynamic protocol switching, so it is possible to expose
just two pins of the SWD debug by default. If there is a need to
switch over to JTAG, then you can program a device-specific pin
multiplexer (mux) control register to expose the other pins for
JTAG, and then apply a switchover sequence to start JTAG
More Debug Requirements/Ideas Debug access port (DAP) – You might optionally move the
SWJ-DP to the always-on power domain to allow the debugger to wake
up the system with a debug connection. An alternative solution is to
use another hardware mechanism to wake up the system so that
debugger can connect to the processor to start the debug sessions. I
will prefer the latter, as putting DAP in AON will just burn power
in mission mode.
Potentially some peripherals like watchdog timers might need to
suspend their operations when the processor is halted. Otherwise, a
reset could be triggered unexpectedly during debugging. Some timers
(e.g., SysTick timers inside the Cortex-M processors) also stop
counting automatically when the processor is halted to allow
single-stepping of application code.
The allocation of interrupt signals affects the C head files for
software development, including the vector table definitions and
interrupt numbers, which are both visible to the software.
On all current Cortex-M processors, the interrupt signals:
Are active high and must be synchronous to the processor’s
system clock signal;
Can be level triggered or pulse triggered. If using pulse
triggered, the duration of the pulse must be at least one clock
Possible NMI uses:
In common embedded systems the NMI could be connected to:
Voltage monitoring logic (also known as brownout detector) to
ensure that the system is shut down correctly when support
voltage drops to a certain value or
The NMI could be connected to a watchdog timer to carry out
remedial actions if the system has stopped normal operation.
Faults in normal interrupt handlers allow the Hard Fault handler (or
other configurable fault handlers) to be triggered and executed.
A fault generated within the NMI handler can cause the processor to
enter lockup state.
If a peripheral generates an interrupt request in the form of a
level signal, the interrupt handler must clear the request at the
The key advantage of a pulsed interrupt is that it saves a few clock
cycles in the ISR that there is no need to clear the interrupt
requests at the peripherals.
However, in many cases, a level-triggered interrupt is preferred
Cross clock domain synchronization of level-triggered
interrupts is simpler than pulsed interrupts. In the case where
pulse interrupt synchronization logic is used, two successive
interrupt request pulses could be merged into one after the
synchronizer due to the latency of the synchronization, which
can be confusing.
If the interrupt event occurred when the processor is reset,
the interrupt event could be lost.
Level trigger interrupts can remain at a high level to
indicate an additional service is needed by the peripheral
(e.g., when additional data is available in a receiver’s FIFO).
Easier for debugging (e.g., in Verilog simulation, where it is
hard to tell if there has been an interrupt event unless the
event information is kept by, for example, a waveform database).
The peripheral design can be reused for other processors that
do not support pulsed trigger interrupts.
The event interface is typically used in multi-core systems to allow
one processor to wake up another during spinlocks. In RTOS
semaphores, if a processor is waiting for a spinlock, it can enter
sleep mode using WFE to save power and wakes up if there is an
interrupt to serve or if there is an event from another processor
For single-processor systems, it is fine to tie RXEV to 0 and leave
All of the Cortex-M processors use an asynchronous
active-low reset signal and must be de-asserted synchronously to the
system and debug clock to prevent timing violations
Most of the Cortex-M processors require the reset to last at least
two clock cycles
This is controlled by a register bit in the Application Interrupt
and Reset Control Register (AIRCR) inside System Control Space.
Software to request a system reset, for example, in the case
of fault error handling;
Debugger to request a system reset. This is essential to allow
the debugger to request a reset of the targeted processor
Designers must make sure that:
SYSRESETREQ only generates a system reset but not debug reset
or power-on reset;
SYSRESETREQ do not generate a system reset in a combinatorial
We can also design the reset generator so that it can optionally
reset the system if it enters lock-up state. To make this
behavior controllable, a programmable register would be needed in
your FPGA/ system design to specify if a lock-up state can cause a
reset. This register is not provided in the Cortex-M processor core
as such requirement is application dependent. During software
development, the control signal at this external reset control
register can be set to 0 to disable the automatic reset. In a
production system, the reset control register can be set to 1 so
that when the system enters lockup state, the SYSRESETn is activated
Interface for debug connection (JTAG or Serial Wire Debug) – for
connecting a debugger to the hardware target to carry out halting,
stepping, restart, resume, setting breakpoints/watchpoints, access
to memories and peripherals.Debug connection is also used for
downloading code and flash programming.
By default, the JTAG instruction register is locked and a debugger
has no access to the debug infrastructure. To initiate debugging, a
debugger must first be authenticated, and its tag ID is returned if
successful. More on how the debugger is authenticated can be found
in the SoC Security article.
Secure Asset Tagging:
The SoC integrator tags each asset with an ID of the asset owner.
The ID itself has no confidentiality requirements; it is simply used
to indicate to which debugger(s) the asset can be exposed via debug
During debug, if an asset is being traced, its tag is compared to
the ID of the authenticated debugger. If the comparison matches, the
asset is traced; otherwise, it is obfuscated.
How to secure SoC assets? BTW SoC Asserts is nothing but the
Software on the SoC, for example it can be a MP3 decoder code,
or some other algorithm.
Easiest way is to permanently disable the debug infrastructure after
silicon validation (i.e., by blowing fuses). However this may not be
done where there is a need to have debug features throughout the
In ARM-based SoCs, the debug infrastructure provides two debug
modes: secure and nonsecure [ARM 2013]. If authenticated in secure
mode, the debugger can trace instructions of secure and privileged
software (potential assets in our case). Otherwise, only user-level
software can be traced. The drawback here is the following:
If authenticated in secure mode, this debugger can trace its own
secure software as well as all other secure software. If
authenticated in nonsecure mode, this debugger cannot analyze its
own code. So in the systems where the SoC assets (i.e. software) are
provided by multiple parties, this approach fails.
The ICODE & DCODE Buses in Cortex M4
Some of the Cortex-M processors have separate ICODE and DCODE buses.
These are 2 AHB buses from the processor which in certain
circumstances can boost performance of the system by exploiting
parallel operation of both the buses. Some say that these support
Harvard Architecture, where the Instructions are separated from
data. However, ARM cortex M3 and M4 do not have true-Harvard
architecture. They only have what is called "Harvard Bus
Architecture", this means, the code and data buses are 2 different
buses, and code & data accesses can happen in parallel, but the
code & data memory space is unified.
Moreover the type of data the DCODE bus will fetch will be the 'literals' embedded in the program.
The Cortex-M processors which support Separate ICODE and DCODE
buses, provides an option to merge the operation of these as if they
were a single bus, if it is so desired.
The input pin DNOTITRANS when tied to '1', will effectively merge
the two buses, so that they would no longer operate in parallel.
This means that they will no longer generate parallel accesses.
However there will be still 2 physical buses on the processor
However if the parallelism of these buses are to be exploited,
1. DNOTITRANS input to the CORTEX-M must be tied to '0'.
2. The design would also make sure that there is a parallel path
from these 2 buses to 2 different memory instances or memory banks
which can be accessed in parallel.
3. The compiler must be directed to keep the 'literals' in separate
memory instance/bank so that the execution would be able to do
parallel access for ICODE and DCODE.
Literals are the constant data which are embedded into code. For
example, for a for loop, the maximum loop count will be a literal.
Other examples of literals could be string values. In the famous
"Hello World" example, the string "Hello World" is a literal. These
literals must be stored stored somewhere as data.
If we dont have a 'literal' cache attached to D-CODE bus, then I
think these can be unified.
For example, in the Cortex-M1 processor, there are two TCM
interfaces: the ITCM interface is primarily for instruction memory
(including literal data access inside a program), and the DTCM is
primarily for data transfers.
The Cortex-M35P processor supports an optional built-in program
cache (sometimes referred to as instruction cache but technically it
is a unified cache that can cache both instruction and read-only
description of the debug authentication flow involves the
Establishing link from external agent to the security
step essentially ensures correct provisioning of the CoreSight
SDC-600 for transmitting protocol messages. This includes power-up
of the internal block if unpowered, and performing a protocol
SoC ID discovery
requests the SoC ID from the Secure CPU. The relevant SoC ID is
derived through CryptoCell and is delivered back to external
Introducing debug certificate
requests the secure CPU to authenticate its debug certificate
which includes the requested DCU values for this debug session.
The debug certificate must be based on the provided SoC ID and
must include Root of Trust (ROT) permissions to debug this
Debug access authentication
CPU with the help of CryptoCell verifies the debug
certificate and applies corresponding settings to the DCU before
responding to debugger’s Introduce Debug Certificate command with
a message which includes the current values of the DCU, indicating
what capabilities are now available for use.