System on Chip Architecture Tutorial
Memory Architecture for ARM Cortex-M based SoC
-Aviral Mittal

Connect @ https://www.linkedin.com/in/avimit/

SITE HOME

Memory Architecture for Cortex-M bases System on Chip.
Once you hare done with the processor Selection (i.e. among the ARM Cortex-M family), the memory architecture is perhaps the second most important aspect of the SoC architecture. Memory Architecture depends upon the processor selection.
For example if you select Cortex-M7, you have built in Instruction Cache and Data Cache memories built in the processor as an option, so your memory system may not have any Caches.
Cortex-M7 also has Tightly Coupled Memory (TCMs) which offer very fast code execution (from Instruction TCM) and very fast data access (from Data TCMs).

NVM Memory choice for code storage.
Now since this techerature is about Hostless SoC, i.e. a SoC which is self sufficient, and is the main SoC for the system, it ought to have Non Volatile Memory (NVM) for code storage.

Flash or R-RAM as NVM for code storage.
A popular choice is Flash NVM. For geometries above 28nm, you can also have e-Flash i.e. embedded Flash, i.e. Flash memory integrated within the SoC, but for geometries less than 28nm, e-flash is usually not available due to technology limitations, so you may want to choose external Flash device, which typically is accessed via SPI interface, or a Quad SPI interface (QSPI) or a Octa SPI Interface. (OSPI). However if you want to have NVM on the SoC itself for finer geometries, you can go for R-RAM (Resistive RAM) or M-RAM (Magnetic RAM). However you have to bear in mind that R-RAM is a costly affair, and can bump up the cost of the SoC significantly.

ROM for code storage
OK you can have ROM, which is very fast, very cost effective, very low power when compared to Flash or R-RAM, but the problem with ROM is, it isn't very flexible. It cannot be overwritten, so if you want to update your system at later stages, ROM wont let it happen. However ROM is the most 'secure' memory, and does not generally need authentication or de-cryption etc. So if your system is stable enough, if your code does not need updates during the system life-time, ROM is the way to go. Sometimes, during development a SoC may have a flash or R-RAM, and initial version of SoCs are released with Flash/R-RAM, and as the code matures, the Flash/R-RAM is replaced by ROM.

OnChip R-RAM Vs Flash:

On Chip R-RAM                                        
Nand Flash             
NOR Flash
Costly   
Cost Effective
Cost Effective
Fast, access time ~30ns
Very Slow, large Access Times, Initial Latency 10,000 ns (10 us), Sequential latency ~50ns
Slower but not so slow. Initial latency 50 ns, sequential latency ~10 ns
Low power consumption
High power consumption

Higher density, more bits/sq mm
Lower density.

More reliable than Flash
Less reliable than R-RAM.

Scaleable to below 10nm
Technology limitations below 22 nm

No special voltage requirements
Needs special high voltage for write operation


Program @ ~2.5 Mbytes/s
Program @ 0.3 Mbytes/s

Erase @ ~8 Mbytes/s
Erase @ 0.2 Mb/s

Density ~ 64 MBit - 16 Gbit
Density ~1 Mbit - 1 Gbit

Access Method : Sequential
Access Method : Random Access Possible.

It is important to consider that the XIP code storage in Flash will typically be in NOR Flash, as it offers random access, unlike NAND flash where random access is not possible.
The NOR Read Initial Access times are typically around 50 ns, with read sequential access times of around 10ns
The NAND Flash Read Initial Access times are huge, typically in us e.g. 10us and Read sequential access times are typically around 50ns.

Its quite obvious from the above table, that R-RAM wins at most of the points, hence if the cost permits and your SoC needs more performance for less power R-RAM will be the choice for NVM.
However R-RAM is a very new technology at the time of this writing (Feb 2020), and is quite expensive at the moment, hence most of the SoCs that use ARM Cortex-M class processors still either have e-Flash or off-Chip Flash.

Note: It is important to note that R-RAM is not replacement for on-chip RAM. It still has finite write cycles ~10,000, hence it cannot be used as normal RAM is used on the SoC.

On-chip Vs off-Chip NVM:
On-chip NVM can be considered more secure

XIP vs No XIP.
While considering the NVM, if it is Flash or R-RAM (and not ROM, ROMs are usually always XIP), you may also want to consider if you need XIP. XIP is execute-in-Place. You can find more on XIP here.
If your system does not have high performance requirements, XIP can be really good proposition. It is very cost effective, as it is a lot cheaper than on-chip RAM, and your code is being directly executed from this memory. Using cache-memories can bridge the performance gap and provide more than adequate performance for a variety of applications, even if the NVM is off-chip Flash.
However the downside of using XIP from off-chip Flash is high power consumption. You will end up consuming far more power in off-chip XIP process than if you copy the code once into on-chip RAM and execute from on-chip RAM. So these are the kind of trade-offs you have to make as a SoC Architect. Cost/Power/Performance. And there is no right/wrong way. It depends upon what are the use cases of your SoC.
Then not using XIP means that the code needs to be copied from NVM to RAM, that means you need more memory to store the code, as there is code-replication. But in these cases, the code is compressed in the NVM, and then decompressed into the RAM which saves some amount of memory, but still No-XIP means more system memory than XIP.


Distributed Local Memories Vs Shared Memory:
Another thing/issue with memory architecture is to compare and contarst local vs shared memory architecture i.e. 1 single large pool of memory as shared memory, or have local memories in subsystems?
While shared memory provides flexibility in terms of the usage of memory, it cannot provide the speed of a local memory.
A large memory pool of shared memory is readily available to any subsystem which wants to use it. But local memory is typically dedicated to a sub-system.
Local memory if not used fully, may not be used by other subsystems. e.g. Local memory belonging to Subsystem A may not be available to subsystem B,
even when subsystem A is not using all of its own local memory.
Shared memory to a processor of a subsystem will be available via some kind of SoC bus or SoC interconnect. SoC interconnect will have latencies.
Then there could be arbitration latencies to shared memory, as shared memory will be accessed by multiple subsystems at the same time.
Hence performance of Shared memory based systems wont match the performance of local memory based systems.
The performance impact of latencies to Shared memory systems can be mitigated using Cache memories locally in subsystems.



<= PREV Which ARM Cortex-M processor                                     Next => SoC: Clock Sources

Click Here to Make Comments or ask Questions
SITE HOME