Teensy 4.1 Memory Configuration and Use
Overview
In this section we will cover the basic memory architecture of the Teensy 4.1 and commands for working with the memory as well as how the additional PSRAM and Flash chips that can be added to the Teensy 4.1 can be utilized.
If you ordered a Teensy 4.1 with the Prototyping System for Teensy 4.1, you probably opted for the additional memory (95% do), so this will give you an overview of the capabilities of that additional memory.
If you are moving up from a typical Arduino, you basically either had enough memory or you did not. There was not much control that you could exert over the amount of memory or its usage other than to look for another Arduino module that had more memory like moving from an Uno to a Mega 2560.
The much more powerful Teensy 4.1 not only has more memory to start with, but also has different types of memory along with the ability to add additional memory to the module. It also supports powerful commands for the user to manually intervene in how that memory is used.
When working with the Teensy 4.1 and modest size programs including any example programs you may come across, you can generally ignore the memory architecture as the Teensy software does a good job of optimizing the allocation of memory to provide maximum performance.
If you start working with very large programs or need to use large data arrays and things of that nature, then you may run into the need to exert more manual control of the memory.
The Teensy 4.1 has 1MB of very fast RAM built into the NXP iMXRT1062 microcontroller and 8MB of Flash memory is located in a small IC on the Teensy module. There are also two pad locations on the bottom of the Teensy 4.1 for adding up to 16MB PSRAM and up to 256MB Flash memory. We’ll take a quick look at the built-in memory first and then look at how the optional memory fits in with everything else.
The download window of the IDE shows the basic memory usage of the RAM and Flash when a program is compiled.
Built-in Volatile RAM
The Teensy 4.1 1MB of high-performance RAM memory is divided into two 512K chunks called RAM1 and RAM2. RAM is volatile memory and the contents are lost during a power cycle.
RAM1
RAM1 which NXP refers to as FlexRam, is accessed as tightly coupled memory with a 64-bit data bus for maximum performance. It is broken into ITCM (Instructions Tightly Coupled Memory) for code and DTCM (Data Tightly Coupled Memory) for data.
Code which is labeled FASTRUN in the program is automatically loaded from Flash into the ITCM memory at run-time for fastest execution. In fact, all code is automatically copied into the FASTRUN area of the ITCM if there is space for it unless it is marked as PROGMEM which keeps it stored in Flash. For that reason, marking code as FASTRUN in a typical program will have no impact on performance since the compiler is already doing that for you.
The initialized data variables from Flash are similarly also automatically copied into the DTCM at run-time.
RAM1 is allocated in 32kB chunks of either instructions (program code) or data. If a program is <32kB, it uses one chunk (32kB) and if the program is >32kB and <64kB, it uses 2 chunks (64kB) and so on. This can mean that just adding a small amount of code to a program may make it consume another 32kB of RAM1 if it causes it to cross one of the 32kB boundaries. Similarly, removing a small amount of code may save 32kB.
RAM2
RAM2 which NXP refers to as OCRAM (On Chip RAM) is optimized for access by DMA (Direct Memory Access) and also has a 64-bit data bus but operates at a slower clock rate.
Normally, large arrays and data buffers are placed into RAM2 to save the faster RAM1 for normal variables and program code to speed up program execution. RAM1 clock is 4X faster than RAM2, but RAM2 can leverage a built-in data cache, so typical performance can be similar depending on how much the cache is used.
DMA refers to setting up a data path between RAM2 and some other data source which allows the data to flow without requiring the help of the CPU to move the data, thus freeing it up to do other tasks. The Teensy Audio Library uses DMA which allows it to continue to stream audio data no matter what the main program happens to be doing at the moment, like sitting in a delay() loop.
Built-in Non-Volatile Flash Memory
Flash is non-volatile memory and the contents are preserved during a power cycle.
The Teensy 4.1 built-in 8MB of Flash memory is mainly used for storing the program code. It is the small IC located next to the Program button.
If not needed for code, a portion of the Flash can be used for general file storage using the LittleFS (little File System) library.
The Top 256kB of this memory is reserved for EEPROM emulation and the LED Blink restore program that is used to recover from a bricked Teensy due to a non-responsive program.
This Flash memory is accessed via a dedicated 4-bit QSPI bus. The speed of access to Flash is much slower than RAM1 or RAM2 which is why the program is automatically moved to RAM1 at runtime if there is space for it.
Optional PSRAM and Flash Memory
For many applications the built-in RAM and Flash memory is more than enough to meet requirements, but the Teensy 4.1 also has the option to add up to two additional memory chips on the bottom of the PCB to add PSRAM and/or additional Flash storage.
This extra storage can be handy for applications such as datalogging, storing large amounts of static data or working with large arrays and data structures that won’t fit into RAM1 or RAM2.
Optional PSRAM
PSRAM (Pseudo-SRAM) is dynamic memory and will lose its contents after a power cycle. It is typically used as scratchpad memory to hold temporary data.
The PSRAM memory chips are 8MB serial devices organized as 8M x 8 bits (64Mbit). The Teensy 4.1 can support up to two PSRAM chips which will provide 16MB x 8 bits (128Mbit) of storage.
They are accessed using a dedicated QSPI (Quad SPI) bus that can move 4 bits of data at a time instead of just 1 like normal SPI. The bus is configured to run at an 88MHz clock by default, so has a raw burst speed of up to 44Mbyte/sec not including miscellaneous overhead.
This speed is relatively slow in comparison to the built-in RAM1 or RAM2, but still fast enough for many applications and offers significantly more storage capacity for large arrays, data buffers and similar applications. PSRAM can also benefit from a built-in 32KB cache where cached accesses are very fast.
If higher performance is needed, this bus speed can be increased from 88Mhz to a maximum of 132Mhz by including this code snippet in Setup(). We currently test all of our memory at this higher bus speed without noting any speed related failures.
//************************** //Reset QSPI clock from 88Mhz to 132 Mhz. CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_OFF); CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_FLEXSPI2_PODF_MASK | CCM_CBCMR_FLEXSPI2_CLK_SEL_MASK)) | CCM_CBCMR_FLEXSPI2_PODF(4) | CCM_CBCMR_FLEXSPI2_CLK_SEL(2); // 528/5 = 132 MHz CCM_CCGR7 |= CCM_CCGR7_FLEXSPI2(CCM_CCGR_ON); //**************************
PSRAM is memory mapped in its own dedicated memory space. For 8MB of PSRAM it occupies the memory range of 0x7000,0000 to 0x707F,FFFF and for 16MB of PSRAM, it occupies the memory range of 0x7000,0000 to 0x70FF,FFFF.
Keep in mind that there is no initialization done on the PSRAM space, so the memory cannot be assigned data values at compile time. That means the data needs to be added at run time. This also means that the initial data in the PSRAM will be random at power up and should be initialized to zeros or filled with the data you need before using the memory.
Optional Flash Memory
The optional Flash memory is also QSPI like the Teensy 4.1 built-in Flash with similar performance. Both smaller NOR and larger NAND Flash chips are supported in sizes as large as 256MB/2Gb using the built-in LittleFS library.
The main thing to know about the optional Flash memory is that it cannot be used to simply increase the built-in 8MB program space. While the optional Flash memory cannot be used to directly increase program space on the Teensy 4.1, it can sometimes be used to stretch the program space by off-loading large data structures and the like from program space and putting it into the optional Flash chip.
Most commonly, the optional Flash memory is used for datalogging or storage of image or sound files. Since the Flash memory is non-volatile, any data stored will remain until it is intentionally erased. This usage mirrors using the SD card reader on the Teensy 4.1 and there is often a tradeoff to consider whether to use Flash memory or SD card storage in a given application.
Flash memory is available in several technologies and sizes for both cost and performance considerations.
NAND vs NOR Flash
Both NAND and NOR Flash chips provide non-volatile memory that saves its contents during a power cycle. Each has advantages and disadvantages, so the best one to use will depend on the intended application.
NAND Flash
NAND Flash uses very small memory cells arranged in a series architecture. The chip has enough address lines to access blocks of data within the chip but can’t access every data location.
Advantages of NAND Flash compared to NOR Flash
- Higher storage density so can fit more memory in the same size chip package and thus available in larger memory configurations.
- Lower cost.
- Higher write and erase speeds.
- Data accessed in larger blocks serially and can be accessed at high speed such as for streaming a single file.
These characteristics make it the technology of choice for USB drives, SSD Drives and SD cards where maximum storage space for lowest price is the main driver. Data is also generally read or written sequentially in large chunks such as when saving a picture to an SD card on a camera.
NOR Flash
NOR Flash uses larger memory cells arranged in a parallel architecture. The chip has enough address lines to directly access every memory location on the chip, similar to how a SRAM works.
Advantage of NOR Flash compared to NAND Flash
- NOR Flash is true random-access memory, so can access different data quicker making it better suited for applications which need to be read and written frequently and in smaller chunks.
- Faster read speeds for smaller and random data access such as when reading multiple files simultaneously.
- Longer potential life. Typically, 100,000 program / erase cycles and 20 years data retention for NOR compared to 60,000 program / erase cycles and 10 years data retention for NAND technology.
These characteristics make it the technology of choice when streaming multiple files, such as channels of music simultaneously or when running code directly from Flash where the program needs to bounce around and access different chunks of data.
LittleFS Library
The Flash memory on Teensy 4.1 uses the LittleFS library that is included in the Teensyduino and the full list of currently supported chips can be found in the documentation for the library. Note that this library supports both standard SPI as well as QSPI chips, but only QSPI chips can be used on the bottom of the Teensy 4.1: https://github.com/PaulStoffregen/LittleFS
To initialize the library, you use QSPIFlash for the smaller NOR Flash chips like the 16MB/128Mb and QPINAND for the larger NAND chips usually in the 128MB/1Gb or 256MB/2Gb sizes.
LittleFS_QSPIFlash myfs; // NOR FLASH
LittleFS_QPINAND myfs; // NAND FLASH 1Gb
The library automatically formats the Flash chip if needed the first time it is used.
This program simply looks for optional PSRAM and Flash memory chips that may be installed and reports what it finds. It uses build-in functions to return the sizes of these devices.
//=============================================================================== // Find Optional Memory Chips on Teensy 4.1 //=============================================================================== #include "LittleFS.h" extern "C" uint8_t external_psram_size; //=============================================================================== // Initialization //=============================================================================== void setup() { Serial.begin(115200); //Initialize USB serial port to computer // Check for PSRAM chip(s) installed uint8_t size = external_psram_size; Serial.printf("PSRAM Memory Size = %d Mbyte\n", size); if (size == 0) { Serial.println ("No PSRAM Installed"); } // Check for either NOR or NAND Flash chip installed LittleFS_QSPIFlash myfs_NOR; // NOR FLASH LittleFS_QPINAND myfs_NAND; // NAND FLASH 1Gb // Check for NOR Flash chip installed if (myfs_NOR.begin()) { Serial.printf("NOR Flash Memory Size = %d Mbyte / ", myfs_NOR.totalSize() / 1048576); Serial.printf("%d Mbit\n", myfs_NOR.totalSize() / 131072); } // Check for NAND Flash chip installed else if (myfs_NAND.begin()) { Serial.printf("NAND Flash Memory Size = %d bytes / ", myfs_NAND.totalSize()); Serial.printf("%d Mbyte / ", myfs_NAND.totalSize() / 1048576); Serial.printf("%d Gbit\n", myfs_NAND.totalSize() * 8 / 1000000000); } else { Serial.printf("No Flash Installed\n"); } } void loop() { // put your main code here, to run repeatedly: }
Memory Map
The graphic below is a spin on the PJRC graphic that shows the basic memory architecture of the Teensy 4.1 including the two optional memory chips on the right side of the diagram.
Next we will get into more detail on how this memory can be managed by the user.
Static Allocation of Memory
When the compiler builds your program, all variables, static variables and compiled code is assigned to dedicated locations in memory. This is called static allocation because the memory addresses are fixed at compile time. By default, allocation tries to use the ultra-fast DTCM and ITCM memory for all this storage.
The following keywords allow you to have some control over where the compiler will place variables and code within the memory. While this can be ignored for typical example programs or other smaller programs, it becomes more important and can become critical as the programs get larger and start to tax the available memory. The programmer can then decide which parts of the program need to remain in fast RAM1 or RAM2 and which can be run from slower memory locations like Flash or PSRAM.
- DMAMEM – Variables defined with DMAMEM (DMA Memory) are placed at the beginning of RAM2. Normally large buffers and arrays are placed here. These variables cannot be initialized at compile time. Your program must write their initial values.
- EXTMEM – Variables defined with EXTMEM (External Memory) are placed into the optional PSRAM chip or chips soldered to the bottom of the Teensy 4.1. These variables also cannot be initialized so your program must write their initial values if needed.
- PROGMEM – Variables defined with PROGMEM (Program Memory) are placed only in the Flash memory and are not copied to RAM1 at run-time. They are accessed normally despite residing in Flash.
- FLASHMEM – Functions defined with FLASHMEM (Flash Memory) are executed directly from Flash. FLASHMEM should be used on startup code and other functions where speed is not important so they don’t take up space in RAM1 if space is getting tight.
- F() – Strings surrounded by F() are also placed only in the Flash memory.
Allocating Static Variables in PSRAM
You can statically allocate variables in PSRAM using EXTMEM at compile time which is a pointer to that memory space. The following allocates 1MB of space in the PSRAM chip for holding character data.
EXTMEM char bigBuffer[1000000];
Once PSRAM memory is allocated, it appears as ordinary memory and standard functions can be used to work with it. For instance, strcpy() can be used to copy characters into our PSRAM buffer.
strcpy(bigBuffer, “Hello World”);
Similarly to copy memory from a buffer in RAM1 or RAM2 to bigBuffer in PSRAM we can use memcpy()
memcpy((void*)(bigBuffer),buffer, len);
Dynamic Allocation of Memory
As your program runs, it may use all of the RAM which was not reserved by Static allocation. Because the specific address for each variable is computed as your program runs, this is called dynamic memory allocation.
Local Variables – Local variables and return addresses from function calls and the saved state from interrupts are placed on a stack which starts from the top of RAM1 and grown downward. The amount of space available for local variables is the portion of RAM1 not used by FASTRUN code and initialized and zeroed variables.
Heap – Memory allocated by using C++ new and C malloc() as well as Arduino String variables are placed into RAM2, starting immediately after the EXTMEM variables.
External Heap – if PSRAM has been added, extmem_malloc() can be used to allocate this memory, starting immediately after the EXTMEM variables. When no PSRAM is present, extmem_malloc() automatically allocates memory from the normal heap in RAM2
Allocating Dynamic variables In PSRAM
If you want to dynamically allocate PSRAM memory at run time, you can use extmem_malloc() to allocate the PSRAM memory which isn’t already consumed by EXTMEM variables.
extmem_malloc() will attempt to use PSRAM memory, but will automatically fall back to internal RAM2 when a PSRAM chip either isn’t available or it has already been fully allocated.
The function extmem_realloc() allows the memory that was previously allocated to be freed up, possibly resized and reallocated. extmem_free() frees up the previously allocated memory.
How to Get the Optional Memory
Order Teensy 4.1 with Memory Preinstalled
The easy way is to use our Fully Loaded series of teensy 4.1 which are modified with various memory configurations preinstalled and tested. These include:
- 16M PSRAM
- 8MB PSRAM and 16MB/128M-bit NOR Flash
- 8MB PSRAM and 128MB/1G-bit NAND Flash
- 8MB PSRAM and 256MB/2-Gbit NAND Flash
These memory configurations are available in two different flavors of the Teensy 4.1 depending on planned usage.
The Teensy 4.1 Fully Stuffed version is configured for use with a standard solderless breadboard and similar applications and include the additional memory plus a male Ethernet header is installed on top and the standard pin headers also installed.
The Teensy 4.1 Fully Stuffed for Prototyping System version has all the I/O such as the Ethernet , USB Host, VBAT and VUSB all coming down from the bottom of the Teensy for use with our Prototyping System for Teensy 4.1 baseboard. It also includes a reverse current Schottky diode between VIN/VUSB for isolating power supplies. This version will work with most PCB baseboards where the I/O is being brought down from the Teensy to the baseboard. This version will not work with a solderless breadboard due to the extra pins on the bottom.
DIY Installation
The user can of course also buy the memory chips and hand solder them on themselves. We offer the chips separately for users that want to go that route. The PSRAM chips are also available from PJRC and the Flash memory can usually be found at distributors like Digikey.
Manually soldering the chips is of moderate difficulty mainly related to ensuring good solder flow between the pads and the pins of the IC. Using a liberal amount of flux is generally key to getting good solder flow and avoiding solder bridges. Also don’t try to use a tiny soldering iron tip with a needle point which is probably the most common mistake. You want something with enough thermal mass to heat the joint quickly and allow the solder to flow well. The ideal tip to use is the Bevel or C-series tip which looks like a round conical tip with the end cut off at a 45 degree angle but a standard conical or smaller chisel tip will also work OK.
To solder the IC, first apply a little solder to one corner pad. Then apply liquid flux to all the pads and place the IC on the pads and reheat that soldered pad and pin to tack the IC in place. Be sure to observe proper orientation of the IC. The opposite corner pin can then be soldered to hold the IC solidly in place. Reheat these two pins and reposition the IC if needed to get the IC reasonably well centered on the pads. A magnifier of some type makes this operation go much easier. The other pins can then be soldered to their pads. Applying another dose of liquid flux to the IC pins will help the solder flow well.
The larger capacity NAND Flash memory chips in the large flat WSON package with leads tucked under the body require a little extra care. The package must be centered well on the pads to avoid the metal slug on the bottom of the IC from possibly shorting across the pads under the body of the chip. This metal pad is not electrically tied to anything but can possibly short across adjacent pads. The solder should also be kept to the outer edges of the pads only to avoid the possibility of a solder short to the IC metal pad. This is mainly a concern if using solder paste and hot air reflow.
If hot air reflow is an option, use a solder paste with a low temperature melting point so that the solder paste will reflow before possibly reflowing the solder on the other components on the module.
If also installing the header pins on the module, the memory chips should be soldered first for easiest access and then tested before the pins are installed in case any rework is required
The flux residue can be cleaned with isopropyl alcohol and small brush. Pure 99%+ isopropyl is best for cleaning, but standard 70% drugstore stuff will also work OK. Standard flux cleaners will also work.
Going Further….
In this section we just touched on the main points of how the Teensy 4.1 memory is organized and can be used in programs. Optimizing the use of memory can be a complex topic beyond the scope of this tutorial and the best source of information and help on this topic can be found on the Teensy Forums.