This project proposes to provide restructuring and improvement of existing Bela Software Code to allow for compatibility and easier transition to newer Texas Instrument Sitara Processors (like the AM5729 in the BeagleBone AI).
As given on the official website, Bela is a hardware and software system for creating beautiful interaction with sensors and sound.
Bela has a lot of analog and digital inputs and outputs for hooking up sensors and controlling other devices, and most importantly Bela has stereo audio i/o allowing you to interact with the world of sound.
All Bela systems so far use the same Bela software. It uses a customized Debian distribution which - most notably - uses a Xenomai kernel instead of a stock kernel. Xenomai is co-kernel for Linux which allows to achieve hard real-time performance on Linux machines (ref: xenomai.org). It thus takes advantage of features of the BeagleBone computers and can achieve extremely low-latency audio and sensor processing times.
Although the proposal Title mentions support for AI, I have developed a standardized setup that allows an easy jump across all TI chips.
Bela and BB
Bela systems have used BeagleBoard computers from the very beginning. Bela uses the BeagleBone Black, and Bela Mini uses the PocketBeagle.
Both the BeagleBone Black, the PocketBeagle, (and also the BBAI) feature programmable real-time units, or PRUs, which are central to the way Bela works. These PRUs enable Bela's ultra-low latency processing: They are fast (200MHz, 32-bit) processors with single-cycle I/O access to a number of the board's pins, as well as full access to the internal memory and peripherals.
Applications of Bela:
Bela is ideal for creating anything interactive that uses sensors and sound. So far, Bela has been used to create:
Why add support for BBAI/newer TI chips?
The Beagle Black was launched over 7 years ago in 2013 and newer and better TI Sitara Processors have been launched ever since. It would be better to have a more standardized setup that allows an easier jump across TI chips. Soon, newer boards with different and more efficient chips like the AM5X and the TI C66x digital-signal-processor (DSP) cores in the BBAI are coming up that will need to be compatible with the Bela Software and Hardware.
Programming languages and tools to be used:
C, C++, PRU, dtb, GNU Make, ARM Assembly
The hardware was partially working on the BBAI using only ALSA(Advanced Linux Sound Architecture) and the SPI driver (refer). However, the Bela real-time code on ARM and PRU was not running on the BBAI yet.
This project involved dealing with pinmuxing (using overlays), PRU assembly, C and C++ for Linux user space applications and I also had to study the Technical Reference Manual of the Sitara family of SoCs. (AM5729 and the AM335x).
What is RProc?
The remoteproc framework allows different platforms/architectures to control (power on, load firmware, power off) those remote processors while abstracting the hardware differences, so the entire driver doesn't need to be duplicated. In addition, this framework also adds rpmsg virtio devices for remote processors that supports this kind of communication. This way, platform-specific remoteproc drivers only need to provide a few low-level handlers
Reference: kernel.org
What is a Device Tree Overlay? Sometimes it is not convenient to describe an entire system with a single FDT(Flattened Device Tree). For example, processor modules that are plugged into one or more modules (a la the BeagleBone), or systems with an FPGA peripheral that is programmed after the system is booted. For these cases it is proposed to implement an overlay feature so that the initial device tree data can be modified by userspace at runtime by loading additional device tree overlays that amend the original data.
Pinmuxing The following pin diagram
from a mathworks forum aided greatly to help visualize and compare the pins on the BeagleBone black versus the BeagleBone AI.
Also, inorder to write a DTO(Device Tree Overlay) using CCL(Cape Compatibility Layer), I referred am572x-bone-common-univ.dtsi which helps one understand the names and references to the pinmuxes.
How is an overlay compiled?
dtc
(Device Tree Compiler) - converts between the human editable device tree source dts
format and the compact device tree blob dtb
representation usable by the kernel or assembler source.
Once an overlay is compiled, it generates a .dtbo
file which we can then use in the next stage to load the overlay.
To know more on how to compile and load the overlay, just head over to overlay-instructions.
XENOMAI kernel
Xenomai is a Free Software project in which engineers from a wide background collaborate to build a robust and resource-efficient real-time core for Linux© following the dual kernel approach, for applications with stringent latency requirements.
Xenomai kernel (v4.19.94-ti-xenomai-r64
) has been built and tested on the BBAI with a few minor bugs. I have installed the xenomai kernel through the default procedure to update kernel and libraries which I have documented here. I have successfully built the entire Bela core code without needing to modify any xenomai dependant syntax.
Hardware used
Hardware required: The hardware listed below was used for testing if my code implementation works correctly.
The places within the Bela core code that required intervention are:
Addition of IS_AM572x
flag that is set automatically 1
on the BBAI and not set in BBB. Updated the workflow to build the PRU code for remoteproc. Also implemented auto-detection of which processor the code was being compiled on which was passed as a compile time flag to the BELA PRU and Core codes.
New flags and their brief description:
Flag Name | Values | Description |
---|---|---|
IS_AM572x | 1 and 0 | - Set as 1 on BBAI - Set as 0 on BBB |
ENABLE_PRU_UIO | 1 and 0 | Tells PruManager to use the UIO+libprussdrv implementation - Set 0 on BBAI. - Set 1 on BBB |
ENABLE_PRU_RPROC | 1 and 0 | Tell PruManager to use the RProc+Mmap implementation- Set 1 on BBAI. - Set 0 on BBB |
BOARD_COMMON_FLAGS | -DIS_AM572x | Only gets set on BBAI |
firmwareBelaRProcNoMcaspxxxx | build/pru/pru_rtaudio.out | useful mainly with RProc for passing .out file path of non McASP IRQ PRU code to PruManager |
firmwareBelaRProcMcaspxxxx | build/pru/pru_rtaudio_irq.out | useful mainly with RProc for passing .out file path of McASP IRQ PRU code to PruManager |
Workflow to build pasm code
Although pasm is outdated, the binary it generates is still valid for the PRU. The issue is with the fact that it does not generate a file ready to be packaged up as a valid .out
ELF which remoteproc would recognise and load as fw into the PRU.
To solve this issue, we came up with the following workflow to build the PRU Firmware to be used by RProc:
pasm -V2 -L -c -b
.resources/rproc-build/rproc-template.c
as an __asm__
directive (i.e.: add quotes and prepend a space at the beginning of each line):prudis $< | sed 's/^\(.*\)$$/" \1\\n"/' > $(RPROC_INCLUDED_ASSEMBLY)
.c
file mentioned above with the regular clpru toolchain. clpru -fe $(RPROC_TMP_FILE).o $(RPROC_TEMPLATE) -v3 --endian=little --include_path=$(RPROC_BUILD_DIR) --include_path=$(RPROC_INCLUDE) --include_path=/usr/lib/ti/pru-software-support-package/include
lnkpru -o $(RPROC_TMP_FILE).out $(RPROC_TMP_FILE).o --stack_size=0x0 --heap_size=0x0 -m $(RPROC_TMP_FILE).map $(RPROC_CMD)
QBA
/QBBx
instructions in there, so in the dd
stepdd if=$< of=$(RPROC_TMP_FILE).out bs=1 obs=1 seek=52 conv=notrunc status=none
.bin
that had been created by pasm
.PruManager enables RProc and UIO PRUSS(using the libprussdrv API) implementation all under one roof.
Transitioning from libprussdrv to rproc:
I initially believed that I needed to change the initialization code in PRU.cpp that is currently relying on libprussdrv
and move to using rproc
. I was not sure if rproc provides some functionalities to access the PRU's RAM the way prussdrv_map_prumem()
used to, that essentially gives access to a previously mmap'ed area of memory.
On the latest Bela code there's a Mmap
class which can make this somehow simpler (ref. here).
For this transition, maintaining backward compatibility was also quite essential. This is what the PruManager
class achieves. Below is a rough structure of the class PruManager
:
class PruManager
is an abstract base classvoid stop();
as the name suggests, stops the PRUint start(bool useMcaspIrq);
The first start is called by the PRU.cpp
code where required, where useMcaspIrq
is a flag that decides which pru code is to be used out of pru_rtaudio.p
and pru_rtaudio_irq.p
.int start(const std::string& path);
The second start is called within the first one after the choice of PRU code is made. This function then does the job of loading the firmware file and starting the PRU.void* getOwnMemory();
Each PRU has is own 8-KiB data RAM per PRU CPU (signified RAM0 for PRU0 and RAM1 for PRU1) and 32-KiB general purpose memory RAM (signified RAM2) shared between PRU0 and PRU1. (ref. prucookbook)void* getSharedMemory();
refer the DATA RAM2 (shared) block in the diagram below from the AM572x Ref. ManualThe classes below are children of the above PruManager
virtual base class.
class PruManagerRprocMmap
is responsible for RProc implementation. The RProc Mmap class is named so, because we are also using the Mmap.h
header mentioned above to access /dev/mem
on the BeagleBones to read or write to desired global memory locations.prussAddresses
which is a vector of type <uint32_t>
and is responsible for keeping the base addresses of the 2 PRU Sub Systems.getOwnMemory()
then accesses the DATA RAM using an object of class Mmap
called ownMemory
:dataram0
and 1
(shown in the PRU-ICSS block diagram) is 8-KiB
.getSharedMemory
function then accesses the shared RAM using an object of class Mmap
called sharedMemory
:32-KiB
.class PruManagerUio
is basically a ditto implementation of the libprussdrv
approach that was being used earlier. This class is mainly useful to maintain backward compatibility with v4.14 on the BBB+Bela. It does not support the AM572x processor.ENABLE_PRU_RPROC
which tells the core/codes to use the RProc implementation or else ENABLE_PRU_UIO
tells the core/codes to keep using the old libprussdrv
implementation.In pru/pru_rtaudio.p
the hard-coded McASP, SPI and GPIO constants were replaced with board-dependent ones using board_specific.h
.
pru/board_specific.h
uses the IS_AM572x
to set the proper BASE constants depending on which board it is compiling on. The GPIO, CLOCK_MCASP1
, MCASP1, CLOCK_SPI2
, SPI2, and a few other Base Addresses needed changing in the AM572x. For more details visit my PR.
Other places like Gpio.cpp
, bela_hw_settings.h
, and a few other codes also needed updating the base addresses for including the new AM572x constants. Those changes can also be viewed all at once here.
2 device tree overlays were also created using the CCL,
aplay
.and, a debugger for PRU called PRUDebug was ported to the BBAI.
Created a device tree overlay using Cape Compatibility layer to port BB-BONE-AUDI overlay to the BBAI.
The Overlay I wrote has been accepted by BeagleBone maintainer Robert Nelson, and you can find it to here
Created a BBAI-BELA-00A1 device tree overlay which helps in setting the right pinmux for BELA.
Installed a Xenomai patched kernel and ran the full Bela stack.
beagleboard/BeagleBoard-DeviceTrees BBAI-AUDI-02-00A0 overlay using the CCL PR#33
BBAI-AUDI-02-00A0.dts: Solved the output audio frequency issue in PR#36
Bela: PruManager Rproc + MMap/ prussdrv+UIO implementation PR#1
MarkAYoder/BeagleBoard-exercises: prudebug: Add BBAI support PR#7
Due to the introduction of a new concept called IRQ_CROSSBAR
for handling interrupts from peripherals in the AM572x chips, porting the existing codes from BELA that use interrupts proved to be a bit complicated.
After going through the AM572x manual, a workflow was suggested. However on testing this workflow there still seem to be a few steps missing.
Essentially what we were trying to achieve was McASP –> PRU interrupts like there were in the pru_rtaudio_irq.p code.
Materials referred were: AM572x sitara manual and PRU-ICSS Migration Guide
TLDR of shortcomings:
IRQ_CROSSBARS
which me and the Bela team have been working on for a few days. This may take some more time to implement. (however, this shouldn't be much of a deal-breaker for most users)This project adds support for the Bela cape + Xenomai + PRU on the BeagleBone AI, and also the code will now be easier to port to other Texas Instruments systems-on-chips.
By going through the steps needed to have the Bela environment running on BBAI, we will go through refactoring and rationalization, using mainline drivers and APIs where possible. This will make Bela easier to maintain and to port to new platforms, benefiting the project's longevity and allowing it to expand its user base.
-Giulio Moro
Just ordered a Bela cape. The platform seems really cool and exactly what I was looking for :) Just thought I would add that, I would also be very interested in having a bit more processing power under the hood and the ai would definitely be enough for my purposes.
-A User on Bela Forum