Audio Generation with NVIDIA Jetson Orin NX

Building on my exploration of text generation with NVIDIA Jetson Orin NX , this post delves into the audio generation capabilities of the Jetson platform.

Transcribing Audio with Whisper

Following the Tutorial Whisper , after starting the container with the command below, you can access Jupyter Lab at https://192.168.68.100:8888 (password: nvidia):

jetson-containers run $(autotag whisper)

Instead of recording my own audio, I used the Free Transcription Example Files .

After downloading the necessary models, here is the transcription output for the first 30 seconds:

Text LLM and ASR/TTS with Llamaspeak

To start Llamaspeak with text LLM and ASR/TTS enabled, use the following command. Make sure your Hugging Face token is correctly set; I do this by adding HF_TOKEN=hf_xyz123abc456 to my .bashrc file.

Start the nano_llm with:

jetson-containers run --env HUGGINGFACE_TOKEN=$HF_TOKEN \
  $(autotag nano_llm) \
  python3 -m nano_llm.agents.web_chat --api=mlc \
    --model meta-llama/Meta-Llama-3-8B-Instruct \
    --asr=riva --tts=piper

That’s all for this post. I’ll continue my journey with the Two Days to a demo series, an introductory set of deep learning tutorials for deploying AI and computer vision in the field using NVIDIA Jetson Orin!

Optional - Preparing RIVA Server

Following the Speech AI Tutorial , you will need to sign up for NGC, generate an API key, and configure the NGC setup. Here are the commands to get started:

sudo gedit /etc/docker/daemon.json

# Add the line:
“default-runtime”: “nvidia”

sudo systemctl restart docker

# Add your user to the Docker group
sudo usermod -aG docker $USER
newgrp docker

# Download the NGC CLI
wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc
find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5

# Add the NGC CLI to your PATH in .bashrc and source it
export PATH=$PATH:/home/pi/ngc-cli

# Configure NGC
ngc config set

# Download RIVA Quickstart
ngc registry resource download-version nvidia/riva/riva_quickstart_arm64:2.12.0

cd riva_quickstart_arm64_v2.12.0
sudo bash riva_init.sh

# Start the RIVA server in a Docker container
bash riva_start.sh

This section was intended for Llamaspeak, but due to a CUDA driver version error, I was unable to proceed with the speech testing. The error occurred when running the command docker logs riva-speech: