Building on my exploration of text generation with NVIDIA Jetson Orin NX, this post delves into the audio generation capabilities of the Jetson platform.
Transcribing Audio with Whisper
Following the Tutorial Whisper, after starting the container with the command below, you can access Jupyter Lab at https://192.168.68.100:8888 (password: nvidia):
jetson-containers run $(autotag whisper)
Instead of recording my own audio, I used the Free Transcription Example Files.
After downloading the necessary models, here is the transcription output for the first 30 seconds:
Text LLM and ASR/TTS with Llamaspeak
To start Llamaspeak with text LLM and ASR/TTS enabled, use the following command. Make sure your Hugging Face token is correctly set; I do this by adding HF_TOKEN=hf_xyz123abc456 to my .bashrc file.
Start the nano_llm with:
jetson-containers run --env HUGGINGFACE_TOKEN=$HF_TOKEN \
$(autotag nano_llm) \
python3 -m nano_llm.agents.web_chat --api=mlc \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--asr=riva --tts=piper
That’s all for this post. I’ll continue my journey with the Two Days to a demo series, an introductory set of deep learning tutorials for deploying AI and computer vision in the field using NVIDIA Jetson Orin!
Optional - Preparing RIVA Server
Following the Speech AI Tutorial, you will need to sign up for NGC, generate an API key, and configure the NGC setup. Here are the commands to get started:
sudo gedit /etc/docker/daemon.json
# Add the line:
“default-runtime”: “nvidia”
sudo systemctl restart docker
# Add your user to the Docker group
sudo usermod -aG docker $USER
newgrp docker
# Download the NGC CLI
wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc
find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5
# Add the NGC CLI to your PATH in .bashrc and source it
export PATH=$PATH:/home/pi/ngc-cli
# Configure NGC
ngc config set
# Download RIVA Quickstart
ngc registry resource download-version nvidia/riva/riva_quickstart_arm64:2.12.0
cd riva_quickstart_arm64_v2.12.0
sudo bash riva_init.sh
# Start the RIVA server in a Docker container
bash riva_start.sh
This section was intended for Llamaspeak, but due to a CUDA driver version error, I was unable to proceed with the speech testing. The error occurred when running the command docker logs riva-speech: