Porting Llama3.java to Micronaut

- 3 mins read
This project ports the original single-file llama3.java by Alfonso² Peterssen into a modular Micronaut application, transforming it from a console app to a stream-based, production-ready API. The internals and performance of the original GGUF-format LLM remain unchanged. Updates focus on restructuring the codebase into logical packages and adapting it to Micronaut’s ecosystem, enabling easier integration with Java microservices. This streamlined design retains the original’s simplicity while enhancing scalability and usability for modern AI applications.
This post explores porting the single-file llama2.java into a robust Micronaut application, demonstrating both JDK and GraalVM native mode performance. While GraalVM offers faster startup (56ms), its serving throughput is slower (210 tokens/sec). For high-performance inference, JDK mode is preferred, but GraalVM shines for lightweight, startup-critical scenarios. The journey highlights Micronaut’s ease of integration for AI applications.
Learn how to build an AI-powered knowledge assistant using Python. This guide covers data ingestion from sources like PDFs, deploying a FastAPI-based API, and querying the assistant for detailed answers. Using advanced retrieval mechanisms, the assistant provides contextually relevant responses. You’ll also explore a working example with the CQRS design pattern. The complete project code is available on GitHub.
In this post, I dive into the powerful capabilities of the Raspberry Pi 5 paired with the Hailo-8L AI Kit, a neural network accelerator offering 13 TOPS. After setting up the system and camera, I explore object detection, pose estimation, and instance segmentation, showcasing the Pi’s impressive AI potential. Whether using pre-trained models or custom configurations, this compact setup proves to be a versatile tool for AI enthusiasts and developers alike. It’s a fun, hands-on introduction to the world of edge AI on a budget-friendly platform.

GPT-2 Training Guide

- 18 mins read
This post documents my journey training GPT-2 on the Tiny Shakespeare dataset, inspired by Andrej Karpathy’s instructional video and nanoGPT repository. Following each step, I detail the setup process, from encoding data and optimizing the model with AdamW, to improving training stability with mixed precision and flash attention. The post includes practical insights on using pretrained weights, weight sharing, and efficient data handling, concluding with sample outputs from training and sampling a “baby” GPT model.

GPT-2 Setup and Pretraining Guide

- 12 mins read
This guide explores reproducing GPT-2 (124M) using Andrej Karpathy’s video walkthrough. It begins with an overview of the GPT-2 architecture, a decoder-only transformer model inspired by “Attention Is All You Need.” Using pretrained GPT-2 weights, we analyze and initialize a custom GPT class, with detailed steps to handle token embeddings, causal attention, and layer normalization. The guide includes code for generating text from pretrained weights. In the next segment, we’ll continue with a deeper dive into dataset preparation and training from scratch, moving from small samples to large-scale training.
This post details the installation of MLflow and Kubeflow on a Talos HomeLab cluster. It covers the setup process, including Talos configuration, local-path and NFS provisioning, and Metallb installation. Step-by-step instructions are provided for deploying Kubeflow, followed by the installation of MLflow for managing the machine learning lifecycle. Finally, the post illustrates how to log experiments and models in MLflow and perform inference, demonstrating a seamless integration of these tools for enhanced machine learning operations.
In this guide, we walk through deploying the Google Cloud Microservices Demo locally using Talos Linux in a VirtualBox VM. This step-by-step tutorial replicates a GKE environment, enabling seamless integration of microservices in a local Kubernetes cluster. We cover Talos installation, setting up the microservices demo, integrating MetalLB for load balancing, and optional Istio service mesh and Kiali observability for advanced monitoring. This approach is perfect for developers looking to simulate cloud environments locally and ensure their microservices run smoothly before deploying to production.
Learn how to set up and control the RoArm-M2-S robotic arm using ROS2 in a virtual environment. This guide walks you through installing VirtualBox, creating a custom Ubuntu VM, setting up ROS2 with MoveIt2, and running driver nodes to control the arm through RViz, MoveIt, keyboard, and a web interface.
This post documents using a Raspberry Pi 4 as the main controller for the Wave Rover project. It covers installing Raspberry Pi OS, setting up the UGV_RPI package, enabling VNC, and configuring the Pi camera. Additionally, it details how to control the rover’s movement with Python and stream camera feeds via Flask. A sample HTML interface is provided for controlling the rover with arrow keys or buttons. Troubleshooting steps are included for resolving issues related to Picamera2 and V4L2 errors during the setup.