2023

[Artificial Intelligence] Boosting Inference Speed: SSD and GPU Acceleration

Embarking on an exhilarating upgrade journey, I chronicle the seamless migration to the powerful Lexar NM790 SSD and unveil the secrets behind turbocharging Langchain4j's inferencing speed. With Clonezilla's reliability, my Windows 11 transition to this SSD was flawless, offering a tangible boost. The GPU acceleration saga unfolded with CUDA installation and the NVIDIA Container Toolkit magic, resulting in a high-speed universe. Launching the LocalAI image in a GPU Docker container revealed the grand finale—a remarkable surge in Langchain4j's inference speed. This transformation invites tech enthusiasts to explore elevated performance and redefine possibilities.

2023

[Artificial Intelligence] RAG over Java code with Langchain4j

In my latest post, I delve into seamlessly integrating Retrieval-Augmented Generation (RAG) with Java code using Langchain4j. Drawing inspiration from RAG over code, I explore Java Parser's potential for robust codebase analysis. The pivotal JavaParsingService and EmbeddingStoreService orchestrate this integration, enabling users to effortlessly load Java projects and glean profound insights. The enhanced controller boasts user-friendly endpoints, fostering dynamic interactions. Witness Retrieval-Augmented Generation breathe life into Java code, from codebase ingestion to insightful querying with models like gpt4all-j, WizardLM, and OpenAI. This narrative unveils the nuanced capabilities of RAG in querying Java codebases.

2023

[Artificial Intelligence] Building an AI Application with Langchain4j

I embarked on a journey to harness the capabilities of Langchain4j, crafting a powerful AI application in Java using the local language model. Utilizing Spring Boot, Postman, and various Langchain4j components, I explored setting up, implementing a chat service, integrating custom tools, embedding functionality with Chroma, translation, persistence, retrieval, and streaming services. The blog post serves as a comprehensive guide for building personalized AI applications, showcasing the versatility and potential of Langchain4j in Java development.

2023

[Artificial Intelligence] Unlocking the Power of Machine Learning with MLC LLM

I delve into the transformative realm of MLC LLM, an advanced universal deployment solution for extensive language models. My post guides you personally through the setup, emphasizing critical components like TVM and Conda. I demonstrate the process, including TVM installation via pip, Conda setup on WSL, and Vulkan SDK installation for optimal performance. Navigating the MLC Chat exploration, I detail creating a Conda environment and running MLC LLM's CLI version, offering a glimpse into its potential through a sample question. With MLC LLM and MLC Chat at your fingertips, the world of machine learning and language understanding unfolds boundless possibilities. 🚀🧠

2023

[Artificial Intelligence] Utilizing vLLM for Efficient Language Model Serving

Embarking on my journey with vLLM, I explore its potential for streamlined Large Language Model (LLM) inference and deployment. The blog details my personal experience setting up vLLM on a Windows Subsystem for Linux (WSL) instance running Ubuntu 22.04. I meticulously guide through installing WSL, NVIDIA GPU drivers, CUDA Toolkit, and Docker for efficient utilization. Delving into vLLM setup within the NVIDIA PyTorch Docker image, I navigate through the installation process and launch the API server. The blog provides insights into querying the model and creating a Docker image snapshot, offering a comprehensive guide to efficient language model serving.

2023

[Home Lab] Setting up K3s

In my latest blog post, I share my journey setting up K3S, a lightweight Kubernetes distribution, in my home lab. With a step-by-step guide, I install K3S on an Ubuntu Server 22.04.2 LTS, offering a seamless experience. The post covers creating useful aliases for simplifying interactions with K3S and verifying the installation. Additionally, I introduce Portainer to manage Docker and Kubernetes in my home lab. I walk through setting up Portainer, adding a Kubernetes environment, and connecting it to the K3S cluster. Furthermore, I establish a local Docker registry and demonstrate optional steps for pushing and deploying Docker images within the K3S cluster.

2023

[Artificial Intelligence] Unleashing the Power of LLaMA Server in Docker Container

After completing the Generative AI with Large Language Models course, I'm thrilled to share my Dockerized experience running the LLaMA model. The guide covers setting up the project structure, creating a FastAPI application, and Dockerizing it. Additionally, I showcase building an AI chatbot, integrating it with FastAPI, HuggingFace embeddings, and LLaMA. The Docker environment loads the LLM and allows seamless interactions with PDFs. I conclude by enhancing performance with OpenBLAS, significantly reducing inferencing time. Explore the power of LLaMA Server in a Docker container for transformative AI experiences! 🚀

2023

[Artificial Intelligence] Unlocking the Power of GPT4All: How to summarize YouTube Videos in Minutes (Part 2)

In this comprehensive guide, I explore AI-powered techniques to extract and summarize YouTube videos using tools like Whisper.cpp, GPT4All, LLaMA.cpp, and OpenAI models. I detail the step-by-step process, from setting up the environment to transcribing audio and leveraging AI for summarization. Despite encountering issues with GPT4All's accuracy, alternative approaches using LLaMA.cpp and OpenAI models provide versatile summarization options. The tutorial aims to empower researchers, content creators, and information enthusiasts to efficiently analyze and summarize YouTube content using cutting-edge AI technologies.

2023

[Artificial Intelligence] Unlocking the Power of GPT4All: How to summarize YouTube Videos in Minutes (Part 1)

Hey folks! Today, I'm stoked to introduce you to the game-changer that is GPT4All for summarizing YouTube videos. Join me on this journey of transformation as we set up the magic using Python. We'll load transcripts, chunk them for optimal processing, and then unleash the power of GPT4All for mind-blowing summarizations. Brace yourself for amazement as we witness the magic unfold! Additionally, we'll explore an optional OpenAI approach for comparison. Stay tuned for more exciting updates in the next blog post on video content summarization without embedded transcripts! ✨🚀