Below you will find pages that utilize the taxonomy term “LLaMA”
2024
[Artificial Intelligence] Deploying LLMs with WasmEdge in HomeLab
In this post, we explored deploying Lightweight Language Models (LLMs) using WasmEdge, a high-performance WebAssembly runtime, within a HomeLab environment. The process involved preparing an OpenAI-compatible API server, configuring the Wasi-NN plugin, and deploying the setup to HomeLab using Kubernetes (K3s). The post also detailed the steps for testing the API server and integrating it into a Java application. Overall, the guide provides a comprehensive walkthrough of hosting and utilizing LLMs with WasmEdge in a local environment.
2023
[Home Lab] Deploying OpenAI-Compatible LLAMA CPP Server with K3S
In this post, I expand my Home Lab by adding a perpetual LLAMA model for on-demand inferencing. The steps involve crafting a Dockerfile, packaging Microsoft's Phi2 model, and deploying it with K3S. This ensures a continuously accessible LLAMA server for seamless integration into various inferencing projects.
2023
[Artificial Intelligence] Unleashing the Power of LLaMA Server in Docker Container
After completing the Generative AI with Large Language Models course, I'm thrilled to share my Dockerized experience running the LLaMA model. The guide covers setting up the project structure, creating a FastAPI application, and Dockerizing it. Additionally, I showcase building an AI chatbot, integrating it with FastAPI, HuggingFace embeddings, and LLaMA. The Docker environment loads the LLM and allows seamless interactions with PDFs. I conclude by enhancing performance with OpenBLAS, significantly reducing inferencing time. Explore the power of LLaMA Server in a Docker container for transformative AI experiences! 🚀
2023
[Artificial Intelligence] Unlocking the Power of GPT4All: How to summarize YouTube Videos in Minutes (Part 2)
In this comprehensive guide, I explore AI-powered techniques to extract and summarize YouTube videos using tools like Whisper.cpp, GPT4All, LLaMA.cpp, and OpenAI models. I detail the step-by-step process, from setting up the environment to transcribing audio and leveraging AI for summarization. Despite encountering issues with GPT4All's accuracy, alternative approaches using LLaMA.cpp and OpenAI models provide versatile summarization options. The tutorial aims to empower researchers, content creators, and information enthusiasts to efficiently analyze and summarize YouTube content using cutting-edge AI technologies.
2023
[Artificial Intelligence] Autofill PDF with LangChain and LangFlow
In this journey, I explore automating PDF autofill using LangChain and LangFlow. Leveraging LangFlow and OpenAI, I streamline the employment form completion process, demonstrating steps to install LangFlow and set up a PostgreSQL table. Despite encountering challenges in prototyping with LangFlow, the exploration progresses to auto-fill PDFs. After extracting form fields and LLaMA model setup, I employ LangChain to fetch PostgreSQL data. Concluding with Python manipulation to interpolate and update the PDF, the process achieves seamless auto-fill. Dive into the details, overcome challenges, and witness the power of LangChain and LangFlow in revolutionizing PDF automation.
2023
[Artificial Intelligence] Running LLaMA server in local machine
In continuation from my previous post, I prepared the environment using Pipenv and installed the OpenAI-like web server with specific CMAKE arguments. Running the server with a provided model was straightforward. To create an SSH tunnel to the remote Ubuntu machine from my Windows PC, I used PuTTY, configuring it to forward port 8888. Connecting from BYO-GPT involved adjusting the server endpoint in the Dart file. This seamless integration allowed me to access the Open API for the LLAMA CPP server and successfully connect BYO-GPT to the specified server.
2023
[Artificial Intelligence] Building ChatBot for your PDF files with LangChain
In this post, I extend the use case from my previous post to demonstrate building a ChatBot for PDF files using LangChain. In the preparation phase, I install Chroma, an open-source embedding database, and ingest a PDF file using PyPDFLoader. I then split the document into chunks and use Chroma's default embeddings. Due to a potential issue, I provide an alternative embedding approach. Next, I load a local LLaMA model, prepare for question-answering, and run queries using RetrievalQAWithSourcesChain. I also touch on running with OpenBLAS for optimization. The guide empowers users to explore personalized question-answering over their PDF documents.
2023
[Artificial Intelligence] Building a basic Chain with LangChain
With the LangChain framework and a setup from a previous post, I delve into building a basic chain using Llama.cpp within LangChain. Following preparations, I install required packages and run interactive Python code to set up the LLM model. The process involves formatting a prompt template and creating a chain. I explore memory integration, adding a conversation buffer for context. The conversation with AI is initiated and continued through user inputs. Stay tuned for more explorations in upcoming posts!
2023
[Artificial Intelligence] Running LLaMA model locally
In this thorough guide, I prepared my Ubuntu machine (32GB) for the LLaMA (Language Model) build. Following Georgi Gergano's llama.cpp, I executed CMake commands, ensuring the correct tag and building the model successfully. I downloaded Microsoft's Phi2 model in GGUF format, enabling local execution without exposing prompts or data. Running the Phi2 model showcased its capabilities in a few-shot interaction, providing accurate responses. Additionally, I explored optional OpenBLAS integration for improved speed, offering insights into the installation and rebuild process.