Below you will find pages that utilize the taxonomy term “Uvicorn”
2023
[Home Lab] Deploying OpenAI-Compatible LLAMA CPP Server with K3S
In this post, I expand my Home Lab by adding a perpetual LLAMA model for on-demand inferencing. The steps involve crafting a Dockerfile, packaging Microsoft's Phi2 model, and deploying it with K3S. This ensures a continuously accessible LLAMA server for seamless integration into various inferencing projects.
2023
[Artificial Intelligence] Utilizing vLLM for Efficient Language Model Serving
Embarking on my journey with vLLM, I explore its potential for streamlined Large Language Model (LLM) inference and deployment. The blog details my personal experience setting up vLLM on a Windows Subsystem for Linux (WSL) instance running Ubuntu 22.04. I meticulously guide through installing WSL, NVIDIA GPU drivers, CUDA Toolkit, and Docker for efficient utilization. Delving into vLLM setup within the NVIDIA PyTorch Docker image, I navigate through the installation process and launch the API server. The blog provides insights into querying the model and creating a Docker image snapshot, offering a comprehensive guide to efficient language model serving.
2023
[Artificial Intelligence] Unleashing the Power of LLaMA Server in Docker Container
After completing the Generative AI with Large Language Models course, I'm thrilled to share my Dockerized experience running the LLaMA model. The guide covers setting up the project structure, creating a FastAPI application, and Dockerizing it. Additionally, I showcase building an AI chatbot, integrating it with FastAPI, HuggingFace embeddings, and LLaMA. The Docker environment loads the LLM and allows seamless interactions with PDFs. I conclude by enhancing performance with OpenBLAS, significantly reducing inferencing time. Explore the power of LLaMA Server in a Docker container for transformative AI experiences! 🚀