Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and Local LLM Deployment
- Risks associated with cloud LLMs: data retention, input training, and foreign jurisdiction implications.
- Overview of Ollama architecture: model server, registry, and OpenAI-compatible API interface.
- Comparative analysis with vLLM, llama.cpp, and Text Generation Inference.
- Model licensing terms for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Configuration
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback options and AVX/AVX2 optimization techniques.
- Deploying via Docker with persistent volume mapping.
- Setting up multi-GPU environments and managing VRAM allocation.
Model Management
- Downloading models from the Ollama registry (e.g., running 'ollama pull llama3').
- Importing GGUF models from HuggingFace and TheBloke repositories.
- Understanding quantization levels: trade-offs between Q4_K_M, Q5_K_M, and Q8_0.
- Managing model switching and understanding limits on concurrent model loading.
Custom Modelfiles
- Writing Modelfile syntax including FROM, PARAMETER, SYSTEM, and TEMPLATE directives.
- Tuning parameters such as temperature, top_p, and repeat_penalty.
- Engineering system prompts to define role-specific behaviors.
- Creating and publishing custom models to the local registry.
API Integration
- Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
- Implementing streaming responses and JSON mode.
- Integrating with LangChain, LlamaIndex, and custom applications.
- Setting up authentication and rate limiting using a reverse proxy.
Performance Optimization
- Configuring context window sizes and managing KV cache.
- Handling batch inference and parallel requests.
- Allocating CPU threads and ensuring NUMA awareness.
- Monitoring GPU utilization and memory pressure.
Security and Compliance
- Establishing network isolation for model serving endpoints.
- Implementing input filtering and output moderation pipelines.
- Enabling audit logging for prompts and completions.
- Verifying model provenance and hash integrity.
Requirements
- Intermediate knowledge of Linux administration and container management.
- A high-level understanding of machine learning concepts and transformer architectures.
- Familiarity with REST APIs and JSON data formats.
Audience
- AI engineers and developers looking to replace cloud LLM APIs with local solutions.
- Organizations handling sensitive data that prohibits the use of cloud-based models.
- Government and defense teams requiring fully air-gapped language model infrastructure.
14 Hours