mirror of
https://github.com/deepseek-ai/DeepSeek-V3.git
synced 2025-06-19 16:13:48 -04:00
- Port HTTP server, and appropriate points across core etc from old API to Zig `0.15.0-dev` patterns - Fix mutability, unused variables, and API compatibility issues - Validate SIMD tensor operations and backend architecture - Foundation now compiles cleanly and produces working binary
7.8 KiB
7.8 KiB
DeepZig V3 Implementation - Setup Guide
This guide will help you set up the development environment and understand the project structure.
Prerequisites
1. Install Zig 0.15.0-dev
Download the latest development build from ziglang.org/download:
# macOS (using Homebrew)
brew install zig --HEAD
# Linux (manual installation)
wget https://ziglang.org/builds/zig-linux-x86_64-0.15.0-dev.xxx.tar.xz
tar -xf zig-linux-x86_64-0.15.0-dev.xxx.tar.xz
export PATH=$PATH:/path/to/zig
# Verify installation
zig version
# Should show: 0.15.0-dev.xxx
2. Platform-Specific Setup
macOS (for Metal backend)
# Install Xcode Command Line Tools
xcode-select --install
# Verify Metal support
system_profiler SPDisplaysDataType | grep Metal
Linux (for CUDA backend, optional)
# Install CUDA Toolkit (optional)
# Follow: https://developer.nvidia.com/cuda-downloads
# For Ubuntu/Debian:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
Project Overview
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Web Layer │ │ Core Engine │ │ Backends │
│ │ │ │ │ │
│ ├─ HTTP API │◄──►│ ├─ Transformer │◄──►│ ├─ CPU (SIMD) │
│ ├─ WebSocket │ │ ├─ Attention │ │ ├─ Metal (macOS)│
│ ├─ Rate Limit │ │ ├─ MoE Routing │ │ ├─ CUDA (Linux) │
│ └─ Auth │ │ └─ Tokenizer │ │ └─ WebGPU │
└─────────────────┘ └──────────────────┘ └─────────────────┘
Key Components
Core Module (src/core/
)
- Tensor Operations: SIMD-optimized tensor math with AVX2/NEON support
- Model Architecture: DeepSeek V3 implementation with MLA and MoE
- Memory Management: Arena allocators and memory pools
- Backend Abstraction: Unified interface for CPU/GPU computation
Web Layer (src/web/
)
- HTTP Server: Built on
std.http.Server
(Zig 0.15.0 compatible) - OpenAI API: Compatible
/v1/chat/completions
endpoint - Middleware: CORS, authentication, rate limiting
- WebSocket: Streaming inference support (planned)
Backends (src/backends/
)
- CPU: Multi-threaded with SIMD optimizations
- Metal: Apple Silicon GPU acceleration (macOS)
- CUDA: NVIDIA GPU support with Tensor Cores (Linux/Windows)
Development Workflow
1. Initial Setup
# Clone the repository
cd experimental/
# Build the project
zig build
# Run tests to verify setup
zig build test
# Run benchmarks
zig build bench
2. Development Commands
# Format code
zig fmt src/
# Run tests with watch mode (in development)
zig build test
# Build optimized release
zig build -Doptimize=ReleaseFast
# Cross-compile for different targets
zig build -Dtarget=aarch64-macos # Apple Silicon
zig build -Dtarget=x86_64-linux # Linux x64
zig build -Dtarget=wasm32-freestanding # WebAssembly
3. Running the Server
# Default configuration (CPU backend, port 8080)
zig build run
# Custom configuration
zig build run -- --port 3000 --backend metal
# With model path (when implemented)
zig build run -- --model ./models/deepseek-v3.bin --backend cuda
4. Testing the API
# Health check
curl http://localhost:8080/health
# Model information
curl http://localhost:8080/v1/models
# Chat completion (placeholder response)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
Implementation Status
✅ Ready for Development
- Build system and project structure
- Core tensor operations with SIMD
- HTTP server with basic routing
- OpenAI API compatibility layer
- Memory management utilities
- Benchmark framework
🚧 Needs Implementation
- DeepSeek V3 Model: Transformer architecture
- Attention Mechanism: Multi-Head Latent Attention (MLA)
- MoE Implementation: Expert routing and selection
- Tokenizer: BPE tokenization (currently placeholder)
- Model Loading: Weight file parsing and loading
- GPU Backends: Metal and CUDA kernel implementations
📋 Future Enhancements
- Model quantization (INT8, FP16)
- Flash Attention optimization
- WebSocket streaming
- Distributed inference
- Model sharding
Code Style and Conventions
Zig Best Practices
- Use
snake_case
for functions and variables - Use
PascalCase
for types and structs - Prefer explicit error handling with
!
andcatch
- Use arena allocators for request-scoped memory
- Leverage comptime for zero-cost abstractions
Error Handling
// Preferred: explicit error handling
const result = someFunction() catch |err| switch (err) {
error.OutOfMemory => return err,
error.InvalidInput => {
std.log.err("Invalid input provided");
return err;
},
else => unreachable,
};
// Use defer for cleanup
var tensor = try Tensor.init(allocator, shape, .f32);
defer tensor.deinit();
Memory Management
// Use arena allocators for request scope
var arena = std.heap.ArenaAllocator.init(allocator);
defer arena.deinit();
const request_allocator = arena.allocator();
// Use memory pools for tensors
var tensor_pool = TensorPool.init(allocator);
defer tensor_pool.deinit();
Performance Considerations
SIMD Optimization
- Use
@Vector
types for SIMD operations - Align data to cache line boundaries (64 bytes)
- Prefer blocked algorithms for better cache locality
Backend Selection
- CPU: Best for smaller models, development
- Metal: Optimal for Apple Silicon (M1/M2/M3)
- CUDA: Best for NVIDIA GPUs with Tensor Cores
Memory Layout
- Use structure-of-arrays (SoA) for better vectorization
- Minimize memory allocations in hot paths
- Leverage unified memory on Apple Silicon
Debugging and Profiling
Debug Build
# Build with debug symbols
zig build -Doptimize=Debug
# Run with verbose logging
RUST_LOG=debug zig build run
Performance Profiling
# Run benchmarks
zig build bench
# Profile with system tools
# macOS: Instruments.app
# Linux: perf, valgrind
# Windows: Visual Studio Diagnostics
Next Steps
-
Choose an area to implement:
- Core ML components (transformer, attention, MoE)
- Backend optimizations (Metal shaders, CUDA kernels)
- Web features (streaming, authentication)
-
Read the code:
- Start with
src/core/root.zig
for module structure - Check
src/main.zig
for the server entry point - Look at
bench/main.zig
for performance testing
- Start with
-
Run and experiment:
- Build and run the server
- Try the API endpoints
- Run benchmarks to understand performance
- Read the TODOs in the code for implementation ideas
-
Contribute:
- Pick a TODO item
- Implement and test
- Submit improvements
Resources
- Zig Language Reference
- DeepSeek V3 Paper
- Zig SIMD Guide
- Metal Programming Guide
- CUDA Programming Guide
Ready to build the future of high-performance LLM inference! 🚀