DeepSeek-V3/experimental/README.md
Triex 31ef81000f feat: Migrate experimental implementation to modern Zig, achieve clean compilation (private repo dump -> /experimental)
- Port HTTP server, and appropriate points across core etc from old API to Zig `0.15.0-dev` patterns
- Fix mutability, unused variables, and API compatibility issues
- Validate SIMD tensor operations and backend architecture
- Foundation now compiles cleanly and produces working binary
2025-06-06 15:31:21 +10:00

286 lines
9.0 KiB
Markdown

# DeepZig V3 Implementation 🚀
A high-performance implementation of DeepSeek V3 in [Zig](https://ziglang.org/) for blazingly fast inference.
> **⚠️ Status: Experimental Foundation**
>
> This project provides a **base foundation** for DeepSeek V3 in Zig with:
> - ✅ **Working HTTP server** with OpenAI-compatible API
> - ✅ **SIMD-optimized tensor operations** (AVX2, NEON)
> - ✅ **Cross-platform build system** (Zig 0.15.0-dev)
> - ✅ **Memory management** and backend architecture
>
> **Not yet implemented**: Full DeepSeek V3 model architecture, attention mechanisms, MoE routing.
> See [Development Status](#development-status) for details.
## Overview
This experimental implementation aims to leverage Zig's unique advantages for systems programming to create a high-performance LLM inference engine:
- **Zero-cost abstractions** with compile-time optimization
- **Direct hardware access** for SIMD and platform-specific optimizations
- **Manual memory management** without garbage collection pauses
- **Single binary deployment** with no runtime dependencies
- **Cross-platform compilation** for multiple architectures
## Project Structure
```
experimental/
├── build.zig # Build system configuration
├── build.zig.zon # Package dependencies
├── src/
│ ├── main.zig # HTTP server entry point
│ ├── core/ # Core ML components
│ │ ├── root.zig # Module exports
│ │ ├── tensor.zig # SIMD-optimized tensors
│ │ ├── model.zig # DeepSeek V3 model
│ │ ├── attention.zig # MLA attention mechanism
│ │ ├── moe.zig # Mixture of Experts
│ │ ├── tokenizer.zig # Text tokenization
│ │ ├── backend.zig # Backend abstraction
│ │ ├── memory.zig # Memory management
│ │ └── math/ # Math utilities
│ │ ├── root.zig # Math module exports
│ │ ├── simd.zig # SIMD operations
│ │ ├── activation.zig # Activation functions
│ │ └── rms_norm.zig # RMS normalization
│ ├── web/ # HTTP API layer
│ │ ├── root.zig # Web module exports
│ │ ├── server.zig # HTTP server (std.http)
│ │ ├── handlers.zig # Request handlers
│ │ ├── middleware.zig # CORS, auth, rate limiting
│ │ ├── websocket.zig # WebSocket support
│ │ ├── openai.zig # OpenAI API compatibility
│ │ ├── request.zig # Request wrapper
│ │ └── response.zig # Response wrapper
│ ├── backends/ # Compute backends
│ │ ├── cpu/ # CPU with SIMD
│ │ ├── metal/ # Apple Silicon
│ │ └── cuda/ # NVIDIA GPUs
│ └── wasm/
│ └── main.zig # WebAssembly entry point
├── bench/
│ └── main.zig # Performance benchmarks
└── README.md # This file
```
## Requirements
- **Zig 0.15.0-dev** or later
- Platform-specific requirements:
- **macOS**: Xcode Command Line Tools (for Metal backend)
- **Linux**: CUDA Toolkit (for CUDA backend, optional)
- **Windows**: CUDA Toolkit (for CUDA backend, optional)
## Quick Start
### Building
```bash
# Clone and navigate to experimental directory
cd experimental/
# Build the project
zig build
# Run the server
zig build run
# Run tests
zig build test
# Run benchmarks
zig build bench
# Build WebAssembly
zig build wasm
```
### Running the Server
```bash
# Start server on default port (8080)
./zig-out/bin/deepseek-v3-zig
# Custom configuration
./zig-out/bin/deepseek-v3-zig --port 3000 --backend metal --model ./path/to/model
```
### API Usage
The server exposes OpenAI-compatible endpoints:
```bash
# Chat completion
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
# Health check
curl http://localhost:8080/health
# Model info
curl http://localhost:8080/v1/models
```
## Performance Features
### SIMD Optimizations
- **x86_64**: AVX2/AVX-512 vectorization for matrix operations
- **ARM64**: NEON SIMD for Apple Silicon optimization
- **Auto-vectorization**: Compiler-optimized loops with `@Vector` types
### Backend Support
| Backend | Status | Features |
|---------|--------|----------|
| **CPU** | ✅ Implemented | Multi-threaded, SIMD, cache-optimized |
| **Metal** | 🚧 In Progress | Apple Silicon GPU, unified memory |
| **CUDA** | 🚧 Planned | NVIDIA GPU, Tensor Cores |
| **WebGPU** | 📋 Future | Browser GPU acceleration |
### Memory Management
- **Arena allocators** for request-scoped memory
- **Memory pools** for tensor allocations
- **Zero-copy operations** where possible
- **Cache-friendly** data layouts
## Development Status
### ✅ Drafted
- [x] Project structure and build system
- [x] Core tensor operations with SIMD
- [x] HTTP server with OpenAI API compatibility
- [x] CPU backend with optimizations
- [x] Memory management utilities
- [x] Benchmark suite
### 🚧 In Progress
- [ ] DeepSeek V3 model architecture
- [ ] Multi-Head Latent Attention (MLA)
- [ ] Mixture of Experts (MoE) implementation
- [ ] Metal backend for Apple Silicon
- [ ] Model loading and weight management
### 📋 Planned
- [ ] CUDA backend for NVIDIA GPUs
- [ ] WebSocket streaming
- [ ] Model quantization (INT8, FP16)
- [ ] Flash Attention optimization
- [ ] Distributed inference
- [ ] Advanced sampling strategies
## Architecture Decisions
### Why Zig?
1. **Performance**: Zero-cost abstractions without runtime overhead
2. **Memory Safety**: Compile-time memory management without GC
3. **Simplicity**: Single binary deployment, cross-compilation
4. **Control**: Direct hardware access for optimization
### Design Principles
- **Modularity**: Clean separation between core, web, and backend layers
- **Performance**: SIMD-first design with cache-friendly algorithms
- **Compatibility**: OpenAI API compatibility for easy adoption
- **Extensibility**: Plugin architecture for new backends
## Contributing
This is an experimental project! Contributions are welcome:
1. **Core ML**: Implement transformer layers, attention mechanisms
2. **Backends**: Optimize CUDA/Metal compute kernels
3. **Performance**: Profile and optimize bottlenecks
4. **Testing**: Add comprehensive test coverage
5. **Documentation**: Improve setup and usage guides
### Development Setup
```bash
# Install Zig 0.15.0-dev
# https://ziglang.org/download/
# Clone repository
git clone [repository-url]
cd experimental/
# Run tests during development
zig build test --watch
# Format code
zig fmt src/
```
## Benchmarks
Run benchmarks to measure performance:
```bash
zig build bench
```
Example output:
```
🚀 DeepZig V3 Performance Benchmarks
==========================================
Backend: CPU (SIMD optimized)
Architecture: x86_64
Thread count: 16
Operation | Iterations | Avg Time | Operations/s | Memory
-------------------------------|------------|-----------|--------------|-------
Tensor Creation (1024x1024) | 1000 iter | 0.05 ms | 20000000 ops/s | 4.0 MB
Tensor Addition (SIMD) | 100 iter | 0.12 ms | 35000000000 ops/s | 48.0 MB
Matrix Multiplication | 10 iter | 125.30 ms | 17.2 GFLOPS | 12.0 MB
```
## Known Issues
- **Model Loading**: Currently creates dummy models - real weight loading not implemented
- **Tokenizer**: Placeholder implementation - needs proper BPE tokenizer
- **WebSocket**: Basic structure only - streaming not implemented
- **Metal/CUDA**: Backend stubs only - GPU kernels not implemented
## License
This experimental implementation follows the same license as the original DeepSeek V3 project.
## Resources
- [Original DeepSeek V3 Paper](https://arxiv.org/abs/2412.19437)
- [Zig Language Documentation](https://ziglang.org/documentation/master/)
- [Zig Performance Guide](https://github.com/ziglang/zig/wiki/Performance)
- [SIMD in Zig](https://ziglang.org/documentation/master/#Vectors)
## Is This Ready for Production?
**No** - this is a research/development foundation. But it's **theoretical and compiles**:
- **What works now**: ✅ Compiles with Zig 0.15.0-dev, tensor math, SIMD operations, benchmarks, backend architecture
- **What's missing**: HTTP server API update, actual DeepSeek V3 model implementation
- **Timeline**: Foundation is **compiling**, model implementation is the next major milestone
## Comparison to Other Projects
| Project | Language | Status | Focus |
|---------|----------|--------|-------|
| **This** | Zig | Foundation + API | Web-first inference |
| llama.cpp | C++ | Production | CLI/library |
| Candle | Rust | Production | ML framework |
| ZML | Zig | Research | Low-level ML ops |
**Unique advantages**: Built-in web server, Zig's zero-cost abstractions, single binary deployment.
---
**⚡ Built with Zig for blazing fast LLM inference!**