docs: Add clear device notes

This commit is contained in:
Triex 2025-06-11 19:47:35 +10:00
parent 618ecfb0c9
commit 973933d974
2 changed files with 5 additions and 5 deletions

View File

@ -53,7 +53,7 @@ Current LLM inference is dominated by Python/PyTorch, which introduces:
| Memory usage | 20-40GB | **< 16GB** | *16GB+ for basic ops* | | Memory usage | 20-40GB | **< 16GB** | *16GB+ for basic ops* |
| Dependencies | ~2GB runtime | **Single binary** | ✅ **Single binary** | | Dependencies | ~2GB runtime | **Single binary** | ✅ **Single binary** |
| Deployment | Complex | **Copy & run** | ✅ **Copy & run** | | Deployment | Complex | **Copy & run** | ✅ **Copy & run** |
| Matrix Mul (1024×1024) | ~1ms (optimized) | **< 1ms** | **2.1ms (1000+ GFLOPS)** | | Matrix Mul (1024×1024) | ~1ms (optimized) | **< 1ms** | **2.1ms (1000+ GFLOPS/M1 Macbook)** |
*See [experimental benchmarks](experimental/README.md#benchmarks) for current performance measurements.* *See [experimental benchmarks](experimental/README.md#benchmarks) for current performance measurements.*
@ -103,7 +103,7 @@ Current LLM inference is dominated by Python/PyTorch, which introduces:
- [x] **Updated to Zig 0.15.0-dev - compiles cleanly** - [x] **Updated to Zig 0.15.0-dev - compiles cleanly**
- [x] **Benchmark suite** showing current performance - [x] **Benchmark suite** showing current performance
- [x] **BLAS integration working** - Apple Accelerate backend functional - [x] **BLAS integration working** - Apple Accelerate backend functional
- [x] **Improved matrix performance** - 1000+ GFLOPS operations - [x] **Improved matrix performance** - 1000+ GFLOPS operations on an M1 Macbook
*📈 Performance improvement achieved - BLAS acceleration now working* *📈 Performance improvement achieved - BLAS acceleration now working*

View File

@ -13,7 +13,7 @@ A high-performance implementation of DeepSeek V3 in [Zig](https://ziglang.org/)
> - ✅ **Functional matrix operations** (significant performance improvement) > - ✅ **Functional matrix operations** (significant performance improvement)
> >
> **Recent Progress**: Matrix operations now use BLAS acceleration<br/> > **Recent Progress**: Matrix operations now use BLAS acceleration<br/>
> **Performance Status**: 1160+ GFLOPS with Apple Accelerate backend working (measured on Apple M1)<br/> > **Performance Status**: 1160+ GFLOPS with Apple Accelerate backend working (measured on Apple M1 Macbook)<br/>
> >
> See [Performance Results](#performance-notes) for detailed benchmarks. > See [Performance Results](#performance-notes) for detailed benchmarks.
@ -27,7 +27,7 @@ This experimental implementation aims to leverage Zig's unique advantages for sy
- **Single binary deployment** with no runtime dependencies - **Single binary deployment** with no runtime dependencies
- **Cross-platform compilation** for multiple architectures - **Cross-platform compilation** for multiple architectures
**🚀 BLAS Acceleration Achieved!** We've successfully integrated Apple Accelerate backend delivering **1000+ GFLOPS** performance - a **3000x speedup** over the initial naive implementation. **🚀 BLAS Acceleration Achieved!** We've successfully integrated Apple Accelerate backend delivering **1000+ GFLOPS** performance - a **3000x speedup** over the initial naive implementation. Measured on an M1 Macbook.
**🔗 Related**: See the [main project README](../README.md) for architecture overview and vision. **🔗 Related**: See the [main project README](../README.md) for architecture overview and vision.
@ -309,7 +309,7 @@ This experimental implementation follows the same license as the original DeepSe
- **Matrix 1024×1024**: 2.1ms/iter, **1004 GFLOPS** (38.6% efficiency) - **Matrix 1024×1024**: 2.1ms/iter, **1004 GFLOPS** (38.6% efficiency)
- **Matrix 2048×2048**: 21.5ms/iter, **799 GFLOPS** (30.7% efficiency) - **Matrix 2048×2048**: 21.5ms/iter, **799 GFLOPS** (30.7% efficiency)
**Performance Improvement**: From **6418ms naive****2.1ms BLAS** = significant speedup for matrix operations **Performance Improvement**: From **6418ms naive****2.1ms BLAS** = significant speedup for matrix operations. Measured on an M1 Macbook.
**System Status**: **System Status**:
- ✅ **BLAS Backend**: Apple Accelerate integration working - ✅ **BLAS Backend**: Apple Accelerate integration working