docs: Add clear device notes

2025-07-05 07:51:38 -04:00 · 2025-06-11 19:47:35 +10:00 · 2025-06-11 19:47:35 +10:00 · 973933d974
commit 973933d974
parent 618ecfb0c9
2 changed files with 5 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -53,7 +53,7 @@ Current LLM inference is dominated by Python/PyTorch, which introduces:
 | Memory usage | 20-40GB | **< 16GB** | *16GB+ for basic ops* |
 | Dependencies | ~2GB runtime | **Single binary** | ✅ **Single binary** |
 | Deployment | Complex | **Copy & run** | ✅ **Copy & run** |
-| Matrix Mul (1024×1024) | ~1ms (optimized) | **< 1ms** | ✅ **2.1ms (1000+ GFLOPS)** |
+| Matrix Mul (1024×1024) | ~1ms (optimized) | **< 1ms** | ✅ **2.1ms (1000+ GFLOPS/M1 Macbook)** |
 *See [experimental benchmarks](experimental/README.md#benchmarks) for current performance measurements.*
@ -103,7 +103,7 @@ Current LLM inference is dominated by Python/PyTorch, which introduces:
 - [x] **Updated to Zig 0.15.0-dev - compiles cleanly**
 - [x] **Benchmark suite** showing current performance
 - [x] **BLAS integration working** - Apple Accelerate backend functional
- [x] **Improved matrix performance** - 1000+ GFLOPS operations
+- [x] **Improved matrix performance** - 1000+ GFLOPS operations on an M1 Macbook
 *📈 Performance improvement achieved - BLAS acceleration now working*
--- a/experimental/README.md
+++ b/experimental/README.md
@ -13,7 +13,7 @@ A high-performance implementation of DeepSeek V3 in [Zig](https://ziglang.org/)
 > - ✅ **Functional matrix operations** (significant performance improvement)
 > 
 > **Recent Progress**: Matrix operations now use BLAS acceleration<br/>
-> **Performance Status**: 1160+ GFLOPS with Apple Accelerate backend working (measured on Apple M1)<br/>
+> **Performance Status**: 1160+ GFLOPS with Apple Accelerate backend working (measured on Apple M1 Macbook)<br/>
 > 
 > See [Performance Results](#performance-notes) for detailed benchmarks.
@ -27,7 +27,7 @@ This experimental implementation aims to leverage Zig's unique advantages for sy
 - **Single binary deployment** with no runtime dependencies
 - **Cross-platform compilation** for multiple architectures
-**🚀 BLAS Acceleration Achieved!** We've successfully integrated Apple Accelerate backend delivering **1000+ GFLOPS** performance - a **3000x speedup** over the initial naive implementation.
+**🚀 BLAS Acceleration Achieved!** We've successfully integrated Apple Accelerate backend delivering **1000+ GFLOPS** performance - a **3000x speedup** over the initial naive implementation. Measured on an M1 Macbook.
 **🔗 Related**: See the [main project README](../README.md) for architecture overview and vision.
@ -309,7 +309,7 @@ This experimental implementation follows the same license as the original DeepSe
 - **Matrix 1024×1024**: 2.1ms/iter, **1004 GFLOPS** (38.6% efficiency)
 - **Matrix 2048×2048**: 21.5ms/iter, **799 GFLOPS** (30.7% efficiency)
-**Performance Improvement**: From **6418ms naive** → **2.1ms BLAS** = significant speedup for matrix operations
+**Performance Improvement**: From **6418ms naive** → **2.1ms BLAS** = significant speedup for matrix operations. Measured on an M1 Macbook.
 **System Status**:
 - ✅ **BLAS Backend**: Apple Accelerate integration working