diff --git a/README.md b/README.md
index 3137cb7..15ca9be 100644
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ Current LLM inference is dominated by Python/PyTorch, which introduces:
| Memory usage | 20-40GB | **< 16GB** | *16GB+ for basic ops* |
| Dependencies | ~2GB runtime | **Single binary** | ✅ **Single binary** |
| Deployment | Complex | **Copy & run** | ✅ **Copy & run** |
-| Matrix Mul (1024×1024) | ~1ms (optimized) | **< 1ms** | ✅ **2.1ms (1000+ GFLOPS)** |
+| Matrix Mul (1024×1024) | ~1ms (optimized) | **< 1ms** | ✅ **2.1ms (1000+ GFLOPS/M1 Macbook)** |
*See [experimental benchmarks](experimental/README.md#benchmarks) for current performance measurements.*
@@ -103,7 +103,7 @@ Current LLM inference is dominated by Python/PyTorch, which introduces:
- [x] **Updated to Zig 0.15.0-dev - compiles cleanly**
- [x] **Benchmark suite** showing current performance
- [x] **BLAS integration working** - Apple Accelerate backend functional
-- [x] **Improved matrix performance** - 1000+ GFLOPS operations
+- [x] **Improved matrix performance** - 1000+ GFLOPS operations on an M1 Macbook
*📈 Performance improvement achieved - BLAS acceleration now working*
diff --git a/experimental/README.md b/experimental/README.md
index 380a63d..9acde95 100644
--- a/experimental/README.md
+++ b/experimental/README.md
@@ -13,7 +13,7 @@ A high-performance implementation of DeepSeek V3 in [Zig](https://ziglang.org/)
> - ✅ **Functional matrix operations** (significant performance improvement)
>
> **Recent Progress**: Matrix operations now use BLAS acceleration
-> **Performance Status**: 1160+ GFLOPS with Apple Accelerate backend working (measured on Apple M1)
+> **Performance Status**: 1160+ GFLOPS with Apple Accelerate backend working (measured on Apple M1 Macbook)
>
> See [Performance Results](#performance-notes) for detailed benchmarks.
@@ -27,7 +27,7 @@ This experimental implementation aims to leverage Zig's unique advantages for sy
- **Single binary deployment** with no runtime dependencies
- **Cross-platform compilation** for multiple architectures
-**🚀 BLAS Acceleration Achieved!** We've successfully integrated Apple Accelerate backend delivering **1000+ GFLOPS** performance - a **3000x speedup** over the initial naive implementation.
+**🚀 BLAS Acceleration Achieved!** We've successfully integrated Apple Accelerate backend delivering **1000+ GFLOPS** performance - a **3000x speedup** over the initial naive implementation. Measured on an M1 Macbook.
**🔗 Related**: See the [main project README](../README.md) for architecture overview and vision.
@@ -309,7 +309,7 @@ This experimental implementation follows the same license as the original DeepSe
- **Matrix 1024×1024**: 2.1ms/iter, **1004 GFLOPS** (38.6% efficiency)
- **Matrix 2048×2048**: 21.5ms/iter, **799 GFLOPS** (30.7% efficiency)
-**Performance Improvement**: From **6418ms naive** → **2.1ms BLAS** = significant speedup for matrix operations
+**Performance Improvement**: From **6418ms naive** → **2.1ms BLAS** = significant speedup for matrix operations. Measured on an M1 Macbook.
**System Status**:
- ✅ **BLAS Backend**: Apple Accelerate integration working