Commit Graph

8 Commits

Author SHA1 Message Date
Triex
c8eefc8865 feat: BLAS integration working - significant matrix operation improvements
Matrix Performance Improvements:
-  Apple Accelerate backend integrated and functional
-  Matrix ops: 1004 GFLOPS (38.6% efficiency) on 1024×1024
-  Significant speedup: 6418ms naive → 2.1ms BLAS
-  Draft implementation with working acceleration

Performance Results (Apple M1, debug build):
- Matrix 256×256: 0.1ms, 561 GFLOPS (21.6% efficiency)
- Matrix 512×512: 0.2ms, 1129 GFLOPS (43.4% efficiency)
- Matrix 1024×1024: 2.1ms, 1004 GFLOPS (38.6% efficiency)
- Matrix 2048×2048: 21.5ms, 799 GFLOPS (30.7% efficiency)

System Integration:
-  Memory bandwidth: 23.5 GB/s
-  Access latency: 1.8ns
-  Apple Silicon detection working
-  BLAS backend selection functional

Web Layer Updates:
- Enhanced /health endpoint with BLAS status
- New /performance endpoint with benchmark data
- Module dependency conflicts resolved
- Hardware acceleration reporting

Implementation Status:
- Matrix operations now use BLAS acceleration
- Foundation ready for transformer development
- DeepSeek V3 model implementation next priority
- Experimental/draft status maintained

This represents significant progress in the experimental foundation - matrix operations now deliver good performance while maintaining the zero-deployment-complexity advantage of Zig.
2025-06-11 19:30:33 +10:00
Triex
7b81ea27d7 docs: Tidy root README, add hardware notes to experimental/README.md 2025-06-11 17:48:38 +10:00
Triex
0f980354f8 feat: Enhanced device detection handling, added metal initial draft, theoretically-reliable metal mac detection -> experimental implementation
 Implemented initial Apple Silicon detection using sysctl system calls
 Added proper M1/M2/M3/M4 generation detection via CPU brand string
 Fixed memory leaks that occured during dev with proper allocator cleanup
 Enhanced Metal backend foundation with device capabilities
 Added `test_m_series.zig` for hardware verification

🔧 Key Technical Improvements:
- Real hardware detection via `hw.model` (eg; `MacBookPro17,1`)
- CPU brand string parsing for accurate M-series identification
- Unified memory strategy detection (even under Rosetta)
- Apple Neural Engine capability detection
- Memory-safe device info structures

🧪 Verified on Apple Silicon:
- M1 correctly detected (generation 1, no variant)
- 16GB unified memory properly identified
- Builds cleanly with Zig `0.15.0-dev.703+597dd328e`
- No false positives for M1 Pro/Max/Ultra variants

📋 Updated README status to reflect experimental draft implementation
⚠️  Clearly marked as research/development foundation, not production ready
2025-06-11 17:43:04 +10:00
Triex
bcee49badf docs: Tidy experimental README 2025-06-06 16:03:51 +10:00
Triex
8aa2785fad docs: Tidy experimental README status section 2025-06-06 16:00:24 +10:00
Triex
16fec1d4e9 docs: Update experimental README to reflect current state / performance 2025-06-06 15:58:39 +10:00
Triex
24b5fcfd02 docs: Tidy experimental/README.md 2025-06-06 15:48:21 +10:00
Triex
31ef81000f feat: Migrate experimental implementation to modern Zig, achieve clean compilation (private repo dump -> /experimental)
- Port HTTP server, and appropriate points across core etc from old API to Zig `0.15.0-dev` patterns
- Fix mutability, unused variables, and API compatibility issues
- Validate SIMD tensor operations and backend architecture
- Foundation now compiles cleanly and produces working binary
2025-06-06 15:31:21 +10:00