Commit Graph

58 Commits

Author SHA1 Message Date
Triex
12b517bfb7 feat: Implement Multi-Head Latent Attention (MLA) - Core DeepSeek V3 Innovation, update -> dual license
🧠 MAJOR MILESTONE: Complete architectural implementation of Multi-Head Latent Attention,
the key innovation that makes DeepSeek V3 more efficient than standard transformers.

 What's New:
• Multi-Head Latent Attention (MLA) with latent space projections
• Complete transformer architecture (RMS norm, SwiGLU, residual connections)
• RoPE (Rotary Position Encoding) with pre-computed embeddings
• KV Cache for efficient autoregressive inference
• Full BLAS acceleration delivering 1000+ GFLOPS on Apple Silicon (Apple M1 Macbook Pro under heavy load - 250+ chrome tabs, 30+ vscode instances)

🏗️ Architecture Highlights:
• Latent projections (kv_a_proj_with_mqa, kv_b_proj) for efficient KV computation
• Separate handling of positional vs non-positional components
• LayerNorm in latent space for training stability
• BLAS-accelerated scaled dot-product attention
• MoE integration architecture ready for expert routing

 Performance:
• 1164 GFLOPS peak performance (Apple M1 MacBook Pro)
• ~3000x speedup over naive implementations via BLAS integration
• First architectural implementation of MLA attention mechanism

🧪 Status:
• Theoretical implementation following DeepSeek V3 paper specifications
• Compiles cleanly with Zig 0.15.0-dev, passes all tests
• Architecturally complete but requires validation with real model weights

🎯 Next Steps:
• Load real DeepSeek V3 weights (safetensors/HuggingFace format)
• Validate outputs against reference PyTorch implementation
• Complete MoE expert routing and tokenization
• End-to-end inference pipeline

Updated -> dual LICENSE, added to headers for relevant files.

This makes us the first project to architecturally implement DeepSeek V3's Multi-Head Latent Attention innovation in a systems programming language.
2025-06-11 22:15:00 +10:00
Triex
c24c4dc1eb docs: Update benchmarks 2025-06-11 21:24:34 +10:00
Triex
973933d974 docs: Add clear device notes 2025-06-11 19:47:35 +10:00
Alex Zarov
618ecfb0c9
docs: Update README.md 2025-06-11 19:45:50 +10:00
Triex
c8eefc8865 feat: BLAS integration working - significant matrix operation improvements
Matrix Performance Improvements:
-  Apple Accelerate backend integrated and functional
-  Matrix ops: 1004 GFLOPS (38.6% efficiency) on 1024×1024
-  Significant speedup: 6418ms naive → 2.1ms BLAS
-  Draft implementation with working acceleration

Performance Results (Apple M1, debug build):
- Matrix 256×256: 0.1ms, 561 GFLOPS (21.6% efficiency)
- Matrix 512×512: 0.2ms, 1129 GFLOPS (43.4% efficiency)
- Matrix 1024×1024: 2.1ms, 1004 GFLOPS (38.6% efficiency)
- Matrix 2048×2048: 21.5ms, 799 GFLOPS (30.7% efficiency)

System Integration:
-  Memory bandwidth: 23.5 GB/s
-  Access latency: 1.8ns
-  Apple Silicon detection working
-  BLAS backend selection functional

Web Layer Updates:
- Enhanced /health endpoint with BLAS status
- New /performance endpoint with benchmark data
- Module dependency conflicts resolved
- Hardware acceleration reporting

Implementation Status:
- Matrix operations now use BLAS acceleration
- Foundation ready for transformer development
- DeepSeek V3 model implementation next priority
- Experimental/draft status maintained

This represents significant progress in the experimental foundation - matrix operations now deliver good performance while maintaining the zero-deployment-complexity advantage of Zig.
2025-06-11 19:30:33 +10:00
Triex
7b81ea27d7 docs: Tidy root README, add hardware notes to experimental/README.md 2025-06-11 17:48:38 +10:00
Triex
0f980354f8 feat: Enhanced device detection handling, added metal initial draft, theoretically-reliable metal mac detection -> experimental implementation
 Implemented initial Apple Silicon detection using sysctl system calls
 Added proper M1/M2/M3/M4 generation detection via CPU brand string
 Fixed memory leaks that occured during dev with proper allocator cleanup
 Enhanced Metal backend foundation with device capabilities
 Added `test_m_series.zig` for hardware verification

🔧 Key Technical Improvements:
- Real hardware detection via `hw.model` (eg; `MacBookPro17,1`)
- CPU brand string parsing for accurate M-series identification
- Unified memory strategy detection (even under Rosetta)
- Apple Neural Engine capability detection
- Memory-safe device info structures

🧪 Verified on Apple Silicon:
- M1 correctly detected (generation 1, no variant)
- 16GB unified memory properly identified
- Builds cleanly with Zig `0.15.0-dev.703+597dd328e`
- No false positives for M1 Pro/Max/Ultra variants

📋 Updated README status to reflect experimental draft implementation
⚠️  Clearly marked as research/development foundation, not production ready
2025-06-11 17:43:04 +10:00
Alex Zarov
68c4c77600
docs: Update README.md 2025-06-08 22:46:46 +10:00
Triex
b1a862818a docs: Tidy README 2025-06-06 16:39:22 +10:00
Triex
31ef81000f feat: Migrate experimental implementation to modern Zig, achieve clean compilation (private repo dump -> /experimental)
- Port HTTP server, and appropriate points across core etc from old API to Zig `0.15.0-dev` patterns
- Fix mutability, unused variables, and API compatibility issues
- Validate SIMD tensor operations and backend architecture
- Foundation now compiles cleanly and produces working binary
2025-06-06 15:31:21 +10:00
Triex
5ff856c018 docs: Tidy 2025-06-05 05:17:21 +10:00
Triex
90627c6d54 docs: Tidy spacing/line breaks 2025-06-05 04:11:55 +10:00
Triex
a7f0a5391b docs: Tidy 2025-06-05 04:10:56 +10:00
Triex
9c25e23773 docs: Improved README, add additional references 2025-06-05 04:08:59 +10:00
Triex
9aedaae1d5 docs: Tidy list items @ README 2025-06-04 11:41:41 +10:00
Triex
69c1bab49e docs: Further tidy initial proposal idea 2025-06-04 11:38:26 +10:00
Triex
e480e15e5f docs: Tidy README 2025-06-04 11:36:38 +10:00
Triex
5c57ab1f8c docs: Move why section up to top of root README proposal/architecture notes - more cohesive flow 2025-05-23 04:23:39 +10:00
Alex Zarov
21654d7511
docs: Tidy title 2025-05-23 04:09:48 +10:00
Triex
ef39c76b2d docs: Tidy title 2025-05-23 04:08:57 +10:00
Triex
59c80bf948 docs: Enhanced draft code, table of contents + copy 2025-05-23 03:54:47 +10:00
Triex
715d0d2e6d docs: Tidy introduction 2025-05-23 03:33:03 +10:00
Triex
3af7848785 docs: Initial architecture notes for Zig implementation 2025-05-23 03:29:53 +10:00
Huang Panpan
57d7bd45df
Merge pull request #736 from shihaobai/main
Docs: add LightLLM as supported engine
2025-04-08 22:18:33 +08:00
Shixian Sheng
a5d2ad229e
Update README.md 2025-03-26 08:58:35 -04:00
DeepSeekDDM
98e67a71f4
Update paper link 2025-03-16 23:41:52 +08:00
shihaobai
408e6e188a
Update README.md
polish
2025-03-03 20:16:37 +08:00
shihaobai
73f2954fa8 polish 2025-03-03 20:10:18 +08:00
shihaobai
1ab09c8780 Docs: add LightLLM as supported engine 2025-03-03 19:23:08 +08:00
DeepSeekDDM
c9353aba6c
Update bib info 2025-02-24 11:25:44 +08:00
Konano
f07bccc49e
fix: resolve center alignment issue in preview 2025-02-14 12:12:16 +08:00
Konano
0866cab5f9
chore: update README.md to improve layout and image attributes 2025-02-14 12:02:10 +08:00
Konano
e15f67af1c
chore: update README.md to improve layout and image attributes 2025-02-08 18:28:40 +08:00
Xingkai Yu
1d7d440461
Merge pull request #432 from luislh-dev/main
remove redundant asterisks in README
2025-02-05 16:53:53 +08:00
Xingkai Yu
09d108620a
Merge pull request #440 from spenserblack/main
Add syntax highlighting to requirements code block
2025-02-05 16:50:03 +08:00
Xingkai Yu
d0f8c4fca3
Merge pull request #528 from WSL0809/main
Fix table bold formatting in TriviaQA EM comparison
2025-02-05 16:33:18 +08:00
luislopez-developer
97b35f1fca docs: remove redundant asterisks in note 2025-02-03 15:02:04 -05:00
wangsl
d5c08b384b
Update README.md
fix(table): correct bold formatting for TriviaQA EM comparison

- Remove redundant bolding on LLaMA3.1 405B (82.7)
- Retain single bold style for DeepSeek-V3's highest score (82.9)
- Aligns with evaluation convention of highlighting only the best performance
2025-02-02 02:34:59 +08:00
Spenser Black
760d22821f
Add syntax highlighting to requirements code block 2025-01-28 18:07:15 -05:00
Dhieu
6784e1976d Fix TOC links to correctly link to headings in Markdown 2025-01-28 17:14:35 +03:00
Dhieu
ddc501b80e Add table of contents to README 2025-01-27 14:18:17 +03:00
enoch kan
53d8dc9966 docs: Update system requirements with GitHub Markdown callout 2025-01-25 22:29:54 +00:00
enoch kan
722e6885ef docs: Improve system requirements section formatting 2025-01-25 22:26:48 +00:00
enoch kan
53b055bc1e docs: Add system requirements for DeepSeek-Infer demo 2025-01-25 22:21:51 +00:00
wangfuchun-fc
3779a89770
fix: fix readme doc typo. 2025-01-06 22:00:32 +08:00
Xingkai Yu
9b288b86cc
Update README.md 2025-01-03 15:30:48 +08:00
kutt
21bc231f32
use alert formatting for notes in readme 2025-01-02 15:02:52 +01:00
Huang Panpan
1b8e18cc29
Merge pull request #21 from eltociear/patch-1
docs: update README.md
2024-12-30 15:03:30 +08:00
zhyncs
68d0061937 upd 2024-12-30 14:25:28 +08:00
zhyncs
2fc98d1cdf upd 2024-12-30 14:21:00 +08:00