Commit Graph

97 Commits

Author SHA1 Message Date
Alex Zarov
618ecfb0c9
docs: Update README.md 2025-06-11 19:45:50 +10:00
Alex Zarov
c4ca746a60
Merge pull request #4 from Triex:feat--Implement-dynamic-benchmark-summary-with-real-performance-metrics
feat: Implement dynamic benchmark summary with real performance metrics
2025-06-11 19:43:10 +10:00
Triex
18097ee5d3 feat: implement dynamic benchmark summary with real performance metrics
- Replace mocked performance estimates with actual measured results
- Add `BenchmarkResults` struct to collect live performance data during execution
- Implement honest dynamic summary showing real GFLOPS, timing, and bandwidth
- Add transparent performance assessment based on measured values only
- Display peak performance identification (1160 GFLOPS measured at 512×512)
- Include real memory bandwidth (20.3 GB/s) and latency (1.8 ns) measurements
- Remove misleading static efficiency percentages with live measurement system
- Show clear distinction between measured performance and theoretical estimates
- Provide actionable insights from Apple Accelerate backend performance

Results: 1160 GFLOPS peak measured performance with honest assessment,
eliminating misleading hardcoded comparisons in favor of real benchmark data.
2025-06-11 19:41:51 +10:00
Alex Zarov
2c0ad7fe97
Merge pull request #3 from Triex:feat--BLAS-integration-working---significant-matrix-operation-improvements
feat: BLAS integration working - significant matrix operation improvements
2025-06-11 19:33:50 +10:00
Triex
c8eefc8865 feat: BLAS integration working - significant matrix operation improvements
Matrix Performance Improvements:
-  Apple Accelerate backend integrated and functional
-  Matrix ops: 1004 GFLOPS (38.6% efficiency) on 1024×1024
-  Significant speedup: 6418ms naive → 2.1ms BLAS
-  Draft implementation with working acceleration

Performance Results (Apple M1, debug build):
- Matrix 256×256: 0.1ms, 561 GFLOPS (21.6% efficiency)
- Matrix 512×512: 0.2ms, 1129 GFLOPS (43.4% efficiency)
- Matrix 1024×1024: 2.1ms, 1004 GFLOPS (38.6% efficiency)
- Matrix 2048×2048: 21.5ms, 799 GFLOPS (30.7% efficiency)

System Integration:
-  Memory bandwidth: 23.5 GB/s
-  Access latency: 1.8ns
-  Apple Silicon detection working
-  BLAS backend selection functional

Web Layer Updates:
- Enhanced /health endpoint with BLAS status
- New /performance endpoint with benchmark data
- Module dependency conflicts resolved
- Hardware acceleration reporting

Implementation Status:
- Matrix operations now use BLAS acceleration
- Foundation ready for transformer development
- DeepSeek V3 model implementation next priority
- Experimental/draft status maintained

This represents significant progress in the experimental foundation - matrix operations now deliver good performance while maintaining the zero-deployment-complexity advantage of Zig.
2025-06-11 19:30:33 +10:00
Alex Zarov
24d94f7c21
Merge pull request #2 from Triex:feat-Enhanced-device-detection-handling-initial-metal
Feat-Enhanced-device-detection-handling-initial-metal
2025-06-11 17:50:54 +10:00
Triex
7b81ea27d7 docs: Tidy root README, add hardware notes to experimental/README.md 2025-06-11 17:48:38 +10:00
Triex
0f980354f8 feat: Enhanced device detection handling, added metal initial draft, theoretically-reliable metal mac detection -> experimental implementation
 Implemented initial Apple Silicon detection using sysctl system calls
 Added proper M1/M2/M3/M4 generation detection via CPU brand string
 Fixed memory leaks that occured during dev with proper allocator cleanup
 Enhanced Metal backend foundation with device capabilities
 Added `test_m_series.zig` for hardware verification

🔧 Key Technical Improvements:
- Real hardware detection via `hw.model` (eg; `MacBookPro17,1`)
- CPU brand string parsing for accurate M-series identification
- Unified memory strategy detection (even under Rosetta)
- Apple Neural Engine capability detection
- Memory-safe device info structures

🧪 Verified on Apple Silicon:
- M1 correctly detected (generation 1, no variant)
- 16GB unified memory properly identified
- Builds cleanly with Zig `0.15.0-dev.703+597dd328e`
- No false positives for M1 Pro/Max/Ultra variants

📋 Updated README status to reflect experimental draft implementation
⚠️  Clearly marked as research/development foundation, not production ready
2025-06-11 17:43:04 +10:00
Alex Zarov
68c4c77600
docs: Update README.md 2025-06-08 22:46:46 +10:00
Triex
b1a862818a docs: Tidy README 2025-06-06 16:39:22 +10:00
Triex
bcee49badf docs: Tidy experimental README 2025-06-06 16:03:51 +10:00
Triex
8aa2785fad docs: Tidy experimental README status section 2025-06-06 16:00:24 +10:00
Triex
16fec1d4e9 docs: Update experimental README to reflect current state / performance 2025-06-06 15:58:39 +10:00
Triex
b1c1f2c07f feat: Update bench + core/root -> latest dev Zig ver 2025-06-06 15:55:07 +10:00
Triex
24b5fcfd02 docs: Tidy experimental/README.md 2025-06-06 15:48:21 +10:00
Alex Zarov
1d7855c807
Merge pull request #1 from Triex:feat--Migrate-experimental-implementation-to-modern-Zig,-achieve-clean-compilation
Feat--Migrate-experimental-implementation-to-modern-Zig,-achieve-clean-compilation
2025-06-06 15:34:31 +10:00
Triex
31ef81000f feat: Migrate experimental implementation to modern Zig, achieve clean compilation (private repo dump -> /experimental)
- Port HTTP server, and appropriate points across core etc from old API to Zig `0.15.0-dev` patterns
- Fix mutability, unused variables, and API compatibility issues
- Validate SIMD tensor operations and backend architecture
- Foundation now compiles cleanly and produces working binary
2025-06-06 15:31:21 +10:00
Triex
5ff856c018 docs: Tidy 2025-06-05 05:17:21 +10:00
Triex
90627c6d54 docs: Tidy spacing/line breaks 2025-06-05 04:11:55 +10:00
Triex
a7f0a5391b docs: Tidy 2025-06-05 04:10:56 +10:00
Triex
9c25e23773 docs: Improved README, add additional references 2025-06-05 04:08:59 +10:00
Triex
9aedaae1d5 docs: Tidy list items @ README 2025-06-04 11:41:41 +10:00
Triex
69c1bab49e docs: Further tidy initial proposal idea 2025-06-04 11:38:26 +10:00
Triex
e480e15e5f docs: Tidy README 2025-06-04 11:36:38 +10:00
Triex
5c57ab1f8c docs: Move why section up to top of root README proposal/architecture notes - more cohesive flow 2025-05-23 04:23:39 +10:00
Alex Zarov
21654d7511
docs: Tidy title 2025-05-23 04:09:48 +10:00
Triex
ef39c76b2d docs: Tidy title 2025-05-23 04:08:57 +10:00
Triex
59c80bf948 docs: Enhanced draft code, table of contents + copy 2025-05-23 03:54:47 +10:00
Triex
715d0d2e6d docs: Tidy introduction 2025-05-23 03:33:03 +10:00
Triex
3af7848785 docs: Initial architecture notes for Zig implementation 2025-05-23 03:29:53 +10:00
Triex
a1895012dd feat: Initial MacBook optimisation draft for DeepSeek V3 inference > moving to Zig instead 2025-05-23 01:53:02 +10:00
Xingkai Yu
4cc6253d5c
Merge pull request #666 from codinglover222/deepseek-doc-fix
fix an args description.
2025-04-09 09:50:40 +08:00
Huang Panpan
57d7bd45df
Merge pull request #736 from shihaobai/main
Docs: add LightLLM as supported engine
2025-04-08 22:18:33 +08:00
Xingkai Yu
88d6547df2
Merge pull request #816 from KPCOFGS/main
Update README.md
2025-04-08 17:27:09 +08:00
Xingkai Yu
741b06ebca
Merge pull request #720 from xiaokongkong/main
modify the explanation of MLA
2025-04-08 17:20:37 +08:00
Shixian Sheng
a5d2ad229e
Update README.md 2025-03-26 08:58:35 -04:00
DeepSeekDDM
a878eada08
Delete DeepSeek_V3.pdf 2025-03-16 23:42:21 +08:00
DeepSeekDDM
98e67a71f4
Update paper link 2025-03-16 23:41:52 +08:00
shihaobai
408e6e188a
Update README.md
polish
2025-03-03 20:16:37 +08:00
shihaobai
73f2954fa8 polish 2025-03-03 20:10:18 +08:00
shihaobai
1ab09c8780 Docs: add LightLLM as supported engine 2025-03-03 19:23:08 +08:00
huxuedan
d29a967601 modify the explanation of MLA 2025-02-26 17:07:39 +08:00
DeepSeekDDM
592fd5daf8
Delete CITATION.cff 2025-02-24 11:50:20 +08:00
DeepSeekDDM
c9353aba6c
Update bib info 2025-02-24 11:25:44 +08:00
Huang Panpan
f09f5fa321
Merge pull request #616 from Konano/chore-readme
chore: update README.md to improve layout
2025-02-18 18:04:06 +08:00
oyzh
4a65fd9221 fix an args description. 2025-02-15 11:02:28 +08:00
Xingkai Yu
1398800ebf
fix scores mask 2025-02-14 20:26:45 +08:00
Konano
f07bccc49e
fix: resolve center alignment issue in preview 2025-02-14 12:12:16 +08:00
Konano
0866cab5f9
chore: update README.md to improve layout and image attributes 2025-02-14 12:02:10 +08:00
Konano
e15f67af1c
chore: update README.md to improve layout and image attributes 2025-02-08 18:28:40 +08:00