DeepSeek-V3

mirror of https://github.com/deepseek-ai/DeepSeek-V3.git synced 2025-02-23 14:18:57 -05:00

Author	SHA1	Message	Date
XxAlonexX	79d72ecd8d	Optimize Multi-head Latent Attention (MLA) with Fast Path for Short Sequences	2025-02-19 10:35:28 +05:30
XxAlonexX	f8b7c3b6e7	Merge branch 'main' of github.com:XxAlonexX/DeepSeek-V3	2025-02-19 10:32:29 +05:30
XxAlonexX	cc66d60c67	Optimize Multi-head Latent Attention (MLA) for Short Sequences	2025-02-19 10:31:28 +05:30
Xingkai Yu	1398800ebf	fix scores mask	2025-02-14 20:26:45 +08:00
Xingkai Yu	5ee97a83f0	fix comment	2025-02-07 16:42:55 +08:00
Xingkai Yu	87a01053e4	Merge pull request #556 from XxAlonexX/main Fix Linear Layer Bias Initialization	2025-02-05 16:23:02 +08:00
XxAlonexX	6a30b43249	Fix Linear Layer Bias Initialization	2025-02-04 10:38:45 +05:30
Roman Fitzjalen	2756e130c2	clarify assertion error	2025-01-28 13:16:54 +01:00
enoch kan	bc77f22afc	Updated model.py docstrings	2025-01-05 18:24:31 +00:00
GeeeekExplorer	fd011c11aa	torch rmsnorm	2025-01-05 14:33:48 +08:00
stack-heap-overflow	4c2fdb8f55	Release DeepSeek-V3	2024-12-26 19:01:57 +08:00