Commit Graph

45 Commits

Author SHA1 Message Date
Benjamin Winkler
43cffb1ae3
Minor grammatical tense corrections to README.md
Minor changes to correct grammatical tense for activities that took place in the past.
2025-02-07 01:01:40 -05:00
Xingkai Yu
1d7d440461
Merge pull request #432 from luislh-dev/main
remove redundant asterisks in README
2025-02-05 16:53:53 +08:00
Xingkai Yu
09d108620a
Merge pull request #440 from spenserblack/main
Add syntax highlighting to requirements code block
2025-02-05 16:50:03 +08:00
Xingkai Yu
d0f8c4fca3
Merge pull request #528 from WSL0809/main
Fix table bold formatting in TriviaQA EM comparison
2025-02-05 16:33:18 +08:00
Xingkai Yu
87a01053e4
Merge pull request #556 from XxAlonexX/main
Fix Linear Layer Bias Initialization
2025-02-05 16:23:02 +08:00
Huang Panpan
a157077c61
Merge pull request #408 from fitzjalen/refactor
Clarify assertion errors
2025-02-05 12:03:02 +08:00
Huang Panpan
c32c957fb0
Merge pull request #364 from Dhie-boop/feature/table-of-content
Add table of contents to README for better navigation
2025-02-05 11:39:08 +08:00
XxAlonexX
6a30b43249 Fix Linear Layer Bias Initialization 2025-02-04 10:38:45 +05:30
luislopez-developer
97b35f1fca docs: remove redundant asterisks in note 2025-02-03 15:02:04 -05:00
wangsl
d5c08b384b
Update README.md
fix(table): correct bold formatting for TriviaQA EM comparison

- Remove redundant bolding on LLaMA3.1 405B (82.7)
- Retain single bold style for DeepSeek-V3's highest score (82.9)
- Aligns with evaluation convention of highlighting only the best performance
2025-02-02 02:34:59 +08:00
Spenser Black
760d22821f
Add syntax highlighting to requirements code block 2025-01-28 18:07:15 -05:00
Dhieu
6784e1976d Fix TOC links to correctly link to headings in Markdown 2025-01-28 17:14:35 +03:00
Roman Fitzjalen
2756e130c2 clarify assertion error 2025-01-28 13:16:54 +01:00
Dhieu
ddc501b80e Add table of contents to README 2025-01-27 14:18:17 +03:00
Huang Panpan
b5d872ead0
Merge pull request #341 from enochkan/main
docs: Add system requirements for DeepSeek-Infer demo
2025-01-26 09:29:50 +08:00
enoch kan
53d8dc9966 docs: Update system requirements with GitHub Markdown callout 2025-01-25 22:29:54 +00:00
enoch kan
722e6885ef docs: Improve system requirements section formatting 2025-01-25 22:26:48 +00:00
enoch kan
53b055bc1e docs: Add system requirements for DeepSeek-Infer demo 2025-01-25 22:21:51 +00:00
Xingkai Yu
ee4c4ea32b
Merge pull request #234 from wangfuchun-fc/patch-1
fix: fix readme doc typo.
2025-01-07 17:53:28 +08:00
Huang Panpan
25109d2ccd
Merge pull request #230 from jacksonpradolima/main
Add CITATION.cff to provide citation metadata
2025-01-07 14:05:15 +08:00
Huang Panpan
fdbd5be754
Merge pull request #193 from enochkan/main
Add docstrings to functions in inference modules for better clarity
2025-01-07 14:02:11 +08:00
wangfuchun-fc
3779a89770
fix: fix readme doc typo. 2025-01-06 22:00:32 +08:00
Jackson Antonio do Prado Lima
c070549279 Add CITATION.cff to provide citation metadata
This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.
2025-01-05 21:46:37 -03:00
enoch kan
bc77f22afc Updated model.py docstrings 2025-01-05 18:24:31 +00:00
enoch kan
a1296f099e Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
GeeeekExplorer
fd011c11aa torch rmsnorm 2025-01-05 14:33:48 +08:00
Xingkai Yu
9b288b86cc
Update README.md 2025-01-03 15:30:48 +08:00
Huang Panpan
0d16ea24c8
Merge pull request #206 from kutt/patch-1
use alert formatting for notes in readme
2025-01-03 09:48:03 +08:00
kutt
21bc231f32
use alert formatting for notes in readme 2025-01-02 15:02:52 +01:00
Xingkai Yu
8710ec2ecb
require model-parallel in convert.py 2024-12-31 18:05:55 +08:00
Huang Panpan
7c2466b310
Update issue templates 2024-12-31 14:49:05 +08:00
Huang Panpan
1b8e18cc29
Merge pull request #21 from eltociear/patch-1
docs: update README.md
2024-12-30 15:03:30 +08:00
Haswell Iris
94410f8d58
Merge pull request #33 from zhyncs/main
docs: update SGLang usage
2024-12-30 14:37:38 +08:00
zhyncs
68d0061937 upd 2024-12-30 14:25:28 +08:00
zhyncs
2fc98d1cdf upd 2024-12-30 14:21:00 +08:00
zhyncs
a1edf4138e upd 2024-12-30 14:18:00 +08:00
zhyncs
8638950ec2 docs: update SGLang usage 2024-12-30 14:13:27 +08:00
DeepSeekDDM
83dd18eda4
Update README.md
add citation format to the arxiv-version paper
2024-12-30 11:04:14 +08:00
Ikko Eltociear Ashimine
710c8b8b6e
docs: update README.md
HuggingFace -> Hugging Face
2024-12-29 00:43:11 +09:00
Yang Wang
8f1c9488b5
handle missing scale_inv_name (#2)
* handle missing scale_inv_name

Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.

* sort filename to reduce memory costs

* Add CUDA cache clearing in memory management

Added torch.cuda.empty_cache() to free up unused memory on the GPU,
2024-12-27 09:34:38 +08:00
Huang Panpan
c8087bd8b8
Merge pull request #9 from simon-mo/vllm
Docs: add vLLM as supported engine
2024-12-27 09:16:09 +08:00
simon-mo
e2c15caf04 add version
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:11:31 -08:00
simon-mo
cf47874d8e Docs: add vLLM as supported engine
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:10:33 -08:00
stack-heap-overflow
4c2fdb8f55 Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
stack-heap-overflow
4b58dc6bfc
Initial commit 2024-12-26 17:52:41 +08:00