Commit Graph

47 Commits

Author SHA1 Message Date
Konano
e15f67af1c
chore: update README.md to improve layout and image attributes 2025-02-08 18:28:40 +08:00
Konano
76d8d39560
chore: add stale issue management configuration 2025-02-08 15:12:09 +08:00
Xingkai Yu
5ee97a83f0
fix comment 2025-02-07 16:42:55 +08:00
Xingkai Yu
1d7d440461
Merge pull request #432 from luislh-dev/main
remove redundant asterisks in README
2025-02-05 16:53:53 +08:00
Xingkai Yu
09d108620a
Merge pull request #440 from spenserblack/main
Add syntax highlighting to requirements code block
2025-02-05 16:50:03 +08:00
Xingkai Yu
d0f8c4fca3
Merge pull request #528 from WSL0809/main
Fix table bold formatting in TriviaQA EM comparison
2025-02-05 16:33:18 +08:00
Xingkai Yu
87a01053e4
Merge pull request #556 from XxAlonexX/main
Fix Linear Layer Bias Initialization
2025-02-05 16:23:02 +08:00
Huang Panpan
a157077c61
Merge pull request #408 from fitzjalen/refactor
Clarify assertion errors
2025-02-05 12:03:02 +08:00
Huang Panpan
c32c957fb0
Merge pull request #364 from Dhie-boop/feature/table-of-content
Add table of contents to README for better navigation
2025-02-05 11:39:08 +08:00
XxAlonexX
6a30b43249 Fix Linear Layer Bias Initialization 2025-02-04 10:38:45 +05:30
luislopez-developer
97b35f1fca docs: remove redundant asterisks in note 2025-02-03 15:02:04 -05:00
wangsl
d5c08b384b
Update README.md
fix(table): correct bold formatting for TriviaQA EM comparison

- Remove redundant bolding on LLaMA3.1 405B (82.7)
- Retain single bold style for DeepSeek-V3's highest score (82.9)
- Aligns with evaluation convention of highlighting only the best performance
2025-02-02 02:34:59 +08:00
Spenser Black
760d22821f
Add syntax highlighting to requirements code block 2025-01-28 18:07:15 -05:00
Dhieu
6784e1976d Fix TOC links to correctly link to headings in Markdown 2025-01-28 17:14:35 +03:00
Roman Fitzjalen
2756e130c2 clarify assertion error 2025-01-28 13:16:54 +01:00
Dhieu
ddc501b80e Add table of contents to README 2025-01-27 14:18:17 +03:00
Huang Panpan
b5d872ead0
Merge pull request #341 from enochkan/main
docs: Add system requirements for DeepSeek-Infer demo
2025-01-26 09:29:50 +08:00
enoch kan
53d8dc9966 docs: Update system requirements with GitHub Markdown callout 2025-01-25 22:29:54 +00:00
enoch kan
722e6885ef docs: Improve system requirements section formatting 2025-01-25 22:26:48 +00:00
enoch kan
53b055bc1e docs: Add system requirements for DeepSeek-Infer demo 2025-01-25 22:21:51 +00:00
Xingkai Yu
ee4c4ea32b
Merge pull request #234 from wangfuchun-fc/patch-1
fix: fix readme doc typo.
2025-01-07 17:53:28 +08:00
Huang Panpan
25109d2ccd
Merge pull request #230 from jacksonpradolima/main
Add CITATION.cff to provide citation metadata
2025-01-07 14:05:15 +08:00
Huang Panpan
fdbd5be754
Merge pull request #193 from enochkan/main
Add docstrings to functions in inference modules for better clarity
2025-01-07 14:02:11 +08:00
wangfuchun-fc
3779a89770
fix: fix readme doc typo. 2025-01-06 22:00:32 +08:00
Jackson Antonio do Prado Lima
c070549279 Add CITATION.cff to provide citation metadata
This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.
2025-01-05 21:46:37 -03:00
enoch kan
bc77f22afc Updated model.py docstrings 2025-01-05 18:24:31 +00:00
enoch kan
a1296f099e Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
GeeeekExplorer
fd011c11aa torch rmsnorm 2025-01-05 14:33:48 +08:00
Xingkai Yu
9b288b86cc
Update README.md 2025-01-03 15:30:48 +08:00
Huang Panpan
0d16ea24c8
Merge pull request #206 from kutt/patch-1
use alert formatting for notes in readme
2025-01-03 09:48:03 +08:00
kutt
21bc231f32
use alert formatting for notes in readme 2025-01-02 15:02:52 +01:00
Xingkai Yu
8710ec2ecb
require model-parallel in convert.py 2024-12-31 18:05:55 +08:00
Huang Panpan
7c2466b310
Update issue templates 2024-12-31 14:49:05 +08:00
Huang Panpan
1b8e18cc29
Merge pull request #21 from eltociear/patch-1
docs: update README.md
2024-12-30 15:03:30 +08:00
Haswell Iris
94410f8d58
Merge pull request #33 from zhyncs/main
docs: update SGLang usage
2024-12-30 14:37:38 +08:00
zhyncs
68d0061937 upd 2024-12-30 14:25:28 +08:00
zhyncs
2fc98d1cdf upd 2024-12-30 14:21:00 +08:00
zhyncs
a1edf4138e upd 2024-12-30 14:18:00 +08:00
zhyncs
8638950ec2 docs: update SGLang usage 2024-12-30 14:13:27 +08:00
DeepSeekDDM
83dd18eda4
Update README.md
add citation format to the arxiv-version paper
2024-12-30 11:04:14 +08:00
Ikko Eltociear Ashimine
710c8b8b6e
docs: update README.md
HuggingFace -> Hugging Face
2024-12-29 00:43:11 +09:00
Yang Wang
8f1c9488b5
handle missing scale_inv_name (#2)
* handle missing scale_inv_name

Fixed an issue where `weight` and `weight_scale_inv` (e.g. `model.layers.39.mlp.experts.92.gate_proj.weight` and `model.layers.39.mlp.experts.92.gate_proj.weight_scale_inv`) were not in the same SafeTensor, causing an assertion error due to scale_inv_name not being in the state_dict.

* sort filename to reduce memory costs

* Add CUDA cache clearing in memory management

Added torch.cuda.empty_cache() to free up unused memory on the GPU,
2024-12-27 09:34:38 +08:00
Huang Panpan
c8087bd8b8
Merge pull request #9 from simon-mo/vllm
Docs: add vLLM as supported engine
2024-12-27 09:16:09 +08:00
simon-mo
e2c15caf04 add version
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:11:31 -08:00
simon-mo
cf47874d8e Docs: add vLLM as supported engine
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-12-26 17:10:33 -08:00
stack-heap-overflow
4c2fdb8f55 Release DeepSeek-V3 2024-12-26 19:01:57 +08:00
stack-heap-overflow
4b58dc6bfc
Initial commit 2024-12-26 17:52:41 +08:00