Commit Graph

69 Commits

Author SHA1 Message Date
Gabriel Caetano
7b07cc985c
Merge a3f30dc59c into 57d7bd45df 2025-04-08 22:46:27 -03:00
Gabriel Caetano
a3f30dc59c
Merge branch 'main' into main 2025-04-08 22:46:24 -03:00
Gabriel Caetano
a7bab5c920 Clean up and optimize Triton FP8 kernels
- Improved readability and structure of Triton kernels for FP8 weight dequantization and matrix multiplication (GEMM)
- Added comments for clarity
- Replaced hardcoded block sizes with configurable parameters
- Improved safety using tl.cdiv and masking
- Renamed variables and ensured consistency in naming
2025-04-08 22:33:48 -03:00
Huang Panpan
57d7bd45df
Merge pull request #736 from shihaobai/main
Docs: add LightLLM as supported engine
2025-04-08 22:18:33 +08:00
Xingkai Yu
88d6547df2
Merge pull request #816 from KPCOFGS/main
Update README.md
2025-04-08 17:27:09 +08:00
Xingkai Yu
741b06ebca
Merge pull request #720 from xiaokongkong/main
modify the explanation of MLA
2025-04-08 17:20:37 +08:00
Shixian Sheng
a5d2ad229e
Update README.md 2025-03-26 08:58:35 -04:00
DeepSeekDDM
a878eada08
Delete DeepSeek_V3.pdf 2025-03-16 23:42:21 +08:00
DeepSeekDDM
98e67a71f4
Update paper link 2025-03-16 23:41:52 +08:00
shihaobai
408e6e188a
Update README.md
polish
2025-03-03 20:16:37 +08:00
shihaobai
73f2954fa8 polish 2025-03-03 20:10:18 +08:00
shihaobai
1ab09c8780 Docs: add LightLLM as supported engine 2025-03-03 19:23:08 +08:00
huxuedan
d29a967601 modify the explanation of MLA 2025-02-26 17:07:39 +08:00
DeepSeekDDM
592fd5daf8
Delete CITATION.cff 2025-02-24 11:50:20 +08:00
DeepSeekDDM
c9353aba6c
Update bib info 2025-02-24 11:25:44 +08:00
Huang Panpan
f09f5fa321
Merge pull request #616 from Konano/chore-readme
chore: update README.md to improve layout
2025-02-18 18:04:06 +08:00
Xingkai Yu
1398800ebf
fix scores mask 2025-02-14 20:26:45 +08:00
Konano
f07bccc49e
fix: resolve center alignment issue in preview 2025-02-14 12:12:16 +08:00
Konano
0866cab5f9
chore: update README.md to improve layout and image attributes 2025-02-14 12:02:10 +08:00
Konano
e15f67af1c
chore: update README.md to improve layout and image attributes 2025-02-08 18:28:40 +08:00
Huang Panpan
2f7b80eece
Merge pull request #611 from Konano/chore-stale
chore: add stale issue management configuration
2025-02-08 16:10:06 +08:00
Konano
76d8d39560
chore: add stale issue management configuration 2025-02-08 15:12:09 +08:00
Xingkai Yu
5ee97a83f0
fix comment 2025-02-07 16:42:55 +08:00
Xingkai Yu
1d7d440461
Merge pull request #432 from luislh-dev/main
remove redundant asterisks in README
2025-02-05 16:53:53 +08:00
Xingkai Yu
09d108620a
Merge pull request #440 from spenserblack/main
Add syntax highlighting to requirements code block
2025-02-05 16:50:03 +08:00
Xingkai Yu
d0f8c4fca3
Merge pull request #528 from WSL0809/main
Fix table bold formatting in TriviaQA EM comparison
2025-02-05 16:33:18 +08:00
Xingkai Yu
87a01053e4
Merge pull request #556 from XxAlonexX/main
Fix Linear Layer Bias Initialization
2025-02-05 16:23:02 +08:00
Huang Panpan
a157077c61
Merge pull request #408 from fitzjalen/refactor
Clarify assertion errors
2025-02-05 12:03:02 +08:00
Huang Panpan
c32c957fb0
Merge pull request #364 from Dhie-boop/feature/table-of-content
Add table of contents to README for better navigation
2025-02-05 11:39:08 +08:00
XxAlonexX
6a30b43249 Fix Linear Layer Bias Initialization 2025-02-04 10:38:45 +05:30
luislopez-developer
97b35f1fca docs: remove redundant asterisks in note 2025-02-03 15:02:04 -05:00
wangsl
d5c08b384b
Update README.md
fix(table): correct bold formatting for TriviaQA EM comparison

- Remove redundant bolding on LLaMA3.1 405B (82.7)
- Retain single bold style for DeepSeek-V3's highest score (82.9)
- Aligns with evaluation convention of highlighting only the best performance
2025-02-02 02:34:59 +08:00
Gabriel Caetano
61790e1653 Update 2
Here are the improvements made to the code for your commit message:

Refactored init_distributed function: Extracted distributed setup logic into a separate function.
Updated sample function: Replaced exponential approach with torch.multinomial for sampling.
Improved argument validation: Replaced assert with a more user-friendly validation in main to ensure at least one parameter (input-file or interactive) is provided.
Refactored interactive mode logic: Maintained user interaction logic but moved init_distributed call to the beginning of main.
2025-01-31 19:33:00 -03:00
Gabriel Caetano
89882a94f6 Change
Changes:

init_distributed function: Extracted the distributed setup logic into a separate function.
sample function: Modified it to use torch.multinomial instead of an exponentiation-based approach for sampling.
Argument Validation: Replaced the assert with a more user-friendly validation in main to ensure that at least one of the parameters (input-file or interactive) is provided.
Interactive Code Refactoring: The user interaction logic was kept, but the init_distributed function is now called separately at the beginning of main.
2025-01-30 22:47:39 -03:00
Spenser Black
760d22821f
Add syntax highlighting to requirements code block 2025-01-28 18:07:15 -05:00
Dhieu
6784e1976d Fix TOC links to correctly link to headings in Markdown 2025-01-28 17:14:35 +03:00
Roman Fitzjalen
2756e130c2 clarify assertion error 2025-01-28 13:16:54 +01:00
Dhieu
ddc501b80e Add table of contents to README 2025-01-27 14:18:17 +03:00
Huang Panpan
b5d872ead0
Merge pull request #341 from enochkan/main
docs: Add system requirements for DeepSeek-Infer demo
2025-01-26 09:29:50 +08:00
enoch kan
53d8dc9966 docs: Update system requirements with GitHub Markdown callout 2025-01-25 22:29:54 +00:00
enoch kan
722e6885ef docs: Improve system requirements section formatting 2025-01-25 22:26:48 +00:00
enoch kan
53b055bc1e docs: Add system requirements for DeepSeek-Infer demo 2025-01-25 22:21:51 +00:00
Xingkai Yu
ee4c4ea32b
Merge pull request #234 from wangfuchun-fc/patch-1
fix: fix readme doc typo.
2025-01-07 17:53:28 +08:00
Huang Panpan
25109d2ccd
Merge pull request #230 from jacksonpradolima/main
Add CITATION.cff to provide citation metadata
2025-01-07 14:05:15 +08:00
Huang Panpan
fdbd5be754
Merge pull request #193 from enochkan/main
Add docstrings to functions in inference modules for better clarity
2025-01-07 14:02:11 +08:00
wangfuchun-fc
3779a89770
fix: fix readme doc typo. 2025-01-06 22:00:32 +08:00
Jackson Antonio do Prado Lima
c070549279 Add CITATION.cff to provide citation metadata
This file includes detailed citation information for the DeepSeek-V3 project, such as authors, DOI, license, and key project details. It enables users to properly cite the work and promotes better academic and professional attribution.
2025-01-05 21:46:37 -03:00
enoch kan
bc77f22afc Updated model.py docstrings 2025-01-05 18:24:31 +00:00
enoch kan
a1296f099e Enhance documentation and update .gitignore for model conversion scripts 2025-01-05 18:18:18 +00:00
GeeeekExplorer
fd011c11aa torch rmsnorm 2025-01-05 14:33:48 +08:00