This commit is contained in:
Juan Pablo Valencia 2025-04-09 09:51:10 +08:00 committed by GitHub
commit 9108724bca
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -143,8 +143,8 @@ def linear(x: torch.Tensor, weight: torch.Tensor, bias: Optional[torch.Tensor] =
quantization-aware computations depending on the input parameters. quantization-aware computations depending on the input parameters.
Notes: Notes:
- If `weight` is quantized (e.g., `element_size() == 1`), a dequantized version - If `weight` is in a higher precision format (e.g., float32 or bfloat16), then `element_size() > 1`, and the original
is used for computation. weight tensor is used for computation.
- If `gemm_impl == "bf16"`, dequantization and a `bf16` GEMM operation are applied. - If `gemm_impl == "bf16"`, dequantization and a `bf16` GEMM operation are applied.
- For other cases, the function applies quantization to `x` and uses `fp8_gemm` for computation. - For other cases, the function applies quantization to `x` and uses `fp8_gemm` for computation.
""" """