fix comment

2025-07-18 15:19:09 -04:00 · 2025-02-07 16:42:55 +08:00 · 2025-02-07 16:42:55 +08:00 · 5ee97a83f0
commit 5ee97a83f0
parent 1d7d440461
1 changed files with 1 additions and 1 deletions
--- a/inference/model.py
+++ b/inference/model.py
@ -143,7 +143,7 @@ def linear(x: torch.Tensor, weight: torch.Tensor, bias: Optional[torch.Tensor] =
        quantization-aware computations depending on the input parameters.

    Notes:
-        - If `weight` is quantized (e.g., `element_size() > 1`), a dequantized version 
+        - If `weight` is quantized (e.g., `element_size() == 1`), a dequantized version 
          is used for computation.
        - If `gemm_impl == "bf16"`, dequantization and a `bf16` GEMM operation are applied.
        - For other cases, the function applies quantization to `x` and uses `fp8_gemm` for computation.