DeepSeek-V3/MACBOOK_SETUP.md

3.0 KiB

DeepSeek V3 for MacBook

This was the initial idea, but now migrating to Zig for better performance and support for any architecture.

This guide provides instructions for running DeepSeek V3 efficiently on MacBook devices with limited resources compared to high-end GPU servers.

Optimizations Made

The optimized version includes several improvements:

  1. CPU and MPS (Apple Silicon) Support: Implementation of CPU-compatible kernels and Apple Silicon GPU acceleration.
  2. Memory Efficiency: Chunked processing and sliding window attention to reduce memory usage.
  3. Quantization: Optional int8 quantization for CPU to improve inference speed while maintaining reasonable quality.
  4. Reduced Model Size: Configuration options to load smaller, more efficient model variants.
  5. Dynamic Device Selection: Automatic selection of the best available device (MPS, CPU).
  6. Progressive Generation: Ability to see generation progress for longer outputs.

System Requirements

Minimum Requirements

  • MacBook with Intel CPU (8GB RAM minimum)
  • macOS 11 (Big Sur) or newer
  • 10GB disk space for model weights
  • MacBook with Apple Silicon (M1/M2/M3)
  • 16GB RAM or more
  • macOS 12 (Monterey) or newer
  • 20GB disk space for model weights

Installation

  1. Install dependencies:
pip install -r inference/requirements_macbook.txt
  1. Download model weights following instructions in README_WEIGHTS.md

Usage

The optimized script provides several options to control performance:

python inference/optimized_generate.py \
  --ckpt-path /path/to/model/weights \
  --config inference/configs/config_macbook.json \
  --interactive \
  --max-new-tokens 200 \
  --temperature 0.2

Additional Options

  • --force-cpu: Force CPU usage even if GPU is available
  • --no-quantize: Disable quantization (higher quality but slower)
  • --chunk-size: Size of processing chunks (default: 512, lower values use less memory)
  • --reduced-model: Use reduced model parameters (fewer experts/layers for lower resource usage)

Performance Tips

  1. Context Length: Keep prompt length short (under 1024 tokens) for better performance.
  2. Batch Size: Always use batch size of 1 on MacBooks.
  3. Apple Silicon: M1/M2/M3 MacBooks can use MPS backend for significantly better performance.
  4. Memory Management: Close other applications when running the model.
  5. Temperature: Using temperature=0 (greedy decoding) is faster but less creative.

Troubleshooting

"Out of Memory" Errors

  • Try using --reduced-model flag
  • Reduce --chunk-size to 256 or 128
  • Use --force-cpu if MPS memory is limited

Slow Generation

  • Ensure you're using the Apple Silicon optimized build of PyTorch
  • Check activity monitor to verify GPU utilization
  • Try a smaller config file (edit parameters in config_macbook.json)

Model Loading Errors

  • Verify model weights are downloaded correctly
  • Ensure safetensors files are in the expected location
  • Check torch/transformers versions match requirements