macOS xterm-ghostty zsh 10 views

A demo of zai-org/GLM-5.2-FP8 running on 8xH200 and pointing Claude Code at it to do some code reviews and stress-test streaming tool calling.

vLLM main (as of June 22nd, 2026) started via:

vllm serve zai-org/GLM-5.2-FP8 \
  --tensor-parallel-size 8 \
  --kv-cache-dtype fp8 \
  --tool-call-parser glm47 \
  --enable-auto-tool-choice \
  --reasoning-parser glm45 \
  --max-num-seqs 1024 \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 5 \
  --max-model-len 512000

Claude Code started via:

CLAUDE_MODEL="zai-org/GLM-5.2-FP8" \
CLAUDE_CODE_USE_VERTEX=0 \
CLAUDE_CODE_AUTO_COMPACT_WINDOW=400000 \
ANTHROPIC_BASE_URL="http://localhost:8000" \
ANTHROPIC_DEFAULT_OPUS_MODEL="$CLAUDE_MODEL" \
ANTHROPIC_DEFAULT_SONNET_MODEL="$CLAUDE_MODEL" \
ANTHROPIC_DEFAULT_HAIKU_MODEL="$CLAUDE_MODEL" \
ANTHROPIC_AUTH_TOKEN="dummy" \
claude \
  --model "sonnet[1m]"