Distributed Systems Engineer
Platform · Palo Alto, CA · Full-time · Hybrid
$180k – $250k + Equity
You’ll scale our CUDA / Triton inference fleet across regions and providers, keeping costs low and QoS high.
Responsibilities
- Design autoscaling policies for GPU and CPU inference clusters.
- Implement canary & shadow traffic pipelines for new model versions.
- Drive infra-as-code and chaos testing culture across engineering.
Minimum Qualifications
- 4+ years Go, Rust or similar in production micro-services.
- Strong grasp of container orchestration (K8s, Nomad) and service discovery.
- Experience with GPU scheduling or network-heavy realtime systems a plus.
- BS in CS, CE or demonstrable equivalent.