Skip to content

Careers · Engineering

Inference Systems Engineer

Serve frontier models fast and cheap — the engine behind Vikasit Inference and 2M free tokens/day.

EngineeringPuneFull-time

About the role

You'll build and optimize the serving stack that makes Vikasit Inference fast, reliable, and affordable at India scale — kernels, batching, caching, and the OpenAI-compatible API surface developers love.

What you'll do

  • Optimize throughput and latency across our model fleet (vLLM / SGLang / TensorRT-LLM)
  • Build batching, KV-cache, quantization, and routing for MoE models
  • Own reliability, autoscaling, and cost of the inference platform
  • Keep the OpenAI-compatible API rock-solid for streaming and tool calls

What we're looking for

  • Deep systems engineering (GPU, CUDA-adjacent, or high-performance serving)
  • Experience with an LLM serving framework in production
  • Comfort owning latency, throughput, and cost SLOs

Nice to have

  • CUDA / Triton kernels
  • Quantization (GGUF, AWQ, FP8)
  • Multi-region infra

Sound like you?

We hire for skill over credentials. Tell us why you're a fit — links and projects welcome.

Apply for this role