Name: Vikasit AI
Price: 10 USD

About the role

You'll build and optimize the serving stack that makes Vikasit Inference fast, reliable, and affordable at India scale — kernels, batching, caching, and the OpenAI-compatible API surface developers love.

What you'll do

—Optimize throughput and latency across our model fleet (vLLM / SGLang / TensorRT-LLM)
—Build batching, KV-cache, quantization, and routing for MoE models
—Own reliability, autoscaling, and cost of the inference platform
—Keep the OpenAI-compatible API rock-solid for streaming and tool calls

What we're looking for

—Deep systems engineering (GPU, CUDA-adjacent, or high-performance serving)
—Experience with an LLM serving framework in production
—Comfort owning latency, throughput, and cost SLOs

Nice to have

—CUDA / Triton kernels
—Quantization (GGUF, AWQ, FP8)
—Multi-region infra

Sound like you?

We hire for skill over credentials. Tell us why you're a fit — links and projects welcome.

Apply for this role