Models · Voice
vikasit-omni
Full multimodal — text + image + audio in, text + speech out. Real-time.
Parameters
30B (3B active)
Context window
256K
Best for
Full multimodal
Benchmarks
MMLU-Pro61.6%
GPQA Diamond69.6%
AIME 202565.0%
MMMU (val)69.1%
Thinking mode, no tools where applicable. See full comparisons on the benchmarks page.
Coming soon
This model is in development. Want early access? Get in touch.