Skip to content

Models · Voice

vikasit-omni

Full multimodal — text + image + audio in, text + speech out. Real-time.

Parameters
30B (3B active)
Context window
256K
Best for
Full multimodal

Benchmarks

MMLU-Pro61.6%
GPQA Diamond69.6%
AIME 202565.0%
MMMU (val)69.1%

Thinking mode, no tools where applicable. See full comparisons on the benchmarks page.

Coming soon

This model is in development. Want early access? Get in touch.