Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google's open Gemma 4 12B feeds audio and vision into the model directly, cutting latency and memory versus a separate encoder.

LLM InfraMulti-Model
Read original on Google DeepMind