Introducing Gemma 4 12B: a unified, encoder-free multimodal model
Google's open Gemma 4 12B feeds audio and vision into the model directly, cutting latency and memory versus a separate encoder.
Google's open Gemma 4 12B feeds audio and vision into the model directly, cutting latency and memory versus a separate encoder.