EN / RU / 🤖
← Back to essays
· Essay · 1 min

Google Releases Gemini Embedding 2 - A Multimodal Embeddings Model

Google has released a model that translates text, images, video, and audio into a unified vector space.

<p>Embeddings are a very powerful thing. They translate text into 'meaning' and place it in a multidimensional vector space. This allows for many cool things - for example, understanding how close two different sentences are to each other, or performing mathematical operations: thought A, but without thought B (vector A minus vector B).</p>

<p>I actively use this in my projects. My entire anti-spam bot works on analyzing meanings, not keywords - if the meaning of the message is advertising, we delete it, even if it's written in the format '3араб00Т00к'.</p>

<p>A constant limitation was that all of this only worked on text. Google has released a model that translates any object into 'meaning' - text, images, video, audio, and documents into a unified vector space. It supports 100+ languages, up to 8192 tokens of text, up to 6 images, and up to 120 seconds of video.</p>

<p>This allows for all the same things, but now also with video and images. For example, we have seen a lot of spam coming from auto-generated images - now we can catch that too.</p>

<p>It's a technical, but very cool news.</p>

<p>#google #gemini #ai #embeddings #ml</p>

<p>—————————<br>Мысли Рвачева<br>—————————</p>