Google Releases Gemini Embedding 2 - A Multimodal Embeddings Model

Embeddings are a very powerful thing. They translate text into 'meaning' and place it in a multidimensional vector space. This allows for many cool things - for example, understanding how close two different sentences are to each other, or performing mathematical operations: thought A, but without thought B (vector A minus vector B).

I actively use this in my projects. My entire anti-spam bot works on analyzing meanings, not keywords - if the meaning of the message is advertising, we delete it, even if it's written in the format '3араб00Т00к'.

A constant limitation was that all of this only worked on text. Google has released a model that translates any object into 'meaning' - text, images, video, audio, and documents into a unified vector space. It supports 100+ languages, up to 8192 tokens of text, up to 6 images, and up to 120 seconds of video.

This allows for all the same things, but now also with video and images. For example, we have seen a lot of spam coming from auto-generated images - now we can catch that too.

It's a technical, but very cool news.

#google #gemini #ai #embeddings #ml

————————— Мысли Рвачева —————————