Google Launches Its Multimodal GPT - Google Gemini

⚡️ Google has launched Gemini, a new family of multimodal AI models developed by the Google DeepMind team, Google Research, and other specialists at Google. Gemini has demonstrated outstanding capabilities in processing images, audio, video, and text. The most powerful model in this family, Gemini Ultra, sets new standards in 30 out of 32 tests, including text and reasoning, image understanding, video, and speech recognition. Gemini Ultra was the first to reach expert-level performance on MMLU across 57 subjects, achieving results above 90%. The model also set a new record with 62.4% on MMMU, surpassing the previous best model by more than 5 percentage points.

Gemini offers a wide range of capabilities, from education to various applications. For example, the model can read illegible handwriting, understand formulated tasks, translate them into mathematical equations, identify errors, and suggest correct solutions. Gemini is integrated into several Google products, including Google Bard and will soon be available via API on Google AI Studio and Google Cloud Vertex AI.

According to tests, Google Gemini outperforms OpenAI's ChatGPT 4.

📝 Main blog post: https://blog.google/technology/ai/google-gemini-ai
📚 Technical report: https://deepmind.google/gemini/gemini_1_report.pdf
🧪 Website: https://deepmind.google/gemini
🧪 Demo: https://bard.google.com

МР.