Google’s Gemini AI Beats GPT & Human Experts Across 57 Subjects

Google’s Gemini AI Beats GPT & Human Experts Across 57 Subjects

Google has introduced its impressive next-generation Gemini AI, asserting its superiority over OpenAI's GPT-4 and human experts in nearly all significant evaluations. Gemini AI demonstrates proficiency in comprehending images, video, audio, text, and code, with plans to acquire additional senses in the future.
Google’s Gemini AI represents the next step-change in a wildly accelerating field
Google

Google has introduced its impressive next-generation Gemini AI, asserting its superiority over OpenAI’s GPT-4 and human experts in nearly all significant evaluations. Gemini AI demonstrates proficiency in comprehending images, video, audio, text, and code, with plans to acquire additional senses in the future.

Scoring 90.0% on the Massive Multitask Language Understanding (MMLU) test, Gemini AI becomes the inaugural model to surpass human experts (89.8%) and outperform GPT-4 (86.4%) in diverse knowledge and problem-solving tasks spanning 57 subjects, encompassing areas such as math, physics, history, law, medicine, and ethics. It’s worth noting that these experts are not representative of the average human.

Gemini’s Training Diversity and Nuanced Comprehension

In fact, Gemini is inherently multimodal, meaning its initial training dataset included a substantial amount of diverse media beyond text. Consequently, it exhibits proficiency in comprehending visual and auditory information as effectively as it does with text. In contrast to other language models that often interpret video and images primarily in textual terms, Gemini preserves the full tone and nuance of the original video, audio, and image sources.

While the video below serves as a polished product demo and should be viewed with a degree of skepticism, it provides a valuable glimpse into the practical implications of Gemini’s true multimodal capabilities.

Hands-on with Gemini: Interacting with multimodal AI

What’s the key takeaway? AIs are undergoing training with increasingly extensive sensory datasets to emulate the learning processes employed by humans in interacting with their surroundings. With enhanced visual and auditory comprehension, Gemini advances in perception and reasoning. Once integrated into Google devices, starting with the upcoming Pixel phones, it will be capable of assisting with various daily tasks.

According to Google Deepmind CEO Demis Hassabis, this progression is poised to extend into the next logical sensory dimension: touch and tactile feedback. While Google is already a prominent player in AI robotics, embedding a highly knowledgeable model like Gemini with the ability to comprehend the world through touch will propel robotics, both humanoid and otherwise, into unexplored territories.

Gemini’s Proficiency in Generating Code for Meta-Knowledge from Vast Datasets

Multimodality is just one notable feature among many, but akin to GPT-4, Gemini is an all-encompassing tool, making it challenging to pinpoint where to begin. Perhaps its potential contributions to science are worth highlighting? In the showcased video, Deepmind scientists illustrate how Gemini has the capacity to generate its own code for reading and comprehending 200,000 scientific studies. It filters the studies for relevance using its intrinsic reasoning capabilities, compiles data, and effectively generates new meta-knowledge. The team claims to have accomplished this during their lunch break, emphasizing its applicability to other domains such as law, where extensive datasets require thorough examination.

Gemini: Unlocking insights in scientific literature

Regarding coding, Gemini exhibits proficiency in Python, Java, C++, and Go programming languages. Google is already showcasing its ability to create websites that dynamically generate code based on user interactions, adapting to users’ needs as they become apparent. This marks a novel approach to the internet, where a single page evolves to meet your requirements once it discerns them.

Gemini’s Extraordinary Power in Creating Dynamic Graphical User Interfaces for Daily Tasks

The demonstration video focuses on a relatively straightforward scenario—planning a child’s birthday party. However, it exemplifies the remarkable capabilities Gemini possesses, envisioning how it could generate graphical user interfaces for almost any conceivable task. This is a unique capability achievable only through AI, akin to having a web app programmer working alongside you but with the ability to operate at a significantly accelerated pace.

Like any AI tool, Gemini is highly interactive. If it doesn’t precisely deliver what you want, you can communicate your preferences, and it will adjust itself accordingly or engage in a conversation to determine the best course of action. This showcases the transformative shift in our interactions with technology.

Gemini: Reasoning about user intent to generate bespoke experiences

In coding, Deepmind’s AlphaCode 2 project involves training various Gemini models for distinct aspects of the programming process. The initiative deploys a swarm of programming agents to generate up to a million code snippets to solve a problem. A separate Gemini model evaluates these samples, discarding around 95% based on compilation and effectiveness.

AlphaCode 2’s Coding Triumph

Another Gemini model develops a code-testing framework, conducts thorough testing, and ranks the remaining code samples for correctness. Deepmind effectively transformed Gemini into a multifunctional software team, excelling in a coding competition where it outperformed 87% of participants, positioning it between the ‘Expert’ and ‘Candidate Master’ categories on Codeforces. This success underscores the need for exceptional rational understanding and creative use of software tools in such competitions.

Gemini: Excelling at competitive programming

AlphaCode 2, despite its impressive performance, is not expected to be immediately accessible to the public, and its current form is unlikely to be released due to the extensive computing power needed for generating a million code snippets. Although the success rate remains consistent at a million snippets and could potentially improve further with billions or trillions of snippets, the current approach is inefficient. Nonetheless, the rapid progress in this field indicates the likelihood of a more efficient method emerging soon.

Gemini’s Varied Offerings

To conclude, deepmind is considering a streamlined version of AlphaCode 2 for public release. However, Google is set to launch Gemini in three sizes: Gemini Nano for mobile devices, Gemini Pro comparable to GPT-3.5 for various tasks, and Gemini Ultra, the largest model surpassing GPT-4 in benchmark tests. Gemini Ultra is expected to launch publicly next year after safety evaluations. Gemini Nano is already on Pixel 8 Pro, and Gemini Pro is available for free through Google Bard. Google plans to integrate Gemini into its products, signaling ongoing developments.

Gemini: Google’s newest and most capable AI model

Read the original article on: New Atlas

Read more: Video-to-Sound Tech Helps Visually Impaired Recognize Faces

Share this post