Tag: Human Experts

  • AI Generates More Innovative Research Ideas than Human Experts

    AI Generates More Innovative Research Ideas than Human Experts

    The first statistically significant findings are in: Large Language Model (LLM) AIs are not only capable of generating expert-level scientific research ideas, but their ideas are also deemed more innovative and exciting than those of human experts, according to expert evaluations.
    Is it possible that AI , pictured here using generative tools, might be better at coming up with new ideas than humans?

    The first statistically significant findings are in: Large Language Model (LLM) AIs are not only capable of generating expert-level scientific research ideas, but their ideas are also deemed more innovative and exciting than those of human experts, according to expert evaluations.

    Recent advancements in LLMs have sparked enthusiasm among researchers about their potential to transform scientific discovery. Models like ChatGPT and Anthropic’s Claude have demonstrated the ability to autonomously produce and validate new research concepts.

    This was one of the many tasks people believed AI could never surpass humans in—generating new knowledge and making scientific discoveries, rather than simply combining existing information from their training data.

    Multimodal AI: A New Frontier in Original Research

    Yet, much like artistic creation, music composition, coding, and even understanding subtext or body language, today’s multimodal AIs seem capable of producing truly original research—often more innovative than human-generated ideas.

    Until recently, no research had explored this potential. In a new study, over 100 natural language processing (NLP) experts (PhDs and postdocs from 36 respected institutions) competed against AI-driven “ideation agents.” Their research ideas were judged on originality, excitement, and feasibility by human experts.

    Natural Language Processing (NLP) is a branch of AI focused on facilitating communication between humans and machines, allowing both to “understand” each other—not just in terms of basic syntax, but also with regard to nuance, tone, and emotional inflection.

    In the study, 49 human experts developed ideas on seven NLP topics, while an AI model trained by researchers generated ideas on the same subjects. To motivate the participants, the study offered $300 per idea and a $1,000 bonus for the top five human-generated concepts, encouraging clear, actionable proposals.

    Standardizing Submissions

    After all entries were submitted, an LLM was employed to standardize the writing style of each submission while maintaining the original content. This approach aimed to create a more level playing field and ensure the study remained as blind as possible.

    A panel of 79 human experts reviewed all the submissions in a blind assessment, providing a total of 298 evaluations, with each idea receiving between two and four independent reviews. The results showed that AI-generated ideas outperformed those of human researchers in terms of novelty and excitement. Although AIs ranked slightly lower in feasibility and slightly higher in effectiveness compared to humans, these differences were not statistically significant.

    An overall look at how human papers scored against LLM generated ideas
    Chenglei Si

    Identifying Limitations

    The study revealed some weaknesses in LLMs, such as a lack of idea diversity and difficulties with self-assessment. Despite being instructed not to repeat itself, the LLM often did so. Additionally, LLMs struggled with consistency when reviewing and scoring ideas, showing low agreement with human judgments.

    The study also notes that judging the “originality” of an idea is subjective, even among experts. To further explore whether LLMs are truly better suited for autonomous scientific discovery, the researchers plan to involve more expert participants in a follow-up study. This time, the ideas from both AI and humans will be fully developed into projects to assess their real-world impact.

    The Unreliability of Advanced Language Models

    These initial findings are certainly eye-opening. Humanity now faces an unusual challenge from highly capable language model AIs. While these models can accomplish remarkable feats, they remain unreliable and prone to what AI companies term “hallucinations“—or what others might call fabrications.

    Though AIs can handle vast amounts of paperwork, the scientific method requires rigor, and there’s no place for “hallucinations.” It’s already concerning that estimates suggest AIs are co-authoring at least 10% of research papers.

    On the flip side, we can’t ignore AI’s potential to accelerate progress, as seen with DeepMind’s GNoME system, which condensed 800 years’ worth of materials discovery into months, producing recipes for 380,000 new inorganic crystals with potential for revolutionary applications.

    As the fastest-evolving technology, many of AI’s current flaws could be fixed in the coming years. Some researchers even believe we’re nearing general superintelligence, where AIs will surpass expert knowledge in most fields.

    Watching AIs quickly master skills we once thought defined human uniqueness, including the generation of novel ideas, is a strange experience. Human ingenuity seems to be edging humans out, but for now, the best path forward is a partnership between organic and artificial intelligence, as long as we align our goals.

    If this were a competition, it would be AI: 1, humans: 0 for this round.


    Read the original article on: New Atlas

    Read more: Who’d Have Thought Robotic Bee Swarms Could be so Captivating?

  • Google’s Gemini AI Beats GPT & Human Experts Across 57 Subjects

    Google’s Gemini AI Beats GPT & Human Experts Across 57 Subjects

    Google has introduced its impressive next-generation Gemini AI, asserting its superiority over OpenAI's GPT-4 and human experts in nearly all significant evaluations. Gemini AI demonstrates proficiency in comprehending images, video, audio, text, and code, with plans to acquire additional senses in the future.
    Google’s Gemini AI represents the next step-change in a wildly accelerating field
    Google

    Google has introduced its impressive next-generation Gemini AI, asserting its superiority over OpenAI’s GPT-4 and human experts in nearly all significant evaluations. Gemini AI demonstrates proficiency in comprehending images, video, audio, text, and code, with plans to acquire additional senses in the future.

    Scoring 90.0% on the Massive Multitask Language Understanding (MMLU) test, Gemini AI becomes the inaugural model to surpass human experts (89.8%) and outperform GPT-4 (86.4%) in diverse knowledge and problem-solving tasks spanning 57 subjects, encompassing areas such as math, physics, history, law, medicine, and ethics. It’s worth noting that these experts are not representative of the average human.

    Gemini’s Training Diversity and Nuanced Comprehension

    In fact, Gemini is inherently multimodal, meaning its initial training dataset included a substantial amount of diverse media beyond text. Consequently, it exhibits proficiency in comprehending visual and auditory information as effectively as it does with text. In contrast to other language models that often interpret video and images primarily in textual terms, Gemini preserves the full tone and nuance of the original video, audio, and image sources.

    While the video below serves as a polished product demo and should be viewed with a degree of skepticism, it provides a valuable glimpse into the practical implications of Gemini’s true multimodal capabilities.

    Hands-on with Gemini: Interacting with multimodal AI

    What’s the key takeaway? AIs are undergoing training with increasingly extensive sensory datasets to emulate the learning processes employed by humans in interacting with their surroundings. With enhanced visual and auditory comprehension, Gemini advances in perception and reasoning. Once integrated into Google devices, starting with the upcoming Pixel phones, it will be capable of assisting with various daily tasks.

    According to Google Deepmind CEO Demis Hassabis, this progression is poised to extend into the next logical sensory dimension: touch and tactile feedback. While Google is already a prominent player in AI robotics, embedding a highly knowledgeable model like Gemini with the ability to comprehend the world through touch will propel robotics, both humanoid and otherwise, into unexplored territories.

    Gemini’s Proficiency in Generating Code for Meta-Knowledge from Vast Datasets

    Multimodality is just one notable feature among many, but akin to GPT-4, Gemini is an all-encompassing tool, making it challenging to pinpoint where to begin. Perhaps its potential contributions to science are worth highlighting? In the showcased video, Deepmind scientists illustrate how Gemini has the capacity to generate its own code for reading and comprehending 200,000 scientific studies. It filters the studies for relevance using its intrinsic reasoning capabilities, compiles data, and effectively generates new meta-knowledge. The team claims to have accomplished this during their lunch break, emphasizing its applicability to other domains such as law, where extensive datasets require thorough examination.

    Gemini: Unlocking insights in scientific literature

    Regarding coding, Gemini exhibits proficiency in Python, Java, C++, and Go programming languages. Google is already showcasing its ability to create websites that dynamically generate code based on user interactions, adapting to users’ needs as they become apparent. This marks a novel approach to the internet, where a single page evolves to meet your requirements once it discerns them.

    Gemini’s Extraordinary Power in Creating Dynamic Graphical User Interfaces for Daily Tasks

    The demonstration video focuses on a relatively straightforward scenario—planning a child’s birthday party. However, it exemplifies the remarkable capabilities Gemini possesses, envisioning how it could generate graphical user interfaces for almost any conceivable task. This is a unique capability achievable only through AI, akin to having a web app programmer working alongside you but with the ability to operate at a significantly accelerated pace.

    Like any AI tool, Gemini is highly interactive. If it doesn’t precisely deliver what you want, you can communicate your preferences, and it will adjust itself accordingly or engage in a conversation to determine the best course of action. This showcases the transformative shift in our interactions with technology.

    Gemini: Reasoning about user intent to generate bespoke experiences

    In coding, Deepmind’s AlphaCode 2 project involves training various Gemini models for distinct aspects of the programming process. The initiative deploys a swarm of programming agents to generate up to a million code snippets to solve a problem. A separate Gemini model evaluates these samples, discarding around 95% based on compilation and effectiveness.

    AlphaCode 2’s Coding Triumph

    Another Gemini model develops a code-testing framework, conducts thorough testing, and ranks the remaining code samples for correctness. Deepmind effectively transformed Gemini into a multifunctional software team, excelling in a coding competition where it outperformed 87% of participants, positioning it between the ‘Expert’ and ‘Candidate Master’ categories on Codeforces. This success underscores the need for exceptional rational understanding and creative use of software tools in such competitions.

    Gemini: Excelling at competitive programming

    AlphaCode 2, despite its impressive performance, is not expected to be immediately accessible to the public, and its current form is unlikely to be released due to the extensive computing power needed for generating a million code snippets. Although the success rate remains consistent at a million snippets and could potentially improve further with billions or trillions of snippets, the current approach is inefficient. Nonetheless, the rapid progress in this field indicates the likelihood of a more efficient method emerging soon.

    Gemini’s Varied Offerings

    To conclude, deepmind is considering a streamlined version of AlphaCode 2 for public release. However, Google is set to launch Gemini in three sizes: Gemini Nano for mobile devices, Gemini Pro comparable to GPT-3.5 for various tasks, and Gemini Ultra, the largest model surpassing GPT-4 in benchmark tests. Gemini Ultra is expected to launch publicly next year after safety evaluations. Gemini Nano is already on Pixel 8 Pro, and Gemini Pro is available for free through Google Bard. Google plans to integrate Gemini into its products, signaling ongoing developments.

    Gemini: Google’s newest and most capable AI model

    Read the original article on: New Atlas

    Read more: Video-to-Sound Tech Helps Visually Impaired Recognize Faces