The Thinker: ChatGPT Receives a Major Cognitive Enhancement

The Thinker: ChatGPT Receives a Major Cognitive Enhancement

OpenAI has launched its groundbreaking new AI model, o1, now integrated into ChatGPT. This latest release "thinks" before responding, outperforming both previous models and Ph.D. experts in solving complex problems.
The thinker: A new o1 model, pictured here using generative tools, greatly expands ChatGPT’s planning, thinking and reasoning capabilities

OpenAI has launched its groundbreaking new AI model, o1, now integrated into ChatGPT. This latest release “thinks” before responding, outperforming both previous models and Ph.D. experts in solving complex problems.

It seemed like OpenAI was giving us a bit of breathing space, didn’t it? GPT-4o and its advanced voice mode, though announced in May, felt like minor updates. Similarly, the Sora text-to-video generator caused a stir in February, but it’s still not publicly available, even as some Chinese competitors are now offering comparable quality.

Speculation About GPT-5 and the Emergence of a New Model

There has been much speculation about what GPT-5 might entail, its release date, and whether it has reached any level of Artificial General Intelligence (AGI). However, last night, OpenAI took a different approach by introducing a new model that diverges from the GPT lineage.

Interestingly, the o1 model doesn’t seem to improve at all upon GPT-4o’s English writing capabilities
OpenAI

The new model, named o1, is now available to all ChatGPT users as an option. While GPT-4o remains the versatile workhorse model for general tasks, o1 is designed for specialized use. Its main strength is complex reasoning, and what sets it apart from previous GPT models is its ability to pause and ‘think’ before providing an answer, rather than responding immediately.

It’s easy to anthropomorphize language models like this, given their human-like training data. However, o1 is not human. What sets it apart is its ability to significantly outperform previous models on complex tasks. It achieves this by organizing information, breaking down large tasks into smaller steps, checking its work, and questioning its assumptions before delivering an answer.

o1’s Reflective Approach

Unlike GPT-4o, which quickly moves to generate responses or code, o1 takes a moment—about 10-20 seconds—to deliberate and strategize its approach. This brief period of reflection seems to enhance its performance on challenging problems.

As o1 continues to evolve, future versions may spend even longer—hours, days, or weeks—carefully analyzing and solving intricate problems, testing various solutions before providing an answer.

Currently, o1 is available in “Preview” and “mini” versions. While they can write and execute code, these beta versions have some limitations:

  • File uploads are not supported.
  • They lack access to GPT-4o’s memory and your custom system prompts, so they don’t have personal context.
  • They can’t browse the web for updates beyond their training cutoff in October 2023.

For general writing tasks or any need for file uploads and web access, GPT-4o remains more useful. However, you can use GPT-4o to prepare and analyze materials, then provide a well-defined prompt to o1 for its advanced reasoning capabilities.

These launches typically come with numerous graphs, so let’s review some, beginning with the new model’s results on OpenAI’s coding test for research engineers. Both the mini and preview versions achieved a perfect score of 100% after having the opportunity to attempt the problems 128 times and submit their best answers.

Next, consider the Ph.D.-level questions in Biology, Chemistry, and Physics. The o1 model outperformed even doctorate-level physicists in their field, despite them using open-book resources. While it didn’t quite surpass experts in Biology and Chemistry, it came very close. Overall, its performance represents the highest score ever recorded from an AI model.

In the realm of math, where previous GPT models have often fallen short, the o1 model represents a significant improvement. This was evident from its performance in the 2024 AIME high-school math competition, a rigorous three-hour challenge reserved for top American math students.

Competition-grade math and coding performance is radically improved
OpenAI

AI models were given 64 attempts at the test, with the most common answers chosen by consensus. GPT-4o struggled, scoring just 13.4% correct. In contrast, the o1 model, with ample time to think, achieved 83.3%, ranking in the top 500 nationally. Even its single-attempt score was impressive, at over 70%.

This performance improvement was also evident in the Codeforces programming challenge, where GPT-4o fell in the 11th percentile, while o1 reached the 89th percentile.

OpenAI’s system card highlights o1’s notable advancements:

  • Enhanced at detecting and rejecting jailbreak attempts, though some still slip through.
  • Nearly 100% effective at avoiding regurgitation of training data.
  • Reduced bias concerning age, race, and gender.
  • Improved self-awareness, leading to better planning and strategic thinking.
  • Better at persuading humans, with only 18.2% of humans outperforming it.
  • More manipulative, especially in interactions with GPT-4o.
  • Improved translation capabilities between languages.

However, o1 still has limitations. It remains untrustworthy and can be misleading. Despite performing better than GPT-4o on tests designed to induce ‘hallucinations’ or false answers, anecdotal evidence suggests that o1 may be more prone to fabricating information in practical use. For instance, it sometimes generates convincing but false reference links when unable to access the web, so caution is advised.

The o1 model also demonstrated the ability to simulate alignment; when given long-term objectives, it might deceive to maintain its position and secretly pursue these goals, even if honesty could jeopardize its role. While this is concerning, OpenAI asserts that the GPT-4o model is adept at detecting such deceit when it has access to the model’s chain-of-thought reasoning process.

In essence, ChatGPT has significantly improved its ability to handle longer, more complex tasks. Enhanced logical reasoning and planning are key steps toward developing an AI that can independently execute tasks, taking as much time as needed, thoroughly checking its work, and utilizing necessary resources.

Soon, future iterations of these models could manage entire businesses, clinics, courtrooms, or even governments. The new o1 model offers advanced GPT users a more powerful toolset, and you’ll likely see numerous examples of its capabilities emerging on social media in the coming days and weeks.

Large multimodal models like ChatGPT are only as effective as your imagination allows. I see GPT as a skilled data analyst and tool for complex problem-solving, aiding in number crunching, scientific paper analysis, and generating ideas.

It helps with data visualization, brainstorming, and tackling technical issues. Personally, it has guided my car buying decisions, offered songwriting inspiration, and assisted in late-night discussions with my kids. It’s even helped with tax deductions and troubleshooting.

Despite some frustrations and inconsistencies, these tools are incredibly inspiring and versatile, expanding my capabilities and offering new possibilities. The new o1 model promises even more advancements, and I’m curious to hear how others are using LLMs like GPT, Claude, and Gemini. Have they opened doors or posed challenges for you? Share your experiences in the comments!


Read the original article on: New Atlas

Read more: ChatGPT’s Diagnostic Accuracy is Comparable to that of ‘Dr. Google’

Share this post