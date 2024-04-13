Advertisement

Next iteration of Elon Musk's AI will prioritize processing "real-world" images. Grok-1.5 will soon be available to testers and existing product customers.

Grok-1.5 vision by Elon Musk: Focus on real-world spatial understanding

The hotly-anticipated Grok-1.5 release of Elon Musk's AI chatbot will be focused on working with visual information: documents, diagrams, charts, screenshots and photos. Such ambitious goals were shared in the "Grok-1.5 Vision Preview" announcement by Elon Musk on X today, April 13, 2024.

As announced in the document, the new version of the chatbot will be equipped with a powerful image processing module for understanding real-world events and processes dubbed RealWorldQA:

We are particularly excited about Grok’s capabilities in understanding our physical world

As covered by U.Today previously, earlier Elon Musk was stating that Grok 1.5 will be good at reading and summarizing X posts and even helping X users in creating them.

The initial release of RealWorldQA consists of over 700 images, with a question and an easily verifiable answer for each image. The dataset is fully open-sourced and available to enthusiasts under the CC BY-ND 4.0 type of license.

Grok-1.5V to outperform GPT4 and Gemini Pro 1.5: Data

Largely, the pioneering dataset consists of anonymized images taken from vehicles, in addition to other real-world images.

In a series of attached samples, Grok-1.5 transforms a block scheme into Python code, produces a bedtime story based on a child's painting, creates a CSV dataset from a screenshot, "expands" a meme and so on.

Also, the xAI team shared an estimation of Grok-1.5's performance compared to its main rivals, OpenAI's GPT, Google's Gemini Pro 1.5 and Claude 3 by Anthropic.

Grok-1.5 outperformed all competitors in math tasks, text reading and real-world understanding, xAI's report says.