I demoed Google's Project Astra and it felt like the future of generative AI (until it didn't)

Kerry Wan/ZDNET

As I waited through a queue of journalists and walked into the small demo room, my eyes were glued to a wall-mounted monitor and the Pixel 8 Pro in one of two Google product experts’ hands. The pre-recorded showcase of Project Astra, featured during the company’s I/O keynote an hour earlier, was well received — and a hard act to follow. Now, with my phone stashed in my breast pocket, the real-world demo was about to begin.

Also: Google Glass vs. Project Astra: Sergey Brin on AI wearables and his top use case

Project Astra is the brainchild of Google DeepMind; the company’s vision of a multimodal, super-charged AI assistant that can process visual information, show reasoning, and remember what it’s been told or shown. It won’t be as readily available as the new Gemini features coming to Android devices, but the end goal, at least for now, is to embed the technology into phones and possibly wearables, becoming an everyday assistant for everything we do.

For the demo, I was presented with four use cases: Storyteller, Pictionary, Alliteration, and Free-form. They’re all fairly self-explanatory and nothing existing generative AI models can’t do, but the depth, speed, and adaptability of answers are where Project Astra truly shined. 

First, I placed a pepper on Astra’s camera feed and asked it to create an alliteration. “Golden groupings gleam gloriously,” it responded confidently, though incorrect. “Wait, it’s a pepper,” I told Astra. “Perhaps polished peppers pose peacefully.” Much better.

Also: 9 biggest announcements at Google I/O 2024: Gemini, Search, Project Astra, and more

I then added a toy ice cream cone and banana into the mix and asked Astra if they would make for a good lunch. “Perhaps packing protein provides pep,” it suggested, understanding the imbalance of nutrition among the three foods and, to my surprise, sticking with alliterations. Astra’s answers were relatively fast, by the way, enough to discourage me from pulling out my Rabbit R1 to compare.

Perhaps more notable was how natural the AI sounded — sharing a similar tone as OpenAI’s GPT4-o — as I panned the Pixel 8 Pro camera around and asked random questions about various objects in the room. The natural-sounding voice goes hand in hand with the Storyteller and Pictionary capabilities, both of which keep children, students, and people who have time to spare entertained.

Also: The best AI chatbots of 2024: ChatGPT and alternatives

One issue I encountered during my roughly five-minute demo was how Astra would frequently pause mid-response, possibly interpreting the sounds of external chatter and the nearby soccer activation (where Google demoed how its AI could judge your kicking form) as me interrupting it. The ability to interrupt a voice assistant is the latest step to achieving more natural conversations. 

However, in this case, the high sensitivity of the head-worn microphone on one of the staff members may have worked against the demo. That leads me to believe that in more bustling environments, like when I’m navigating through the NYC subway or at a trade show, communicating with Astra may be more difficult than talking to an actual person beside me.

Also: Generative AI can transform customer experiences. But only if you focus on other areas first

The other issue with Project Astra is its memory capabilities. At the moment, the AI only remembers and tracks the location of objects shown to it within the chat session (only a few minutes). While the AI was able to recall that I had placed my phone in the breast pocket of my jacket at the start of the demo, theoretically, it wouldn’t be able to tell me where I left the TV remote the night before — when such a feature would be most beneficial.

One of the researchers told me that extending the memory capacity of Astra — which runs on the cloud and not on-device — is certainly possible. The tradeoff for such a performance feat would likely be battery life, especially if the goal is to fit the technology within a wearable as thin and lightweight as glasses

Ultimately, Google DeepMind gave me a strong vision of what the future of AI interactions could look like. They just have some wrinkles that need to be smoothed out before I’m ready to introduce another voice assistant into my life.

Source link

Leave a comment