5-ish Things on AI: Fake James Bond Trailer Goes Viral, an Inside Look at Secretive Training Data – CNET

author
12 minutes, 45 seconds Read

One of the biggest issues hanging over generative AI companies has to do with training data. What data (information of all kinds, from words to images to audio) have these companies collected from online and offline sources to feed into their large language models and train them so the chatbots they power can have a natural-language conversation with you? 

Training data is the lifeblood of AI systems — it’s the data “culled from books, Wikipedia articles, news stories and other sources across the internet,” The New York Times’ Cade Metz and Stuart A. Thompson reported recently. “These chatbots learn their skills by analyzing enormous amounts of digital data,” Metz added in a podcast about an investigation he led for the paper.

The thing is, AI companies haven’t shared what’s in their set of training data or how they obtained that information, a fact that’s set off numerous copyright lawsuits, by authors, publishers and others, who say the big developers of gen AI tools have scraped the internet to grab their content without permission or compensation. And that’s why, Reuters reported, AI companies are now talking with copyright holders and quietly inking licensing deals for their content. 

“The data land grab comes as makers of big generative AI ‘foundation’ models face increasing pressure to account for the massive amounts of content they feed into their systems, a process known as ‘training’ that requires intensive computing power and often takes months to complete,” Reuters said.

Against this backdrop comes the investigation by the Times, released this month, which alleges that the biggest tech companies building gen AI engines “bent and broke” their own rules to train their gen AI systems. 

“We found that three major players in this race: OpenAI, Google and Meta — as they were locked into this competition to develop better and better artificial intelligence, they were willing to do almost anything to get their hands on this data, including ignoring, and in some cases, violating corporate rules and wading into a legal gray area as they gathered this data,” Metz said in the podcast. 

(An important note: In December the Times sued OpenAI and Microsoft, alleging they used the paper’s copyrighted material without permission to train their AI systems. OpenAI and Microsoft are trying to have parts of the NYT lawsuit dismissed.)

In late 2021, OpenAI essentially “ran out of data,” Metz said in the podcast. “They had used just about all the respectable English language text on the internet to build this system … Wikipedia articles by the thousands, news articles, Reddit threads, digital books by the millions. We’re talking about hundreds of billions, even trillions, of words.” That, he added, includes copyrighted material.

After digesting printed words, the Times found, OpenAI collected audio files — books, podcasts and as many as 1 million hours of YouTube video — and then converted those files into text and fed the transcripts into its system, “going against YouTube’s terms of service” unnamed sources told the paper.

The Times also examined how Meta and Google trained their systems, and all three companies responded to the paper’s investigation:

“OpenAI said each one of of its A.I. models ‘has a unique data set that we curate to help their understanding of the world and remain globally competitive in research,'” the Times reported. “Google said that its A.I. models ‘are trained on some YouTube content,’ which was allowed under agreements with YouTube creators, and that the company did not use data from office apps outside of an experimental program. Meta said it had ‘made aggressive investments’ to integrate A.I. into its services and had billions of publicly shared images and videos from Instagram and Facebook for training its models.”

If you think this whole discussion is a bit wonky or too insider for you, think again. Copyright issues aside (and that’s a big aside), understanding what’s in the training data being used by these popular AI chatbots is important to understanding what biases or misinformation might be baked into those large language models, or LLMs. Since companies so far haven’t shared what training data they use, legislators are starting to propose bills that what would require AI companies to disclose information about what’s in their training set.

On a different note, if you’re interested in getting CNET’s expert take on AI products already on the market, including reviews of Microsoft’s Copilot, OpenAI’s ChatGPT and Google’s Gemini, check out CNET’s AI Atlas, a new consumer hub that also offers news, how-tos, explainers and other resources to get you up to speed on gen AI. Plus, you can sign up at AI Atlas to get this column via email every week. 

Here are the other doings in AI worth your attention.

Faking it with AI: A James Bond trailer and an AI beauty pageant

You know that saying, “If it sounds too good to be true, it probably is”? This week’s example of that at play involves James Bond, actors Henry Cavill and Margot Robbie, and a little AI deepfakery.

I’m talking about a bogus trailer for a made-up James Bond film starring Cavill as 007 and Robbie as the latest “formidable Bond girl.” Called “Bond 26 – First Trailer” and posted on YouTube, the 90-second teaser features scenes from prior Cavill and Robbie flicks, including “The Man from U.N.C.L.E” and “Focus.” It’s racked up over 2.6 million views in five days, with The Hollywood Reporter saying the views are being “driven by a mix of fans enjoying it as a ‘what if’ effort, along with some being fooled by it.”

“Please note that this video is a concept trailer created solely for artistic and entertainment purposes,” creator KH Studio calls out to viewers in the YouTube notes.

“I have meticulously incorporated various effects, sound design, AI technologies, movie analytics, and other elements to bring my vision to life. Its purpose is purely artistic, aiming to entertain and engage with the YouTube community. My goal is to showcase my creativity and storytelling skills through this trailer. Thank you for your support, and let’s dive into the world of imagination.”

For what it’s worth, the comments suggest a huge appetite for Cavill to succeed Daniel Craig as Bond, James Bond. Cavill played Superman in the DC Comics series of films, as well as The Witcher in Netflix’s popular series. The Hollywood Reporter says there are unconfirmed reports that the latest actor to be considered for the British spy is Aaron Taylor-Johnson, along with Cavill, though he may be “too old” (he’s 40).

[embedded content]

Bogus Bond isn’t the only AI movie fakery that’s garnered attention recently. This summer, tech company TCL is planning to release its first original feature, a short romance movie called “Next Stop Paris,” on its TCLTV Plus streaming platform.

“There’s just one slight hitch: TCL is using generative AI to make original content for its platform, and early signs do not bode well,” Engadget noted after watching the trailer for what it says TCL is calling “the first AI-powered love story.”

I watched the minute-long trailer on YouTube, too (only 120,000 views, so it seems like James Bond wins), and I have to agree with Engadget that, “While it’s not entirely fair to judge a film based on a trailer, the Next Stop Paris clip gives a terrible first impression” of both the flick and TCLTV Plus. “The look of the characters changes throughout … and they project all of the emotion of a pair of dead fish.” Watch and decide for yourself.

[embedded content]

And there’s one more example of AI fakery that you might think is good, bad or just another sign that it’s the end of civilization as we know it: Organizers are working on the first beauty pageant featuring AI-generated contestants competing for the title of “Miss AI.”  

“The competition is the first installment in a program of awards presented by the World AI Creator Awards (WAICA) and its inaugural partner, Fanvue, a subscription platform that hosts AI content,” People reported.

The awards are “dedicated to recognizing the achievements of AI creators around the world,” the WAICA says on its website. It adds that “contestants will be judged on their beauty, tech, and clout” — a reference to the engagement time each attracts. The creator who gets first place will receive $13,000. 

OpenAI is the ‘most funded’ AI company in the world

OpenAI, creator of text-to-image generator Dall-E and the world’s most popular chatbot, ChatGPT, is also “the most funded AI company in the world, with $14 billion raised in funding rounds so far,” according to CB Insights and data presented by Stocklytics. 

“CB Insight’s analysis of the 100 most promising AI startups shows that OpenAI raised over $14 billion in capital through partnerships with Microsoft and other investments, pushing valuation to a whopping $80 billion,” reads a post on the Stocklytics site. “This figure is even more impressive when compared to the capital raised by other most-funded AI companies. Statistics show that OpenAI alone raised more money than the seven other companies on the list, including Anthropic, Databricks, and Shield AI.”

No. 2 on the CB Insights–Stocklytics list is Anthropic, the creator of Claude, with $4.2 billion in funding. Big-data analysis platform Databricks ranked No. 3, with $4 billion, and Shield AI took fourth place, with $1 billion in funding. All the other most-funded AI companies have so far raised less $1 billion dollars, CB Insights said.

Its $80 billion valuation makes OpenAI the third most-valuable unicorn (a unicorn is any startup with a valuation of over $1 billion), behind TikTok owner ByteDance (which is valued around $268 billion) and Elon Musk’s SpaceX (valued at $180 billion as of December). In 2023, there were 95 companies on the global unicorn list — with 20% of them from the AI industry, CB Insights added. 

Google’s $100 billion AI bet as it consolidates its teams

While I’m talking about AI and money, it’s worth noting that Google’s AI chief, Demis Hassabis, who runs the company’s DeepMind research arm, said this month that he expects the company to spend more than $100 billion to develop its AI technology.

Hassabis was responding to a question about Stargate, a US-based data center for AI that’s reportedly being built by OpenAI and Microsoft what would house “a supercomputer made up of millions of AI chips and cost up to $100 billion,” Quartz said, citing Bloomberg and others who’ve been speculating on Stargate. 

Hassabis’ remark came just a few days before Google announced that all its AI teams will now report to DeepMind and Hassabis, according to an April 18 memo to the company written by Google CEO Sundar Pichai.

Pichai, in a memo called “Building for our AI future” and posted to Google’s blog, said the change is aimed at simplifying “our structure” and improvising “velocity and execution.” That language is generally considered code for, “We need to move faster to beat our competitors.” 

In the memo, Pichai also made a reference to the recent firing, noted by CNN, of 28 Google employees who criticized the company’s contract for cloud technology with Israel by protesting in its offices.

Google needs “to be more focused in how we work, collaborate, discuss and even disagree,” Pichai wrote. “We have a culture of vibrant, open discussion that enables us to create amazing products and turn great ideas into action. That’s important to preserve. But ultimately we are a workplace and our policies and expectations are clear: this is a business, and not a place to act in a way that disrupts coworkers or makes them feel unsafe, to attempt to use the company as a personal platform, or to fight over disruptive issues or debate politics.” 

Expert vs. AI: What’s the future of phones?

In our new short-video series pitting CNET’s expert reviewers against ChatGPT 3.5, Mobile Editor Patrick Holland asks about the future of smartphones. Holland believes the future of smartphones includes having these devices “become our personal platform for a truly smart personal assistant that can predict our needs, what we want it to do and be far more helpful than any phone today.” 

ChatGPT also offers up some thoughts on AI, applications and sustainability, but Holland notes that ChatGPT “does feel like it’s offering a lot of jargon,” with the AI sounding more like what you might read in a phone maker’s press release. Gotta say, I agree. 

By the way, Holland also took on ChatGPT in regard to foldable phones and whether they’re worth buying now — in case you’re in the market for a new phone.

Stanford AI Index says AI beats humans on some tasks, but not all

Stanford University released the seventh edition of its AI Index Report, and while it may seem daunting at 502 pages, researchers have summarized the top 10 takeaways (starting on page five) and they’re worth a read. 

You’ll find data points on the challenges of AI, including an increase in AI awareness — and AI-related nervousness — in people around the world. There’s also an assessment that the top AI makers, including OpenAI, Google and Anthropic, need to do a better job of reporting on the risks of their systems. But there are some good things happening too. 

Here are my top three takeaways from their findings:

Humans matter
“AI beats humans on some tasks, but not on all. AI has surpassed human performance on several benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.”

AI can be your copilot
“The data is in: AI makes workers more productive and leads to higher quality work. In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge the skill gap between low- and high-skilled workers. Still, other studies caution that using AI without proper oversight can lead to diminished performance.”

The US is ahead in AI innovation
“The United States leads China, the EU, and the U.K. as the leading source of top AI models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European Union’s 21 and China’s 15.”

In other research news worth noting, check out a succinct, seven-page paper on “The Recessionary Pressures of Generative AI: A Threat to Wellbeing.” It examines how gen AI will affect the economy, job prospects and ultimately societal well-being and offers suggestions for the policies we should consider to mitigate the negative effects of gen AI.

TL;DR: Let’s remember how to take care of humans as we move ahead into a world with gen AI, so people aren’t left behind.

“Governments need to act now to ensure that the march of innovation does not trample the livelihoods of the people; the backbone of our economy and prosperity of our nations,” said Jo-An Occhipinti, a professor at the University of Sydney who helped author the paper. “Secure, quality employment is the bedrock of societal strength, providing not just economic stability but also a source of shared purpose, connectedness, and psychological fulfillment that are important to our mental health and collective wellbeing.”   

Editors’ note: CNET used an AI engine to help create several dozen stories, which are labeled accordingly. The note you’re reading is attached to articles that deal substantively with the topic of AI but are created entirely by our expert editors and writers. For more, see our AI policy.

This post was originally published on this site

Similar Posts