OpenAI Used Over 1 million Hours of YouTube data : Report

Today, we will talk about AI models like ChatGPT and Bard. These models learn from a lot of information such as articles, videos, and audio recordings. But where do they find this information? Sometimes, they might get it from you. It’s your information. A recent report mentioned that big Tech companies are teaching AI models using many YouTube videos. Even though YouTube’s rules say this is not right, the report shows that companies are finding ways to work around the rules. Does this mean creators’ rights are being ignored? Can you take legal action against these companies for using your content? Laws about copyrights all over the world might need to be changed.

Rise of AI Models and Their Data Sources

Our new report talks about whether the best AI models are following the rules set by the people who make things. Some people have wondered about this question before. So far, we didn’t have a clear answer, but a recent report has given us some new information. According to this report, big Tech companies used videos from YouTube to help their AI models learn by using the written version of many videos. Let’s take OpenAI as an example.

The report mentions that the creators of ChatGPT needed examples to teach their model better. They wanted more examples to teach different models, so they created a tool called Whisper. This tool could write down the words from millions of YouTube videos. They used this tool to teach ChatGPT-4, their most advanced language model.


openAI youtube

Allegations: OpenAI and Google in Spotlight

OpenAI and Google have been accused of using YouTube videos to make their artificial intelligence better. Google also got in trouble for this. Google says it’s not okay to take or save YouTube videos without asking first. The CEO of YouTube is concerned that people are using the platform to teach AI. Some people believe OpenAI used YouTube videos to create their Sora model. Google didn’t punish OpenAI because they were doing the same thing. Google was aware of what OpenAI was doing but didn’t take action because they were also using YouTube videos.

The Copyright Dilemma: AI and Data Usage

What does Google think about that? Google says it taught some of its computer programs using videos from YouTube, but only after asking the people who made the videos for permission. This brings up the topic of AI programs and copyrights. AI programs require a large amount of information to learn, but some individuals are not pleased about the material being utilized. Businesses in the media industry have taken legal action against these companies for violating copyright laws.

Copyright is a special rule that protects things made by people. It gives the person who made something the right to decide what happens with it. When it comes to computers, they need to copy things to work, so copyright rules are very important. Fair use is another rule that says some types of copying are okay, but it’s not the same for everything. What’s fair in one place might be wrong in another. With artificial intelligence, things get even trickier.

Some companies say AI can’t work without data, while others think they should pay for that data. This might mean that the rules about copyright need to change. It might not happen soon, but as AI gets more common and needs more data, companies will need to figure out what they can use and what they can’t.

In conclusion,

The use of AI, how data is used, and copyright laws create a tricky problem for big tech companies and people who make content. As technology gets better, it’s important to find a way to be creative while still respecting people’s rights to their ideas. Some people argue that AI might be doing the wrong thing by using content without permission, which brings up questions about how data is used and the rules that should be in place for the digital world.
Read More: