En cliquant sur « Accepter tous les cookies », vous acceptez le stockage de cookies sur votre appareil afin d'améliorer la navigation sur le site, d'analyser l'utilisation du site et de contribuer à nos efforts de marketing. Voir notre Politique de confidentialité pour de plus amples renseignements.
Solutions
February 20, 2025

NBCUniversal + Coactive: A discussion on how multimodal AI is unlocking visual content discovery in media & entertainment

An overview of how a leading media & entertainment company is thinking about AI.
Image of a man looking at his phone with a text overlay that says "How Media & Entertainment companies are unlocking video content with multimodal AI"

On January 16th, Kevin Hill, General Manager of Media & Entertainment at Coactive AI joined Augusto Morena-Beltran, SVP Corporate Decision Sciences at NBCUniversal for a conversation about the impact of multimodal AI in media & entertainment. This blog post summarizes their conversation.

Media & entertainment companies are accustomed to dealing with massive volumes of visual content. These challenges show up throughout the media supply chain – from production to personalization and experience to measuring the impact and performance of a given asset.

A diagram of a contentDescription automatically generated with medium confidence

Every use case requires an understanding of the “content DNA” – Who are the people on screen? What is the emotion? Are there any advertisers being shown? What sport is being played? Where is it happening? In what location? Traditionally, this has required tagging content with metadata. 

For example, let’s say you’re working on content optimization for a talk show. You need to understand what’s on the screen at what time and match that to audience ratings to understand the segments that are resonating. For a major sporting event, the ability to turn around tagging and content analysis quickly is critically important. For an ad engine, you again need to know what the audience was seeing and how it performed in terms of memorability and brand messaging. The number of use cases is unlimited, and while media & entertainment companies have been finding ways to solve these challenges, they are often left working with incomplete metadata.

This brings us to the advent of AI.

Why AI now? The convergence of technology and need

The recent surge of AI in media & entertainment isn't merely a trend; it's a response to a pressing need. Powerful new models and technology, coupled with the growing urgency of content challenges, have created a tipping point. These breakthroughs are making it possible to go beyond both manual, human approaches as well as machine learning approaches. In the session, our speakers outlined four big changes in AI that make it possible to solve the challenges of visual content discovery and management: 

  1. Breakthrough AI models: Advancements in LLMs and multimodal AI are unlocking capabilities beyond traditional human and ML-based approaches
  2. Adaptive Learning & Iteration: AI is enabling continuous improvement and the ability to solve use case expansion
  3. Democratized Access: Shift to the cloud and lower costs are making powerful AI accessible to all businesses
  4. Real-Time Scalability: Analyze massive content libraries with unprecedented efficiency.

These capabilities together reveal a path to the holy grail: moving beyond basic metadata tagging to truly understand the DNA of your content.

Choosing an AI partner

When it comes to choosing an AI partner to accelerate visual content discovery, our speakers discussed three important criteria:

  • Infinite customization: Media companies don’t have the luxury of a single use case. They need a platform that can work with content across all content genres. They need models that can be tuned with domain-specific context, and finally, they are often looking for an interface where non-technical business users can further customize metadata and labels. 
  • Scalability and Cost: With millions of minutes of content to analyze, a scalable and cost-effective solution is a requirement. AI infrastructure is expensive to deploy and maintain, is that something your organization needs to take on?

  • Fast Adaptation: Media & entertainment companies have a wide range of use cases across sporting events, news, and always-on recommendation engines. The AI foundation needs to be able to adapt to all these different use cases while also being able to keep up with the rapidly changing landscape of AI.  
A diagram of a company's companyDescription automatically generated with medium confidence

Getting started with multimodal AI

Our speakers came back often to the wide array of use cases in media & entertainment. This is an industry where content is the product. Companies have decades of footage of important moments, shows that were on-air for years, and archives of breaking news. If you’re able to understand the DNA of that content, what becomes possible? 

Our speakers discussed a few exciting use cases:

  • Unlocking media archives with semantic search. With multimodal AI, it’s possible to search a show archive using natural language “[celebrity] introducing [another celebrity]”, no tags or labels required. Multimodal AI can analyze both visual and audio queues to deliver highly accurate results. This capability allows marketers to find the perfect segment for clip compilations faster.

  • Enabling advertising engines. Effective advertising resonates emotionally. While humans can quickly sense if an image is “motivation,” that’s the kind of job that computers have been historically very bad at. Multimodal AI now makes this possible. It’s able to analyze creative for nuanced concepts like “patriotism” or “motivation.” Now ad engines can be powered with a much richer understanding of the emotional content of an image.

  • Optimizing content performance. Media companies want to understand video content at multiple levels, from the asset level to the segment and frame level, and then pair that data with viewer ratings. This has been done for years, but is labor intensive. The ability of multimodal AI to automatically identify segments and the content DNA of those segments makes it possible to optimize content performance across a much wider portfolio of shows. 

Overview of Coactive

Coactive’s Multimodal AI Platform is how media companies are beginning to unlock their visual content with AI. Coactive integrates with cloud storage and is model-agnostic, allowing customers to choose the best model for their particular needs. In preprocessing, Coactive makes video content “AI ready” by managing file sizes and creating vector embeddings. The Semantic Engine is where customers get the “customization” capabilities. This allows users to prompt Coactive and provide simple “yes / no” feedback on whether a concept matches their expectations. It’s powerful, iterative, and simple to use.

The last mile is where these capabilities plug into the end user’s workflow. With Coactive APIs, it’s easy to bring multimodal AI directly into existing systems. In addition, users can use Coactive’s interface to search, generate metadata, and even write SQL to analyze visual content.

A screenshot of a computerDescription automatically generated

2025 is shaping up to be the year when companies are shipping big, impactful AI projects and we’re excited to share these stories.