How AI is Simplifying Video

In close-up by gesture mode, Sony’s Edge Analytics Appliance can ingest a 4K camera feed and then automatically digitally zoom into a speaker who is gesturing in the shot, creating a 1080p HD video close-up in the process.
In close-up by gesture mode, Sony’s Edge Analytics Appliance can ingest a 4K camera feed and then automatically digitally zoom into a speaker who is gesturing in the shot, creating a 1080p HD video close-up in the process. (Image credit: Sony)

Artificial intelligence (AI) is the next big thing for video. Companies such as Pexip and Sony are harnessing AI-enabled video platforms to make presentations and videoconferences easier to execute and better to watch.

[AI Meets AV in Higher Ed]

Here is what both companies have come up with in this area, plus insights into how AI-enriched video analytics can make life simpler for everyone, how this technology can protect participant privacy, and where artificial intelligence for presentations and videoconferences is headed next.

Pexip’s AI Approach

Pexip’s entry into AI-enabled video analytics is called Adaptive Composition, which the company bills as “the first AI-powered technology designed to put people, not systems, at the heart of the meeting experience,” according to Jordan Owens, Pexip’s VP of architecture. In plain language, Adaptive Composition’s ability to autonomously manage key elements of meetings allows participants to focus on themselves and their content, rather than staying within the limits of their technology.

Adaptative Composition’s auto-framing function “automatically frames the camera shots around the participants’ faces, ensuring that no one appears off-center or distant in the presenting room,” Owens said. This provides a more naturalistic look to multi-site meetings, and avoids the awkwardness that can occur when presenters are half in frame or too far away to be clearly seen onscreen. (Pexip’s AI-enabled auto-framing is device-agnostic, by the way.)

Meanwhile, Adaptive Composition’s intelligent layout function makes sure that the most active rooms in a multi-site videoconference are given priority, along with the most active speakers. This is in contrast to basic voice/sound detection technology, which can maximize the view of a room with two people in it above all others, just because one of them happened to cough. Again, the result of using AI-enabled video analytics is a more natural flow to multi-site meetings, with improved AV management allowing the technology to recede into the background.

Pexip will introduce Adaptive Composition this year as a tech preview in version 23 of Pexip’s the company’s self-hosted software, Pexip Infinity.

Sony’s Take on AI

Sony’s Edge Analytics Appliance is designed to improve many aspects of distance learning, multi-site presentations, and videoconferences.

Sony’s Edge Analytics Appliance is designed to improve many aspects of distance learning, multi-site presentations, and videoconferences. (Image credit: Sony)

Sony’s Edge Analytics Appliance (REA-C1000) is a different approach to AI. It is a physical device that can be licensed to do one of five AI-enabled tasks: handwriting extraction, PTZ camera auto tracking, close-up by gesture, chromakey-less CG overlays on backgrounds, and focus area cropping. To add a second function, a second appliance and license are required.

The Edge Analytics Appliance is designed to improve many aspects of distance learning, multi-site presentations, and videoconferences. For instance, “In handwriting extraction mode, the Edge Analytics Appliance can capture what a professor is writing on a whiteboard, and then add it to a continuously updated digital overlay,” said Sony product manager Drew Buttress. If need be, the professor can be made semi-transparent on screen to ensure that the writing isn’t blocked to the viewers. They can see everything that is being written as the professor writes it, no matter what happens.

In close up by gesture mode, the Edge Analytics Appliance displays a 4K camera view of the entire audience. “When someone in the audience stands up, we produce a 1080p close-up of that person until they sit down,” Buttress said. “Once seated, the Edge goes back to the wide 4K view of the audience."

Sony’s chromakey-less CG overlays extracts the presenter’s live image from the shot and overlays it over any CG background that the operator selects, no green/blue screen required. This makes it possible to make presentations more visually compelling with a minimum of production equipment. Finally, the Edge Analytics Appliance’s focus area cropping mode allows a single camera feed to be used for two shots: A 4K wide shot of the speaker and his or her surroundings, plus a 1080p close up of the speaker.

Making Life Better for Everyone

Both Pexip’s and Sony’s AI solutions enhance the visual quality of distance communication while simplifying the production process. For example, using artificial intelligence to keep the participants in frame eliminates the need for human camera operators and the “amateur hour” appearance when non-AI-enabled systems fail to track participants accurately and smoothly.

Somewhat paradoxically, the use of artificial intelligence in multi-site presentations and videoconferences can make the experience feel less artificial. “With our handwriting extraction feature, for instance, the students can see what’s being written on the whiteboard clearly in addition to the teacher’s face and body language,” Buttress said. This makes the presentation experience more true-to-life and less mechanical; AI helps the technology get out of the way of the human interaction.

“With AI-enabled video analytics, participants across multiple sites have a more seamless and natural experience,” Owens said. “They don’t have their experience disrupted by worrying about being in the frame, or having someone interrupt their presentation to ask, ‘Could you please zoom in because I can’t see you?’”

AV technology managers who use AI-enabled video platforms will find their working lives less stressful, because their meetings will run smoother as this technology assumes many duties previously assigned to humans or other less-capable mechanical systems. The result is happier users, fewer presentation issues, and a reduced workload for AV staff.

Protecting Privacy

By its very nature, AI-controlled technology is highly programmable. This means that AV managers can set whatever parameters are necessary to protect participant privacy, subject to the organization’s rules that govern these matters.

Granted, some of these limits are a function of equipment deployment rather than AI. “In schools, we can’t show students’ faces due to privacy issues,” Buttress said. “This is why the cameras are set up to shoot the teachers at the front of the room and the back of students’ heads.” 

Still, there are times when everybody’s faces are shown and yet privacy is still an issue, such as a corporate multi-site town hall meeting that isn’t open to the general public. 

In these cases, an AI-enabled video system can automatically impose whatever level of encryption and routing is required to keep everything private. “This includes graphical information that is being shared inside the meeting itself, such as facial recognition in aid of auto-tracking,” Owens said. “This data is protected within the system itself.”

What’s to Come

Today’s AI-powered video platforms are just the start of what is possible with AI-enabled AV equipment. 

Sony’s Drew Buttress can foresee a time when “video analytics are incorporated into the processing that goes into augmented realty and virtual reality to provide more realistic experiences.” Meanwhile, Pexip’s Jordan Owens is more taken with the idea of AI making all aspects of AV multi-site presentations more naturalistic and seamless to participants. “We’re looking at that now: trying not just to create better face-to-face communications, but driving better overall meeting experiences across the board,” he said.

One thing is certain: Today’s AI-enabled video platforms are just scratching the surface of what is possible with this technology. Some day in the future, AI may be managing all aspects of multi-site meetings so efficiently and smoothly, that no one will even notice the AV equipment that makes these meetings possible.

James Careless

James Careless is an award-winning freelance journalist with extensive experience in audio-visual equipment, AV system design, and AV integration. His credits include numerous articles for Systems Contractor News, AV Technology, Radio World, and TV Tech, among others. Careless comes from a broadcasting background, with credits at CBC Radio, NPR, and NBC News. He currently co-produces/co-hosts the CDR Radio podcast, which covers the Canadian defense industry. Careless is a two-time winner of the PBI Media Award for Excellence.