Using AI for Audio in Video Production Company

We’ve been inundated with talk of artificial intelligence, and I hesitate to jump on the bandwagon. Still, AI-based audio tools have created tremendous efficiencies in video production and post-production. AI audio tools aren’t brand new. They’ve evolved over the last few years. They started as helpful, although sometimes frustrating, but AI tools have become more reliable and beneficial for video production and post-production.

In video production, the first place Elephant Productions uses AI is to create transcriptions following the video shoot. We do a lot of video production that involves unscripted interviews. Feeding the raw video to a service or software that uses artificial intelligence to create transcriptions means the writer can get transcriptions from those interviews within hours, not days. Not only do we get the transcriptions, but they are timestamped with their location in the video. Sometimes, things look better on paper than they sound. Timestamps allow us to quickly watch the soundbite and ensure it’s a good take for the video production.

The AI-generated transcripts are imperfect, and their accuracy can vary by product, but they are good enough to convey the idea. AI can struggle with words that sound alike, such as “for” and “four.” AI can also struggle with proper punctuation. If accuracy is essential, you pay extra to the transcript service for someone to review and refine the AI transcript. For instance, I would pay extra if I were creating transcripts for closed captions or subtitles. Otherwise, I think AI does just fine.

The video script is completed using AI-generated transcriptions with timestamps. Once the video script is complete, the timestamps increase the speed at which the editor can assemble the first cut of the video. Having the transcripts loaded into the edit system and synced with the video can further increase efficiency. You can assemble an edit by highlighting the desired part of the transcript instead of the typical mark “in” and “out” of the video clip. Editing is more complicated than copying and pasting, but an initial rough assembly is a jumping-off point, and using AI-generated transcriptions is a huge timesaver.

Transcripts synced with the video in the edit system allow searching for certain words or phrases to find alternate takes. For instance, a clip’s content may be great, but one word or phrase may be unclear because a cell phone went off in the background or the speaker mumbled. You can search for that word or phrase and possibly find a seamless replacement.

During the post-production process, artificial intelligence is helping speed up the audio sweetening process. Removing background noise has always been an essential part of audio mixing; now AI technology makes it easier and faster. It does have to be utilized with a “less is more” philosophy. It’s not yet a set-it-and-forget-it technology—too much leads to an electronic-sounding voice, and too little leaves too much background noise. Adobe Premiere’s audio remixing tool is also a great use of AI. The hard part about using music from a library is that the track rarely times out just as you want unless you are doing a 15, 30, or 60-second commercial. Adobe’s AI tool will analyze and adjust the music track to fit the time you need. It’s finding places that can be looped or cut, and you could do this manually, but what a timesaver the tool is.

AI can help generate captions or subtitles when the video edit is done. Some closed caption software has a built-in transcription function. This will take the finished show, transcribe it, and create the closed captions or subtitles synced to the show and broken into text blocks depending on your preferences, such as how many lines, characters, and pop-on or roll-up formatting. Since most of our video projects have transcriptions created at the start of the post-production process, we use a closed caption program to take the finished script and use artificial intelligence to time and make the captions or subtitles. AI can struggle to determine a logical point to break the text and create a new text block. You may have to edit them manually in some places, but it’s faster than watching the video and manually entering start and stop times for each block.

AI audio tools don’t necessarily yield better results during video production and post-production; these tasks could be done manually, but they save tremendous time. Time that can be dedicated to making the finished video production even better.

Tagged Artifiicial intelligence, Elephant Productions, Post-Production, Video production