War: What is badly wrong with Google Live Captions?

Vitaliy_Kiselev

Google has been testing Live Captions for Chrome on desktop for a while using Chrome Canary. Now, all Chrome users on PC can take advantage of real-time captioning, so long as Google already automatically installed version 89. The latest version of the browser has a new Accessibility section under Advanced in Preferences that'll let users toggle on Live Captions. Upon switching it on, they'll see subtitles for any audio and video they play on their browser that's in the English language. It will work even if the video is muted.

Note that this means that all audio will be sent to Google and stored indefinitely (with recognized text, of course!), it will be also infinitely linked with you. And note that "acceptable" video or audio clip can be considered non acceptable next month or next year by lot of authorities or corporations. And text will some stupid NN (so called "AI") decide if you are to blame. And yes, retrospect responsibility can be new feature.

Quite soon using Non Microsoft and Non Alphabet browser will become requirement.

theusualeditor

Having worked in the broadcast industry for well over a decade, the term "captions" is loosely used.. similar to people "filming" with their video cameras. In any case, Chrome Live Captions is better defined as Live Transcription, and the resulting subtitles are really just a layer of text over the video/browser frame.

To leverage AI computational power, learned phonetics, user-defined word databases, etc. The audio must be processed remotely, where the massive and powerful hardware that performs the transcription lives. Virtually every AI "Captioning" online service works like this. Any video uploaded to YouTube also goes through this process, just not in real-time.

The alternative is to buy an extremely limited and basic program that lives locally, learns nothing, and has a very low accuracy. Anyone remembers the early 2000's "Dragon Naturally Speaking" fiasco? Good times.

In the US, for accessibility compliance, there is a percentage of accuracy a transcription must have in order to be legally acceptable.

From having to deal with this non-sense constantly, I can say that the actual captioning industry (human steno operators, hardware caption encoders) is an extremely closed ecosystem. From its inception, the industry made sure to close all the loops, so that if anyone needs real captions (CEA-608 and CEA-708) they have to pay a very hefty price tag, either on equipment that lives in-house, or in remote services performed by real people.

In closing, live transcription is used by IBM, AWS, Facebook, Microsoft Teams, Zoom, third party services like Otter, Adobe, and is even available on Google docs to allow users to "type" with their voice. If I had to guess, I'll say they have been transcribing in the background for a long time (because AI needs to learn), and only now is the service being deployed to the public.

I also see this as a strategic move, because in the US there are many individuals and groups who actively search for reasons to sue small/medium/large companies because they don't offer accessibility features.. even if they themselves don't need them.

tfinn

@theusualeditor

Also work in broadcast and completely agree, especially with last part. Also, FCC mandate has expanded so short-form content/certain web content is not exempt from having captioning. Our understanding (more like anticipating at this point) is that going forward, anything, even ads, will need captioning.

As a person that has had to edit captions, trust me everyone, you want these automated transcribers to improve. Once the accuracy % requirement goes up, it's either automating captioning or pay up a significant chunk of cash to have someone do it as theusualeditor says.

Vitaliy_Kiselev

@tfinn

As a person that has had to edit captions, trust me everyone, you want these automated transcribers to improve. Once the accuracy % requirement goes up, it's either automating captioning or pay up a significant chunk of cash to have someone do it as theusualeditor says.

Google NNs that do captioning are improving at staggering pace. They teach them using huge amount of stuff and also have thousands of people working 24/7 that feed them audio-text pairs.

NN size each year increases from 2 up to 5 times, some years 10-15 times.

Also I think that most don't get the point, as now Google will have cover to send any local video or audio to their servers, silently.

Howdy, Stranger!

Categories

Tags in Topic

Top Posters