Adobe VoCo - make voiceovers from text

Vitaliy_Kiselev

You need around 20 minutes of recorded speech for the engine to be able to accurately add new words to the audio clip.

matt_gh2

This is good.

Vitaliy_Kiselev

Just note that primers are such for a reason. As voice bases speech synthesis is extremely hard field. And it is field that still lacks progress of voice recognition. Voice recognition had been improved by brute force - it means that each of your requests are recorded and stored and used by system, system use huge datacenters.

Voice tempering techniques already existing in government agencies paired with huge voice base collected by Google and Apple allow to put in jail anyone, if you are small enough to be under public attention radar.

matt_gh2

Good point. I was thinking in terms of ADR that can be done by film editor without needing to bring actors back for a session.

inqb8tr

How does this work? There is audio recording that has been processed by some decent voice recognition and then added textual meta data to it, so that you can add text that is already in the data base, and the software blends it in. Basically just very smoothly rearranges waveform with textual control, like word interface that edits waves in the background? Is that it, or there is more to it?

EDIT: To sum it up, it can't create words that have not been recorded, or it can??

Vitaliy_Kiselev

@inqb8tr

Synthesis normally do not work on words level, they use phonemes.

Vitaliy_Kiselev

Paper by same guy

CUTE: A CONCATENATIVE METHOD FOR VOICE CONVERSION USING EXEMPLAR-BASED UNIT SELECTION

http://gfx.cs.princeton.edu/pubs/Jin_2016_CAC/CUTE-icassp_2016.pdf

Isaac_B

I saw this used in person at Adobe Max last week. It works very well, and the audience was impressed but also worried about the implications for how easily this can be misused. They are going to figure out how to watermark this, but I'm still very worried about this being reverse engineered and abused. It worked seamlessly.

Vitaliy_Kiselev

@Isaac_B

Again, samples and whole presentation had been arranged. It is not voice synthesis software, so does not solve any problems of them.

inqb8tr

Why do these events look like some new age sect meetings.

Howdy, Stranger!

Categories

Tags in Topic

Top Posters