Personal View site logo
Adobe VoCo - make voiceovers from text
  • You need around 20 minutes of recorded speech for the engine to be able to accurately add new words to the audio clip.

  • 9 Replies sorted by
  • This is good.

  • @matt_gh2

    Just note that primers are such for a reason. As voice bases speech synthesis is extremely hard field. And it is field that still lacks progress of voice recognition. Voice recognition had been improved by brute force - it means that each of your requests are recorded and stored and used by system, system use huge datacenters.

    Voice tempering techniques already existing in government agencies paired with huge voice base collected by Google and Apple allow to put in jail anyone, if you are small enough to be under public attention radar.

  • Good point. I was thinking in terms of ADR that can be done by film editor without needing to bring actors back for a session.

  • How does this work? There is audio recording that has been processed by some decent voice recognition and then added textual meta data to it, so that you can add text that is already in the data base, and the software blends it in. Basically just very smoothly rearranges waveform with textual control, like word interface that edits waves in the background? Is that it, or there is more to it?

    EDIT: To sum it up, it can't create words that have not been recorded, or it can??

  • @inqb8tr

    Synthesis normally do not work on words level, they use phonemes.

  • Paper by same guy

    CUTE: A CONCATENATIVE METHOD FOR VOICE CONVERSION USING EXEMPLAR-BASED UNIT SELECTION

    http://gfx.cs.princeton.edu/pubs/Jin_2016_CAC/CUTE-icassp_2016.pdf

  • I saw this used in person at Adobe Max last week. It works very well, and the audience was impressed but also worried about the implications for how easily this can be misused. They are going to figure out how to watermark this, but I'm still very worried about this being reverse engineered and abused. It worked seamlessly.

  • @Isaac_B

    Again, samples and whole presentation had been arranged. It is not voice synthesis software, so does not solve any problems of them.

  • Why do these events look like some new age sect meetings.