Exciting And Completely Terrifying

I can’t decide if it’s more exciting or more terrifying how fast the technology is evolving right now, and specifically when it comes to Artificial Intelligence. Some of you might have read my short informational text about how easy it is to create your own animated, talking 3D avatar. It doesn’t really matter now anyway, since that’s old news now. But let me take it from start, at least from the start from my perspective.

A couple of days ago I was browsing around on Github, as you do. I was mainly searching for resources to fix my installation of my generative AI (Stable Diffusion), as I had done something that screwed up the installation for me. In my search for one file or packet that I suspected was broken, I stumbled on a repository for what’s called Ultimate RVC.

I’ve not heard of this before and didn’t know what it was, but I recognized some files as well as the UI (Gradio) and because of that I figured that it might have something I need. It didn’t, but when I checked the README file I instantly became intrigued and downloaded the program. Soon I had forgotten everything that had anything to do with why I was at Github to begin with.

Now I’m no expert, so I won’t be able to explain exactly what it is or how it works, but I can give you my understanding of it, based on what few tests I’ve had the chance to do so far.

Exciting, terrifying or maybe “Excitifying”

As far as I understand, Ultimate RVC is a type of language model (as ChatGPT is). The difference is that this model can be trained at (as I understand it) voices. Any voice really. Your, mine, your girlfriends/boyfriends, parents, dead uncle and Elvis.

This is super cool!

And terrifying!

Let’s say that you can’t sing a clean tone if your life depended on it, but you have a really deep passion for music and have always wanted to learn how to sing. If you train a model with your voice, then the AI can sing with your voice and make it sound good. I also assume that this is a giant leap forward for text-to-speech models that could be really useful for those who can’t speak at all. Super cool!

It also means that if your brother calls you up and asks if you can lend him some money, you can’t take for granted that it really is your brother just because it sounds like it. Already before all this scams over the phone has been pretty common, and I imagine that for some time to come these types of crimes will explode. Terrifying!

This is what the UI looks like.

In this case I’m using a model that has been pre-trained on Kurt Cobain’s voice, and I’ve sped up the video a bit. It took a few minutes until I could get Kurt Cobain to sing songs that weren’t even written when he passed away.

I’m going to give a short example. I also made some AI figures for this, mostly to give an idea of what it could look like if it’s done well. It should be noted that I haven’t spent that much time on the actual video, so it’s pretty obvious that these are AI animations. But I bet that someone that spends a bit more time creating a video, could make it realistic enough that no one would notice. Except for that face that Cobain has been dead for 30 years.

So remember, it’s not the visuals that are important here. The female voice is the real one, but Cobain’s voice is completely created from a language model that has been trained on his voice specifically.

All I did was to download the language model that was trained on Kurt Cobain’s voice, and provide a link to the youtube video I wanted him to sing. The voice of Britney Spears I added afterward, because I thought it would sound nice.

And now you might want to argue that anyone with at least one eye in their skull can see that this is AI and not the real Kurt Cobain and Britney Spears, even if they both were alive today. While that is true, I did say that I had not spent very much time and energy on the visuals in this video.

However, imagine that you pair up this voice AI with this other AI software called Live Portrait.

With Live Portrait you can look like anyone you want, in live video. While now also have the opportunity to sound like anyone you want, more or less live (I bet it will be live very soon).

Source: Live Portrait at Github

I will write more about both of these softwares over the weekend. To make sure you won’t miss anything, you can sign up on my newsletter.

Exciting And Completely Terrifying

Exciting, terrifying or maybe “Excitifying”

Creepybits Newsleter

Thank you!