Let’s Stay Connected
Our team is ready to support you on your virtual human journey. Click below to reach out and someone will be in contact shortly.
With the advancements in technology, social media has become an essential part of our life. Here, visual content plays a crucial role in fulfilling our marketing goals. According to famous WordStream research, digital marketers use videos to increase revenue to 49% faster than non-video users. But people are going towards AI videos because it is difficult to create videos manually.
-How deep learning changed the current NFT market- I'm sure the term NFT is not unfamiliar to people who surf the Internet every day, but can you think of the connection between artificial intelligence and NFTs? Can you imagine the next popular artist being AI? Let’s get started!
The person on the screen created by DeepBrain AI becomes an announcer who delivers news 24/7 and a YouTube creator who communicates with viewers in real time. Sometimes we imitate people who are familiar to us.
CTO Kyung-Soo Chae representatively introduced DeepBrain AI's lip-syncing video synthesis technology that enables real-time conversations between AI humans and people. In addition to the technology to generate high-resolution lip-sync videos at four times the speed of time through its own artificial neural network structure design, it also announced research results that succeeded in reducing synthesis time to 1/3 by applying NVIDIA's deep learning inference optimization SDK.
Like the AdaSpeech model we looked at last time, the existing TTS adaptation method has used text-speech pair data to synthesize the voices of a specific speaker. However, since it is practically difficult to prepare data in pairs, it will be a much more efficient way to adapt the TTS model only with speech data that is not transcribed. The easiest way to access is to use the automatic speech recognition (ASR) system for transcription, but it is difficult to apply in certain situations and recognition accuracy is not high enough, which can reduce final adaptation performance. And there have been attempts to solve this problem by joint training of the TTS pipeline and the module for adaptation, which has the disadvantage of not being able to easily combine with other commercial TTS models.
AdaSpeech is a TTS model that has the ability to adapt to new users while making good use of the advantages of FastSpeech, which has previously improved speed with parallel speech synthesis.
It is expected that the lip sync synthesis model through deep learning will develop further and approach humans as a richer service.
In conclusion we have presented a new large-scale dataset to help researchers develop and evaluate deepfake detection methods.
Deepfake is a compound word of deep learning and fake, and refers to a fake image or video that is difficult to distinguish authenticity using deep learning technology. Deepfake generally has a negative perception of 'fake', but it is one of the deep learning technologies.