8 August, 2018
Animesh Karnewar, a programmer who likes to experiment with AI and machine learning, wanted to see if he could build an AI that was able to generate images of characters in books based only on the book’s description, according to a Medium post he wrote. By looking to previous examples of creating images from text, including a model called Face2Text from researchers at the University of Copenhagen, he was able to build an AI and train it on a dataset of 400 random faces with accompanying description.
Eventually, the AI became more refined and was able to generate pretty accurate, if somewhat abstract, portraits when fed a description in plain language. The results are low resolution portraits that tend to get the main features right, though sometimes gets details, like hair color, wrong. Take this portrait for example, which the AI created based off the description “man in his late 50s has an elongated face with a prominent nose, short mustache with a receding hairline and brown eyes.”
Karnewar shared the program on GitHub for anyone else who wants to play around with it, though it requires some technical expertise and existing software to experiment with. Karnewar wrote that training on a larger dataset will allow the AI to get even better, and could potentially be used to help police sketch artists create images of a suspect.
“Basically, [it can be used] for any application where we need some head-start to jog our imagination,” Karenwar wrote.
“I found that the generated samples at higher resolutions (32 x 32 and 64 x 64) has more background noise compared to the samples generated at lower resolutions,” Karnewar explains. “I perceive it due to the insufficient amount of data (only 400 images).”
The technique used to train the adversarial networks is called “Progressive Growing of GANs,” which improves quality and stability over time. As the video shows, the image generator starts from an extremely low resolution. New layers are slowly introduced into the model, increasing the details as the training progresses over time.
“The Progressive Growing of GANs is a phenomenal technique for training GANs faster and in a more stable manner,” he adds. “This can be coupled with various novel contributions from other papers.”
In an example, the text description illustrates a woman in her late 20s with long brown hair swiped over to one side, gentle facial features and no make-up. She’s “casual” and “relaxed.” Another description illustrates a man in his 40s with an elongated face, a prominent nose, brown eyes, a receding hairline and a short mustache. Although the end results are extremely pixelated, the final renders show great progress in how A.I. can generate faces from scratch.
Karnewar says he plans to scale out the project to integrate additional datasets such as Flicker8K and Coco captions. Eventually, T2F could be used in the law enforcement field to identify victims and/or criminals based on text descriptions, among other applications. He’s open to suggestions and contributions to the project.