Despite having promising results, style transfer, which requires preparing style images in advance, may result in a lack of creativity and accessibility. Following human instruction, on the other hand, is the most natural way to perform artistic style transfer that can significantly improve controllability for visual effect applications. We introduce a new task—language-driven artistic style transfer (LDAST)—to manipulate the style of a content image, guided by a text. We propose contrastive language visual artist (CLVA) that learns to extract visual semantics from style instructions and accomplish LDAST.
Tsu-Jui (Ray) Fu is a Ph.D. candidate at UC Santa Barbara, advised by William Wang. His research lies in vision+language and mainly focuses on text-guided visual editing. He is vision+also interested in language grounding and information extraction. He has done research internships in Academia Sinica, Preferred Networks, Microsoft Research, and Meta AI.