您的瀏覽器不支援JavaScript語法,網站的部份功能在JavaScript沒有啟用的狀態下無法正常使用。

中央研究院 資訊科學研究所

活動訊息

友善列印

列印可使用瀏覽器提供的(Ctrl+P)功能

學術演講

:::

Guiding Instruction-based Image Editing via Multimodal Large Language Models

  • 講者傅子睿 先生 (加州大學聖巴巴拉分校)
    邀請人:馬偉雲
  • 時間2024-03-26 (Tue.) 10:30 ~ 12:30
  • 地點資訊所新館 106 會議室
摘要
Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks.
However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. In this talk, we investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE).
We will involve a background review of MLLMs and diffusion models for visual generation, so everyone is welcome to join!
BIO
Tsu-Jui (https://tsujuifu.github.io) is a Ph.D. candidate at UCSB and an incoming research scientist at Apple. His research lies in vision+language and text-guided visual editing.
He is also interested in language grounding and information extraction. He has done research internships at Apple AI/ML, Meta AI, Microsft Azure AI, and Microsoft Research.