Taegyeong Lee

Hi, I'm Taegyeong Lee. I'm passionate about novel research that explores generating images or videos from audio and text—integrating various modalities. I enjoy conducting research that is simple yet effective, leveraging multimodal and generative models to make a strong impact in the real world.

I am currently working as a research intern on the FnGuide LLM TFT, focusing on LLMs and RAG. Previously, I earned my Master’s degree from the UNIST AIGS. I interned at the ETRI and completed the Software Maestro 8th. I also served as a software developer in the Promotion Data Management Division at the Republic of Korea Army Headquarters. I hold a Bachelor of Computer Engineering from Pukyong National University.

Email  /  Scholar  /  Github

profile photo

Research

My current primary research interests include:
Generative AI, LLMs, RAG.

Zero-shot Prompt Guard for Multi-modal LLM Safety
Taegyeong Lee, Jeonghwa Yoo, Hyoungseo Cho, Soo Yong Kim and Yunho Maeng
ACL 2025 Workshop (The 9th Workshop on Online Abuse and Harms)

This paper proposes a simple yet effective question prompting method to block harmful prompts, including multi-modal ones, in a zero-shot and robust manner.

Multi-aspect Knowledge Distillation with Large Language Model
Taegyeong Lee, Jinsik Bang, Soyeong Kwon, Taehwan Kim,
CVPR 2025 Workshop (The 12th Workshop on Fine-Grained Visual Categorization)
github / arXiv

We introduce a multi-aspect knowledge distillation method using MLLMs to enhance vision models by learning both visual and abstract aspects, improving performance across tasks.

Generating Realistic Images from In-the-wild Sounds
Taegyeong Lee, Jeonghun Kang, Hyeonyu Kim, Taehwan Kim,
ICCV, 2023
github / arXiv

We propose a diffusion-based model that generates images from wild sounds using audio captioning, attention mechanisms, and CLIP-based optimization, achieving superior results.

Generating Emotional Face Images using Audio Information for Sensory Substitution
Taegyeong Lee, Hyerin Uhm, Chi Yoon Jeong, Chae-Kyu Kim,
Journal of Korea Multimedia Society, 2023

We propose a method to generate images optimized for sound intensity, enhancing V2A models for improved face image generation.


Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.