Le Thien Phuc Nguyen
Logo Affiliated with the University of North Carolina at Chapel Hill
CS PhD Student

Hi, my name is Le Thien Phuc Nguyen.

I am from Vietnam, and I am a CS PhD student at the University of North Carolina at Chapel Hill, advised by Professor Zhongzheng (Jason) Ren .

Previously, I was an Undergraduate Researcher at Wisconsin AI Vision Lab (WAIV), University of Wisconsin-Madison, working with Professor Yong Jae Lee . At WAIV, I am fortunate to work with my mentor Dr. Zhuoran Yu who taught me a lot.

I received my B.S. in Computer Science, Data Science, Math, and Statistics (4 majors) from the University of Wisconsin-Madison in 2026.

My research interests focus on multimodal models, with a particular emphasis on video, audio, image, and large language models (LLMs).


Education
  • University of Wisconsin - Madison
    University of Wisconsin - Madison
    B.S. in Computer Science, Data Science, Math, and Statistics
    Sep. 2022 - May. 2026
  • University of North Carolina at Chapel Hill
    University of North Carolina at Chapel Hill
    Ph.D. Student
    Aug. 2026 - present
Honors & Awards
  • WACV 2026 Oral Presentation
    2026
  • Gold medal in the ICPC North Central North America (NCNA)
    2023
  • Silver medal in the ICPC North Central North America (NCNA)
    2022
  • Third prize in the Vietnam National Olympiad in Informatics
    2022
  • Second prize in the ICPC Vietnam National Round
    2021
  • Second prize in the Vietnam National University Olympiad in Informatics
    2021
News
2026
I will be joining University of North Carolina - Chapel Hill as a PhD student working with Professor Jason Ren in Fall 2026!
Apr 23
AV-SpeakerBench has been recommended for CVPR Findings 2026!
Feb 20
I am honored to have my paper LASER be selected for WACV 2026 Oral!
Jan 22
2025
My paper LASER is accepted to WACV 2026!
Sep 05
2024
I became a mentee of Zhuoran Yu, a PhD student in Professor Lee's lab
Sep 05
I got accepted into Professor Yong Jae Lee's lab
Jun 01
2022
I have just landed in United States to start my education journey at University of Wisconsin - Madison
Aug 19
Selected Publications (view all )
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Le Thien Phuc Nguyen*, Zhuoran Yu*, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee (* equal contribution)

Findings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Le Thien Phuc Nguyen*, Zhuoran Yu*, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee (* equal contribution)

Findings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

Ngoc Bui Lam Quang, Nam Le Nguyen Binh, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Quan Nguyen, Ulas Bagci

ELAMI Workshop @ the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

Ngoc Bui Lam Quang, Nam Le Nguyen Binh, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Quan Nguyen, Ulas Bagci

ELAMI Workshop @ the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

Describe Anything Model for Visual Question Answering on Text-rich Images
Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

VisionDocs Workshop @ the International Conference on Computer Vision (ICCV), 2025

Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

VisionDocs Workshop @ the International Conference on Computer Vision (ICCV), 2025

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios
UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Le Thien Phuc Nguyen*, Zhuoran Yu*, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Soochahn Lee, Yong Jae Lee (* equal contribution)

arXiv, 2025

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Le Thien Phuc Nguyen*, Zhuoran Yu*, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Soochahn Lee, Yong Jae Lee (* equal contribution)

arXiv, 2025

LASER: Lip Landmark Assisted Speaker Detection for Robustness
LASER: Lip Landmark Assisted Speaker Detection for Robustness

Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026 Oral

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026 Oral

All publications