Homepage - Le Thien Phuc Nguyen

Le Thien Phuc Nguyen

Affiliated with the University of North Carolina at Chapel Hill
CS PhD Student

Hi, my name is Le Thien Phuc Nguyen.

I am from Vietnam, and I am currently a CS PhD student at the University of North Carolina at Chapel Hill, advised by Professor Zhongzheng (Jason) Ren . Also, I am honored to be co-advised by Professor Yong Jae Lee at the University of Wisconsin - Madison.

Previously, I was an Undergraduate Researcher at Wisconsin AI Vision Lab (WAIV), University of Wisconsin-Madison, working with Professor Yong Jae Lee . At WAIV, I am fortunate to work with my mentor, Dr. Zhuoran Yu , who taught me a lot.

I received my B.S. in Computer Science, Data Science, Math, and Statistics (4 majors) from the University of Wisconsin-Madison in 2026.

My research interests focus on multimodal models, with a particular emphasis on video, audio, image, and large language models (LLMs). Specifically, I have expertise in multimodal learning and representation. Currently, I am interested in Multimodal Agentic AI and Embodied AI.

plnguyen6(at)wisc.edu Google Scholar GitHub LinkedIn

Education

University of North Carolina at Chapel Hill

CS Ph.D. Student

Aug. 2026 - present
University of Wisconsin - Madison

B.S. in Computer Science, Data Science, Math, and Statistics

Sep. 2022 - May. 2026

Honors & Awards

WACV 2026 Oral Presentation

2026
Gold medal in the ICPC North Central North America (NCNA)

2023
Silver medal in the ICPC North Central North America (NCNA)

2022
Third prize in the Vietnam National Olympiad in Informatics

2022
Second prize in the ICPC Vietnam National Round

2021
Second prize in the Vietnam National University Olympiad in Informatics

2021

News

2026

I will be joining University of North Carolina - Chapel Hill as a CS PhD student working with Professor Jason Ren in Fall 2026!

Apr 23

AV-SpeakerBench has been recommended for CVPR Findings 2026!

Feb 20

I am honored to have my paper LASER be selected for WACV 2026 Oral!

Jan 22

2025

My paper LASER is accepted to WACV 2026!

Sep 05

2024

I became a mentee of Zhuoran Yu, a PhD student in Professor Lee's lab

Sep 05

I got accepted into Professor Yong Jae Lee's lab

Jun 01

2022

I have just landed in United States to start my education journey at University of Wisconsin - Madison

Aug 19

Selected Publications (view all )

DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents

Zhuoran Yu, Le Thien Phuc Nguyen, Jaden Park, Xinyi Gu, Zexue He, Soochahn Lee, Rogerio Feris, Yong Jae Lee

Proceedings of the International Conference on Machine Learning (ICML), 2026

[Project Page]

DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents

Zhuoran Yu, Le Thien Phuc Nguyen, Jaden Park, Xinyi Gu, Zexue He, Soochahn Lee, Rogerio Feris, Yong Jae Lee

Proceedings of the International Conference on Machine Learning (ICML), 2026

[Project Page]

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Le Thien Phuc Nguyen*, Zhuoran Yu*, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee (* equal contribution)

Findings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026
Workshop on Emerging Directions in Data for Multimodal Foundation Models (DataMFM) @ CVPR 2026 Oral
Interactive Physical AI Workshop (IPA) @ CVPR 2026

[Project Page] [Paper] [Code] [Data] [Leaderboard]

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Le Thien Phuc Nguyen*, Zhuoran Yu*, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee (* equal contribution)

[Project Page] [Paper] [Code] [Data] [Leaderboard]

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

Ngoc Bui Lam Quang, Nam Le Nguyen Binh, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Quan Nguyen, Ulas Bagci

ELAMI Workshop @ the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

[Paper]

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

Ngoc Bui Lam Quang, Nam Le Nguyen Binh, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Quan Nguyen, Ulas Bagci

ELAMI Workshop @ the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

[Paper]

Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

VisionDocs Workshop @ the International Conference on Computer Vision (ICCV), 2025

[Paper] [Code]

Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

VisionDocs Workshop @ the International Conference on Computer Vision (ICCV), 2025

[Paper] [Code]

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Le Thien Phuc Nguyen*, Zhuoran Yu*, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Soochahn Lee, Yong Jae Lee (* equal contribution)

arXiv, 2025

[Project Page] [Paper] [Code] [Data]

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Le Thien Phuc Nguyen*, Zhuoran Yu*, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Soochahn Lee, Yong Jae Lee (* equal contribution)

arXiv, 2025

[Project Page] [Paper] [Code] [Data]

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026 Oral

[Project Page] [Paper] [Code] [data]

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026 Oral

[Project Page] [Paper] [Code] [data]

Warning

Action required

Education

Honors & Awards

News

Selected Publications (view all )

DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents

DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

Describe Anything Model for Visual Question Answering on Text-rich Images

Describe Anything Model for Visual Question Answering on Text-rich Images

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

LASER: Lip Landmark Assisted Speaker Detection for Robustness

LASER: Lip Landmark Assisted Speaker Detection for Robustness

All publications