Le Thien Phuc Nguyen
Logo Affiliated with the University of North Carolina at Chapel Hill
CS PhD Student

Hi, my name is Le Thien Phuc Nguyen.

I am from Vietnam, and I am currently a CS PhD student at the University of North Carolina at Chapel Hill, advised by Professor Zhongzheng (Jason) Ren . Also, I am honored to be co-advised by Professor Yong Jae Lee at the University of Wisconsin - Madison.

Previously, I was an Undergraduate Researcher at Wisconsin AI Vision Lab (WAIV), University of Wisconsin-Madison, working with Professor Yong Jae Lee . At WAIV, I am fortunate to work with my mentor, Dr. Zhuoran Yu , who taught me a lot.

I received my B.S. in Computer Science, Data Science, Math, and Statistics (4 majors) from the University of Wisconsin-Madison in 2026.

My research interests focus on multimodal models, with a particular emphasis on video, audio, image, and large language models (LLMs).


Education
  • University of North Carolina at Chapel Hill
    University of North Carolina at Chapel Hill
    CS Ph.D. Student
    Aug. 2026 - present
  • University of Wisconsin - Madison
    University of Wisconsin - Madison
    B.S. in Computer Science, Data Science, Math, and Statistics
    Sep. 2022 - May. 2026
Honors & Awards
  • WACV 2026 Oral Presentation
    2026
  • Gold medal in the ICPC North Central North America (NCNA)
    2023
  • Silver medal in the ICPC North Central North America (NCNA)
    2022
  • Third prize in the Vietnam National Olympiad in Informatics
    2022
  • Second prize in the ICPC Vietnam National Round
    2021
  • Second prize in the Vietnam National University Olympiad in Informatics
    2021
News
2026
I will be joining University of North Carolina - Chapel Hill as a CS PhD student working with Professor Jason Ren in Fall 2026!
Apr 23
AV-SpeakerBench has been recommended for CVPR Findings 2026!
Feb 20
I am honored to have my paper LASER be selected for WACV 2026 Oral!
Jan 22
2025
My paper LASER is accepted to WACV 2026!
Sep 05
2024
I became a mentee of Zhuoran Yu, a PhD student in Professor Lee's lab
Sep 05
I got accepted into Professor Yong Jae Lee's lab
Jun 01
2022
I have just landed in United States to start my education journey at University of Wisconsin - Madison
Aug 19
Selected Publications (view all )
DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents
DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents

Zhuoran Yu, Le Thien Phuc Nguyen, Jaden Park, Xinyi Gu, Zuxue He, Soochahn Lee, Rogerio Feris, Yong Jae Lee

Proceedings of the International Conference on Machine Learning (ICML), 2026 2026

DocHop: Benchmarking Out-of-domain Multi-hop Reasoning in Information-Dense Documents

Zhuoran Yu, Le Thien Phuc Nguyen, Jaden Park, Xinyi Gu, Zuxue He, Soochahn Lee, Rogerio Feris, Yong Jae Lee

Proceedings of the International Conference on Machine Learning (ICML), 2026 2026

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Le Thien Phuc Nguyen*, Zhuoran Yu*, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee (* equal contribution)

Findings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Le Thien Phuc Nguyen*, Zhuoran Yu*, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee (* equal contribution)

Findings of The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR Findings), 2026

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

Ngoc Bui Lam Quang, Nam Le Nguyen Binh, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Quan Nguyen, Ulas Bagci

ELAMI Workshop @ the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification

Ngoc Bui Lam Quang, Nam Le Nguyen Binh, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Quan Nguyen, Ulas Bagci

ELAMI Workshop @ the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

Describe Anything Model for Visual Question Answering on Text-rich Images
Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

VisionDocs Workshop @ the International Conference on Computer Vision (ICCV), 2025

Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh-Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li, Tianyang Wang, Ulas Bagci, Min Xu

VisionDocs Workshop @ the International Conference on Computer Vision (ICCV), 2025

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios
UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Le Thien Phuc Nguyen*, Zhuoran Yu*, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Soochahn Lee, Yong Jae Lee (* equal contribution)

arXiv, 2025

UniTalk: Towards Universal Active Speaker Detection in Real World Scenarios

Le Thien Phuc Nguyen*, Zhuoran Yu*, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Soochahn Lee, Yong Jae Lee (* equal contribution)

arXiv, 2025

LASER: Lip Landmark Assisted Speaker Detection for Robustness
LASER: Lip Landmark Assisted Speaker Detection for Robustness

Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026 Oral

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Le Thien Phuc Nguyen*, Zhuoran Yu*, Yong Jae Lee (* equal contribution)

The IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026 Oral

All publications