Jen-Tse (Jay) Huang 黃任澤
My first name sounds like: Yen-Zuh
Email: jhuan236@jh.edu
|
Towards Evaluating Proactive Risk Awareness of Multimodal Language Models
Youliang Yuan, Wenxiang Jiao, Yuejin Xie, Chihao Shen, Menghan Tian, Wenxuan Wang, Jen-tse Huang, Pinjia He
NeurIPS, 2025
| arXiv | dataset |
|
|
CodeCrash: Stress-Testing LLM Code Reasoning under Misleading Natural Language Perturbations
Man Ho Lam, Chaozheng Wang , Jen-tse Huang, Michael R. Lyu
NeurIPS, 2025
| arXiv | code | homepage | dataset |
|
|
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
Jen-tse Huang , Jiantong Qin , Jianping Zhang, Youliang Yuan, Wenxuan Wang , Jieyu Zhao
EMNLP Main, 2025
| arXiv | code |
|
|
AI Sees Your Location---But With A Bias Toward The Wealthy World
Jingyuan Huang , Jen-tse Huang , Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang , Jieyu Zhao
EMNLP Main, 2025
| arXiv | code |
|
|
Where Fact Ends and Fairness Begins: Redefining AI Bias Evaluation through Cognitive Biases
Jen-tse Huang, Yuhang Yan , Linqi Liu , Yixin Wan, Wenxuan Wang , Kai-Wei Chang, Michael R. Lyu
EMNLP Findings, 2025
| arXiv | code |
|
|
Learning to Ask: When LLM Agents Meet Unclear Instruction
Wenxuan Wang, Juluan Shi , Zixuan Ling , Yuk-Kit Chan , Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang , Wenxiang Jiao , Michael R. Lyu
EMNLP Main, 2025
| arXiv | code |
|
|
UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging
Cheryl Lee , Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, Michael R. Lyu
EMNLP Main, 2025
| arXiv | code |
|
|
Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare
Zhengliang Shi, Ruotian Ma, Jen-tse Huang, Xinbei Ma, Xingyu Chen, Mengru Wang, Qu Yang, Yue Wang, Fanghua Ye, Ziyang Chen, Shanyi Wang, Cixing Li, Wenxuan Wang, Zhaopeng Tu , Xiaolong Li, Zhaochun Ren , Linus
arXiv, 2025
| arXiv | code |
|
|
The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems
Xinbei Ma, Ruotian Ma, Xingyu Chen, Zhengliang Shi, Mengru Wang, Jen-tse Huang, Qu Yang, Wenxuan Wang, Fanghua Ye, Qingxuan Jiang, Mengfei Zhou, Zhuosheng Zhang , Rui Wang, Hai Zhao, Zhaopeng Tu , Xiaolong Li, Linus
arXiv, 2025
| arXiv | code |
|
|
The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies
Jiaxu Zhou , Jen-tse Huang , Xuhui Zhou, Man Ho Lam, Xintao Wang, Hao Zhu, Wenxuan Wang , Maarten Sap
arXiv, 2025
| arXiv |
|
|
FairGamer: Evaluating Biases in the Application of Large Language Models to Video Games
Bingkang Shi, Jen-tse Huang, Guoyi Li, Xiaodan Zhang , Zhongjiang Yao
arXiv, 2025
| arXiv | code |
|
|
Diversity-Enhanced Reasoning for Subjective Questions
Yumeng Wang , Zhiyuan Fan , Jiayu Liu , Jen-tse Huang, Yi R. Fung
arXiv, 2025
| arXiv | code |
|
|
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang , Youliang Yuan, Michael R. Lyu, Maarten Sap
ICML, 2025
| arXiv | code | poster | slides | video |
|
|
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui Xu, Jen-tse Huang, Siyu Yuan, Haoran Guo, Jiangjie Chen, Shuchang Zhou, Wei Wang, Yanghua Xiao
ICML, 2025
| arXiv | code | homepage | dataset | model | poster | slides | video |
|
|
JARVIS or Ultron? A Survey on the Safety and Security Threats of Computer-Using Agents
Ada Chen , Yongjiang Wu , Junyuan Zhang , Jingyu Xiao, Shu Yang, Jen-tse Huang, Kun Wang, Wenxuan Wang , Shuai Wang
arXiv, 2025
| arXiv |
|
|
Language Models Do Not Have Human-Like Working Memory
Jen-tse Huang , Kaiser Sun, Wenxuan Wang, Mark Dredze
arXiv, 2025
| arXiv | code |
|
|
SOTOPIA-S4: A User-Friendly System for Flexible, Customizable, and Large-Scale Social Simulation
Xuhui Zhou , Zhe Su , Sophie Feng, Jiaxu Zhou, Jen-tse Huang, Hsien-Te Kao, Spencer Lynch, Svitlana Volkova, Tongshuang Wu, Anita Woolley, Hao Zhu, Maarten Sap
NAACL Demo, 2025
| arXiv | code | demo |
|
|
Competing Large Language Models in Multi-Agent Gaming Environments
Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang , Youliang Yuan, Wenxiang Jiao , Xing Wang, Zhaopeng Tu, Michael R. Lyu
ICLR, 2025
| arXiv | code | poster | slides | video |
|
|
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
Kun Wang , Guibin Zhang , Zhenhong Zhou , Jiahao Wu , Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Shicheng Xu, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Kai Wang, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Xiaolong Jin, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Liang Pang, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, Xiaofeng Wang, Dacheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu
arXiv, 2025
| arXiv |
|
|
BiasInspector: Detecting Bias in Structured Data through LLM Agents
Haoxuan Li, Mingyu Derek Ma, Jen-tse Huang, Zhaotian Weng, Wei Wang, Jieyu Zhao
arXiv, 2025
| arXiv | code |
|
|
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench
Ziyi Liu, Priyanka Dey, Zhenyu Zhao, Jen-tse Huang, Rahul Gupta, Yang Liu, Jieyu Zhao
arXiv, 2025
| arXiv | code |
|
|
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition
Xiaoying Zhang , Da Peng, Yipeng Zhang, Zonghao Guo , Chengyue Wu, Jen-tse Huang, Chi Chen, Wei Ke, Helen Meng , Maosong Sun
arXiv, 2025
| arXiv | code |
|
|
Human Cognitive Benchmarks Reveal Foundational Visual Gaps in MLLMs
Jen-tse Huang, Dasen Dai, Jen-Yuan Huang, Youliang Yuan, Xiaoyuan Liu, Wenxuan Wang , Wenxiang Jiao, Pinjia He, Zhaopeng Tu, Haodong Duan
arXiv, 2025
| arXiv | code |
|
|
FairCoder: Evaluating Social Bias of LLMs in Code Generation
Yongkang Du, Jen-tse Huang, Jieyu Zhao, Lu Lin
arXiv, 2025
| arXiv | code |
|
|
InstantIR: Blind Image Restoration with Instant Generative Reference
Jen-yuan Huang, Haofan Wang, Qixun Wang, Xu Bai, Hao Ai, Peng Xing, Jen-tse Huang
arXiv, 2024
| arXiv | code | demo |
|
|
InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews
Xintao Wang, Yunze Xiao, Jen-tse Huang, Siyu Yuan, Rui Xu, Haoran Guo, Quan Tu, Yaying Fei, Ziang Leng, Wei Wang, Jiangjie Chen, Cheng Li, Yanghua Xiao
ACL Main, 2024
| arXiv | code | homepage | poster | video |
|
|
How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO
Man Tik Ng , Hui Tung Tse , Jen-tse Huang , Jingjing Li, Wenxuan Wang, Michael R. Lyu
arXiv, 2024
| arXiv | code |
|
|
The Earth is Flat? Unveiling Factual Errors in Large Language Models
Wenxuan Wang , Juluan Shi , Zhaopeng Tu, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu
arXiv, 2024
| arXiv |
|
|
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models
Tian Liang, Zhiwei He, Jen-tse Huang, Wenxuan Wang, Wenxiang Jiao , Rui Wang, Yujiu Yang , Zhaopeng Tu, Shuming Shi, Xing Wang
arXiv, 2023
| arXiv | code |
|
|
Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine
Wenxiang Jiao , Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi, Zhaopeng Tu
arXiv, 2023
| arXiv | code |
|
|