About
I am Xuhong He (贺旭鸿), a Master’s student in Intelligent Information Systems (MIIS) at Carnegie Mellon University, where I currently work with Prof. Fernando Diaz on information retrieval, LLM agents, and query simulation/evaluation.
Before CMU, I completed dual undergraduate degrees from Zhejiang University and the University of Illinois Urbana-Champaign. I previously collaborated with Prof. Zhiting Hu at University of California San Diego on SimWorld, a UE5-based simulator for autonomous agents in physical and social worlds. I also worked as a software engineer intern at ByteDance on large-scale search infrastructure and neural sparse retrieval.
Experience
Internship Experience
ByteDance
Built a production-level neural sparse retrieval pipeline (SPLADE, BGE) into ByteES (an OpenSearch-based search stack), focusing on the full pipeline from indexing to serving under large-scale constraints. Also explored and prototyped Learning-to-Rank reranking integration to improve relevance in the existing production system.
Research Experience
Multilingual Simulation of Tip-of-the-Tongue (ToT) Queries
Created a multilingual Tip-of-the-Tongue query simulation pipeline from Wikipedia corpora, with a focus on Chinese, Japanese and Korean for NTCIR-19 Tip-of-the-Tongue (ToT) Track. Verified quality via retrieval-behavior consistency across Systems-rank Correlation evaluation, ensuring simulated queries behave similarly to human queries in ranking.
SimWorld: Open-ended Simulator for Agents in Physical and Social Worlds
Developed an Unreal Engine 5-based simulator for developing and evaluating autonomous agents (including LLM/VLM agents) in rich physical and social worlds, featuring open-ended realistic simulation with language-controllable world generation, a rich gym-like LLM/VLM agent interface with multimodal observations and grounded natural-language actions across multiple abstraction levels, and diverse long-horizon physical/social reasoning scenarios for systematic agent training and evaluation.
Publications
Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation
Submitted to SIGIR 2026
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
Technical report, 2025
SimWorld: An Open-ended Simulator for Agents in Physical and Social Worlds
NeurIPS 2025 Spotlight
Blog
Under Construction
I plan to use this section for research notes, engineering writeups, and short posts on information retrieval, LLM agents, and systems. The blog structure is kept in the repo, but public posts will be added later.
Service
- Co-organizer, NTCIR-19 Tip-of-the-Tongue (ToT) Track (2026)
- Program Committee Member / Reviewer, SIGIR 2026 (Resources Paper Track)
Education
- Carnegie Mellon University (Dec 2026, expected)
M.S. in Intelligent Information Systems (MIIS) - University of Illinois Urbana-Champaign (May 2025)
B.S. in Computer Engineering (double degree) - Zhejiang University (May 2025)
B.E. in Electronic and Computer Engineering (double degree)