About

I am Xuhong He (贺旭鸿), a Master’s student in Intelligent Information Systems (MIIS) at Carnegie Mellon University, where I currently work with Prof. Fernando Diaz on information retrieval, LLM agents, and query simulation/evaluation.

Before CMU, I completed dual undergraduate degrees from Zhejiang University and the University of Illinois Urbana-Champaign. I previously collaborated with Prof. Zhiting Hu at University of California San Diego on SimWorld, a UE5-based simulator for autonomous agents in physical and social worlds. I also worked as a software engineer intern at ByteDance on large-scale search infrastructure and neural sparse retrieval.

Experience

Internship Experience

ByteDance

Software Engineer Intern, Search Infrastructure (Information Retrieval)
Jul 2024 - Mar 2025

ByteDance internship project diagram

Built a production-level neural sparse retrieval pipeline (SPLADE, BGE) into ByteES (an OpenSearch-based search stack), focusing on the full pipeline from indexing to serving under large-scale constraints. Also explored and prototyped Learning-to-Rank reranking integration to improve relevance in the existing production system.

Research Experience

Multilingual Simulation of Tip-of-the-Tongue (ToT) Queries

Aug 2025 - Present
Areas: LLMs, Information Retrieval, Evaluation
Advisor: Prof. Fernando Diaz (CMU & Google)

Multilingual Tip-of-the-Tongue (ToT) simulation overview

Created a multilingual Tip-of-the-Tongue query simulation pipeline from Wikipedia corpora, with a focus on Chinese, Japanese and Korean for NTCIR-19 Tip-of-the-Tongue (ToT) Track. Verified quality via retrieval-behavior consistency across Systems-rank Correlation evaluation, ensuring simulated queries behave similarly to human queries in ranking.

SimWorld: Open-ended Simulator for Agents in Physical and Social Worlds

Jun 2024 - May 2025
Areas: LLMs, Agents
Advisor: Prof. Zhiting Hu (UCSD)

SimWorld platform overview

Developed an Unreal Engine 5-based simulator for developing and evaluating autonomous agents (including LLM/VLM agents) in rich physical and social worlds, featuring open-ended realistic simulation with language-controllable world generation, a rich gym-like LLM/VLM agent interface with multimodal observations and grounded natural-language actions across multiple abstraction levels, and diverse long-horizon physical/social reasoning scenarios for systematic agent training and evaluation.

Publications

Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation

Submitted to SIGIR 2026

SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds

Technical report, 2025

SimWorld: An Open-ended Simulator for Agents in Physical and Social Worlds

NeurIPS 2025 Spotlight

Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration

NeurIPS 2025 Poster

Blog

Under Construction

I plan to use this section for research notes, engineering writeups, and short posts on information retrieval, LLM agents, and systems. The blog structure is kept in the repo, but public posts will be added later.

Service

Education

  • Carnegie Mellon University (Dec 2026, expected)
    M.S. in Intelligent Information Systems (MIIS)
  • University of Illinois Urbana-Champaign (May 2025)
    B.S. in Computer Engineering (double degree)
  • Zhejiang University (May 2025)
    B.E. in Electronic and Computer Engineering (double degree)