Xiaofeng Lin

I am a Ph.D. student in Systems Engineering at Boston University, advised by Prof. Xuezhou Zhang, where I work on reinforcement learning and LLMs.

I earned my M.S. in Robotics from the University of Michigan, Ann Arbor, where I was fortunate to work with Prof. Vasileios Tzoumas, and my B.S.E. in Engineering Mechanics from Tianjin University.

I'm actively looking for research internship positions during spring/summer 2026. Feel free to contact me!

Google Scholar /  Github  /  Resume  /  Email  /  X

profile photo

Research

I'm broadly interested in RL, post-training of LLMs and LLM agents. I aspire to make LLMs and LLM agents make better decisions in different domains.

LLMs and RL

Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
Xiaofeng Lin*, Sirou Zhu*, Yilei Chen, Mingyu Chen, Hejian Sang, Ioannis Paschalidis, Zhipeng Wang, Aldo Pacchiano, Xuezhou Zhang
Preprint, 2026.
arXiv / code

We introduce ORBIT, a multi-task, multi-episode meta-RL framework that trains LLMs to learn from interaction in context and improve online decision-making on unseen environments.

Debunk the Myth of SFT Generalization
Xiaofeng Lin*, Hejian Sang*, Zhipeng Wang, Xuezhou Zhang
Preprint, 2025.
arXiv / code

This paper challenges the view that SFT fails to generalize and shows that with prompt diversity and chain-of-thought supervision, SFT can match or surpass RL baselines on decision-making tasks.

Efficient Reinforcement Learning in Probabilistic Reward Machines
Xiaofeng Lin, Xuezhou Zhang
AAAI, 2025. (Oral)
arXiv / code / slides

This paper studies how to explore given the knowledge of Probabilistic Reward Machines (PRMs) and extends reward-free framework to generic non-Markovian rewards setting.

Robotics

A Real-to-Sim-to-Real Approach for Vision-Based Autonomous MAV-Catching-MAV
Zian Ning, Yin Zhang, Xiaofeng Lin, Shiyu Zhao
Unmanned Systems, 2024.
World Scientific / PDF

Real-to-sim-to-real pipeline for fully autonomous vision-based MAV catching.

Leveraging Untrustworthy Commands for Multi-Robot Coordination in Unpredictable Environments: A Bandit Submodular Maximization Approach
Zirui Xu*, Xiaofeng Lin*, Vasileios Tzoumas
American Control Conference (ACC), 2024.
arXiv / code

Meta-algorithm that selects between unreliable commands and bandit submodular coordination.

Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments
Zirui Xu, Xiaofeng Lin, Vasileios Tzoumas
Robotics: Science and Systems (RSS), 2023.
arXiv / code / simulation videos / presentation

Bandit submodular coordination via tracking the best action, validated in multi-target tracking.

Teaching Assistant

  • BU SE 524/674: Optimization Theory and Methods (Fall 2024)
  • BU EK 125: Introduction to Programming for Engineers (Fall 2025)

Service

  • Reviewer: ACC, ICLR, NeurIPS, AISTATS, ICML.

Thanks for Jon Barron for this amazing template.