Research
I'm broadly interested in RL, post-training of LLMs and LLM agents. I aspire to make LLMs and LLM agents make better decisions in different domains.
|
Scaling In-Context Online Learning Capability of LLMs via Cross-Episode Meta-RL
Xiaofeng Lin*, Sirou Zhu*, Yilei Chen, Mingyu Chen, Hejian Sang, Ioannis Paschalidis, Zhipeng Wang, Aldo Pacchiano, Xuezhou Zhang
Preprint, 2026.
arXiv / code
We introduce ORBIT, a multi-task, multi-episode meta-RL framework that trains LLMs to learn from interaction in context and improve online decision-making on unseen environments.
|
Debunk the Myth of SFT Generalization
Xiaofeng Lin*, Hejian Sang*, Zhipeng Wang, Xuezhou Zhang
Preprint, 2025.
arXiv / code
This paper challenges the view that SFT fails to generalize and shows that with prompt diversity and chain-of-thought supervision, SFT can match or surpass RL baselines on decision-making tasks.
|
Efficient Reinforcement Learning in Probabilistic Reward Machines
Xiaofeng Lin, Xuezhou Zhang
AAAI, 2025. (Oral)
arXiv /
code /
slides
This paper studies how to explore given the knowledge of Probabilistic Reward Machines (PRMs) and extends reward-free framework to generic non-Markovian rewards setting.
|
A Real-to-Sim-to-Real Approach for Vision-Based Autonomous MAV-Catching-MAV
Zian Ning, Yin Zhang,
Xiaofeng Lin, Shiyu Zhao
Unmanned Systems, 2024.
World Scientific / PDF
Real-to-sim-to-real pipeline for fully autonomous vision-based MAV catching.
|
Leveraging Untrustworthy Commands for Multi-Robot Coordination in Unpredictable Environments: A Bandit Submodular Maximization Approach
Zirui Xu*,
Xiaofeng Lin*, Vasileios Tzoumas
American Control Conference (ACC), 2024.
arXiv /
code
Meta-algorithm that selects between unreliable commands and bandit submodular coordination.
|
Bandit Submodular Maximization for Multi-Robot Coordination in Unpredictable and Partially Observable Environments
Zirui Xu,
Xiaofeng Lin, Vasileios Tzoumas
Robotics: Science and Systems (RSS), 2023.
arXiv /
code /
simulation videos /
presentation
Bandit submodular coordination via tracking the best action, validated in multi-target tracking.
|
Teaching Assistant
- BU SE 524/674: Optimization Theory and Methods (Fall 2024)
- BU EK 125: Introduction to Programming for Engineers (Fall 2025)
|
Service
- Reviewer: ACC, ICLR, NeurIPS, AISTATS, ICML.
|
|