Jiayi Weng

Jiayi Weng 翁家翌

trinkle23897 [at] gmail [dot] com

I am a research engineer at OpenAI, where I built and optimized reinforcement learning systems to align foundation models. My work spans high‑throughput RLHF pipelines and tooling that lets researchers iterate easy and fast.

Previously, I received my bachelor's degree from Tsinghua University and my master's degree from Carnegie Mellon University. I was a research engineer at Sea AI Lab in Singapore, advised by Min Lin from May, 2021 to September, 2021. I spent a wonderful time at TSAIL, working with Professor Hang Su and Jun Zhu in the field of Reinforcement Learning from March, 2018 to June, 2020.

I created Tianshou (PyTorch RL library, ) and EnvPool (ultrafast RL environment executor, ). I enjoy open‑source and systems that make RL research easier and faster.

CV / GitHub / Google Scholar / LinkedIn / Ins / Zhihu

Experience

Since July 2022, I have been a research engineer at OpenAI’s Post‑Training team, designing and scaling RLHF pipelines. My contribution to OpenAI's published work includes:

ChatGPT – initial release effort (6th author)
GPT‑4 – RL infra author
GPT‑4V – multimodal RL
GPT‑4 Turbo – post‑training
GPT‑4o – post‑training infra lead
o‑series models – post‑training infra
Operator – early RL effort
GPT‑4.5 – post‑training
RFT (reinforcement fine-tuning) – core RL infra author

For the summer of 2021, I was a full-time research engineer at Sea AI Lab in Singapore, where I worked with Min Lin and Professor Shuicheng Yan. We created a ultrafast RL vectorized environment EnvPool that can train Atari Pong and Mujoco Ant in 5 mins on a laptop, and has released on GitHub.

For the summer of 2019, I was a visiting student researcher at the Montreal Institute for Learning Algorithms (MILA), where I worked with Min Lin and Professor Yoshua Bengio on the Consciousness Prior based on Transformer architecture. [Photo]

Prior to that, I worked on the reinforcement learning algorithm based on the VizDoom platform with Professor Hang Su and Jun Zhu in TSAIL. We proposed an environment-aware hierarchical reinforcement learning architecture and achieved first place in VizDoom AI Competition 2018 Single Player Track(1). I am also the main contributor of Tianshou, an elegant, flexible, and superfast PyTorch deep Reinforcement Learning library.

Publication

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, Shuicheng Yan
Advances in Neural Information Processing Systems, Volume 35, 22409--22421, 2022.

Tianshou: A Highly Modularized Deep Reinforcement Learning Library
Jiayi Weng, Huayu Chen, Dong Yan, Kaichao You, Alexis Duburcq, Minghao Zhang, Yi Su, Hang Su, Jun Zhu
Journal of Machine Learning Research 23 (267), 1--6.

Deep Reinforcement Learning with Credit Assignment for Combinatorial Optimization
Dong Yan, Jiayi Weng, Shiyu huang, Chongxuan Li, Yichi Zhou, Hang Su, Jun Zhu
Pattern Recognition, Volume 124, 2022, 108466, ISSN 0031-3203.

Playing FPS Game with Environment-aware Hierarchical Reinforcement Learning
Shihong Song*, Jiayi Weng*, Hang Su, Dong Yan, Haosheng Zou, Jun Zhu
The 28th International Joint Conferences on Artificial Intelligence (IJCAI 2019). Oral Presentation.

URBER: Ultrafast Rule-Based Escape Routing Method for Large-Scale Sample Delivery Biochips
Jiayi Weng, Tsung-Yi Ho, Weiqing Ji, Peng Liu, Mengdi Bao, Hailong Yao
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2018.

Miscellaneous

I'm also engaged (but an amateur) in Photography [1] [2] / Computer Graphics / Web Security.

, Credits