Jiayi Weng 翁家翌
trinkle23897 [at] gmail [dot] com
I am a research engineer at OpenAI, where I built and optimized reinforcement learning
systems to align foundation models. My work spans high‑throughput RLHF pipelines and
tooling that lets researchers iterate easy and fast.
Previously, I received my bachelor's degree from
Tsinghua University and my master's degree from Carnegie Mellon University. I was a
research engineer at Sea AI Lab in Singapore, advised by Min Lin from
May, 2021 to September, 2021. I spent a wonderful time at TSAIL, working with Professor Hang Su and Jun Zhu in the field of
Reinforcement Learning from March, 2018 to June, 2020.
I created Tianshou (PyTorch RL library,
) and EnvPool
(ultrafast RL environment
executor, ). I enjoy open‑source and
systems that make RL research easier and faster.
CV
/
GitHub
/
Google
Scholar
/
LinkedIn
/
Ins
/
Zhihu
|
|
Experience
Since July 2022, I have been a research engineer at OpenAI’s Post‑Training team,
designing and scaling RLHF pipelines. My contribution to OpenAI's published work
includes:
- ChatGPT – initial release effort
(6th author)
- GPT‑4 – RL infra author
- GPT‑4V – multimodal RL
- GPT‑4 Turbo – post‑training
- GPT‑4o – post‑training
infra lead
- o‑series models –
post‑training infra
- Operator – early RL
effort
- GPT‑4.5 – post‑training
- RFT (reinforcement fine-tuning) – core RL infra author
For the summer of 2021, I was a full-time research engineer at Sea AI Lab in Singapore, where I worked with Min Lin and
Professor Shuicheng Yan. We created a ultrafast
RL vectorized environment EnvPool that can train Atari Pong and Mujoco Ant in 5 mins on
a laptop, and has released on GitHub.
For the summer of 2019, I was a visiting student researcher at the Montreal Institute for Learning Algorithms (MILA),
where I worked with Min Lin and
Professor Yoshua Bengio on the Consciousness Prior based on
Transformer architecture. [Photo]
Prior to that, I worked on the reinforcement learning algorithm based on the VizDoom platform with Professor Hang Su and Jun Zhu in TSAIL. We proposed an environment-aware
hierarchical reinforcement learning architecture and achieved first place in VizDoom
AI Competition 2018 Single Player Track(1). I am also the main contributor of Tianshou, an elegant, flexible, and
superfast PyTorch deep Reinforcement Learning library.
|
Publication
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution
Engine
Jiayi Weng,
Min Lin,
Shengyi Huang,
Bo Liu,
Denys
Makoviichuk,
Viktor
Makoviychuk,
Zichen Liu,
Yufan Song,
Ting Luo,
Yukun Jiang,
Zhongwen Xu,
Shuicheng Yan
Advances in Neural Information Processing Systems, Volume 35, 22409--22421,
2022.
Tianshou: A Highly Modularized Deep Reinforcement Learning Library
Jiayi Weng,
Huayu Chen,
Dong Yan,
Kaichao You,
Alexis
Duburcq,
Minghao Zhang,
Yi Su,
Hang Su,
Jun Zhu
Journal of Machine Learning Research 23 (267), 1--6.
Deep Reinforcement Learning with Credit Assignment for Combinatorial
Optimization
Dong Yan,
Jiayi Weng,
Shiyu huang,
Chongxuan
Li,
Yichi Zhou,
Hang Su,
Jun Zhu
Pattern Recognition, Volume 124, 2022, 108466, ISSN 0031-3203.
Playing FPS Game with Environment-aware Hierarchical Reinforcement Learning
Shihong Song*,
Jiayi Weng*,
Hang Su,
Dong Yan,
Haosheng Zou,
Jun Zhu
The 28th International Joint Conferences on Artificial Intelligence (IJCAI 2019).
Oral Presentation.
URBER: Ultrafast Rule-Based Escape Routing Method for Large-Scale Sample
Delivery Biochips
Jiayi Weng,
Tsung-Yi Ho,
Weiqing Ji,
Peng
Liu,
Mengdi Bao,
Hailong Yao
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
(TCAD), 2018.
|
Miscellaneous
I'm also engaged (but an amateur) in Photography [1] [2] / Computer
Graphics / Web Security.
|
|