Jiamin He

I recently received my Computer Science M.Sc. (Thesis) degree at the University of Alberta. I am supervised by Rupam Mahmood. My current research interests lie in artificial intelligence, especially reinforcement learning.

I got my Bachelor's degree in Information and Computing Science at Sun Yat-sen University under the supervision of Hankz Hankui Zhuo. I was previously a research intern at Machine Intelligence Group at Tsinghua University led by Chongjie Zhang.

email  /  GitHub  /  blog

profile photo


Loosely Consistent Emphatic Temporal-Difference Learning.
Jiamin He, Fengdi Che, Wan Yi, A. Rupam Mahmood.
Conference on Uncertainty in Artificial Intelligence (UAI), 2023.
paper | code

The Emphatic Approach to Average-Reward Policy Evaluation.
Jiamin He, Wan Yi, A. Rupam Mahmood.
Deep Reinforcement Learning Workshop at NeurIPS, 2022.

Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration.
Lulu Zheng*, Jiarui Chen*, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao, Chongjie Zhang.
Conference on Neural Information Processing Systems (NeurIPS), 2021.
paper | code

Other Projects

Hindsight Multi-agent Credit Assignment
Vanilla policy gradient methods for MARL suffer from both variance and lack of counterfactuals, making the learning process data inefficient and unstable. We proposed a unified credit assignment framework to simultaneously assign credit over time and agents to address these issues. We proposed the Hindsight Multi-Agent Credit Assignment (HMACA), which uses both hindsight likelihood and counterfactual reasoning to assign credit over timesteps and agents.

Emergent Tool Use in Multi-agent Reinforcement Learning
We used PPO and adversarial training to train two teams of agents in a mixed cooperative-competitive asymmetric game environment we developed. During training, the agents learned to use ladders to cross the walls or block the adversaries, and surprisingly, they also learned to use ladders to speed up even if there were no walls or adversaries nearby. To make learning more efficient, we also adopted Hindsight Credit Assignment to address the temporal credit assignment.

Sparkle Planning Challenge 2019 Entry: SYSU-Planner
We combined two planners into a two-phased planner that performs a fast best-first width search followed by a refinement hill-climbing search and obtained better overall performance. We achieved state-of-the-art of classical planning in 1/3 of the tested domains and outperformed all previous methods in a few domains. Our planner is in third place on the leaderboard and fourth place in the final result.
description | leaderboard


  • 2022.05 ~ Present: Off-Policy Policy Evaluation (University of Alberta)
    See my publications Loosely Consistent Emphatic Temporal-Difference Learning and The Emphatic Approach to Average-Reward Policy Evaluation above.
  • 2021.01 ~ 2021.08: Multi-agent Reinforcement Learning Research (Tsinghua University)
    I focused on multi-agent reinforcement learning during my stay at Tsinghua University. More specifically, I worked on multi-agent exploration and multi-agent credit assignment (See my publication Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration and project Hindsight Multi-agent Credit Assignment above).
  • 2020.09 ~ 2020.12: Multi-agent Reinforcement Learning Research (Parametrix.ai)
    I worked on game AI for a mixed cooperative-competitive asymmetric game environment, training agents to cooperate and defeat the other team. Our agents emerge with interesting tool-use behaviors during training. I also tried to use Hindsight Credit Assignment to address the temporal credit assignment, and later I developed the idea into Hindsight Multi-agent Credit Assignment (see Projects).
  • 2019.10 ~ 2020.07: Image Matting and Video Matting (SenseTime)
    During my time at SenseTime, I was supervised by Kai Chen, working on image matting and video matting. I developed a pipeline (with segmentation, morphological transformation, and matting) for industrial green screen video matting. The technique was integrated into SenseNeo, the AI-generated content advertising platform of SenseTime. I also developed MMEditing with Xintao Wang and Rui Xu. MMEditing is an open source image and video editing toolbox. It now has over 3.8k stars on GitHub. Check it out!

Stolen from Jon Barron