Effective and Efficient Reinforcement Learning Based on Structural Information Principles (SIDM)

Xianghua Zeng Hao Peng Dingli Su Angsheng Li
School of Computer Science and Engineering, Beihang University School of Cyber Science and Technology, Beihang University School of Computer Science and Engineering, Beihang University School of Computer Science and Engineering, Beihang University

TL;DR  We propose a new, unsupervised, and adaptive Decision-Making framework called SIDM for Reinforcement Learning. This approach handles high complexity environments without manual intervention, and increases sample efficiency and policy effectiveness.

Paper | Code


Abstract  Although Reinforcement Learning (RL) algorithms acquire sequential behavioral patterns through interactions with the environment, their effectiveness in noisy and high-dimensional scenarios typically relies on specific structural priors. This paper proposes an unsupervised and adaptive Decision-Making framework called SIDM for RL, which uses action and state abstractions to address this issue. SIDM improves policy quality, stability, and sample efficiency by up to 32.70%, 88.26%, and 64.86%, respectively.


Framework

Approach  The SIDM processes environmental observations and rewards, retains historical trajectories, and outputs state, abstract action, role set, and skill set to specific downstream algorithms. Initially, we employ encoder-decoder architectures to embed environmental observations and actions, measure feature similarities and eliminate trivial edges to construct state and action graphs. Subsequently, we initialize an encoding tree for each graph, minimize their structural entropy to obtain community partitioning for states and actions, and design an aggregation using assigned entropy as weights to achieve hierarchical abstractions. Finally, we extract abstract elements to construct a transition graph, define and optimize the structural entropy for this directed graph, calculate common path entropy to quantify each transition's sample probability, and introduce an adaptive skill-based learning mechanism.


Videos

Task SIDM SAC SE HIRO HSD
Hurdles
Limbo
Pole Balance
Hurdles Limbo
Stairs