IPIQ

G05B2219/39376

SYSTEMS AND METHODS FOR SKILL LEARNING WITH MULTIPLE CRITICS

20250370432 · 2025-12-04 ·

Naver Corporation

David Emukpere
Bingbing Wu
Julien Perez

Systems and methods are disclosed for determining a policy to recommend transition in a position-representing space for a robotic device using a multi-critic architecture. To learn policy in a multi-critic architecture, a set of critics is defined pertaining to a position-representing space where each critic corresponds to a different objective function such as reach-reward, discovery-reward, and safety-reward. For each one of the critics of the set of critics, a learned value function in position-representing space is determined. The policy is learned based on the weighted feedback of the learned value functions to recommend transitions that are safe in the position-representing space. The multi-critic architecture minimizes interference between multiple reward functions and learns a safe and stable policy for the robotic device.

Patent classifications

G05B2219/39376

SYSTEMS AND METHODS FOR SKILL LEARNING WITH MULTIPLE CRITICS