Patent classifications
G06N5/043
Information provision device, information provision method, and program
To enable provision of appropriate information for a user query even in a case there are multiple information provision modules which are different in answer generation processing. A query sending unit 212 sends a user query to each one of a plurality of information provision module units 220 that are different in the answer generation processing and that each generate an answer candidate for the user query. An output control unit 214 performs control such that the answer candidate acquired from each one of the plurality of information provision module units 220 is displayed on a display unit 300 on a per-agent basis with information on an agent associated with that information provision module unit 220.
Apparatus for Q-learning for continuous actions with cross-entropy guided policies and method thereof
An apparatus for performing continuous actions includes a memory storing instructions, and a processor configured to execute the instructions to obtain a first action of an agent, based on a current state of the agent, using a cross-entropy guided policy (CGP) neural network, and control to perform the obtained first action. The CGP neural network is trained using a cross-entropy method (CEM) policy neural network for obtaining a second action of the agent based on an input state of the agent, and the CEM policy neural network is trained using a CEM and trained separately from the training of the CGP neural network.
Apparatus for Q-learning for continuous actions with cross-entropy guided policies and method thereof
An apparatus for performing continuous actions includes a memory storing instructions, and a processor configured to execute the instructions to obtain a first action of an agent, based on a current state of the agent, using a cross-entropy guided policy (CGP) neural network, and control to perform the obtained first action. The CGP neural network is trained using a cross-entropy method (CEM) policy neural network for obtaining a second action of the agent based on an input state of the agent, and the CEM policy neural network is trained using a CEM and trained separately from the training of the CGP neural network.
MANAGING INFORMATION FOR MODEL TRAINING USING DISTRIBUTED BLOCKCHAIN LEDGER
Embodiments are directed to generating and training a distributed machine learning model using data received from a plurality of third parties using a distributed ledger system, such as a blockchain. As each third party submits data suitable for model training, the data submissions are recorded onto the distributed ledger. By traversing the ledger, the learning platform identifies what data has been submitted and by which parties, and trains a model using the submitted data. Each party is also able to remove their data from the learning platform, which is also reflected in the distributed ledger. The distributed ledger thus maintains a record of which parties submitted data, and which parties removed their data from the learning platform, allowing for different third parties to contribute data for model training, while retaining control over their submitted data by being able to remove their data from the learning platform.
Determining control actions of decision modules
Techniques are described for implementing automated control systems that manipulate operations of specified target systems, such as by modifying or otherwise manipulating inputs or other control elements of the target system that affect its operation (e.g., affect output of the target system). An automated control system may in some situations have a distributed architecture with multiple decision modules that each controls a portion of a target system and operate in a partially decoupled manner with respect to each other, such as by each decision module operating to synchronize its local solutions and proposed control actions with those of one or more other decision modules, in order to determine a consensus with those other decision modules. Such inter-module synchronizations may occur repeatedly to determine one or more control actions for each decision module at a particular time, as well as to be repeated over multiple times for ongoing control.
System and methods for creation of learning agents in simulated environments
A system and methods for generating and applying learning agents in simulated environments, in which an agent simulation is selected, one or more agent goals are received, and agents are created which are individual instances of the agent simulation with each agent having at least one of the agent goals, wherein the agents are used in the execution of an environment simulation which dynamically changes based on the collective behavior of the agents.
System and methods for creation of learning agents in simulated environments
A system and methods for generating and applying learning agents in simulated environments, in which an agent simulation is selected, one or more agent goals are received, and agents are created which are individual instances of the agent simulation with each agent having at least one of the agent goals, wherein the agents are used in the execution of an environment simulation which dynamically changes based on the collective behavior of the agents.
SYSTEMS AND METHODS FOR END-TO-END MULTI-AGENT REINFORCEMENT LEARNING ON A GRAPHICS PROCESSING UNIT
Embodiments provide a fast multi-agent reinforcement learning (RL) pipeline that runs the full RL workflow end-to-end on a single GPU, using a single store of data for simulation roll-outs, inference, and training. Specifically, simulations and agents in each simulation are run in tandem, taking advantage of the parallel capabilities of the GPU. This way, the costly GPU-CPU communication and copying is significantly reduced, and simulation sampling and learning rates are in turn improved. In this way, a large number of simulations may be concurrently run on the GPU, thus largely improving efficiency of the RL training.
SYSTEMS AND METHODS FOR END-TO-END MULTI-AGENT REINFORCEMENT LEARNING ON A GRAPHICS PROCESSING UNIT
Embodiments provide a fast multi-agent reinforcement learning (RL) pipeline that runs the full RL workflow end-to-end on a single GPU, using a single store of data for simulation roll-outs, inference, and training. Specifically, simulations and agents in each simulation are run in tandem, taking advantage of the parallel capabilities of the GPU. This way, the costly GPU-CPU communication and copying is significantly reduced, and simulation sampling and learning rates are in turn improved. In this way, a large number of simulations may be concurrently run on the GPU, thus largely improving efficiency of the RL training.
EXTENSIBLE DIGITAL ASSISTANT INTERFACE USING NATURAL LANGUAGE PROCESSING TO RESPOND TO USER INTENT
A platform, method, and system for a digital assistant are disclosed. The digital assistant uses natural language processing to respond to user queries. The digital assistant is extensible, because it can interface with a variety of devices, with a variety of natural language understanding providers, and with a variety of handlers. One method includes receiving a query from a user device, standardizing the query, and transmitting it to one of a plurality of natural language understanding providers. The digital assistant then receives intent data related to the intent of the query. The digital assistant then transmits the intent data to one of a plurality of handlers. The handler processes the intent data and returns content to the digital assistant. The digital assistant then adapts the content and sends it to the user device.