Methods and Device for Autonomous Missile Control

Abstract

The present disclosure provides methods for controlling a guided missile to account for environmental uncertainties and maintain optimal mission performance and minimize error in hitting a defined target anywhere on Earth. First, sensors collect data about the missile's environment, passing the information to storage in the missile's database and processor. Second, the missile's processor manipulates the database with a deep reinforcement learning algorithm producing instructions. Third, the instructions command the missile's control system for optimal control, target engagement, and impact by manipulating the missile's thrust vectors for guidance. In short, the disclosure provides methods for autonomous missile control which command the missile from launch to target with certainty regardless of weather conditions, environment dynamics, or defensive missile interference.

Claims

1. A method for autonomous missile control, the method comprising: loading a missile with a radiation resistant heat shield using niobium alloy on a satellite with a missile docking and launch mechanism, undocking the missile upon a launch command, igniting a missile propulsion and control system by a launch command, guiding a missile using an embedded and trained deep reinforcement learning algorithm, optimizing missile control and guidance by a deep reinforcement learning algorithm, and otherwise following an optimized missile trajectory, collecting visual data and calculating decisions using a defined policy generated by a proximal policy optimization algorithm, controlling action selection associated with thrust commands by the policy, controlling the thrust vector valves by the thrust commands, minimizing time and distance to a defined target by the thrust commands, and striking a defined target.

2. The method of claim 1, wherein a defined target is dynamic and moving.

3. The method of claim 1, wherein a deep reinforcement learning algorithm is a proximal policy optimization algorithm.

4. The method of claim 1, wherein a deep reinforcement learning algorithm is a deep q-network algorithm.

5. The method of claim 1, wherein a deep reinforcement learning algorithm is a deep deterministic policy gradient algorithm.

6. The method of claim 1, wherein a satellite is in geostationary orbit.

7. The method of claim 1, wherein a satellite is in low-Earth orbit.

8. A device for autonomous missile control, the device comprising: a missile sensing data using a mounted a data sensor, collecting, storing, and processing data in an on-board radiation hardened field programmable gate array, a radiation hardened field programmable gate array further comprising an embedded deep reinforcement learning software program processing sensor data, calculating control commands in real-time, creating a point-cloud environment modeling the real world, and generating commands for thrust vector controls, thrust vector controls commanding missile thrust outputs, optimizing a missile during a powered flight path by minimizing distance from a defined target in real time.

9. The device of claim 8, wherein a defined target is moving.

10. The device of claim 8, wherein a deep reinforcement learning algorithm manipulates a missile control system to manipulate thruster output via a direct hardwired network connecting the data sensor to the thrust controls.

11. The device of claim 8, wherein a deep reinforcement learning software program is a proximal policy optimization algorithm.

12. The device of claim 8, wherein a deep reinforcement learning algorithm is a deep q-network algorithm.

13. The device of claim 8, wherein a deep reinforcement learning algorithm is a deep deterministic policy gradient algorithm.

14. The device of claim 8, wherein a defined target is a moving enemy missile.

15. A method for autonomous missile control, the method comprising: engaging in a trajectory toward a target by a missile, using data sensors, receiving data about the trajectory, processing the data in a radiation hardened field programmable gate array, generating a visual mechanism for action value calculation by a reinforcement learning algorithm further receiving the action value calculation in real-time, generating instructions for commanding thrust vector controls by a reinforcement learning algorithm, manipulating the missile body in attitude, roll, pitch, and yaw by thrust vector controls, optimizing guidance and enabling collision avoidance using artificial intelligence technology, the artificial intelligence technology further comprising a neural network and a reinforcement learning computer program, combining a neural network and reinforcement learning algorithm using a deep q-network, controlling the missile during powered flight by a deep q-network, minimizing distance and time from the missile target by thrust vector controls optimized by a reinforcement learning algorithm, and successfully colliding with the missile target directly.

16. The method of claim 15 wherein a reinforcement learning algorithm utilizes two convolutional neural networks for computer vision.

17. The method of claim 15 wherein a reinforcement learning algorithm utilizes a deep neural network for action selection corresponding to thruster control commands optimizing thruster output for target engagement and impact.

18. The method of claim 15 wherein a reinforcement learning algorithm utilizes an artificial neural network for thruster output control by manipulating thrust valves corresponding to controlled propellant release.

19. The method of claim 15 wherein data sensors are inertial navigation and tracking systems.

20. The method of claim 15 wherein data sensors include LiDAR, camera, and video data, aggregating and processing, on board a missile in a field programmable gate array and processing with one convolutional neural network, generating an environment passing, to a trained reinforcement learning agent, taking actions corresponding to optimal control commands.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0052] FIG. 1 is an information flow diagram for missile thrust vector manipulation.

[0053] FIG. 2 is an information flow diagram for missile guidance optimization.

[0054] FIG. 3 is a diagram of a missile.

[0055] FIG. 4 is a diagram of simplified internal missile components.

[0056] FIG. 5 is a diagram of missile trajectory.

[0057] FIG. 6 is a diagram of a satellite missile system.

DETAILED DESCRIPTION OF THE INVENTION

[0058] FIG. 1 is an information flow diagram for missile thrust vector manipulation. In certain embodiments, LiDAR sensors 100 receive information about the missile's environment. A hardwired network 101 receives and relays the LiDAR sensor data. A radiation-hardened field programmable gate array 102 stores and processes the LiDAR data, performing computer vision functions. An embedded deep reinforcement learning algorithm 103 then processes the computer vision functions and identifies actions to take. The actions correspond to an optimized missile control system for guidance 104 by selecting the best action to take at each moment in the missile's trajectory. In turn, these optimized actions manipulate thrust vector controls for the missile guidance until target impact 105.

[0059] FIG. 2 is an information flow diagram for missile guidance optimization. In certain embodiments, various data sensors 200 receive data about the missile's environment. The data is conveyed to a radiation-hardened field programmable gate array 201, operating as the missile's on-board database and processor. An embedded reinforcement learning control algorithm 202 then manipulates the received data to develop optimal actions for computer decision making using artificial intelligence. This, in turn, generates instructions for optimized guidance and thrust vector control 203. As a result, the missile's trajectory is controlled to minimize time and distance to the defined target 204. Ultimately, the missile collides with the target for impact 205.

[0060] FIG. 3 is a diagram of a missile. In certain embodiments, the present disclosure includes a missile body with a radiation-resistant heat shield using niobium alloy 300. The missile's thrust output controls guidance 301, allowing the missile to operate in space environments, above the Karman Line, and in orbit.

[0061] FIG. 4 is a diagram of simplified internal missile components. In certain embodiments, the present disclosure includes a missile with a LIDAR sensor 400, performing computer vision and data acquisition functions. The data may be conveyed to a radiation-hardened field programmable gate array 401, which serves as an on-board database and processor for the missile. A hardwired network 402 may transfer visual data and decision instructions to intelligent thrust vector controls 403. The intelligent thrust vector controls may then control the missile for optimal guidance.

[0062] FIG. 5 is a diagram of missile trajectory. In certain embodiments, the present disclosure includes a missile launch pad 500, where a missile starts at the launch pad 501. The missile enters a trajectory past the Karman line 502, continues in a trajectory in low-Earth orbit 503, then ignites a guided trajectory in low-Earth orbit, and begins re-entry from low-Earth orbit 504. The missile continues in a guided trajectory via powered flight above the Karman line 505 and in a guided trajectory 506 after re-entry. The missile then strikes the target 507. In such embodiments, the missile may pass the Karman line 508 and enter low-Earth orbit 509 for the purpose of long-range capability.

[0063] FIG. 6 is a diagram of a satellite missile system. In certain embodiments, the present disclosure includes a left satellite side panel 600 and a right satellite side panel 601. A satellite with a missile docking and launch mechanism 602 holds the missile in low-Earth orbit, medium Earth, or geostationary orbit. In such embodiments, the satellite includes a loaded and locked guided missile using deep reinforcement learning control 603. The satellite may also include a Space LiDAR sensor 604, which may be attached to a smart satellite body 605.

[0064] In certain embodiments, the disclosure is a method for autonomous missile control involves a missile equipped with a radiation-resistant heat shield using niobium alloy. The missile is loaded onto a satellite featuring a missile docking and launch mechanism. Upon receiving a launch command, the missile undocks from the satellite, and the launch command ignites the missile's control system. The missile is then guided using an embedded and trained deep reinforcement learning algorithm, which optimizes missile control and guidance. Throughout the flight, the missile follows an optimized trajectory, guided by a deep reinforcement learning software program. This program collects visual data and calculates decisions based on a defined policy. The policy controls thrust commands, manipulating the thrust vector valves to minimize time and distance to the defined target, ultimately striking the target accurately.

[0065] In certain embodiments, the disclosure is a device for autonomous missile control. The device comprises a missile equipped with a data sensor mounted for sensing data. This data is collected, stored, and processed onboard using a radiation-hardened field programmable gate array. An embedded deep reinforcement learning software program processes the sensor data in real-time, calculating control commands. Additionally, the program creates a point-cloud environment modeling the real world and generates commands for thrust vector controls. These thrust vector controls command the missile's thrust outputs, optimizing the missile's powered flight path by minimizing the distance from a defined target in real-time.

[0066] In certain embodiments, the disclosure is a method for autonomous missile control. The method involves a missile engaging in a trajectory toward a target, utilizing data sensors to receive data about the trajectory. This data is processed onboard using a radiation-hardened field programmable gate array, which generates a visual mechanism for action value calculation. A reinforcement learning algorithm then receives the action value calculation in real-time and generates instructions for commanding thrust vector controls, manipulating the missile body, optimizing guidance, and enabling collision avoidance. Artificial intelligence technology, comprising a neural network and a reinforcement learning computer program, is utilized for this purpose. These components are combined using a deep Q-network. Throughout powered-flight, the missile is controlled to minimize distance and time from the target, ultimately colliding directly with the target.

[0067] In certain embodiments, LiDAR sensors 100 receive information about the missile's environment. A hardwired network 101 receives and relays the LiDAR sensor data. A radiation hardened field programmable gate array 102, then stores and processes the LiDAR data, performing computer vision functions. Then, an embedded deep reinforcement learning algorithm 103, processes the computer vision functions and identifies actions to take. The actions correspond to an optimized missile control system for guidance 104, by selecting the best action to take for each moment in the missiles trajectory. In turn, these optimized actions manipulate thrust vector controls for the missile guidance until target impact 105.

[0068] In certain embodiments, various data sensors 200 receive data about the missile's environment. The data is conveyed to a radiation hardened field programmable gate array 201, operating as the missile's on-board database and processor. Then, an embedded reinforcement learning control algorithm 202, manipulates received data to develop optimal actions for computer decision making using artificial intelligence. In turn, this generates instructions for optimized guidance and thrust vector control 203. As a result, the missile's trajectory is controlled, such to be optimized missile trajectory for minimizing time and distance to defined target 204. Ultimately, the missile collides with a target for impact 205.

[0069] In certain embodiments, the present disclosure includes a missile body with radiation resistant heat shield using niobium alloy 300. In such embodiments, the missile thrust output controls guidance 301. In turn, this allows for the missile to operate in space environments, above the Karman Line and in orbit.

[0070] In certain embodiments, the present disclosure includes a missile with a LiDAR sensor 400, performing computer vision and data acquisition functions. The data may be conveyed to a radiation hardened field programmable gate array 401, which is an on-board database and processor for the missile. A Hardwired network 402, may transfer visual data and decision instructions to intelligent thrust vector controls 403. The intelligent thrust vector controls may then control the missile for optimal guidance.

[0071] In certain embodiments, the present disclosure includes a missile launch pad 500, where a missile starts at the launch pad 501. A Missile enters a trajectory past Karman line 502. Then, the missile continues in a trajectory in low-Earth orbit 503. The missile then ignites guided trajectory in low-Earth orbit and begins re-entry from low-Earth orbit 504. The missile continues in a guided trajectory via powered flight above the Karman line 505. The missile continues in a guided trajectory 506 after re-entry. Then missile then strikes the target 507. In such embodiments, the missile may pass the Karman line 508; and enter Low-Earth orbit 509 for the purpose of long-range capability.

[0072] In certain embodiments, the present disclosure includes a left satellite side panel 600 and a right satellite side panel 601. A satellite with a missile docking and launch mechanism 602 holding the missile in low-Earth orbit, medium Earth, or Geostationary orbit. In such embodiments, the satellite includes a loaded and locked guided missile using deep reinforcement learning control 603. The satellite may also include a Space LiDAR sensor 604, which may be attached to a smart satellite body 605.

[0073] In certain embodiments, LiDAR sensors 100 receive information about the missile's environment. A hardwired network 401 receives and relays the LiDAR sensor data. A radiation hardened field programmable gate array 102, then stores and processes the LiDAR data, performing computer vision functions. Then, an embedded deep reinforcement learning algorithm 103, processes the computer vision functions and identifies actions to take. In turn, this generates instructions for optimized guidance and thrust vector control 203. As a result, the missile's trajectory is controlled, such to be optimized missile trajectory for minimizing time and distance to defined target 204. Ultimately, the missile collides with a target for impact 205.

[0074] In certain embodiments, various data sensors 200 receive data about the missile's environment. The data is conveyed to a radiation hardened field programmable gate array 201, operating as the missile's on-board database and processor. Then, an embedded reinforcement learning control algorithm 202, manipulates received data to develop optimal actions for computer decision making using artificial intelligence. In such embodiments, the missile body is a radiation resistant heat shield using niobium alloy 300. In such embodiments, the missile thrust output controls guidance 301. As a result, the missile's trajectory is controlled, such to be optimized missile trajectory for minimizing time and distance to defined target 204. Ultimately, the missile collides with a target for impact 205.

[0075] In certain embodiments, the present disclosure includes a missile body with radiation resistant heat shield using niobium alloy 300. The missile is locked and loaded on a satellite with a missile docking and launch mechanism 602. The missile includes an embedded and trained deep reinforcement learning algorithm 103 for optimizing missile control and guidance 104. In such embodiments, the missile thrust output controls guidance 301. The missile follows an optimized missile trajectory for minimizing time and distance to defined target 204, until the missile strikes the defined target 507.

[0076] In certain embodiments, the present disclosure includes a missile with a LiDAR sensor 400, performing computer vision and data acquisition functions. In such embodiments, the missile body may include radiation resistant heat shield using niobium alloy 300. In such embodiments, a radiation hardened field programmable gate array 102, includes an embedded reinforcement learning control algorithm 202, producing instructions for optimized guidance and thrust vector control 203. The thrust vector control instructions power the guided missile system until it hits the defined target 507.

[0077] In certain embodiments, the present disclosure includes a missile launch pad 500, where a missile starts at the launch pad 501. LiDAR data sensors 100 receive data about the missile's environment. The data is conveyed to a radiation hardened field programmable gate array 401, operating as the missile's on-board database and processor. An embedded reinforcement learning control algorithm 202, manipulates received data to develop optimal actions for computer decision making using artificial intelligence. In such embodiments, the missile body is a radiation resistant heat shield using niobium alloy 300. In such embodiments, the missile thrust output controls guidance 301. Then missile then strikes the target 507.

[0078] In certain embodiments, the present disclosure includes a left satellite side panel 600 and a right satellite side panel 601. A satellite with a missile docking and launch mechanism 602 holding the missile in Earth orbit. In such embodiments, the satellite includes a loaded and locked guided missile using deep reinforcement learning control 603. The missile then ignites guided trajectory in low-Earth orbit and begins re-entry from orbit 504. The missile continues in a guided trajectory via powered flight above the Karman line 505. The missile continues in a guided trajectory 506 after re-entry. Then missile then strikes the target 507.

[0079] In embodiments, the present disclosure is a three-step process for intelligent missile control. First, sensors collect data about the missile's environment, passing the information to storage in the missile's database. Second, the missile's processor manipulates the database with a deep reinforcement learning algorithm producing instructions. Third, the instructions command the missile's control system for optimal control, target engagement, and impact.

[0080] In embodiments, the present disclosure includes an autonomous missile system for trajectory optimization. The methods unify the two elements for artificial intelligence robotics control, perception and decision making. For perception, the methods use deep learning. For decision making, the methods use reinforcement learning. And these two methodologies are integrated into a singular deep reinforcement learning algorithm. The unified methods control missile trajectory for various purposes including point-to-point travel. The method's goal is ensuring the missile connects to the target 507 by optimizing trajectory control mechanics 104.

[0081] In embodiments, the present disclosure provides methods for deep learning enabling intelligent missile perception. The present disclosure utilizes data from various robotics control sensors, including GPS, LiDAR, cameras, and video 200. The data is collected in a various databases and information silos, representing the missile's position in continuous state space. A deep learning system's model is the part of the system which analyzes the information.

[0082] In embodiments, the present disclosure utilizes one or more deep neural networks to identify, select, and engage with enemy targets. In the present disclosure, the reward is associated with metrics minimizing time and distance from an engaged target. The reward is a method of teaching the agent what it should do and is meant to formalize the idea of a goal. In the present disclosure the reward may be defined according to optimal trajectory metrics, including location, attitude, or velocity.

[0083] In embodiments the present disclosure identifies and selects a policy which maximizes expected reward for an agent controlling a guided missile using reinforcement learning. In such embodiments, the disclosure includes methods for selecting the optimal policy using machine learning techniques. Such machine learning techniques may utilize artificial neural networks or deep neural networks to accurately predict the optimal policy from a collection of policies.

[0084] In embodiments, in the present disclosure the environment is made up of two types of space, state spaces and action spaces. The state space is made up of a virtual data model of the target. The state space updates in real time in a database and processing system on board the missile. The missile state space is fully observable because the landing site is government by the laws of physics. Within each state space, the action space contains the decisions available to the agent in each state. The action space is continuous because the actions available for attitude control manipulation are valued vectors.

[0085] In embodiments, initially, the agent is presented with a state of the environment. Then, the agent takes an action in the present state advancing to the next state of the environment, where a reward associated with the chosen action is returned. The agent acts according to a policy. Generally, an optimal policy is developed to maximize value. In the present disclosure, the optimal policy corresponds with taking the action maximizing value to the agent. The model continues to the next state, where the agent receives a reward and a set of actions from which to choose, the agent selects an action, the environment returns a reward and the next state. This process continues perpetually until the environment's final state, which for the present disclosure is target impact.

[0086] In certain embodiments, the present disclosure includes a missile with a LiDAR sensor 100, performing computer vision and data acquisition functions. In such embodiments, the missile body may include radiation resistant heat shield using niobium alloy 300. In such embodiments, a radiation hardened field programmable gate array 401, includes a pre-trained deep reinforcement learning control algorithm 202, producing instructions for optimized guidance and thrust vector control 403. The thrust vector control instructions power the guided missile system during trajectory 506 and until connection with the defined target 507.

[0087] In certain embodiments, the present disclosure may be adapted to multiple domains of warfare. For example, a space satellite sensor 300 may be in communication with a land sensor 303 regarding offensive hypersonic missile threats. In other embodiments, the present disclosure may include additional air sensors and communications networks, as well as sea-based communication network, such as submarine sensing. In such embodiments, a multi-domain approach may be taken to ensure defensive hypersonic missile systems 302 engage with and physically deter any threatening offensive hypersonic missile 301 by following optimized flight path trajectories according to deep reinforcement learning control software 103.

[0088] In certain embodiments, a convolutional neural network may compute an approximation of value for each state-action pair.

[00009] $\begin{matrix} Q (s, a;) (s, a) . & (9) \end{matrix}$

[0089] In Equation 9, the function parameters are a function's variables. This algorithm may be trained on simulation data for the purpose of real-world testing and deployment. The state and action variables may be defined according to data labels in a point-cloud, which may result from data sensor fusion.

[0090] In certain embodiments, the behavioral algorithm may be determined by a trained agent. The neural network may iterate until the convergence of the Q-function is determined by the Bellman Equation.

[00010] $\begin{matrix} Q^{*} (s, a) =_{s^{} ~ s} [r + \max_{a^{}} Q^{*} (s^{}, a^{}) .Math. s, a] . & (10) \end{matrix}$

[0091] In Equation 10, the expectation for all states is the reward, a discount factor allows present rewards to have higher value. Equation 10 defines the optimal Q-function and allows the agent to consider the reward from its present state as greater relative to similar rewards in future states.

[0092] In certain embodiments, an optimal policy may pre-trained and developed in a simulation environment. After the optimal policy is defined according to Equation 11.

[00011] $\begin{matrix} ^{*} = Q^{*} (s^{}, a^{}), & (11) \end{matrix}$

[0093] The optimized policy may be embedded in a radiation hardened processor, which sits on-board a defensive hypersonic missile. After launch, the agent maximizes its reward, defined by engagement with the offensive hypersonic missile by making decisions according to the optimal policy.

[0094] In embodiments, initially, the agent is presented with a state of the environment. Then, the agent takes an action in the present state, advancing to the next state of the environment, where a reward associated with the chosen action is returned. The agent acts according to a policy. Generally, an optimal policy is developed to maximize value. In the present disclosure, the optimal policy corresponds with taking the action that maximizes value to the agent. The model progresses to the next state, where the agent receives a reward and is presented with a new set of actions. The agent then selects another action, the environment responds with a reward and transitions to the next state. This iterative process continues until the environment reaches its final state, which in this context is referred to as target impact.

[0095] It is to be understood that while certain embodiments and examples of the disclosure are illustrated herein, the disclosure is not limited to the specific embodiments or forms described and set forth herein. It will be apparent to those skilled in the art that various changes and substitutions may be made without departing from the scope or spirit of the disclosure and the invention is not considered to be limited to what is shown and described in the specification and the embodiments and examples that are set forth therein. Moreover, several details describing structures and processes that are well-known to those skilled in the art and often associated with aerospace technologies and missiles or other aerospace weapons are not set forth in the following description to better focus on the various embodiments and novel features of the disclosure of the present invention. One skilled in the art would readily appreciate that such structures and processes are at least inherently in the invention and in the specific embodiments and examples set forth herein.

[0096] One skilled in the art will readily appreciate that the present disclosure is well adapted to carry out the objectives and obtain the ends and advantages mentioned herein as well as those that are inherent in the invention and in the specific embodiments and examples set forth herein. The embodiments, examples, methods, and compositions described or set forth herein are representative of certain preferred embodiments and are intended to be exemplary and not limitations on the scope of the invention. Those skilled in the art will understand that changes to the embodiments, examples, methods and uses set forth herein may be made that will still be encompassed within the scope and spirit of the invention. Indeed, various embodiments and modifications of the described compositions and methods herein which are obvious to those skilled in the art, are intended to be within the scope of the invention disclosed herein. Moreover, although the embodiments of the present invention are described in reference to use in connection with rockets or missiles, ones of ordinary skill in the art will understand that the principles of the present inventions could be applied to other types of missiles or apparatus in a wide variety of environments, including environments in the atmosphere, in space, on the ground, and underwater.

Methods and Device for Autonomous Missile Control

Inventors

Cpc classification

Classification Explorer

G05D1/242

PHYSICS

Classification Explorer

F41G7/2253

MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING

Classification Explorer

G05D2101/15

PHYSICS

Classification Explorer

F41G7/226

MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING

Classification Explorer

G05D1/243

PHYSICS

Classification Explorer

G05D2109/28

PHYSICS

International classification

Classification Explorer

F41G7/22

MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING

Classification Explorer

G05D1/242

PHYSICS

Classification Explorer

G05D1/243

PHYSICS

Abstract

Claims

Description