SYSTEM, METHOD, AND COMPUTER READABLE MEDIUM FOR AFFINE FORMATION MANEUVERING OF NONLINEAR MULTI-AGENT SYSTEMS WITH FAULT-TOLERANT SECURE OPTIMIZED BACKSTEPPING CONTROL USING REINFORCEMENT LEARNING
20260023395 ยท 2026-01-22
Assignee
Inventors
Cpc classification
G05D1/86
PHYSICS
G05D1/695
PHYSICS
International classification
G05D1/695
PHYSICS
Abstract
A system, computer readable storage medium and method for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles is disclosed. The system includes unmanned vehicles, each configured with communication circuitry to communicate between the vehicles. A subset of the unmanned vehicles function as leader vehicles, with the remaining vehicles functioning as follower vehicles for leader-follower maneuvering. The system further includes an actuator suite configured to adjust the direction and orientation of each vehicle, a sensor suite for stabilization and navigation, and a flight controller for maintaining stable maneuvering, even in the presence of actuator faults and sensor deception attacks. Processing circuitry is configured with a reinforcement learning neural network that includes identifier, actor, and critic radial basis function neural networks to estimate movement, adjust control actions, and assess vehicle performance based on feedback signals, including corrupted signals from the sensor suite due to deception attacks.
Claims
1. A system for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles, the system comprising: a plurality of the unmanned vehicles, each having communication circuitry configured to communicate between each unmanned vehicle of the plurality of the unmanned vehicles, wherein a subset of the plurality of the unmanned vehicles are leader vehicles and remaining unmanned vehicles are follower vehicles, for leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles to target positions of the trajectory; an actuator suite to maintain and adjust direction and orientation of a respective unmanned vehicle, a sensor suite to stabilize and navigate the respective unmanned vehicle, a flight controller configured to send a control signal to the actuator suite and receive a feedback signal from the sensor suite, wherein the flight controller maintains stable maneuvering of the respective unmanned vehicle while the actuator suite is subject to an actuator fault and the sensor suite is subject to a deception attack; and processing circuitry configured to control the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles, wherein the maneuvering of the plurality of unmanned vehicles is controlled with a reinforcement learning neural network that includes an identifier radial basis function neural network to estimate nonlinear movement of the plurality of unmanned vehicles, an actor radial basis function neural network to adjust direction and orientation of the respective unmanned vehicle by the respective actuator suite based on the estimated nonlinear movement, and a critic radial basis function neural network to assess the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal by the sensor suite, wherein the feedback signal includes corrupted signals from the sensor suite due to the deception attack.
2. The system of claim 1, wherein the processing circuitry is further configured to train the reinforcement learning neural network to learn a performance function that resets a preassigned convergence time whenever a target formation maneuver changes to maintain transient states of each leader-follower tracking error within a predefined range.
3. The system of claim 2, wherein the processing circuitry is further configured to control the leader vehicles to maneuver in a coordinated time-varying formation including one or more of shape, direction, rotation, scaling, and translation, wherein the follower vehicles track positions of the leader vehicles to achieve the target formation maneuver.
4. The system of claim 1, wherein the communication circuitry uses WiFi for communication with others of the plurality of unmanned vehicles.
5. The system of claim 2, wherein the processing circuitry is further configured for controlling the leader-follower maneuvering in an affine formation of the plurality of unmanned vehicles.
6. The system of claim 5, wherein the processing circuitry is further configured to train the reinforcement learning neural network to learn the performance function using performance-constrained backstepping control, wherein an initial control by the reinforcement learning neural network is used as an intermediate control input, and wherein optimal laws for the backstepping control are obtained from an approximate solution of a Hamilton-Jacobi-Bellman equation using the reinforcement learning.
7. The system of claim 1, wherein the plurality of the unmanned vehicles are unmanned aerial vehicles, each having a plurality of top-mounted rotors to move the unmanned aerial vehicle forward, backward, left, and right by adjusting speed of each rotor, wherein the plurality of top-mounted rotors are driven by the actuator suite.
8. The system of claim 1, wherein the plurality of unmanned vehicles are unmanned aerial vehicles, each having a single rotor and a plurality of movable fins, wherein the single rotor and the plurality of movable fins are driven by the actuator suite.
9. The system of claim 1, further comprising a ground-based controller configured with the processing circuitry, for centralized control of the leader-follower maneuvering of the geometric formation.
10. The system of claim 1, wherein the flight controller of each of the unmanned vehicles executes program instructions to obtain sensor suite data and adjust the unmanned vehicle positioning and rotor speeds based on the sensor suite data.
11. The system of claim 1, wherein the sensor suite in each of the plurality of unmanned vehicles includes a gyroscope, an accelerometer, and magnetometer.
12. The system of claim 1, wherein the sensor suit in the leader vehicles includes one or more sensors for detection of obstacles.
13. A non-transitory computer-readable storage medium including computer executable instructions, wherein the instructions, when executed by a computer, cause the computer to perform a method for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles, the method comprising: controlling a leader-follower maneuvering of the geometric formation of a plurality of unmanned vehicles with a reinforcement learning neural network, wherein a subset of the unmanned vehicles are leader vehicles and remaining unmanned vehicles are follower vehicles, for the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles to target positions of a trajectory, including estimating, by an identifier radial basis function neural network, nonlinear movement of the plurality of unmanned vehicles, adjusting, by an actor radial basis function neural network, direction and orientation of the respective unmanned vehicle by actions that control a respective actuator suite based on the estimated nonlinear movement, wherein the actuator suite is subject to an actuator fault, and assessing, by a critic radial basis function neural network, the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal from a sensor suite, wherein the feedback signal includes corrupted signals from the sensor suite due to a deception attack.
14. The computer-readable storage medium of claim 13, further comprising resetting, by a performance function, a preassigned convergence time whenever a target formation maneuver changes to maintain transient states of each leader-follower tracking error within a predefined range.
15. The computer-readable storage medium of claim 14, further comprising: controlling the leader vehicles to maneuver in a coordinated time-varying formation including one or more of shape, direction, rotation, scaling, and translation; and controlling the follower vehicles to track positions of the leader vehicles to achieve the target formation maneuver.
16. The computer-readable storage medium of claim 14, further comprising controlling the leader-follower maneuvering in an affine formation of the plurality of unmanned vehicles.
17. The computer-readable storage medium of claim 14, further comprising: executing the performance function using performance-constrained backstepping control, wherein an initial control by the reinforcement learning neural network is used as an intermediate control input, and wherein optimal laws for the backstepping control are obtained from an approximate solution of a Hamilton-Jacobi-Bellman equation using the reinforcement learning.
18. A method for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles, the method comprising: controlling a leader-follower maneuvering of the geometric formation of a plurality of unmanned vehicles with a reinforcement learning neural network, wherein a subset of the unmanned vehicles are leader vehicles and remaining unmanned vehicles are follower vehicles, for the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles to target positions of a trajectory, including estimating, by an identifier radial basis function neural network, nonlinear movement of the plurality of unmanned vehicles, adjusting, by an actor radial basis function neural network, direction and orientation of the respective unmanned vehicle by actions that control a respective actuator suite based on the estimated nonlinear movement, wherein the actuator suite is subject to an actuator fault, and assessing, by a critic radial basis function neural network, the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal from a sensor suite, wherein the feedback signal includes corrupted signals from the sensor suite due to a deception attack.
19. The method of claim 18, further comprising resetting, by a performance function, a preassigned convergence time whenever a target formation maneuver changes to maintain transient states of each leader-follower tracking error within a predefined range.
20. The method of claim 19, further comprising: executing the performance function using performance-constrained backstepping control, wherein an initial control by the reinforcement learning neural network is used as an intermediate control input, and wherein optimal laws for the backstepping control are obtained from an approximate solution of a Hamilton-Jacobi-Bellman equation using the reinforcement learning.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] A more complete appreciation of this disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
DETAILED DESCRIPTION
[0040] In the drawings, like reference numerals designate identical or corresponding parts throughout the several views. Further, as used herein, the words a, an and the like generally carry a meaning of one or more, unless stated otherwise.
[0041] Furthermore, the terms approximately, approximate, about, and similar terms generally refer to ranges that include the identified value within a margin of 20%, 10%, or preferably 5%, and any values therebetween.
[0042] Aspects of the present disclosure are directed to a system and method for affine formation maneuver control of nonlinear multi-agent systems, focusing particularly on resilience against actuator faults and deception attacks while maintaining prescribed performance during leader-follower maneuvers. Known formation maneuver control methods for multi-agent systems are limited by their inability to effectively address security threats such as deception attacks and physical faults in actuators, which significantly affect the stability and performance of the overall system.
[0043] For purposes of this disclosure, a multi-agent system can include an unmanned vehicle, particularly an unmanned aerial vehicle (UAV), as well as fleet vehicles that move in a coordinated fashion. The unmanned vehicle is not limited to an aerial vehicle, but can be an unmanned vehicle that travels under water, or in outer space, in a coordinated fashion. Also, the unmanned vehicles can include a combination of different types of unmanned vehicles, only limited by a capability to communicate with each other and perform manoeuvring operations using an embedded control mechanism. Hereinafter, multi-agent systems will be referred to as unmanned vehicles.
[0044] The present disclosure provides a system for controlling the trajectory of coordinated, time-varying maneuvers of a geometric formation of unmanned vehicles. The system comprises multiple unmanned vehicles, including at least one leader vehicle and follower vehicles, wherein each unmanned vehicle is equipped with communication circuitry, an actuator suite, a sensor suite, a flight controller, and processing circuitry. The leader vehicles define the trajectory, while the follower vehicles track the leader to maintain the desired geometric formation. The flight controller is configured to maintain stable maneuvering, even when the actuator suite is subjected to faults and the sensor suite experiences a deception attack.
[0045] The system includes a reinforcement learning framework configured to estimate the nonlinear movement of the unmanned vehicles, adjust the movement based on the estimated dynamics, and evaluate the adjusted movement based on feedback from the sensor suite, which may include corrupted signals due to deception attacks. By implementing the reinforcement learning framework, the system dynamically adapts to disturbances, ensuring stable formation maneuvers under adverse conditions.
[0046]
[0047] Unmanned vehicles, including unmanned aerial vehicles (UAVs), drones, or autonomous vehicles, are robotic systems that operate without human intervention. These vehicles can perform a wide range of tasks, from surveillance and reconnaissance to package delivery and agricultural monitoring. In the present system, the unmanned vehicles 102 are designed to operate in a coordinated geometric formation, which may involve a variety of maneuvers such as translation, rotation, scaling, and complex trajectory following. Each unmanned vehicle 102 can be equipped with multiple rotors, such as top-mounted rotors for aerial vehicles to move the unmanned aerial vehicle forward, backward, left, and right by adjusting speed of each rotor. The top-mounted rotors are driven by the actuator suite, or other propulsion systems depending on the type of vehicle and application. Examples include quadcopters, fixed-wing UAVs, or ground-based autonomous rovers.
[0048] The communication circuitry in each unmanned vehicle 102 is configured for maintaining coordinated movement. Communication circuitry refers to the hardware and software components that facilitate data exchange between vehicles, ensuring all vehicles are aware of each other's positions, speed, and trajectory plans. This data exchange is essential for leader-follower coordination, where leader vehicles define the trajectory, and follower vehicles maintain their relative positions. Different types of communication can be implemented, including wireless communication protocols such as Wi-Fi, Zigbee, LoRa, and 5G cellular networks. For example, Wi-Fi may be employed for short-range, high-speed communication, whereas LoRa may be used for long-range communication with lower data rates in environments with limited infrastructure.
[0049] The user devices 104 and 106, depicted in
[0050] The system 100A also includes a database 108, which is responsible for storing flight-related data, including mission parameters, sensor data, actuator performance metrics, and historical flight records. This data can be used for post-flight analysis, performance optimization, and improving the resilience of the system against faults or cyber-attacks. For example, data stored in database 108 may be used to analyze the effects of actuator faults on vehicle stability or to assess the effectiveness of deception attack countermeasures. The database 108 can be implemented using various types of storage, including cloud-based storage solutions, local servers, or distributed storage systems. Different types of memory that can be utilized for storing the data include non-volatile memory such as solid-state drives (SSD), hard disk drives (HDD), flash memory, and magnetic tape storage, as well as volatile memory like random access memory (RAM) for temporary data processing.
[0051] Each unmanned vehicle 102 is equipped with an actuator suite and a sensor suite. The actuator suite is used to maintain and adjust the direction and orientation of the respective unmanned vehicle. For aerial vehicles, this may involve adjusting the speed of rotors to change altitude or direction, while for ground vehicles, it could involve controlling wheel motors or steering mechanisms. The sensor suite includes various sensors such as gyroscopes, accelerometers, and magnetometers, which provide real-time feedback on the vehicle's position, orientation, and movement. This sensor data is critical for maintaining stability and ensuring that each vehicle accurately follows the intended trajectory, particularly during complex maneuvers or when subjected to external disturbances.
[0052] The system 100A further employs processing circuitry configured for controlling the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles 102. In a preferred embodiment, processing circuitry integrates a reinforcement learning neural network, comprising an identifier radial basis function neural network, an actor radial basis function neural network, and a critic radial basis function neural network. The identifier neural network estimates the nonlinear movement dynamics of the unmanned vehicles, while the actor neural network adjusts the direction and orientation of the vehicles based on these estimates. The critic neural network evaluates the performance of the adjustments using feedback from the sensor suite, which may include corrupted signals due to deception attacks. I should be understood, that the reinforcement learning framework can be configured using other types of neural networks or machine learning algorithms, in part depending on limitations of the hardware implementation. In some embodiments, reinforcement learning can be performed using an optimization control function.
[0053] The leader-follower coordination ensures that the unmanned vehicles 102 maintain a cohesive formation during maneuvers, even in the presence of faults or attacks. The processing circuitry's use of reinforcement learning allows the system to dynamically adapt to changing environmental conditions or unforeseen disturbances, such as actuator faults or sensor deception attacks, thereby maintaining the stability of the formation. For example, if a deception attack alters the sensor feedback for one of the follower vehicles, the critic neural network can identify inconsistencies in the sensor data, allowing the processing circuitry to adjust the control signals to maintain proper formation.
[0054] The communication between unmanned vehicles 102, user devices 104 and 106, and database 108, along with the actuator and sensor suites, provides a comprehensive control system for coordinated time-varying maneuvers of unmanned vehicles. The system 100A, through its combination of leader-follower maneuvering, communication circuitry, reinforcement learning, and adaptive processing, ensures effective control of the geometric formation of unmanned vehicles in dynamic environments.
[0055]
[0056] The system 100B includes multiple unmanned vehicles 102. For purposes of this disclosure, the unmanned vehicles are unmanned aerial vehicles (UAV) that do not carry persons. As noted above, the unmanned vehicles can also include vehicles that travel under water or in outer space. Each of the unmanned vehicles is configured with a communication circuitry 110, a sensor suite 112, an actuator suite 114, a flight controller 116, and a processing circuitry 118. The communication circuitry 110 is configured to facilitate communication between each unmanned vehicle in the system. The communication circuitry 110 is essential for the leader-follower configuration, where leader vehicles coordinate the movements of follower vehicles to maintain the geometric formation and achieve target positions along the desired trajectory. Communication between the unmanned vehicles 102 enables the unmanned vehicles 102 to adjust their positions in real-time based on flight control signals and maintain formation cohesion. The communication circuitry 110 may use wireless communication protocols such as Wi-Fi for short-range, high-speed communication, or LoRa for long-range communication in low-data-rate environments. The ability of the communication circuitry 110 to switch between different protocols based on environmental conditions ensures uninterrupted communication even in challenging terrains or when infrastructure is limited. For example, in an urban environment, Wi-Fi might be used to enable high-speed data exchange, whereas in rural or remote areas, LoRa can provide long-range connectivity with lower power consumption. Other communication technologies that may be used include Zigbee for mesh networking or 5G/6G cellular networks for high bandwidth and low latency.
[0057] The sensor suite 112 is integrated into the unmanned vehicle 102-1 and is configured for stabilizing and navigating the unmanned vehicle 102. The sensor suite 112 includes one or more sensors, such as gyroscopes, accelerometers, and magnetometers that gather real-time data about the vehicle's orientation, speed, and position. Gyroscopes provide data on rotational movements, accelerometers measure linear acceleration, and magnetometers help determine the direction relative to Earth's magnetic field. This sensor data is utilized for maintaining stable flight, particularly in environments where the vehicle may be subjected to actuator faults or deception attacks on the sensor inputs. Additionally, the sensor suite 112 may include obstacle detection sensors, such as LiDAR or ultrasonic sensors, to detect and avoid obstacles in the flight path, enhancing the safety and reliability of the formation. For example, a LiDAR sensor can create a 3D map of the surrounding environment to help the vehicle navigate complex terrains, while ultrasonic sensors can detect obstacles at short ranges, making them suitable for low-speed maneuvers or landing operations. Visual sensors, such as cameras, can also be part of the sensor suite, providing image data that can be used for visual navigation, obstacle avoidance, or target recognition.
[0058] The actuator suite 114 in the unmanned vehicle 102-1 is configured to maintain and adjust direction and orientation of a respective unmanned vehicle 102. The actuator suite 114 adjusts the movement mechanisms of the vehicle, such as its rotors or movable fins, to execute the maneuvers for maintaining formation and trajectory. For aerial vehicles, the actuator suite 114 may include multiple top-mounted rotors, which can adjust the vehicle's altitude and direction by varying the rotor speeds. For example, if the unmanned vehicle 102 is a quadcopter, the actuator suite controls each of the four rotors independently to achieve the desired maneuver. In another example, a fixed-wing UAV may have an actuator suite 114 that includes movable control surfaces such as ailerons, rudders, and elevators to control the roll, yaw, and pitch of the vehicle. The actuator suite 114 works in conjunction with the flight controller 116 to ensure the unmanned vehicle 102 performs as expected, even when actuator faults occur, such as a rotor failure or reduced thrust due to mechanical issues. In the case of ground vehicles, the actuator suite 114 may include motors for wheel control and steering mechanisms to navigate across different terrains.
[0059] The flight controller 116 is configured for sending control signals to the actuator suite 114 and receiving feedback signals from the sensor suite 112. By configuring the flight controller 116, the unmanned vehicle 102 maintains stable maneuvering under challenging conditions, such as when the sensor suite 112 is subjected to deception attacks, or when the actuator suite 114 experiences malfunctions. The flight controller 116 processes real-time data from both the sensor suite 112 and the actuator suite 114 to generate appropriate control signals that ensure stable flight. For instance, if the sensor suite 112 detects a sudden deviation from the intended trajectory, the flight controller 116 immediately adjusts the actuator suite 114 to compensate for the deviation and bring the vehicle back on course. This closed-loop control mechanism is crucial for maintaining stability in dynamic and unpredictable environments. For example, if the vehicle encounters a sudden gust of wind that pushes it off course, the flight controller 116 adjusts the rotor speeds to counteract the disturbance and restore the intended trajectory. The flight controller 116 of each of the unmanned vehicles 102 executes program instructions to use sensor suite data to adjust the unmanned vehicle positioning and rotor speeds.
[0060] The processing circuitry 118 is configured for controlling the leader-follower maneuvering of the geometric formation. The processing circuitry 118 is configured to implement a reinforcement learning neural network that includes an identifier radial basis function neural network, an actor radial basis function neural network, and a critic radial basis function neural network. The identifier radial basis function neural network estimates nonlinear movements of the unmanned vehicle 102 by analyzing sensor data and predicting the vehicle's dynamic behavior. With such estimation, the processing circuitry 118 understands the current state of the vehicle and makes informed decisions regarding its movement. For example, the identifier neural network may predict the vehicle's response to a sudden change in wind speed, allowing the processing circuitry to pre-emptively adjust the control signals to maintain stability.
[0061] The actor radial basis function neural network adjusts the direction and orientation of the vehicle using the actuator suite 114, based on the estimated dynamics provided by the identifier radial basis function neural network. The actor radial basis function neural network determines the optimal control actions needed to maintain the desired trajectory and formation. For example, if the vehicle needs to change altitude to avoid an obstacle, the actor radial basis function neural network will compute the necessary rotor speed adjustments and send these commands to the actuator suite 114. In another instance, if the vehicle is part of a formation that needs to rotate to change its orientation, the actor radial basis function neural network calculates the adjustments required for each rotor or control surface to achieve the coordinated maneuver.
[0062] The critic radial basis function neural network assess the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal by the sensor suite 112. This feedback may include corrupted signals resulting from deception attacks, where adversaries inject false data to disrupt the vehicle's control system. The critic radial basis function neural network analyzes discrepancies between expected and actual sensor readings to determine if the vehicle is deviating from its intended path. If inconsistencies are detected, the critic radial basis function neural network prompts the processing circuitry to adjust the control strategy, ensuring the vehicle continues to perform as required despite interference or faults. For example, if the sensor data suggests that the vehicle is drifting off course due to a deception attack, the critic radial basis function neural network identifies the anomaly and instruct the processing circuitry to apply corrective measures, such as recalibrating the control inputs.
[0063] The reinforcement learning neural network integrated within the processing circuitry 118 allows the unmanned vehicle 102 to adapt its behavior dynamically in response to changing environmental conditions. Adaption includes learning a performance function that resets a preassigned convergence time whenever a target formation maneuver changes, thereby maintaining the transient states of each leader-follower tracking error within a predefined range. Such adaptability is particularly useful during complex maneuvers in a coordinated time-varying formation, such as changing the formation shape, direction, rotation, scaling, or translation, where the leader vehicles dictate the maneuver and the follower vehicles track their positions to achieve the target formation. For instance, if the leader vehicle initiates a scaling maneuver to expand the formation, the processing circuitry in each follower vehicle adjusts its movement to maintain the new distances between vehicles, ensuring the geometric formation is preserved.
[0064] The reinforcement learning neural network is also configured to learn a performance function through a performance-constrained backstepping control methodology. The reinforcement learning neural network initially generates an intermediate control input, which serves as a primary input for achieving stable control within the system in which backstepping control is applied.
[0065] In the backstepping control method, the system is based on the stability provided by the intermediate control input to systematically stabilize each outer control input progressively. Specifically, the reinforcement learning neural network assesses the stability requirements of the intermediate control, By following this structured stabilization process, the system progressively aligns each outer control input to achieve overall system stability and enhanced performance across the control layers.
[0066] In addition to the onboard components, the system 100B may further comprise a ground-based controller configured with the processing circuitry 118 for centralized control of the leader-follower maneuvering. The ground-based controller can be used to set mission objectives, define trajectories, and provide real-time monitoring of the unmanned vehicles 102. Communication between the ground-based controller and the unmanned vehicles 102 may be implemented using wireless communication protocols, ensuring that mission-critical data is exchanged reliably. For example, the ground-based controller may be a laptop equipped with specialized software that allows the operator to input mission parameters, view real-time telemetry data, and make adjustments to the flight plan as needed. In scenarios where multiple unmanned vehicles 102 are deployed, the ground-based controller can also serve as a coordination hub, ensuring that all vehicles operate in sync to achieve the mission goals.
[0067] The unmanned vehicle 102 is also capable of storing flight-related data, such as sensor readings, actuator performance, and mission parameters, in onboard memory. This memory may include both volatile memory, such as RAM, for temporary data processing, and non-volatile memory, such as flash storage or SSDs, for long-term data retention. This stored data can be used for post-mission analysis, system diagnostics, and improving future mission performance through machine learning techniques. For example, after a mission, the data stored in non-volatile memory can be analyzed to identify patterns in actuator performance, which can then be used to optimize future control strategies. Additionally, flight data can be uploaded to a central database for further analysis, enabling the development of predictive maintenance schedules to reduce the likelihood of actuator or sensor failures during missions.
[0068] With integration of communication circuitry 110, sensor suite 112, actuator suite 114, flight controller 116, and processing circuitry 118, the unmanned vehicle 102 operates autonomously and maintains the intended formation, even under challenging conditions such as actuator faults, sensor deception attacks, or environmental disturbances. The use of reinforcement learning and adaptive control strategies further enhances the resilience and reliability of the unmanned vehicle system 100B, making it suitable for a wide range of applications, including surveillance, reconnaissance, and cooperative missions in dynamic environments. For example, in a surveillance mission, the unmanned vehicle can autonomously navigate through an urban area, avoiding obstacles and maintaining formation with other vehicles, while continuously transmitting live video feed to the ground-based controller for real-time monitoring.
[0069]
[0070] Each rotor is attached to a motor that forms part of the actuator suite. When actuated, the motors spin the rotors, producing thrust that enables the UAV to execute precise maneuvers and maintain stability. The motors are controlled via an electronic speed controller (ESC), which regulates the power delivered to each motor, ensuring smooth operation and consistent performance during flight. The rotors are essential for stabilizing the UAV in various flight conditions, including sudden shifts in wind or other environmental factors.
[0071] The UAV depicted in
[0072] Additionally, the UAV is equipped with a camera module that captures high-resolution visual data used for navigation, obstacle detection, or mission-specific tasks, such as surveillance or terrain mapping. The captured video or image data can be transmitted in real-time to a ground control station or user device for further analysis and control. This camera is integral for autonomous operations in complex environments, enabling the UAV to detect obstacles, adjust its flight path, and execute mission tasks without manual intervention.
[0073]
[0074] In this configuration, the sensor suite within the UAV, including devices such as gyroscopes, accelerometers, and magnetometers, continuously monitors the UAV's orientation, altitude, and motion. This sensor data is used to adjust the rotor speed and fin positions, ensuring stable flight and precise maneuvering, even in the presence of actuator faults or external disturbances.
[0075]
[0076] The curve 302 represents the initial transient behavior of the tracking error .sub.i at t=0t=0t=0. Initially, the error .sub.i is at .sub.i(0)-.sub.i0=6. As time progresses, the error decays and converges to the steady-state value .sub.is=0.3 at t=T.sub.s=0.5 seconds. The black curve shows the system's ability to stabilize the error within the specified settling time, demonstrating the initial performance of the proposed function in regulating the transient state.
[0077] The curve 304 shows the system's performance when a sudden decrease in the target trajectory occurs at t=4 seconds. At this point, the finite-time performance function resets the transient state to .sub.i(0)=.sub.i0 and begins to decay toward the steady-state value .sub.is. The red curve demonstrates the decay of the transient state after the reset, with the system settling within Ts=0.5 seconds. This behavior shows the ability of the proposed function to confine the new transient state within a predefined range and ensure convergence within the prescribed finite time.
[0078] The curve 306 illustrates the system's behavior when another sudden change in the target trajectory occurs at t=8 seconds. Once again, the performance function resets the transient state to .sub.i(0)=.sup.i0, and the error decays over time. Then the system settles at the steady-state value .sub.is=0.3 within T.sub.s=0.5 seconds after the transient state begins. This resetting behavior is analysed for ensuring that the system can handle multiple transient states in quick succession while maintaining accuracy and stability.
[0079]
[0080] The curve 308 represents the state trajectory p.sub.i as it tracks the target trajectory p.sub.i when there is a sudden change in amplitude at t=4 seconds. At this point, the system resets the transient state, and the proposed finite-time performance function ensures that the trajectory p.sub.i converges to the new steady-state value within the predefined settling time T.sub.s=0.5 seconds. The curve shows how the system reacts to the first change in the target trajectory and how the performance function handles this transition effectively.
[0081] The curve 310 illustrates the state trajectory pi when the target trajectory pi undergoes another sudden change at t=8 seconds. The performance function resets the transient state at this point, and the trajectory converges to the new steady-state within the finite time T.sub.s=0.5 seconds.
[0082] In the embodiment described with respect to =(, ) is utilized to illustrate the communication among agents, which comprises a set of nodes ={v.sub.1, . . . , v.sub.N} and a set of edges .Math.. The set of neighbors of an i.sup.th agent is indicated by
={j: (i, j)}. For an MAS of N agents in two-dimensional space whose positions are represented by p.sub.i
, i=1, 2, . . . , N, the formation of the agents is denoted by (
, p) with
being the configuration of the formation.
[0083] Define (, q) as the nominal formation of the agents, where
is the constant nominal configuration that the agents desired to form. The target configuration of the agents p with various maneuvers is defined by:
where A(t) realizes the time-varying scaling and rotation maneuvers of the whole formation while b(t)
realizes the translation maneuver of the whole formation with respect to the nominal configuration q. Then, the target position
of each agent from (1) is thus:
[0084] To realize the affine transformation of the formation (, p), a stress .sub.ij=.sub.ji
is assigned to the corresponding edge (i, j). If the stresses applied to the configuration are balanced, then the following relation holds.
[0085] Therefore, the stress is called equilibrium stress. The stress matrix is denoted by and is defined as:
[0086] The configuration of the agents can be divided into leaders and followers as:
where
is the group of leaders and
is the group of followers, N.sub.l and N.sub.f=NN.sub.l are the numbers of leaders and followers, respectively.
[0087] The stress matrix can be partitioned according to the followers and leaders groups as:
where .sub.u, .sub.ff
and .sub.fl
.
[0088] Definition 1: The nominal formation (, q) with q affinely span
is said to be localizable if the target position of the followers
can be uniquely obtained from p.sub.l as follows:
[0089] For Definition 1 to be valid, the nominal formation (, q) is set such that its stress matrix is positive semi-definite, and satisfies rank()=Nd1 and .sub.ff is positive definite, with d=2 being the dimension of the agents in the Euclidean space.
[0090] In a first assumption,
is the vector of the target formation maneuvers of the agents, with
In this disclosure, it can be assumed that the leaders have obtained the desired formation maneuvers i.e.
Therefore, the control design for the leaders will be ignored. The control aim is now to realize
as t.fwdarw.T.sub.s, with T.sub.s is being a finite-time settling time.
[0091] The leader-follower MAS under consideration consists of N.sub.l leaders and N.sub.f followers. The leaders are governed by the following dynamic equations:
where p.sub.i, v.sub.i
and u.sub.i
, i=1,2 . . . , N.sub.l represent the positions, velocities, and control inputs of the leaders, respectively.
[0092] The followers are described by second-order nonlinear dynamic equations as follows:
where p.sub.i and v.sub.i
represent the positions and velocities of the agents, respective, p.sub.i
and v.sub.i
are the positions and velocities of the agents under deception attacks, .sub.i(t, p.sub.i) and .sub.i(t, v.sub.i) are state-dependent deception attacks satisfying .sub.i(t, p.sub.i)=w.sub.p.sub.
, .sub.i
represent the faulty actuator, h.sub.i(p.sub.i, v.sub.i)
is an unknown continuous nonlinear function, g.sub.i
.sup.2 is a diagonal matrix of unknown input gains, .sub.i(t)
are the external disturbances.
[0093] In second assumption, the deception attack coefficients are .sub.p.sub.
[0094] Let .sub.p.sub.
[0095] The model of the actuator fault .sub.i is described as:
where u.sub.i is the control signal of each agent, and .sub.i=[.sub.i1, .sub.i2].sup.T
is the vector of bias fault, m.sub.i=diag{m.sub.i1, m.sub.i2}
is a diagonal matrix of control effectiveness factors, and 0<m.sub.i1, m.sub.i21.
[0096] Considering (9) and (10), (8) can be expressed as follows:
[0097] Defining:
[0098] The global form of (11) is thus:
[0099] Define the following error variables for the followers:
where
and
as in (6).
[0101] Neural network approximations, specifically Radial Basis Function Neural Networks (RBFNNs), are implemented to manage the nonlinear functions that emerge from the design of the disclosed control strategy. In real-world multi-agent systems, it is challenging to represent complex nonlinear dynamics in closed-form equations suitable for real-time computation. RBFNNs approximate these functions, enabling accurate and efficient control.
[0102] The smooth and continuous nonlinear functions derived from the design of the disclosed control strategy are approximated with radial basis function neural networks (RBFNNs) as follows:
where W.sub.i=[w.sub.i1 w.sub.i2 . . . w.sub.im].sup.T is the weight vector, m is the number of nodes, .sub.i(X.sub.i) is the RBFNN reconstruction error, error satisfying .sub.i(X.sub.i).sub.l, .sub.i(X.sub.i)=[.sub.i1(X.sub.i).sub.i2(X.sub.i) . . . .sub.im(X.sub.i)].sup.T is the vector of basis function with:
where .sub.1k is the receptive center of the Gaussian function, .sub.2k is the Gaussian function width.
[0103] The specified performance is attained by confining each leader-follower tracking error .sub.i, i=N.sub.l+1, . . . , N.sub.l+N.sub.f within the following preassigned boundary.
where .sub.i>0 and .fwdarw.
is the prescribed performance function characterized as a positive, smooth, and decreasing function.
[0104] In second definition, a smooth function .sub.i(t): .fwdarw.
is said to be a finite time prescribed performance function if it is characterized by a) .sub.i(t)>0; b) {dot over ()}.sub.i(t)<0; c) lim.sub.t.fwdarw.t.sub.
[0105] The finite time prescribed performance function proposed in this study is defined by:
[0107] The inequality (17) can be transformed into an equality form using the following error transformation:
where e.sub.i is the transformed error, () is a smooth function and :(.sub.i,
[0108] The time derivative of the transformed error yields
where
[0109] To better demonstrate the important features of the performance function, a simple example is presented in with sudden change in amplitudes at t.sub.n=4 s and t.sub.n=8 s. The trajectory p.sub.i experienced new transient states at the time t=t.sub.n=4 s and t=t.sub.n=8 s. It is required that p.sub.i settles at T.sub.s seconds after every new transient state. The finite time performance function (18) can be reset and the new transient states of the tracking error E; can be constrained whenever the amplitude of the target trajectory changes. [0110] Initially, t=t.sub.n=0 and .sub.i(0)=.sub.i0=6. It is expected that as t grows, .sub.i(t) converges to .sub.is=0.3 at t=T.sub.s=0.5s. Then, .sub.i(t)=.sub.is for tT.sub.s. [0111] When there is a sudden decrease in the target trajectory at t=t.sub.n=4 s, tt.sub.n=0 and the finite time performance function is reset to .sub.i(0)=.sub.i0 to confine the new transient state within the predefined range. As t grows (i.e t4>0), the finite time performance function decays and settles at t4=0.5s. For t40.5s, .sub.i(t)=.sub.is. [0112] On the other hand, when the target trajectory suddenly increases at t=t.sub.n=8 s, tt.sub.n=0. At this moment, the finite time performance function is reset such that .sub.i(0)=.sub.i0 and begins decaying for t8>0 until t8=0.5s where it settles at .sub.i(t)=.sub.is. Subsequently, for t80.5s, .sub.i(t)=.sub.is.
[0113]
[0114] Curve 402 represents the initial behavior of the tracking error .sub.i when the performance function is unable to reset the transient state after a sudden change in the target trajectory. Initially, the tracking error .sub.i follows the predefined trajectory, but at t=4 seconds, the system becomes unstable due to the inability of the performance function to adjust to the new transient state. Curve 402 shows the system's failure to maintain stability when faced with sudden reference signal changes.
[0115] Curve 404 shows the tracking error .sub.i experiencing instability after the sudden change at t=4 seconds. The lack of a reset mechanism in the existing performance functions prevents the system from controlling the new transient state, resulting in an erratic response. Curve 404 shows that the tracking error fails to converge to the desired steady-state after the target trajectory shifts, reinforcing the need for a modified performance function that can reset and accommodate multiple transient states.
[0116] Curve 406 further demonstrates the inability of the system to handle the sudden change in the target trajectory. The error remains outside the desired range, and the system cannot stabilize the transient state, leading to prolonged instability. Curve 406 indicates the limitations of existing performance functions (22)-(29) in handling sudden shifts in the target trajectory and tracking errors over time.
[0117]
[0118] Curve 408 shows the behavior of the state trajectory pi as it initially tracks the target trajectory p.sub.i without issues. However, at t=4 seconds, the target trajectory p.sub.i changes abruptly. Due to the limitations of performance functions (22)-(29), the system cannot reset the transient state of the tracking error, leading to instability in the state trajectory. The inability to reset and stabilize is clearly demonstrated in this curve, as the trajectory pi becomes erratic after the sudden change.
[0119] Curve 410 represents the expected trajectory of p.sub.i had the system been able to properly handle the sudden change in the target trajectory at t=4 seconds. Curve 410 shows that the state trajectory would have stabilized if the performance function could reset to accommodate the new transient state. However, the existing performance functions fail to achieve this reset, resulting in divergence from the target trajectory. The divergence shows the core limitation of performance functions (22)-(29) in applications where mobile agents experience multiple transient states.
[0120] Optimal backstepping control laws are obtained from the approximate solution of the Hamilton-Jacobi-Bellman equation using reinforcement learning under the identifier-actor-critic neural networks. Subsequently, reinforcement learning-based optimized secure backstepping control can be realized for nonlinear leader-follower multi-agent systems with deception attacks and actuator faults to be resilient and realize various affine formation maneuvers. The objective of the controller is to ensure that the closed-loop system is bounded despite the actuator and the deceptive state signals injected by cyber-attackers.
Defining:
The global form of (21) is given by:
According to the backstepping approach, v.sub.f will be treated as the intermediate control input. Let .sub.f be the virtual control. Then, define the error z.sub.f=v.sub.f.sub.f, and its time-derivative is calculated as:
[0121] For the purpose of a control scheme, the error dynamics (30) and (31) are transformed as follows:
where
A performance index function associated with es and as is defined as follows:
where
is the cost function. The optimal performance index I*(e.sub.f) associated with the optimal virtual controller
is given by:
where is the set of admissible control inputs. From (35), the following Hamilton-Jacobi-Bellman (HJB) equation can be derived:
where
is the gradient of
along e.sub.f. Suppose the solution of (36) exists and is unique, the optimal control policy
can be achieved by computing
Considering (36) and (37), it is clear that
is required to obtain the solution of (36). Nevertheless, due to the unknown deception attack signals and strong nonlinearities, it is impossible to solve (36). To achieve the control objective, RL actor-critic framework is employed to obtain the approximate solution online.
By using some mathematical manipulations,
can be expressed as follows:
where
Inserting (38) into (37) gives:
Since the continuous functions Y and
are unknown, RBFNNs are employed to approximate them in the following form:
where
are the ideal weights, .sub.(X.sub.) and .sub.lp(X.sub.lp) are the vectors basis functions, and .sub.(X.sub.) and .sub.lp(X.sub.lp) are the RBFNN approximation errors.
Substituting (40) and (41) into (38) and (39), respectively:
where .sub.p=.sub.(X.sub.)+.sub.lp(X.sub.lp).
The optimal control input (43) cannot be used because of the unknown weights
The identifier RBFNN is used to estimate the unknown deceptive signals as:
where .sub. is the weight of the identifier, and {circumflex over ()} is the output of the identifier. The RBFNN weight of the identifier
is updated online by:
where .sub.p is a positive-definite matrix and .sub.p>0 is a small constant.
To obtain the optimized controller, the actor, critic, and identifier framework is developed:
where
is the estimate of
are the estimates of the critic and actor weights, respectively.
Adding (46) and (47) into (36), the approximated HJB equation is derived as:
The Bellman's residual error {tilde over (H)}.sub.p is expressed as:
Considering (36), (49) becomes:
With regards to (48), it is desired that
realizes.
If
ensued, the following also hold:
To derive the weight updating laws for .sub.a.sub.
Clearly, S.sub.p=0 is equivalent to (51). Noting that S.sub.p/.sub.a.sub.
The RBFNN weight of the critic is updated as follows:
where .sub.c.sub.
The RBFNN weight of the actor is updated as follows:
where .sub.a.sub.
Using the approximate optimal controller (47), the error dynamics (32) can be rewritten as:
A Lyapunov candidate function is selected for the er subsystem as:
where
are the RBFNN weight errors for the identifier, actor, and critic networks, respectively.
[0122] Differentiating L.sub.1 with respect to time and considering (45), (54), and (55), one gets:
Inserting (40) and (47) into (58), one gets:
Equation (59) can be re-expressed as follows:
By utilizing Young's inequality, we have:
Substituting the Young's inequality (61) into (60) gives:
Noting that
the following facts are valid:
Combining (63)-(65) with (62) yields:
According to Young's inequality, the following relationship is valid
then
The parameters K.sub.p, .sub.c.sub.
Therefore, based on the selected parameters, (68) becomes:
where
which satisfies and
>0 is a constant. After designing the optimized virtual control
the next step is designing the actual control u. Equation (33) can be rewritten with the approximate
as follows:
A performance index function associated with z.sub.f and u is defined as follows:
where
is the cost function. Let u* be the optimal control and the corresponding optimal performance index I(z.sub.f)* is constructed as:
where is the set of admissible control inputs. The HJB equation associated with (73) is given by:
Following the same procedure as step 1, solving
gives
To realize the control objective, the term
is expressed as follows:
where
Inserting (76) into (75) gives:
The unknown continuous functions F and
can be approximately by RBFNNs as follows:
where
are the ideal weights, .sub.F(X.sub.F) and .sub.Iv(X.sub.Iv) are the vectors basis functions, and .sub.F(X.sub.F) and .sub.Iv(X.sub.Iv) are the RBFNN approximation errors.
Inserting (78) and (79) into (76) and (77), respectively gives:
where .sub.v=2.sub.F(X.sub.F)+.sub.Iv(X.sub.Iv).
The optimal control law (81) is unavailable since the weights
are unknown. To obtain a usable optimized control, an RL architecture using the identifier, critic, and actor networks is constructed as:
where
are the estimates of the identifier, critic, and actor weights, respectively, {circumflex over (F)} and
are the estimates of F and
respectively.
The RBFNN weights of the identifier, critic, and actor are updated by the following update laws:
where .sub.v is a positive-definite matrix, .sub.a.sub.
A Lyapunov candidate function is selected for the z.sub.v subsystem as:
where
are the RBFNN weight errors for the identifier, actor, and critic networks, respectively.
Taking the time derivative of L.sub.2 and using (85), (86), and (87), one has:
Inserting (82) and (84) into (89), one gets:
Evaluating (90) yields:
From Young's inequality, one gets:
Substituting the Young's inequality (92) into (91) yields:
Using
the following facts are valid:
Combining (94)-(96) with (93) yields:
Based on Young's inequality, one gets.
By utilizing (98), one gets:
The parameters C.sub.v, K.sub.v, .sub.c.sub.
Therefore, based on the selected parameters, (99) becomes:
where
which satisfies , and
>0 is a constant.
The inequality (100) can be rewritten as:
where
is the minimum eigen value of (K.sub.pC.sub.v/21.25),
is the minimum Eigen value of (K.sub.vC.sub.v/21.75),
is the minimum eigen value of .sub.Ip.sup.T.sub.f.sub.ff.sub.Ip,
is the minimum eigen value of
is the maximum eigen value of
is the maximum eigen value of
From (101), one can obtain:
Where,
[0123] In first theorem, considering the second-order nonlinear multiagent system (12) with unknown nonlinear dynamics under deception attacks and actuator faults. By using the prescribed performance function (22), the error dynamics (30) & (31), the reinforcement learning-based optimized virtual controller (47) and the optimized overall controller (84) with identifier update laws (45), (85), critics update laws (54), (86), and actors update laws (55), (87), then the tracking errors in the closed-loop system are bounded and the leader-follower affine formation maneuvers are realized.
Integrating both sides of (102) gives:
For all tT.sub.s, the inequality (103) shows that all the error signals e.sub.f, z.sub.f, {tilde over (W)}.sub.a.sub.}.
[0124] It is obvious that increasing or decreasing
in (103) will aid in reducing
and subsequently makes the compact set smaller. This means that one may select the parameter
large enough to make e.sub.f, z.sub.f, {tilde over (W)}.sub.a.sub.
[0125] Compared to the work of affine formation maneuver control of linear multi-agent systems, this disclosure considers the issue of affine formation maneuver control of nonlinear multi-agent systems with preassigned performance. In addition, the disclosed technique is able to counter deception attacks and actuator faults.
[0126] Multiple adaptive laws are used to estimate the upper bounds of the
However, the attack signals are usually time-varying and adaptive laws can only estimate constants. By transforming the multi-agent system to the form in (28) and (29), neural networks approximate the lumped functions (p.sub.f, v.sub.f, .sub.p, .sub.v) and F(p.sub.f, v.sub.f, .sub.p, .sub.v, u) which have taken care of the time-varying attack signals and uncertain dynamics.
[0127]
[0128] The topology 502 consists of seven agents, where agents 1, 2, and 3 represent the leaders, while agents 4, 5, 6, and 7 are designated as the followers. The directed connections between the agents are depicted by lines, representing the nominal formation structure used to maintain coordination during the system's maneuvers.
[0129] Agent 1 is one of the leaders in the system, located on the right side of the topology. It is in communication with agents 2, 3, 4, 5, 6, and 7 via directed edges. These communication connections indicate that Agent 1 is configured for controlling the formation and influencing the motion of all followers, as well as the other leaders. Agent 1, in one implementation, serves as a primary controller in the leader-follower dynamic.
[0130] Agent 2 is another leader in the topology, in communication with agents 1, 4, 5, and 7. These connections demonstrate its role in assisting Agent 1 in maintaining the overall formation, controlling several followers and interacting with Agent 1. The connectivity of Agent 2 ensures that it contributes to the stability and coordination of the system during motion.
[0131] Agent 3, also a leader, is in communication with agents 1, 4, and 6. Similar to Agents 1 and 2, Agent 3 influences the motion of certain followers and maintains direct communication with Agent 1 to ensure proper coordination of the leader-follower structure. The connectivity of Agent 3 provides additional redundancy and robustness to the system's control scheme.
[0132] Agent 4 is a follower in communication with leaders 1, 2, and 3. The directed edges between Agent 4 and the leaders indicates reception of control signals and adjusting its position within the formation accordingly. Interactions of Agent 4 with all three leaders show its contribution in maintaining the geometric structure of the formation.
[0133] Agent 5, another follower, is in communication with leaders 1 and 2. The communication pathways between Agent 5 and the leaders ensure that it remains coordinated within the formation, following the control signals sent by Agents 1 and 2. The connections also indicate that Agent 5 is influenced by multiple leaders, contributing to the overall stability of the system.
[0134] Agent 6, a follower, is in communication with leaders 1 and 3. These connections indicate that Agent 6 is guided by the control inputs from both leaders to maintain its position and trajectory within the formation. The presence of multiple connections to leaders ensures that Agent 6 can respond appropriately to changes in the leader trajectories.
[0135] Agent 7, the final follower, is in communication with leaders 1 and 2. Like the other followers, Agent 7 receives control signals from multiple leaders, ensuring that it remains properly aligned within the formation during maneuvers of the system. The directed edges between Agent 7 and the leaders demonstrate its dependency on the leaders for maintaining its position in the topology.
[0136] A numerical example is illustrated herein to show the efficacy of the various embodiments of the present disclosure. The simulations are carried out on MATLAB/Simulink. The multi-agent system in this study consists of three leaders (N.sub.l=3) and four followers (N.sub.f=4) interacting over the nominal formation topology depicted in
where
[0137] The target formation maneuvers of the leaders are described by the trajectories in (104). Various maneuvers of the leaders' nominal configurations can be obtained by manipulating A(t) and b(t).
[0138] As par Assumption 1,
[0139] The motion of the followers is described by the following second-order nonlinear multi-agent systems.
where
the loss of effectiveness faults are given by m.sub.i1=0.9, m.sub.i2=0.8, the bias faults are given by .sub.i1=0.1 sin (2t)exp(0.67t), .sub.i2=0.1 cos (2t)exp(0.02t).
[0141] The compromised sensor signals of the followers are given as:
[0143] The prescribed performance function is designed as:
Whenever t=t.sub.n, the prescribed performance function reset to .sub.i(t)=2.
[0145] The stress matrix is computed using the approach as follows:
[0146] A Gaussian function is chosen as the activation function of the radial basis function neural network. Each of the identifier, actor, and critic neural networks contains five nodes. The centers of the identifier, actor, and critic neural networks are equally spaced within [33], [55], and [55], respectively. The widths of the identifier, actor, and critic neural networks are 1.5, 2.5, and 2.5, respectively. The initial weights of the identifier, actor, and critic neural networks are chosen as .sub.i(0)=0.1.sub.51&.sub.iF(0)=0.1.sub.51, .sub.ia.sub.
[0147] The parameters of the optimized virtual and real controllers are given as
[0148] The simulation results are illustrated with reference to subsequent figures.
[0149]
[0150] Curve 602 illustrates the initial response of the virtual controller for the first follower. It shows that the virtual controller experiences minor oscillations before stabilizing around t=5 seconds. Curve 602 reflects how a virtual controller of the first follower effectively suppresses the disturbances and maintains stability. Curve 604 represents a virtual controller the second follower, which follows a similar pattern to curve 602. After initial fluctuations, a virtual controller of the second follower stabilizes and remains steady for the remainder of the time period. Curve 606 demonstrates the virtual controller for the third follower. Curve 606 shows a series of transient states before the controller achieves stability after t=10 seconds, indicating the robustness of the system in handling fluctuations. Curve 608 depicts a virtual controller of the fourth follower, which also encounters initial disturbances but stabilizes shortly thereafter, demonstrating the efficiency of the proposed control mechanism in mitigating transient disturbances.
[0151]
[0152] Curve 610 illustrates the virtual controller for the first follower, demonstrating minor oscillations before reaching stability around t=5 seconds, indicating the controller's effective performance in the y-axis. Curve 612 represents the second follower's virtual controller, stabilizing after a similar period of transient disturbances as seen in curve 610. The virtual controller settles into a steady state, showing the system ability to handle initial fluctuations. Curve 614 tracks the virtual controller for the third follower, which exhibits larger oscillations but stabilizes after approximately 10 seconds. Curve 614 shows resilience of the controller in the y-axis under the influence of disturbances. Curve 616 depicts a virtual controller of the fourth follower in the y-axis, showing that, despite transient fluctuations, the controller stabilizes after a brief period, mirroring the patterns seen in the other followers.
[0153]
[0154] Curve 702 represents a control input of the first follower in the x-axis. The curve indicates a rapid correction after the initial disturbance, achieving stability around t=5 seconds and showing effective control input handling. Curve 704 tracks a control input of the second follower, which follows a similar trend as curve 702, stabilizing after initial fluctuations and demonstrating the efficiency of the control system. Curve 706 represents a control input of the third follower, showing significant disturbances between t=20 seconds and t=30 seconds, but eventually stabilizing, indicating how the control input corrects for larger fluctuations. Curve 708 depicts a control input the fourth follower, which follows a similar pattern of early disturbances and subsequent stabilization, reinforcing an ability of the system to manage control inputs effectively.
[0155]
[0156] Curve 710 shows a control input of the first follower in the y-axis, stabilizing quickly after initial disturbances around t=5 seconds, indicating robustness of the system in the y-axis. Curve 712 represents a control input of the second follower in the y-axis, following a similar pattern of stabilization after early fluctuations, confirming the effectiveness of the control mechanism. Curve 714 tracks a control input of the third follower, which experiences larger fluctuations but stabilizes over time, demonstrating an ability of the system to correct significant disturbances. Curve 716 reflects a control input of the effectiveness of fourth follower, following the pattern of initial disturbances followed by stabilization, showing the control input across multiple agents.
[0157]
[0158] Curve 802 represents a trajectory of the first leader, showing smooth transitions after initial disturbances and tracking a stable path. Curve 804 represents a trajectory of the second leader, which closely follows a path of the first leader, showing minor deviations but maintaining stability. Curve 806 depicts a trajectory of the third leader, which also stabilizes after initial fluctuations, maintaining a formation of the leader. Curve 808 tracks a trajectory of the first follower, demonstrating effective tracking of the leaders, with only minor deviations corrected over time.
[0159] Curve 810 represents a trajectory of the second follower, showing similar tracking performance with stable behavior after early disturbances. Curve 812 shows a trajectory of the third follower, which mirrors the performance of the other followers, stabilizing quickly after initial fluctuations. Curve 814 reflects a trajectory of the fourth follower, demonstrating consistent tracking of the leaders with minor deviations corrected smoothly.
[0160]
[0161] Curve 816 shows a trajectory of the first leader in the y-axis, indicating smooth transitions after initial disturbances, following a stable path. Curve 818 represents a trajectory of the second leader, which follows a similar pattern of stability after early fluctuations. Curve 820 depicts a trajectory of the third leader, which stabilizes after early disturbances, maintaining alignment with the other leaders. Curve 822 represents a trajectory of the first follower, demonstrating effective tracking of the leaders with minor deviations that are corrected over time. Curve 824 tracks a trajectory of the second follower, showing similar tracking performance and stability after initial disturbances. Curve 826 reflects a trajectory of the third follower, which quickly stabilizes after minor deviations. Curve 828 demonstrates a trajectory of the fourth follower, showing consistent tracking of the leaders with small corrections.
[0162]
[0163] Curve 902 represents the tracking error for the first leader. Initially, the system encounters minor fluctuations in tracking, but the tracking error stabilizes after t=10 seconds, indicating efficient control over the x-axis for the first leader. Curve 904 shows the tracking error for the second leader. Similar to curve 902, a tracking error of the second leader experiences slight disturbances, followed by stabilization, showing ability of the system to mitigate transient states in the x-axis.
[0164] Curve 906 represents a tracking error of the third leader. After experiencing more significant deviations around t=20 seconds, the tracking error stabilizes as the system adjusts to maneuvers of the leader. Curve 908 represents the first follower's tracking error in the x-axis. The curve reflects slight instability before achieving a steady state after t=15 seconds, demonstrating effective control over a position of the first follower. Curve 910 shows the tracking error for the second follower. The system corrects significant deviations in a trajectory of the second follower, stabilizing the error around t=25 seconds. Curve 912 depicts a tracking error of the third follower, which undergoes transient fluctuations before stabilizing after t=30 seconds, showing the system's success in managing the tracking errors of the followers.
[0165]
[0166] Curve 918 represents the third leader's tracking error, which experiences fluctuations around t=15 seconds but stabilizes over time, showing the system's ability to manage transient disturbances in the y-axis. Curve 920 depicts the first follower's tracking error, showing initial oscillations before stabilizing around t=20 seconds, reflecting the system's control over the y-axis for the first follower. Curve 922 tracks the second follower's tracking error in the y-axis, showing more substantial deviations but eventually stabilizing after t=25 seconds, indicating the system's capacity to correct errors. Curve 924 represents the third follower's tracking error, stabilizing after t=30 seconds, demonstrating that the system effectively manages transient errors in the y-axis.
[0167]
[0168] Curve 1002 represents the trajectories of the leader agents during the affine formation maneuvers. Curve 1002 shows that despite the presence of external disturbances, such as deception attacks and actuator faults, the leaders maintain their predefined formation shape, indicated by the consistent path shown in the graph. Curve 1004 illustrates the trajectories of the follower agents. While the followers experience slight deviations, particularly when subjected to actuator faults and deception attacks, curve 1004 shows that the system's control strategy enables the followers to track the leaders effectively, maintaining the desired formation shape.
[0169]
[0170] Curve 1102 depicts the first leader's trajectory in the x-axis. The curve shows a smooth decline over time, indicating controlled movements of the leader along the x-axis as the system maintains stability. Curve 1104 represents a trajectory of the second leader, which follows a similar path to the first leader, demonstrating coordinated control between the leaders in the x-axis. Curve 1106 illustrates a trajectory of the third leader. The curve reflects consistent tracking of the formation maneuver, with minor deviations corrected over time. Curve 1108 represents a trajectory of the first follower, demonstrating effective tracking of the leaders and maintaining proximity to the predefined formation. Curve 1110 shows a trajectory of the second follower, which aligns closely with the other followers, indicating effective control. Curve 1112 represents a trajectory of the third follower, which also follows a stable path after initial deviations. Curve 1114 depicts a trajectory of the fourth follower, which maintains consistent tracking of the leaders with minor adjustments over time.
[0171]
[0172]
[0173]
[0174] In
[0175] Further, the claimed advancements are not limited by the form of the computer-readable media on which the instructions of the inventive process are stored. For example, the instructions may be stored in FLASH memory, Secure Digital Random Access Memory (SDRAM), Random Access Memory (RAM), Read Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), solid-state hard disk or any other information processing device with which the processing circuit 1326 communicates, such as a server or computer.
[0176] Further, the claimed advancements may be provided as a utility application, background daemon, or component of an operating system, or combination thereof, executing in conjunction with MPU 1300 and a mobile operating system such as Android, Microsoft Windows 10 Mobile, Apple iOS and other systems known to those skilled in the art.
[0177] In order to achieve the processing circuit 1326, the hardware elements may be realized by various circuitry elements, known to those skilled in the art. For example, MPU 1300 may be a Qualcomm mobile processor, a Nvidia mobile processor, an Atom processor from Intel Corporation of America, a Samsung mobile processor, or an Apple A7 mobile processor, or may be other processor types that would be recognized by one of ordinary skill in the art. Alternatively, the MPU 1300 may be implemented on a Field-Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD) or using discrete logic circuits, as one of ordinary skill in the art would recognize. Further, MPU 1300 may be implemented as multiple processors cooperatively working in parallel to perform the instructions of the inventive processes described above.
[0178] The processing circuit 1326 in
[0179] The processing circuit 1326 includes a Universal Serial Bus (USB) controller 1325 which may be managed by the MPU 1300.
[0180] The processing circuit 1326 further includes a display controller 1308, such as a NVIDIA GeForce GTX or Quadro graphics adaptor from NVIDIA Corporation of America for interfacing with display 1310. An I/O interface 1312 interfaces with buttons 1314, such as for volume control. In addition to the I/O interface 1312 and the display 1310, the processing circuit 1326 may further include a microphone 1341 and one or more cameras 1331. The microphone 1341 may have associated circuitry 1340 for processing the sound into digital signals. Similarly, the camera 1331 may include a camera controller 1330 for controlling image capture operation of the camera 1331. In an exemplary aspect, the camera 1331 may include a Charge Coupled Device (CCD). The processing circuit 1326 may include an audio circuit 1342 for generating sound output signals and may include an optional sound output port.
[0181] The power management and touch screen controller 1320 manage power used by the processing circuit 1326 and touch control. The communication bus 1322, which may be an Industry Standard Architecture (ISA), Extended Industry Standard Architecture (EISA), Video Electronics Standards Association (VESA), Peripheral Component Interface (PCI), or similar, for interconnecting all of the components of the processing circuit 1326. A description of the general features and functionality of the display 1310, buttons 1314, as well as the display controller 1308, power management controller 1320, network controller 1306, and I/O interface 1312 is omitted herein for brevity as these features are known.
[0182]
[0183] In some embodiments, the computer system 1400 may include a server CPU and a graphics card by NVIDIA, in which the GPUs have multiple CUDA cores. In some embodiments, the computer system 1400 may include a machine learning engine 1412.
[0184] The present disclosure introduces an optimized, secure, fault-tolerant control strategy with prescribed performance for affine formation maneuvers in nonlinear, second-order, leader-follower multi-agent systems subject to actuator faults and deception attacks. A novel prescribed performance function is proposed, characterized by a preassigned convergence time, capable of resetting whenever the target formation maneuver alters, thereby maintaining the new transient states of leader-follower tracking errors within predefined bounds. Subsequently, an optimized backstepping control approach is developed for the system, leveraging a streamlined identifier-actor-critic reinforcement learning framework. Within this scheme, the identifier network estimates the system's nonlinear dynamics, the actor network executes the control actions, and the critic network evaluates the control performance.
[0185] The above-described hardware description is a non-limiting example of corresponding structure for performing the functionality described herein.
[0186] Numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that the invention may be practiced otherwise than as specifically described herein.