SYSTEM, METHOD, AND COMPUTER READABLE MEDIUM FOR AFFINE FORMATION MANEUVERING OF NONLINEAR MULTI-AGENT SYSTEMS WITH FAULT-TOLERANT SECURE OPTIMIZED BACKSTEPPING CONTROL USING REINFORCEMENT LEARNING

Abstract

A system, computer readable storage medium and method for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles is disclosed. The system includes unmanned vehicles, each configured with communication circuitry to communicate between the vehicles. A subset of the unmanned vehicles function as leader vehicles, with the remaining vehicles functioning as follower vehicles for leader-follower maneuvering. The system further includes an actuator suite configured to adjust the direction and orientation of each vehicle, a sensor suite for stabilization and navigation, and a flight controller for maintaining stable maneuvering, even in the presence of actuator faults and sensor deception attacks. Processing circuitry is configured with a reinforcement learning neural network that includes identifier, actor, and critic radial basis function neural networks to estimate movement, adjust control actions, and assess vehicle performance based on feedback signals, including corrupted signals from the sensor suite due to deception attacks.

Claims

1. A system for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles, the system comprising: a plurality of the unmanned vehicles, each having communication circuitry configured to communicate between each unmanned vehicle of the plurality of the unmanned vehicles, wherein a subset of the plurality of the unmanned vehicles are leader vehicles and remaining unmanned vehicles are follower vehicles, for leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles to target positions of the trajectory; an actuator suite to maintain and adjust direction and orientation of a respective unmanned vehicle, a sensor suite to stabilize and navigate the respective unmanned vehicle, a flight controller configured to send a control signal to the actuator suite and receive a feedback signal from the sensor suite, wherein the flight controller maintains stable maneuvering of the respective unmanned vehicle while the actuator suite is subject to an actuator fault and the sensor suite is subject to a deception attack; and processing circuitry configured to control the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles, wherein the maneuvering of the plurality of unmanned vehicles is controlled with a reinforcement learning neural network that includes an identifier radial basis function neural network to estimate nonlinear movement of the plurality of unmanned vehicles, an actor radial basis function neural network to adjust direction and orientation of the respective unmanned vehicle by the respective actuator suite based on the estimated nonlinear movement, and a critic radial basis function neural network to assess the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal by the sensor suite, wherein the feedback signal includes corrupted signals from the sensor suite due to the deception attack.

2. The system of claim 1, wherein the processing circuitry is further configured to train the reinforcement learning neural network to learn a performance function that resets a preassigned convergence time whenever a target formation maneuver changes to maintain transient states of each leader-follower tracking error within a predefined range.

3. The system of claim 2, wherein the processing circuitry is further configured to control the leader vehicles to maneuver in a coordinated time-varying formation including one or more of shape, direction, rotation, scaling, and translation, wherein the follower vehicles track positions of the leader vehicles to achieve the target formation maneuver.

4. The system of claim 1, wherein the communication circuitry uses WiFi for communication with others of the plurality of unmanned vehicles.

5. The system of claim 2, wherein the processing circuitry is further configured for controlling the leader-follower maneuvering in an affine formation of the plurality of unmanned vehicles.

6. The system of claim 5, wherein the processing circuitry is further configured to train the reinforcement learning neural network to learn the performance function using performance-constrained backstepping control, wherein an initial control by the reinforcement learning neural network is used as an intermediate control input, and wherein optimal laws for the backstepping control are obtained from an approximate solution of a Hamilton-Jacobi-Bellman equation using the reinforcement learning.

7. The system of claim 1, wherein the plurality of the unmanned vehicles are unmanned aerial vehicles, each having a plurality of top-mounted rotors to move the unmanned aerial vehicle forward, backward, left, and right by adjusting speed of each rotor, wherein the plurality of top-mounted rotors are driven by the actuator suite.

8. The system of claim 1, wherein the plurality of unmanned vehicles are unmanned aerial vehicles, each having a single rotor and a plurality of movable fins, wherein the single rotor and the plurality of movable fins are driven by the actuator suite.

9. The system of claim 1, further comprising a ground-based controller configured with the processing circuitry, for centralized control of the leader-follower maneuvering of the geometric formation.

10. The system of claim 1, wherein the flight controller of each of the unmanned vehicles executes program instructions to obtain sensor suite data and adjust the unmanned vehicle positioning and rotor speeds based on the sensor suite data.

11. The system of claim 1, wherein the sensor suite in each of the plurality of unmanned vehicles includes a gyroscope, an accelerometer, and magnetometer.

12. The system of claim 1, wherein the sensor suit in the leader vehicles includes one or more sensors for detection of obstacles.

13. A non-transitory computer-readable storage medium including computer executable instructions, wherein the instructions, when executed by a computer, cause the computer to perform a method for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles, the method comprising: controlling a leader-follower maneuvering of the geometric formation of a plurality of unmanned vehicles with a reinforcement learning neural network, wherein a subset of the unmanned vehicles are leader vehicles and remaining unmanned vehicles are follower vehicles, for the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles to target positions of a trajectory, including estimating, by an identifier radial basis function neural network, nonlinear movement of the plurality of unmanned vehicles, adjusting, by an actor radial basis function neural network, direction and orientation of the respective unmanned vehicle by actions that control a respective actuator suite based on the estimated nonlinear movement, wherein the actuator suite is subject to an actuator fault, and assessing, by a critic radial basis function neural network, the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal from a sensor suite, wherein the feedback signal includes corrupted signals from the sensor suite due to a deception attack.

14. The computer-readable storage medium of claim 13, further comprising resetting, by a performance function, a preassigned convergence time whenever a target formation maneuver changes to maintain transient states of each leader-follower tracking error within a predefined range.

15. The computer-readable storage medium of claim 14, further comprising: controlling the leader vehicles to maneuver in a coordinated time-varying formation including one or more of shape, direction, rotation, scaling, and translation; and controlling the follower vehicles to track positions of the leader vehicles to achieve the target formation maneuver.

16. The computer-readable storage medium of claim 14, further comprising controlling the leader-follower maneuvering in an affine formation of the plurality of unmanned vehicles.

17. The computer-readable storage medium of claim 14, further comprising: executing the performance function using performance-constrained backstepping control, wherein an initial control by the reinforcement learning neural network is used as an intermediate control input, and wherein optimal laws for the backstepping control are obtained from an approximate solution of a Hamilton-Jacobi-Bellman equation using the reinforcement learning.

18. A method for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles, the method comprising: controlling a leader-follower maneuvering of the geometric formation of a plurality of unmanned vehicles with a reinforcement learning neural network, wherein a subset of the unmanned vehicles are leader vehicles and remaining unmanned vehicles are follower vehicles, for the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles to target positions of a trajectory, including estimating, by an identifier radial basis function neural network, nonlinear movement of the plurality of unmanned vehicles, adjusting, by an actor radial basis function neural network, direction and orientation of the respective unmanned vehicle by actions that control a respective actuator suite based on the estimated nonlinear movement, wherein the actuator suite is subject to an actuator fault, and assessing, by a critic radial basis function neural network, the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal from a sensor suite, wherein the feedback signal includes corrupted signals from the sensor suite due to a deception attack.

19. The method of claim 18, further comprising resetting, by a performance function, a preassigned convergence time whenever a target formation maneuver changes to maintain transient states of each leader-follower tracking error within a predefined range.

20. The method of claim 19, further comprising: executing the performance function using performance-constrained backstepping control, wherein an initial control by the reinforcement learning neural network is used as an intermediate control input, and wherein optimal laws for the backstepping control are obtained from an approximate solution of a Hamilton-Jacobi-Bellman equation using the reinforcement learning.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] A more complete appreciation of this disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:

[0017] FIG. 1A illustrates an unmanned vehicle system, in accordance with an exemplary aspect of the disclosure;

[0018] FIG. 1B illustrates an unmanned vehicle system configured to control the trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles, in accordance with an exemplary aspect of the disclosure;

[0019] FIG. 2A illustrates a quad-rotor autonomous aerial vehicle, in accordance with an exemplary aspect of the disclosure;

[0020] FIG. 2B illustrates a single rotor unmanned aerial vehicle, in accordance with the exemplary aspect of the disclosure;

[0021] FIG. 3A depicts the behavior of the tracking error over time when a sudden change in the target trajectory occurs at t=4 seconds and t=8 seconds, in accordance with the exemplary aspect of the disclosure;

[0022] FIG. 3B depicts the state trajectory p.sub.i tracking the target trajectory p.sup.i as sudden changes in amplitude occurs at t=4 seconds and t=8 seconds, in accordance with the exemplary aspect of the disclosure;

[0023] FIG. 4A depicts the behavior of the tracking error .sub.i under a sudden change in the target trajectory at t=4 seconds, in accordance with the exemplary aspect of the disclosure;

[0024] FIG. 4B depicts the state trajectory p.sub.i tracking the target trajectory p.sub.i when sudden changes in amplitude occur at t=4 seconds, in accordance with the exemplary aspect of the disclosure;

[0025] FIG. 5 illustrates the leader-follower nominal formation topology for a multi-agent system, in accordance with the exemplary aspect of the disclosure;

[0026] FIG. 6A depicts the behavior of the optimized virtual controllers of leaders and followers of UAV formation in x-axis, in accordance with the exemplary aspect of the disclosure;

[0027] FIG. 6B depicts the behavior of the optimized virtual controllers the of leaders and followers of UAV formation in y-axis, in accordance with the exemplary aspect of the disclosure;

[0028] FIG. 7A depicts the behavior of the control inputs the of leaders and followers of UAV formation in the x-axis, in accordance with the exemplary aspect of the disclosure;

[0029] FIG. 7B depicts the behavior of the control inputs the of leaders and followers of UAV formation in the y-axis, in accordance with the exemplary aspect of the disclosure;

[0030] FIG. 8A illustrates time-varying trajectories the of leaders and followers of UAV formation in the x-axis, in accordance with the exemplary aspect of the disclosure;

[0031] FIG. 8B illustrates time-varying trajectories the of leaders and followers of UAV formation in the y-axis, in accordance with the exemplary aspect of the disclosure;

[0032] FIG. 9A illustrates tracking errors of the of leaders and followers of UAV formation in the x-axis over time in accordance with the exemplary aspect of the disclosure;

[0033] FIG. 9B illustrates tracking errors of the of leaders and followers of UAV formation in the y-axis over time, in accordance with the exemplary aspect of the disclosure;

[0034] FIG. 10 illustrates the affine formation maneuvers of the leader-follower system under adversarial conditions, in accordance with the exemplary aspect of the disclosure;

[0035] FIG. 11A presents the time-varying trajectories of the leaders and followers of UAV formation in the x-axis, in accordance with the exemplary aspect of the disclosure;

[0036] FIG. 11B presents the time-varying trajectories of the leaders and followers of UAV formation in the y-axis, in accordance with the exemplary aspect of the disclosure;

[0037] FIG. 12 illustrates the affine formation maneuvers of the leader-follower system with another existing control method, in accordance with the exemplary aspect of the disclosure;

[0038] FIG. 13 is an illustration of a non-limiting example of details of mobile computing hardware used in the computing system, according to certain embodiments; and

[0039] FIG. 14 is an illustration of a non-limiting example of details of computer system for implementing the machine learning training and inference methods used in the computing system, according to certain embodiments.

DETAILED DESCRIPTION

[0040] In the drawings, like reference numerals designate identical or corresponding parts throughout the several views. Further, as used herein, the words a, an and the like generally carry a meaning of one or more, unless stated otherwise.

[0041] Furthermore, the terms approximately, approximate, about, and similar terms generally refer to ranges that include the identified value within a margin of 20%, 10%, or preferably 5%, and any values therebetween.

[0042] Aspects of the present disclosure are directed to a system and method for affine formation maneuver control of nonlinear multi-agent systems, focusing particularly on resilience against actuator faults and deception attacks while maintaining prescribed performance during leader-follower maneuvers. Known formation maneuver control methods for multi-agent systems are limited by their inability to effectively address security threats such as deception attacks and physical faults in actuators, which significantly affect the stability and performance of the overall system.

[0043] For purposes of this disclosure, a multi-agent system can include an unmanned vehicle, particularly an unmanned aerial vehicle (UAV), as well as fleet vehicles that move in a coordinated fashion. The unmanned vehicle is not limited to an aerial vehicle, but can be an unmanned vehicle that travels under water, or in outer space, in a coordinated fashion. Also, the unmanned vehicles can include a combination of different types of unmanned vehicles, only limited by a capability to communicate with each other and perform manoeuvring operations using an embedded control mechanism. Hereinafter, multi-agent systems will be referred to as unmanned vehicles.

[0044] The present disclosure provides a system for controlling the trajectory of coordinated, time-varying maneuvers of a geometric formation of unmanned vehicles. The system comprises multiple unmanned vehicles, including at least one leader vehicle and follower vehicles, wherein each unmanned vehicle is equipped with communication circuitry, an actuator suite, a sensor suite, a flight controller, and processing circuitry. The leader vehicles define the trajectory, while the follower vehicles track the leader to maintain the desired geometric formation. The flight controller is configured to maintain stable maneuvering, even when the actuator suite is subjected to faults and the sensor suite experiences a deception attack.

[0045] The system includes a reinforcement learning framework configured to estimate the nonlinear movement of the unmanned vehicles, adjust the movement based on the estimated dynamics, and evaluate the adjusted movement based on feedback from the sensor suite, which may include corrupted signals due to deception attacks. By implementing the reinforcement learning framework, the system dynamically adapts to disturbances, ensuring stable formation maneuvers under adverse conditions.

[0046] FIG. 1A illustrates a system 100A for controlling a trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles. The system 100A includes multiple unmanned vehicles 102, depicted as unmanned vehicle 102-1, unmanned vehicle 102-2, unmanned vehicle 102-3, unmanned vehicle 102-4, and unmanned vehicle 102-N, representing the multiple unmanned vehicles, which operate in a leader-follower configuration. Each of the unmanned vehicles 102 is configured with communication circuitry to communicate among the multiple unmanned vehicles 102, wherein a subset of the unmanned vehicles are designated as leader vehicles, while the remaining unmanned vehicles are follower vehicles.

[0047] Unmanned vehicles, including unmanned aerial vehicles (UAVs), drones, or autonomous vehicles, are robotic systems that operate without human intervention. These vehicles can perform a wide range of tasks, from surveillance and reconnaissance to package delivery and agricultural monitoring. In the present system, the unmanned vehicles 102 are designed to operate in a coordinated geometric formation, which may involve a variety of maneuvers such as translation, rotation, scaling, and complex trajectory following. Each unmanned vehicle 102 can be equipped with multiple rotors, such as top-mounted rotors for aerial vehicles to move the unmanned aerial vehicle forward, backward, left, and right by adjusting speed of each rotor. The top-mounted rotors are driven by the actuator suite, or other propulsion systems depending on the type of vehicle and application. Examples include quadcopters, fixed-wing UAVs, or ground-based autonomous rovers.

[0048] The communication circuitry in each unmanned vehicle 102 is configured for maintaining coordinated movement. Communication circuitry refers to the hardware and software components that facilitate data exchange between vehicles, ensuring all vehicles are aware of each other's positions, speed, and trajectory plans. This data exchange is essential for leader-follower coordination, where leader vehicles define the trajectory, and follower vehicles maintain their relative positions. Different types of communication can be implemented, including wireless communication protocols such as Wi-Fi, Zigbee, LoRa, and 5G cellular networks. For example, Wi-Fi may be employed for short-range, high-speed communication, whereas LoRa may be used for long-range communication with lower data rates in environments with limited infrastructure.

[0049] The user devices 104 and 106, depicted in FIG. 1A as a smartphone and a laptop respectively, are used by operators to remotely control and manage the geometric formation of the unmanned vehicles 102. These user devices are equipped with software interfaces that allow operators to adjust flight parameters, initiate trajectory changes, and monitor the overall status of the formation. For instance, user device 104 may be used to provide real-time instructions for changing the formation shape during an operation, while user device 106 can be used to monitor system status, view sensor data, and upload mission plans. Examples of user devices may also include tablets, desktop computers, specialized remote controllers, and wearable devices such as smartwatches or augmented reality (AR) headsets, depending on the operational requirements. The user devices are also capable of communicating with the unmanned vehicles via wireless communication channels to send commands or receive status updates.

[0050] The system 100A also includes a database 108, which is responsible for storing flight-related data, including mission parameters, sensor data, actuator performance metrics, and historical flight records. This data can be used for post-flight analysis, performance optimization, and improving the resilience of the system against faults or cyber-attacks. For example, data stored in database 108 may be used to analyze the effects of actuator faults on vehicle stability or to assess the effectiveness of deception attack countermeasures. The database 108 can be implemented using various types of storage, including cloud-based storage solutions, local servers, or distributed storage systems. Different types of memory that can be utilized for storing the data include non-volatile memory such as solid-state drives (SSD), hard disk drives (HDD), flash memory, and magnetic tape storage, as well as volatile memory like random access memory (RAM) for temporary data processing.

[0051] Each unmanned vehicle 102 is equipped with an actuator suite and a sensor suite. The actuator suite is used to maintain and adjust the direction and orientation of the respective unmanned vehicle. For aerial vehicles, this may involve adjusting the speed of rotors to change altitude or direction, while for ground vehicles, it could involve controlling wheel motors or steering mechanisms. The sensor suite includes various sensors such as gyroscopes, accelerometers, and magnetometers, which provide real-time feedback on the vehicle's position, orientation, and movement. This sensor data is critical for maintaining stability and ensuring that each vehicle accurately follows the intended trajectory, particularly during complex maneuvers or when subjected to external disturbances.

[0052] The system 100A further employs processing circuitry configured for controlling the leader-follower maneuvering of the geometric formation of the plurality of unmanned vehicles 102. In a preferred embodiment, processing circuitry integrates a reinforcement learning neural network, comprising an identifier radial basis function neural network, an actor radial basis function neural network, and a critic radial basis function neural network. The identifier neural network estimates the nonlinear movement dynamics of the unmanned vehicles, while the actor neural network adjusts the direction and orientation of the vehicles based on these estimates. The critic neural network evaluates the performance of the adjustments using feedback from the sensor suite, which may include corrupted signals due to deception attacks. I should be understood, that the reinforcement learning framework can be configured using other types of neural networks or machine learning algorithms, in part depending on limitations of the hardware implementation. In some embodiments, reinforcement learning can be performed using an optimization control function.

[0053] The leader-follower coordination ensures that the unmanned vehicles 102 maintain a cohesive formation during maneuvers, even in the presence of faults or attacks. The processing circuitry's use of reinforcement learning allows the system to dynamically adapt to changing environmental conditions or unforeseen disturbances, such as actuator faults or sensor deception attacks, thereby maintaining the stability of the formation. For example, if a deception attack alters the sensor feedback for one of the follower vehicles, the critic neural network can identify inconsistencies in the sensor data, allowing the processing circuitry to adjust the control signals to maintain proper formation.

[0054] The communication between unmanned vehicles 102, user devices 104 and 106, and database 108, along with the actuator and sensor suites, provides a comprehensive control system for coordinated time-varying maneuvers of unmanned vehicles. The system 100A, through its combination of leader-follower maneuvering, communication circuitry, reinforcement learning, and adaptive processing, ensures effective control of the geometric formation of unmanned vehicles in dynamic environments.

[0055] FIG. 1B illustrates an unmanned vehicle system 100B. The system is configured to control the trajectory of coordinated time-varying maneuvers of a geometric formation of unmanned vehicles. A geometric formation refers to the specific spatial arrangement or configuration that a group of objects, in this context unmanned vehicles, maintain relative to each other while moving. In a geometric formation, the vehicles (e.g., UAVs) are organized to form predetermined shapes, such as a line, triangle, square, or more complex arrangements. The vehicles coordinate their movements to maintain this specific formation while navigating, which can help in tasks like area coverage, surveillance, and efficient navigation through obstacles. The formation is typically managed by leader-follower dynamics, where leader vehicles set the trajectory, and follower vehicles adjust their positions to maintain the intended configuration. This type of formation allows for coordinated, precise, and adaptable group behavior in complex environments.

[0056] The system 100B includes multiple unmanned vehicles 102. For purposes of this disclosure, the unmanned vehicles are unmanned aerial vehicles (UAV) that do not carry persons. As noted above, the unmanned vehicles can also include vehicles that travel under water or in outer space. Each of the unmanned vehicles is configured with a communication circuitry 110, a sensor suite 112, an actuator suite 114, a flight controller 116, and a processing circuitry 118. The communication circuitry 110 is configured to facilitate communication between each unmanned vehicle in the system. The communication circuitry 110 is essential for the leader-follower configuration, where leader vehicles coordinate the movements of follower vehicles to maintain the geometric formation and achieve target positions along the desired trajectory. Communication between the unmanned vehicles 102 enables the unmanned vehicles 102 to adjust their positions in real-time based on flight control signals and maintain formation cohesion. The communication circuitry 110 may use wireless communication protocols such as Wi-Fi for short-range, high-speed communication, or LoRa for long-range communication in low-data-rate environments. The ability of the communication circuitry 110 to switch between different protocols based on environmental conditions ensures uninterrupted communication even in challenging terrains or when infrastructure is limited. For example, in an urban environment, Wi-Fi might be used to enable high-speed data exchange, whereas in rural or remote areas, LoRa can provide long-range connectivity with lower power consumption. Other communication technologies that may be used include Zigbee for mesh networking or 5G/6G cellular networks for high bandwidth and low latency.

[0057] The sensor suite 112 is integrated into the unmanned vehicle 102-1 and is configured for stabilizing and navigating the unmanned vehicle 102. The sensor suite 112 includes one or more sensors, such as gyroscopes, accelerometers, and magnetometers that gather real-time data about the vehicle's orientation, speed, and position. Gyroscopes provide data on rotational movements, accelerometers measure linear acceleration, and magnetometers help determine the direction relative to Earth's magnetic field. This sensor data is utilized for maintaining stable flight, particularly in environments where the vehicle may be subjected to actuator faults or deception attacks on the sensor inputs. Additionally, the sensor suite 112 may include obstacle detection sensors, such as LiDAR or ultrasonic sensors, to detect and avoid obstacles in the flight path, enhancing the safety and reliability of the formation. For example, a LiDAR sensor can create a 3D map of the surrounding environment to help the vehicle navigate complex terrains, while ultrasonic sensors can detect obstacles at short ranges, making them suitable for low-speed maneuvers or landing operations. Visual sensors, such as cameras, can also be part of the sensor suite, providing image data that can be used for visual navigation, obstacle avoidance, or target recognition.

[0058] The actuator suite 114 in the unmanned vehicle 102-1 is configured to maintain and adjust direction and orientation of a respective unmanned vehicle 102. The actuator suite 114 adjusts the movement mechanisms of the vehicle, such as its rotors or movable fins, to execute the maneuvers for maintaining formation and trajectory. For aerial vehicles, the actuator suite 114 may include multiple top-mounted rotors, which can adjust the vehicle's altitude and direction by varying the rotor speeds. For example, if the unmanned vehicle 102 is a quadcopter, the actuator suite controls each of the four rotors independently to achieve the desired maneuver. In another example, a fixed-wing UAV may have an actuator suite 114 that includes movable control surfaces such as ailerons, rudders, and elevators to control the roll, yaw, and pitch of the vehicle. The actuator suite 114 works in conjunction with the flight controller 116 to ensure the unmanned vehicle 102 performs as expected, even when actuator faults occur, such as a rotor failure or reduced thrust due to mechanical issues. In the case of ground vehicles, the actuator suite 114 may include motors for wheel control and steering mechanisms to navigate across different terrains.

[0059] The flight controller 116 is configured for sending control signals to the actuator suite 114 and receiving feedback signals from the sensor suite 112. By configuring the flight controller 116, the unmanned vehicle 102 maintains stable maneuvering under challenging conditions, such as when the sensor suite 112 is subjected to deception attacks, or when the actuator suite 114 experiences malfunctions. The flight controller 116 processes real-time data from both the sensor suite 112 and the actuator suite 114 to generate appropriate control signals that ensure stable flight. For instance, if the sensor suite 112 detects a sudden deviation from the intended trajectory, the flight controller 116 immediately adjusts the actuator suite 114 to compensate for the deviation and bring the vehicle back on course. This closed-loop control mechanism is crucial for maintaining stability in dynamic and unpredictable environments. For example, if the vehicle encounters a sudden gust of wind that pushes it off course, the flight controller 116 adjusts the rotor speeds to counteract the disturbance and restore the intended trajectory. The flight controller 116 of each of the unmanned vehicles 102 executes program instructions to use sensor suite data to adjust the unmanned vehicle positioning and rotor speeds.

[0060] The processing circuitry 118 is configured for controlling the leader-follower maneuvering of the geometric formation. The processing circuitry 118 is configured to implement a reinforcement learning neural network that includes an identifier radial basis function neural network, an actor radial basis function neural network, and a critic radial basis function neural network. The identifier radial basis function neural network estimates nonlinear movements of the unmanned vehicle 102 by analyzing sensor data and predicting the vehicle's dynamic behavior. With such estimation, the processing circuitry 118 understands the current state of the vehicle and makes informed decisions regarding its movement. For example, the identifier neural network may predict the vehicle's response to a sudden change in wind speed, allowing the processing circuitry to pre-emptively adjust the control signals to maintain stability.

[0061] The actor radial basis function neural network adjusts the direction and orientation of the vehicle using the actuator suite 114, based on the estimated dynamics provided by the identifier radial basis function neural network. The actor radial basis function neural network determines the optimal control actions needed to maintain the desired trajectory and formation. For example, if the vehicle needs to change altitude to avoid an obstacle, the actor radial basis function neural network will compute the necessary rotor speed adjustments and send these commands to the actuator suite 114. In another instance, if the vehicle is part of a formation that needs to rotate to change its orientation, the actor radial basis function neural network calculates the adjustments required for each rotor or control surface to achieve the coordinated maneuver.

[0062] The critic radial basis function neural network assess the adjusted direction and orientation of each of the unmanned vehicles based on a feedback signal by the sensor suite 112. This feedback may include corrupted signals resulting from deception attacks, where adversaries inject false data to disrupt the vehicle's control system. The critic radial basis function neural network analyzes discrepancies between expected and actual sensor readings to determine if the vehicle is deviating from its intended path. If inconsistencies are detected, the critic radial basis function neural network prompts the processing circuitry to adjust the control strategy, ensuring the vehicle continues to perform as required despite interference or faults. For example, if the sensor data suggests that the vehicle is drifting off course due to a deception attack, the critic radial basis function neural network identifies the anomaly and instruct the processing circuitry to apply corrective measures, such as recalibrating the control inputs.

[0063] The reinforcement learning neural network integrated within the processing circuitry 118 allows the unmanned vehicle 102 to adapt its behavior dynamically in response to changing environmental conditions. Adaption includes learning a performance function that resets a preassigned convergence time whenever a target formation maneuver changes, thereby maintaining the transient states of each leader-follower tracking error within a predefined range. Such adaptability is particularly useful during complex maneuvers in a coordinated time-varying formation, such as changing the formation shape, direction, rotation, scaling, or translation, where the leader vehicles dictate the maneuver and the follower vehicles track their positions to achieve the target formation. For instance, if the leader vehicle initiates a scaling maneuver to expand the formation, the processing circuitry in each follower vehicle adjusts its movement to maintain the new distances between vehicles, ensuring the geometric formation is preserved.

[0064] The reinforcement learning neural network is also configured to learn a performance function through a performance-constrained backstepping control methodology. The reinforcement learning neural network initially generates an intermediate control input, which serves as a primary input for achieving stable control within the system in which backstepping control is applied.

[0065] In the backstepping control method, the system is based on the stability provided by the intermediate control input to systematically stabilize each outer control input progressively. Specifically, the reinforcement learning neural network assesses the stability requirements of the intermediate control, By following this structured stabilization process, the system progressively aligns each outer control input to achieve overall system stability and enhanced performance across the control layers.

[0066] In addition to the onboard components, the system 100B may further comprise a ground-based controller configured with the processing circuitry 118 for centralized control of the leader-follower maneuvering. The ground-based controller can be used to set mission objectives, define trajectories, and provide real-time monitoring of the unmanned vehicles 102. Communication between the ground-based controller and the unmanned vehicles 102 may be implemented using wireless communication protocols, ensuring that mission-critical data is exchanged reliably. For example, the ground-based controller may be a laptop equipped with specialized software that allows the operator to input mission parameters, view real-time telemetry data, and make adjustments to the flight plan as needed. In scenarios where multiple unmanned vehicles 102 are deployed, the ground-based controller can also serve as a coordination hub, ensuring that all vehicles operate in sync to achieve the mission goals.

[0067] The unmanned vehicle 102 is also capable of storing flight-related data, such as sensor readings, actuator performance, and mission parameters, in onboard memory. This memory may include both volatile memory, such as RAM, for temporary data processing, and non-volatile memory, such as flash storage or SSDs, for long-term data retention. This stored data can be used for post-mission analysis, system diagnostics, and improving future mission performance through machine learning techniques. For example, after a mission, the data stored in non-volatile memory can be analyzed to identify patterns in actuator performance, which can then be used to optimize future control strategies. Additionally, flight data can be uploaded to a central database for further analysis, enabling the development of predictive maintenance schedules to reduce the likelihood of actuator or sensor failures during missions.

[0068] With integration of communication circuitry 110, sensor suite 112, actuator suite 114, flight controller 116, and processing circuitry 118, the unmanned vehicle 102 operates autonomously and maintains the intended formation, even under challenging conditions such as actuator faults, sensor deception attacks, or environmental disturbances. The use of reinforcement learning and adaptive control strategies further enhances the resilience and reliability of the unmanned vehicle system 100B, making it suitable for a wide range of applications, including surveillance, reconnaissance, and cooperative missions in dynamic environments. For example, in a surveillance mission, the unmanned vehicle can autonomously navigate through an urban area, avoiding obstacles and maintaining formation with other vehicles, while continuously transmitting live video feed to the ground-based controller for real-time monitoring.

[0069] FIG. 2A depicts a quad-rotor UAV, illustrating a configuration used for executing coordinated maneuvers in a geometric formation. The UAV includes a main body that houses the primary electronics, sensors, and cameras. The UAV in FIG. 2A features a plurality of top-mounted rotors, which are part of the actuator suite. These rotors generate lift and thrust, allowing the UAV to maintain altitude, change direction, and stabilize during flight.

[0070] Each rotor is attached to a motor that forms part of the actuator suite. When actuated, the motors spin the rotors, producing thrust that enables the UAV to execute precise maneuvers and maintain stability. The motors are controlled via an electronic speed controller (ESC), which regulates the power delivered to each motor, ensuring smooth operation and consistent performance during flight. The rotors are essential for stabilizing the UAV in various flight conditions, including sudden shifts in wind or other environmental factors.

[0071] The UAV depicted in FIG. 2A operates within a system where it may function as either a leader vehicle or a follower vehicle, depending on its role in the formation. Communication circuitry within the UAV facilitates communication with other UAVs in the formation, exchanging flight control signals to maintain the designated trajectory. The UAV's sensor suite provides real-time feedback on its position, orientation, and environmental conditions, enabling effective navigation.

[0072] Additionally, the UAV is equipped with a camera module that captures high-resolution visual data used for navigation, obstacle detection, or mission-specific tasks, such as surveillance or terrain mapping. The captured video or image data can be transmitted in real-time to a ground control station or user device for further analysis and control. This camera is integral for autonomous operations in complex environments, enabling the UAV to detect obstacles, adjust its flight path, and execute mission tasks without manual intervention.

[0073] FIG. 2B illustrates a single rotor unmanned aerial vehicle, depicting a single-rotor UAV with multiple movable fins. The single rotor provides the necessary thrust to lift the UAV off the ground, while the movable fins are driven by the actuator suite that controls the UAV's direction and orientation during flight. With such configuration, the UAV to operate efficiently in different flight modes, whether hovering or moving along a designated flight path.

[0074] In this configuration, the sensor suite within the UAV, including devices such as gyroscopes, accelerometers, and magnetometers, continuously monitors the UAV's orientation, altitude, and motion. This sensor data is used to adjust the rotor speed and fin positions, ensuring stable flight and precise maneuvering, even in the presence of actuator faults or external disturbances.

[0075] FIG. 3A represents the behavior of the tracking error .sup.i over time when a sudden change in the target trajectory p.sub.i occurs at t=4 seconds and t=8 seconds.

[0076] The curve 302 represents the initial transient behavior of the tracking error .sub.i at t=0t=0t=0. Initially, the error .sub.i is at .sub.i(0)-.sub.i0=6. As time progresses, the error decays and converges to the steady-state value .sub.is=0.3 at t=T.sub.s=0.5 seconds. The black curve shows the system's ability to stabilize the error within the specified settling time, demonstrating the initial performance of the proposed function in regulating the transient state.

[0077] The curve 304 shows the system's performance when a sudden decrease in the target trajectory occurs at t=4 seconds. At this point, the finite-time performance function resets the transient state to .sub.i(0)=.sub.i0 and begins to decay toward the steady-state value .sub.is. The red curve demonstrates the decay of the transient state after the reset, with the system settling within Ts=0.5 seconds. This behavior shows the ability of the proposed function to confine the new transient state within a predefined range and ensure convergence within the prescribed finite time.

[0078] The curve 306 illustrates the system's behavior when another sudden change in the target trajectory occurs at t=8 seconds. Once again, the performance function resets the transient state to .sub.i(0)=.sup.i0, and the error decays over time. Then the system settles at the steady-state value .sub.is=0.3 within T.sub.s=0.5 seconds after the transient state begins. This resetting behavior is analysed for ensuring that the system can handle multiple transient states in quick succession while maintaining accuracy and stability.

[0079] FIG. 3B represents the state trajectory pi tracking the target trajectory p.sub.i as it undergoes sudden changes in amplitude at t=4 seconds and t=8 seconds. The behavior of the trajectory is represented by two curves, curve 308 and curve 310.

[0080] The curve 308 represents the state trajectory p.sub.i as it tracks the target trajectory p.sub.i when there is a sudden change in amplitude at t=4 seconds. At this point, the system resets the transient state, and the proposed finite-time performance function ensures that the trajectory p.sub.i converges to the new steady-state value within the predefined settling time T.sub.s=0.5 seconds. The curve shows how the system reacts to the first change in the target trajectory and how the performance function handles this transition effectively.

[0081] The curve 310 illustrates the state trajectory pi when the target trajectory pi undergoes another sudden change at t=8 seconds. The performance function resets the transient state at this point, and the trajectory converges to the new steady-state within the finite time T.sub.s=0.5 seconds.

[0082] In the embodiment described with respect to FIG. 1A to 1C, graph theory is applied for modeling the communication and coordination among multiple agents in a multi-agent system (MAS) in two-dimensional space. The graph theory provides the mathematical foundation for defining and achieving the desired leader-follower formation topology and ensuring that agents can track their target positions while remaining coordinated in the presence of disturbances, faults, or changes in formation configuration. An undirected graph custom-character =(, ) is utilized to illustrate the communication among agents, which comprises a set of nodes ={v.sub.1, . . . , v.sub.N} and a set of edges .Math.. The set of neighbors of an i.sup.th agent is indicated by ={j: (i, j)}. For an MAS of N agents in two-dimensional space whose positions are represented by p.sub.i custom-character , i=1, 2, . . . , N, the formation of the agents is denoted by (, p) with

[00001] $p = {[\begin{matrix} p_{1}^{T} & p_{2}^{T} & .Math. & p_{N}^{T} \end{matrix}]}^{T}^{2 N}$

being the configuration of the formation.

[0083] Define ( custom-character , q) as the nominal formation of the agents, where

[00002] $q = [\begin{matrix} q_{1}^{T} & {q_{2}^{T} .Math. q_{N}^{T}]}^{T} \end{matrix} = {[\begin{matrix} q_{l}^{T} & q_{f}^{T} \end{matrix}]}^{T}^{2 N}$

is the constant nominal configuration that the agents desired to form. The target configuration of the agents p custom-character with various maneuvers is defined by:

[00003] $\begin{matrix} p^{} (t) = [I_{N} .Math. A (t)] q + 1_{N} .Math. b (t) & (1) \end{matrix}$

where A(t) custom-character realizes the time-varying scaling and rotation maneuvers of the whole formation while b(t) realizes the translation maneuver of the whole formation with respect to the nominal configuration q. Then, the target position

[00004] $p_{i}^{}^{2}$

of each agent from (1) is thus:

[00005] $\begin{matrix} p_{i}^{} (t) = A (t) q_{i} + b (t) & (2) \end{matrix}$

[0084] To realize the affine transformation of the formation ( custom-character , p), a stress .sub.ij=.sub.ji is assigned to the corresponding edge (i, j). If the stresses applied to the configuration are balanced, then the following relation holds.

[00006] $\begin{matrix} {.Math.}_{j_{i}}_{ij} (p_{j} - p_{i}) = 0, i & (3) \end{matrix}$

[0085] Therefore, the stress is called equilibrium stress. The stress matrix is denoted by custom-character and is defined as:

[00007] ${[]}_{ij} = {\begin{matrix} 0, & i j, (i, j) .Math. \\ -_{ij}, & i j, (i, j) \\ \underset{k_{i}}{.Math.}_{ik}, & i = j \end{matrix} .$

[0086] The configuration of the agents can be divided into leaders and followers as:

[00008] $\begin{matrix} p = {[\begin{matrix} p_{l}^{T} & p_{f}^{T} \end{matrix}]}^{T} & (4) \end{matrix}$

where

[00009] $p_{l} = {[p_{1}^{T} p_{2}^{T} .Math. p_{N_{l}}^{T}]}^{T}$

is the group of leaders and

[00010] $p_{f} = {[p_{N_{l} + 1}^{T} p_{N_{l} + 2}^{T} .Math. p_{N_{l} + N_{f}}^{T}]}^{T}$

is the group of followers, N.sub.l and N.sub.f=NN.sub.l are the numbers of leaders and followers, respectively.

[0087] The stress matrix can be partitioned according to the followers and leaders groups as:

[00011] $\begin{matrix} \overset{}{} = .Math. I_{2} = [\begin{matrix} _{ll} & _{lf} \\ _{fl} & _{ff} \end{matrix}] .Math. I_{2} = [\begin{matrix} {\overline{}}_{ll} & {\overline{}}_{lf} \\ {\overline{}}_{fl} & {\overline{}}_{ff} \end{matrix}] . & (5) \end{matrix}$

where .sub.u custom-character , .sub.ff and .sub.fl.

[0088] Definition 1: The nominal formation ( custom-character , q) with q affinely span is said to be localizable if the target position of the followers

[00012] $p_{f}^{}$

can be uniquely obtained from p.sub.l as follows:

[00013] $\begin{matrix} p_{f}^{} = - (_{ff}^{- 1}_{fl} .Math. I_{d}) p_{l} . & (6) \end{matrix}$

[0089] For Definition 1 to be valid, the nominal formation ( custom-character , q) is set such that its stress matrix is positive semi-definite, and satisfies rank()=Nd1 and .sub.ff is positive definite, with d=2 being the dimension of the agents in the Euclidean space.

[0090] In a first assumption,

[00014] $p^{} = {[\begin{matrix} p_{f}^{T} & p_{l}^{T} \end{matrix}]}^{T}$

is the vector of the target formation maneuvers of the agents, with

[00015] $p_{l}^{} = {[p_{1}^{T} p_{2}^{T} .Math. p_{N_{l}}^{T}]}^{T} and p_{f}^{} = {[p_{N_{l} + 1}^{T} p_{N_{l} + 2}^{T} .Math. p_{N_{l} + N_{f}}^{T}]}^{T} .$

In this disclosure, it can be assumed that the leaders have obtained the desired formation maneuvers i.e.

[00016] $p_{l} = p_{l}^{} and u_{l} = u_{l}^{} .$

Therefore, the control design for the leaders will be ignored. The control aim is now to realize

[00017] $p_{f} .fwdarw. p_{f}^{}$

as t.fwdarw.T.sub.s, with T.sub.s is being a finite-time settling time.

[0091] The leader-follower MAS under consideration consists of N.sub.l leaders and N.sub.f followers. The leaders are governed by the following dynamic equations:

[00018] $\begin{matrix} {\begin{matrix} {\overset{}{p}}_{i} = v_{i} \\ {\overset{}{v}}_{i} = u_{i}, i = 1, 2, .Math., N_{l} \end{matrix} & (7) \end{matrix}$

where p.sub.i custom-character , v.sub.i and u.sub.i, i=1,2 . . . , N.sub.l represent the positions, velocities, and control inputs of the leaders, respectively.

[0092] The followers are described by second-order nonlinear dynamic equations as follows:

[00019] $\begin{matrix} {\begin{matrix} {\dot{p}}_{i} = v_{i} \\ {\dot{v}}_{i} = g_{i} (p_{i}, v_{i}) {\overset{}{u}}_{i} + h_{i} (p_{i}, v_{i}) +_{i} (t) \\ {\overline{p}}_{i} = p_{i} +_{i} (t, p_{i}) \\ {\overline{v}}_{i} = v_{i} +_{i} (t, v_{i}), i = N_{l} + 1, .Math., N_{l} + N_{f} \end{matrix} & (8) \end{matrix}$

where p.sub.i custom-character and v.sub.i represent the positions and velocities of the agents, respective, p.sub.i and v.sub.i are the positions and velocities of the agents under deception attacks, .sub.i(t, p.sub.i) and .sub.i(t, v.sub.i) are state-dependent deception attacks satisfying .sub.i(t, p.sub.i)=w.sub.p.sub.ip.sub.i and .sub.i(t, v.sub.i)=w.sub.v.sub.iv.sub.i, with w.sub.p.sub.i and w.sub.v.sub.i are time-varying and unknown, u.sub.i custom-character , .sub.i represent the faulty actuator, h.sub.i(p.sub.i, v.sub.i) is an unknown continuous nonlinear function, g.sub.i.sup.2 is a diagonal matrix of unknown input gains, .sub.i(t) are the external disturbances.

[0093] In second assumption, the deception attack coefficients are .sub.p.sub.i=(1+w.sub.p.sub.i) and .sub.v.sub.i=(1+w.sub.v.sub.i), 1+w.sub.p.sub.i0 and 1+w.sub.v.sub.i0 throughout the duration of the attack. The second assumption means that the system will remain controllable during the attack.

[0094] Let .sub.p.sub.i=(1+w.sub.p.sub.i) and .sub.v.sub.i=(1+w.sub.p.sub.i). Then,

[00020] $\begin{matrix} {\begin{matrix} {\overline{p}}_{i} =_{p_{i}} p_{i} \\ {\overline{v}}_{i} =_{v_{i}} v_{i} \end{matrix}, i = N_{l} + 1, .Math., N_{l} + N_{f} & (9) \end{matrix}$

[0095] The model of the actuator fault .sub.i is described as:

[00021] $\begin{matrix} {\overset{}{u}}_{i} = m_{i} u_{i} +_{i} & (10) \end{matrix}$

where u.sub.i custom-character is the control signal of each agent, and .sub.i=[.sub.i1, .sub.i2].sup.T is the vector of bias fault, m.sub.i=diag{m.sub.i1, m.sub.i2} is a diagonal matrix of control effectiveness factors, and 0<m.sub.i1, m.sub.i21.

[0096] Considering (9) and (10), (8) can be expressed as follows:

[00022] $\begin{matrix} {\begin{matrix} {\dot{\overline{p}}}_{i} = & _{pi}_{vi}^{- 1} {\overline{v}}_{i} + {\dot{}}_{pi}_{pi}^{- 1} {\overline{p}}_{i} \\ {\dot{\overline{v}}}_{i} = & _{v_{i}} h_{i} +_{v_{i}} m_{i} g_{i} u_{i} +_{v_{i}} g_{i}_{i} +_{v_{i}}_{i}; + {\dot{}}_{v_{i}}_{v_{i}}^{- 1} {\overline{v}}_{i}, i = N_{l} + 1, .Math., N_{l} + N_{f} \end{matrix} & (11) \end{matrix}$

[0097] Defining:

[00023] $p_{f} = {[{\overset{}{p}}_{N_{l} + 1}^{T} {\overset{}{p}}_{N_{l} + 2}^{T} .Math. {\overset{}{p}}_{N_{l} + N_{f}}^{T}]}^{T}$ $v_{f} = {[{\overset{}{v}}_{N_{l} + 1}^{T} {\overset{}{v}}_{N_{l} + 2}^{T} .Math. {\overline{v}}_{N_{l} + N_{f}}^{T}]}^{T}$ $h = {[h_{N_{l} + 1}^{T} h_{N_{l} + 2}^{T} .Math. h_{N_{l} + N_{f}}^{T}]}^{T}$ $= {[_{N_{l} + 1}^{T}_{N_{l} + 2}^{T} .Math._{N_{l} + N_{f}}^{T}]}^{T}$ $= {[_{N_{l} + 1}^{T}_{N_{l} + 2}^{T} .Math._{N_{l} + N_{f}}^{T}]}^{T}$ $_{p} = diag {_{p_{N_{l} + 1}}_{p_{N_{l} + 2}} .Math._{p_{N_{l} + N_{f}}}}$ $_{v} = diag {_{v_{N_{l} + 1}}_{v_{N_{l} + 2}} .Math._{v_{N_{l} + N_{f}}}}$ $m = diag {m_{N_{l} + 1} m_{N_{l} + 2} .Math. m_{N_{l} + N_{f}}}$ $g = diag {g_{N_{l} + 1} g_{N_{l} + 2} .Math. g_{N_{l} + N_{f}}}$

[0098] The global form of (11) is thus:

[00024] $\begin{matrix} {\begin{matrix} {\dot{p}}_{f} =_{p}_{v}^{- 1} v_{f} + {\dot{}}_{p}_{p}^{- 1} p_{f} \\ {\dot{v}}_{f} =_{v} mgu +_{v} +_{v} h +_{v} + {\dot{}}_{v}_{v}^{- 1} v_{f} \end{matrix} & (12) \end{matrix}$

[0099] Define the following error variables for the followers:

[00025] $\begin{matrix} _{i} = E_{j = 1}^{N}_{ij} (p_{i} - p_{j}), i = N_{l} + 1, .Math., N_{l} + N_{f} & (13) \end{matrix}$ [0100] and the compact form of (13) is given by:

[00026] $\begin{matrix} \begin{matrix} _{f} = {\overline{}}_{ff} p_{f} + {\overline{}}_{fl} P_{l} = {\overline{}}_{ff} [p_{f} - (-_{ff}^{- 1}_{fl} .Math. I_{d}) P_{l}] \\ {\overline{}}_{ff} [p_{f} - p_{f}^{}] \end{matrix} & (14) \end{matrix}$

where

[00027] $_{f} = {[_{I}^{T}_{2}^{T} .Math._{N_{f}}^{T}]}^{T}$

and

[00028] $p_{f}^{} = (-_{ff}^{- 1}_{fl} .Math. I_{d}) p_{l}$

as in (6).

[0101] Neural network approximations, specifically Radial Basis Function Neural Networks (RBFNNs), are implemented to manage the nonlinear functions that emerge from the design of the disclosed control strategy. In real-world multi-agent systems, it is challenging to represent complex nonlinear dynamics in closed-form equations suitable for real-time computation. RBFNNs approximate these functions, enabling accurate and efficient control.

[0102] The smooth and continuous nonlinear functions derived from the design of the disclosed control strategy are approximated with radial basis function neural networks (RBFNNs) as follows:

[00029] $\begin{matrix} _{i} = W_{i}^{T}_{i} (X_{i}) +_{i} (X_{i}) & (15) \end{matrix}$

where W.sub.i=[w.sub.i1 w.sub.i2 . . . w.sub.im].sup.T custom-character is the weight vector, m is the number of nodes, .sub.i(X.sub.i) is the RBFNN reconstruction error, error satisfying .sub.i(X.sub.i).sub.l, .sub.i(X.sub.i)=[.sub.i1(X.sub.i).sub.i2(X.sub.i) . . . .sub.im(X.sub.i)].sup.T is the vector of basis function with:

[00030] $\begin{matrix} _{ik} (X_{i}) = \exp (- \frac{{.Math. X_{i} -_{1 k} .Math.}^{2}}{_{2 k}^{2}}), k = 1, 2, .Math., m & (16) \end{matrix}$

where .sub.1k is the receptive center of the Gaussian function, .sub.2k is the Gaussian function width.

[0103] The specified performance is attained by confining each leader-follower tracking error .sub.i, i=N.sub.l+1, . . . , N.sub.l+N.sub.f within the following preassigned boundary.

[00031] $\begin{matrix} - \underline{_{i}}_{i} (t) <_{i} < {\overset{}{}}_{i}_{i} (t) & (17) \end{matrix}$

where .sub.i>0 and .sub.i are design constants, .sub.i(t): custom-character .fwdarw. is the prescribed performance function characterized as a positive, smooth, and decreasing function.

[0104] In second definition, a smooth function .sub.i(t): custom-character .fwdarw. is said to be a finite time prescribed performance function if it is characterized by a) .sub.i(t)>0; b) {dot over ()}.sub.i(t)<0; c) lim.sub.t.fwdarw.t.sub.s.sub.i(t)=.sub.is>0; and d) For any tt.sub.s, .sub.i(t)=.sub.is.

[0105] The finite time prescribed performance function proposed in this study is defined by:

[00032] $\begin{matrix} _{i} (t) = {\begin{matrix} _{i 0,} & t = t_{n} \\ (_{i 0} -_{is}) \exp (\frac{- K_{i} (t - t_{n})}{T_{s}}) +_{is}, 0 < t - t_{n} < T_{s} \\ _{i s,} & t - t_{n} T_{s} \\ i = N_{l} + 1, .Math., N_{l} + N_{f} \end{matrix} & (18) \end{matrix}$ [0106] where .sub.i0>0, .sub.is>0, and .sub.i>0 are appropriately selected constants, T.sub.s is the preassigned finite settling time, and t.sub.n is the time at which the agents undergo new transient states due to the change in target formation maneuver.

[0107] The inequality (17) can be transformed into an equality form using the following error transformation:

[00033] $\begin{matrix} e_{i} = (\frac{_{i}}{_{i}}) & (19) \end{matrix}$

where e.sub.i is the transformed error, () is a smooth function and :(.sub.i, .sub.i).fwdarw.(, ). The transformation function ()isexpressedas:

[00034] $\begin{matrix} (\frac{_{i}}{_{i}}) = \frac{1}{2} \ln (\frac{\underline{_{i}} + \frac{_{i}}{_{i}}}{{\overline{}}_{i} - \frac{_{i}}{_{i}}}) & (20) \end{matrix}$

[0108] The time derivative of the transformed error yields

[00035] $\begin{matrix} {\dot{e}}_{i} =_{i} ({\overset{}{}}_{i} - \frac{{\dot{}}_{i}}{_{i}}_{i}) i = N_{l} + 1, .Math., N_{l} + N_{f} & (21) \end{matrix}$

where

[00036] $_{i} = \frac{1}{_{i}} \frac{_{i}}{(_{i} /_{i})} .$

[0109] To better demonstrate the important features of the performance function, a simple example is presented in FIG. 1. This figure shows a state trajectory p.sub.i tracking a target trajectory custom-character with sudden change in amplitudes at t.sub.n=4 s and t.sub.n=8 s. The trajectory p.sub.i experienced new transient states at the time t=t.sub.n=4 s and t=t.sub.n=8 s. It is required that p.sub.i settles at T.sub.s seconds after every new transient state. The finite time performance function (18) can be reset and the new transient states of the tracking error E; can be constrained whenever the amplitude of the target trajectory changes. [0110] Initially, t=t.sub.n=0 and .sub.i(0)=.sub.i0=6. It is expected that as t grows, .sub.i(t) converges to .sub.is=0.3 at t=T.sub.s=0.5s. Then, .sub.i(t)=.sub.is for tT.sub.s. [0111] When there is a sudden decrease in the target trajectory at t=t.sub.n=4 s, tt.sub.n=0 and the finite time performance function is reset to .sub.i(0)=.sub.i0 to confine the new transient state within the predefined range. As t grows (i.e t4>0), the finite time performance function decays and settles at t4=0.5s. For t40.5s, .sub.i(t)=.sub.is. [0112] On the other hand, when the target trajectory suddenly increases at t=t.sub.n=8 s, tt.sub.n=0. At this moment, the finite time performance function is reset such that .sub.i(0)=.sub.i0 and begins decaying for t8>0 until t8=0.5s where it settles at .sub.i(t)=.sub.is. Subsequently, for t80.5s, .sub.i(t)=.sub.is.

[00037] $\begin{matrix} _{i} (t) = (_{i 0} -_{i}) e^{-_{i} t} +_{i} & (22) \end{matrix}$ $\begin{matrix} _{i} (t) =_{i 0} \exp (\frac{- t}{_{i}}) +_{i} & (23) \end{matrix}$ $\begin{matrix} _{i} (t) = \coth (_{1 i} t +_{i 2}) - 1 +_{i} & (24) \end{matrix}$ $\begin{matrix} _{i} (t) = {\begin{matrix} (_{i 0} -_{is}) \exp (\frac{-_{i} T_{s} t}{t - T_{s}}) +_{is}, & t < T_{s} \\ _{is}, & t T_{s} \end{matrix} & (25) \end{matrix}$ $\begin{matrix} _{i} (t) = {\begin{matrix} _{i 0} \exp (_{i} (1 - \frac{T_{s}}{T_{s} - t}))_{i} +_{is}, & t < T_{s} \\ _{i} = \frac{1}{2} \cos (\frac{t}{T_{s}}) + \frac{1}{2} \\ _{is}, & t T_{s} \end{matrix} & (26) \end{matrix}$ $\begin{matrix} _{i} (t) = {\begin{matrix} (1 - \frac{t}{T_{s}}) \exp^{1 - \exp (\frac{T_{s}}{T_{s} - t})} +_{is}, & t < T_{s} \\ _{is}, & t T_{s} \end{matrix} & (27) \end{matrix}$ $\begin{matrix} _{i} (t) = {\begin{matrix} (_{i 0} - \frac{t}{_{i} T_{s}}) \\ \exp (1 - \tan (\frac{t}{T_{s} - t})) +_{is}, & t < T_{s} \\ _{is}, & t T_{s} \end{matrix} & (28) \end{matrix}$ $\begin{matrix} _{i} (t) = {\begin{matrix} (_{i 0} -_{is}) {(1 - \frac{t}{T_{s}})}^{_{i}} +_{is}, & t < T_{s} \\ _{is}, & t T_{s} \end{matrix} & (29) \end{matrix}$

[0113] FIG. 4A depicts the behavior 400A of the tracking error .sub.i under a sudden change in the target trajectory at t=4 seconds. The behavior of the system is represented by curve 402, curve 404, and curve 406.

[0114] Curve 402 represents the initial behavior of the tracking error .sub.i when the performance function is unable to reset the transient state after a sudden change in the target trajectory. Initially, the tracking error .sub.i follows the predefined trajectory, but at t=4 seconds, the system becomes unstable due to the inability of the performance function to adjust to the new transient state. Curve 402 shows the system's failure to maintain stability when faced with sudden reference signal changes.

[0115] Curve 404 shows the tracking error .sub.i experiencing instability after the sudden change at t=4 seconds. The lack of a reset mechanism in the existing performance functions prevents the system from controlling the new transient state, resulting in an erratic response. Curve 404 shows that the tracking error fails to converge to the desired steady-state after the target trajectory shifts, reinforcing the need for a modified performance function that can reset and accommodate multiple transient states.

[0116] Curve 406 further demonstrates the inability of the system to handle the sudden change in the target trajectory. The error remains outside the desired range, and the system cannot stabilize the transient state, leading to prolonged instability. Curve 406 indicates the limitations of existing performance functions (22)-(29) in handling sudden shifts in the target trajectory and tracking errors over time.

[0117] FIG. 4B illustrates an observation 400B of the state trajectory p.sub.i tracking the target trajectory p.sub.i when sudden changes in amplitude occur at t=4 seconds. The performance of the state trajectory is depicted by curve 408 and curve 410.

[0118] Curve 408 shows the behavior of the state trajectory pi as it initially tracks the target trajectory p.sub.i without issues. However, at t=4 seconds, the target trajectory p.sub.i changes abruptly. Due to the limitations of performance functions (22)-(29), the system cannot reset the transient state of the tracking error, leading to instability in the state trajectory. The inability to reset and stabilize is clearly demonstrated in this curve, as the trajectory pi becomes erratic after the sudden change.

[0119] Curve 410 represents the expected trajectory of p.sub.i had the system been able to properly handle the sudden change in the target trajectory at t=4 seconds. Curve 410 shows that the state trajectory would have stabilized if the performance function could reset to accommodate the new transient state. However, the existing performance functions fail to achieve this reset, resulting in divergence from the target trajectory. The divergence shows the core limitation of performance functions (22)-(29) in applications where mobile agents experience multiple transient states.

[0120] Optimal backstepping control laws are obtained from the approximate solution of the Hamilton-Jacobi-Bellman equation using reinforcement learning under the identifier-actor-critic neural networks. Subsequently, reinforcement learning-based optimized secure backstepping control can be realized for nonlinear leader-follower multi-agent systems with deception attacks and actuator faults to be resilient and realize various affine formation maneuvers. The objective of the controller is to ensure that the closed-loop system is bounded despite the actuator and the deceptive state signals injected by cyber-attackers.

Defining:

[00038] $e_{f} = {[e_{1}^{T} e_{2}^{T} .Math. e_{N_{f}}^{T}]}^{T}$ $_{f} = diag {{\dot{}}_{1} /_{1} {\dot{}}_{2} /_{2} .Math. {\dot{}}_{N_{f}} /_{N_{f}}}$ $_{f} = {_{1}_{2} .Math._{N_{f}}}$

The global form of (21) is given by:

[00039] $\begin{matrix} \begin{matrix} {\dot{e}}_{f} =_{f} ({\dot{}}_{f} -_{f}_{f}) \\ =_{f} {\overline{}}_{ff} (_{p}_{v}^{- 1} v_{f} + {\dot{}}_{p}_{p}^{- 1} p_{f} - {\dot{p}}_{f}^{}) -_{f}_{f}_{f} \end{matrix} & (30) \end{matrix}$

According to the backstepping approach, v.sub.f will be treated as the intermediate control input. Let .sub.f be the virtual control. Then, define the error z.sub.f=v.sub.f.sub.f, and its time-derivative is calculated as:

[00040] $\begin{matrix} \begin{matrix} {\dot{z}}_{f} = {\dot{v}}_{f} - {\dot{}}_{f} \\ =_{v} mgu +_{v} g +_{v} h +_{v} + {\dot{}}_{v}_{v}^{- 1} v_{f} - {\dot{}}_{f} \end{matrix} & (31) \end{matrix}$

[0121] For the purpose of a control scheme, the error dynamics (30) and (31) are transformed as follows:

[00041] $\begin{matrix} \begin{matrix} {\overset{}{e}}_{f} = (p_{f}, v_{f},_{p},_{v}) + v_{f} \\ = (p_{f}, v_{f},_{p},_{v}) + z_{f} +_{f} \end{matrix} & (32) \end{matrix}$ $\begin{matrix} {\dot{z}}_{f} = F (p_{f}, v_{f},_{p},_{v}, u) + u - {\dot{}}_{f} & (33) \end{matrix}$

where

[00042] $(p_{f}, v_{f},_{p},_{v}) =_{f} {\overline{}}_{ff} (_{p}_{v}^{- 1} v_{f} + {\dot{}}_{p}_{p}^{- 1} p_{f} - {\dot{p}}_{f}^{}) -_{f}_{f}_{f} - v_{f}$ $and$ $F (p_{f}, v_{f},_{p},_{v}, u) =_{v} mgu +_{v} g +_{v} h +_{v} + {\dot{}}_{v}_{v}^{- 1} v_{f} - u .$

A performance index function associated with es and as is defined as follows:

[00043] $\begin{matrix} I_{p} (e_{f}) =_{t}^{} r_{f} (e_{f} (),_{f} ()) d & (34) \end{matrix}$

where

[00044] $r_{f} (e_{f},_{f}) = e_{f}^{T} e_{f} +_{f}^{T}_{f}$

is the cost function. The optimal performance index I*(e.sub.f) associated with the optimal virtual controller

[00045] $_{f}^{*}$

is given by:

[00046] $\begin{matrix} \begin{matrix} I_{p}^{*} (e_{f}) = \min_{_{f}} (_{t}^{} r_{f} (e_{f} (),_{f} ()) d) \\ =_{t}^{} (e_{f}^{T} e_{f} +_{f}^{* T}_{f}^{*}) d \end{matrix} & (35) \end{matrix}$

where is the set of admissible control inputs. From (35), the following Hamilton-Jacobi-Bellman (HJB) equation can be derived:

[00047] $\begin{matrix} H_{p} (e_{f},_{f}^{*}, \frac{I_{p}^{*}}{e_{f}}) = r_{f} (e_{f},_{f}^{*}) + \frac{I_{p}^{*}}{e_{f}} {\dot{e}}_{f} = e_{f}^{T} e_{f} +_{f}^{* T}_{f}^{*} + \frac{I_{p}^{*}}{e_{f}} (z_{f} +_{f} + - p_{f}^{}) = 0 & (36) \end{matrix}$

where

[00048] $\frac{I_{p}^{*}}{e_{f}}$

is the gradient of

[00049] $I_{p}^{*}$

along e.sub.f. Suppose the solution of (36) exists and is unique, the optimal control policy

[00050] $_{f}^{*}$

can be achieved by computing

[00051] $H_{p} (e_{f},_{f}^{*}, \frac{I_{p}^{*}}{e_{f}}) /_{f}^{*} = 0$

[00052] $\begin{matrix} _{f}^{*} = - \frac{1}{2} \frac{I_{p}^{*} (e_{f})}{e_{f}} & (37) \end{matrix}$

Considering (36) and (37), it is clear that

[00053] $\frac{I_{p}^{*}}{e_{f}}$

is required to obtain the solution of (36). Nevertheless, due to the unknown deception attack signals and strong nonlinearities, it is impossible to solve (36). To achieve the control objective, RL actor-critic framework is employed to obtain the approximate solution online.
By using some mathematical manipulations,

[00054] $\frac{I_{p}^{*}}{e_{f}}$

can be expressed as follows:

[00055] $\begin{matrix} \frac{I_{p}^{*} (e_{f})}{e_{f}} = 2 K_{1} e_{f} + 2 + I_{p}^{0} & (38) \end{matrix}$

where

[00056] $I_{p}^{0} = - 2 K_{1} e_{f} - 2 + \frac{I_{p}^{*} (e_{f})}{e_{f}} .$

Inserting (38) into (37) gives:

[00057] $\begin{matrix} _{f}^{*} = - K_{p} e_{f} - - \frac{1}{2} I_{p}^{0} & (39) \end{matrix}$

Since the continuous functions Y and

[00058] $I_{p}^{0}$

are unknown, RBFNNs are employed to approximate them in the following form:

[00059] $\begin{matrix} = W_{}^{* T}_{} (X_{}) +_{} (X_{}) & (40) \end{matrix}$ $\begin{matrix} I_{p}^{0} = W_{I_{p}}^{* T}_{I_{p}} (X_{I_{p}}) +_{I_{p}} (X_{I_{p}}) & (41) \end{matrix}$

where

[00060] $W_{}^{*} and W_{I_{p}}^{*}$

are the ideal weights, .sub.(X.sub.) and .sub.lp(X.sub.lp) are the vectors basis functions, and .sub.(X.sub.) and .sub.lp(X.sub.lp) are the RBFNN approximation errors.
Substituting (40) and (41) into (38) and (39), respectively:

[00061] $\begin{matrix} \frac{I_{p}^{*} (e_{f})}{e_{f}} = 2 K_{1} e_{f} + 2 W_{}^{* T}_{} (X_{}) + W_{I_{p}}^{* T}_{I_{p}} (X_{I_{p}}) +_{p} & (42) \end{matrix}$ $\begin{matrix} _{f}^{*} = - K_{p} e_{f} - W_{}^{* T}_{} (X_{}) - \frac{1}{2} W_{I_{p}}^{* T}_{I_{p}} (X_{I_{p}}) +_{p} & (43) \end{matrix}$

where .sub.p=.sub.(X.sub.)+.sub.lp(X.sub.lp).
The optimal control input (43) cannot be used because of the unknown weights

[00062] $W_{}^{*} and W_{I_{p}}^{*} .$

The identifier RBFNN is used to estimate the unknown deceptive signals as:

[00063] $\begin{matrix} \hat{} = {\hat{W}}_{}^{T}_{} (X_{}) & (44) \end{matrix}$

where .sub. is the weight of the identifier, and {circumflex over ()} is the output of the identifier. The RBFNN weight of the identifier

[00064] ${\hat{W}}_{}^{T}$

is updated online by:

[00065] $\begin{matrix} {\dot{\hat{W}}}_{} =_{p} (_{} (X_{}) e_{f} -_{p} {\hat{W}}_{}) & (45) \end{matrix}$

where .sub.p is a positive-definite matrix and .sub.p>0 is a small constant.
To obtain the optimized controller, the actor, critic, and identifier framework is developed:

[00066] $\begin{matrix} \frac{{\hat{I}}_{p} (e_{f})}{e_{f}} = 2 K_{1} e_{f} + 2 {\hat{W}}_{}^{* T}_{} (X_{}) + {\hat{W}}_{c_{p}}^{* T}_{I_{p}} (X_{I_{p}}) & (46) \end{matrix}$ $\begin{matrix} {\hat{}}_{f}^{*} = - K_{p} e_{f} - {\hat{W}}_{}^{* T}_{} (X_{}) - \frac{1}{2} {\hat{W}}_{a_{p}}^{* T}_{I_{p}} (X_{I_{p}}) & (47) \end{matrix}$

where

[00067] $\frac{{\hat{I}}_{p} (e_{f})}{e_{f}}$

is the estimate of

[00068] $\frac{I_{p} (e_{f})}{e_{f}}, {\hat{W}}_{c_{p}}^{*} and {\hat{W}}_{a_{p}}^{*}$

are the estimates of the critic and actor weights, respectively.
Adding (46) and (47) into (36), the approximated HJB equation is derived as:

[00069] $\begin{matrix} H_{p} (e_{f}, {\hat{}}_{f}^{*}, \frac{{\hat{I}}_{p}}{e_{f}}) = & (48) \end{matrix}$ $e_{f}^{T} e_{f} + {(- K_{p} e_{f} - {\hat{W}}_{}^{* T}_{} (X_{}) - \frac{1}{2} {\hat{W}}_{a_{p}}^{* T}_{I_{p}} (X_{I_{p}}))}^{T} (- K_{p} e_{f} - {\hat{W}}_{}^{* T}_{} (X_{}) - \frac{1}{2} {\hat{W}}_{a_{p}}^{* T}_{I_{p}} (X_{I_{p}})) + (2 K_{1} e_{f} + 2 {\hat{W}}_{}^{* T}_{} (X_{}) + {\hat{W}}_{c_{p}}^{* T}_{I_{p}} (X_{I_{p}}))$ $(z_{f} + - K_{p} e_{f} - {\hat{W}}_{}^{* T}_{} (X_{}) - \frac{1}{2} {\hat{W}}_{a_{p}}^{* T}_{I_{p}} (X_{I_{p}}))$

The Bellman's residual error {tilde over (H)}.sub.p is expressed as:

[00070] $\begin{matrix} {\overset{}{H}}_{p} = H_{p} (e_{f}, {\hat{}}_{f}^{*}, \frac{{\hat{I}}_{p}}{e_{f}}) - H_{p} (e_{f},_{f}^{*}, \frac{I_{p}^{*}}{e_{f}}) & (49) \end{matrix}$

Considering (36), (49) becomes:

[00071] $\begin{matrix} {\overset{}{H}}_{p} = H_{p} (e_{f}, {\hat{}}_{f}^{*}, \frac{{\hat{I}}_{p}}{e_{f}}) & (50) \end{matrix}$

With regards to (48), it is desired that

[00072] ${\hat{}}_{f}^{*}$

realizes.

[00073] ${\overset{}{H}}_{p} = H_{p} (e_{f}, {\hat{}}_{f}^{*}, \frac{{\hat{I}}_{p}}{e_{f}}) .fwdarw. 0.$

[00074] $H_{p} (e_{f}, {\hat{}}_{f}^{*}, \frac{{\hat{I}}_{p}}{e_{f}}) = 0$

ensued, the following also hold:

[00075] $\begin{matrix} \frac{_{f} H_{p} (e_{f}, {\hat{}}_{f}^{*}, \frac{{\hat{I}}_{p}}{e_{f}})}{{\hat{W}}_{a_{p}}} = \frac{1}{2}_{I_{p}}^{T}_{I_{p}} ({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}}) = 0 & (51) \end{matrix}$

To derive the weight updating laws for .sub.a.sub.p and .sub.c.sub.p to guarantee (51), a positive definite function S.sub.p is defined as:

[00076] $\begin{matrix} S_{p} = {({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}})}^{T} ({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}}) & (52) \end{matrix}$

Clearly, S.sub.p=0 is equivalent to (51). Noting that S.sub.p/.sub.a.sub.p=S.sub.p/.sub.c.sub.p=2 (.sub.a.sub.p.sub.c.sub.p), the time derivative of S.sub.p gives

[00077] $\begin{matrix} \begin{matrix} \frac{d S_{p}}{dt} = \frac{S_{p}}{{\hat{W}}_{c_{p}}^{T}} {\dot{\hat{W}}}_{c_{p}} + \frac{S_{p}}{{\hat{W}}_{a_{p}}^{T}} {\dot{\hat{W}}}_{a_{p}} = - k_{c} \frac{S_{p}}{{\hat{W}}_{c_{p}}^{T}}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} - \\ \frac{S_{p}}{{\hat{W}}_{a_{p}}^{T}}_{I_{p}}^{T}_{I_{p}} (k_{a} ({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}}) + k_{c} {\hat{W}}_{c_{p}}) \\ = - \frac{k_{a}}{2} \frac{S_{p}}{{\hat{W}}_{a_{p}}^{T}}_{I_{p}}^{T}_{I_{p}} \frac{S_{p}}{{\hat{W}}_{ai}} 0 \end{matrix} & (53) \end{matrix}$

The RBFNN weight of the critic is updated as follows:

[00078] $\begin{matrix} {\dot{\hat{W}}}_{c_{p}} = -_{c_{p}}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} & (54) \end{matrix}$

where .sub.c.sub.p>0 is a constant.
The RBFNN weight of the actor is updated as follows:

[00079] $\begin{matrix} {\dot{\hat{W}}}_{a_{p}} = -_{I_{p}}^{T}_{I_{p}} (_{a_{p}} ({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}}) +_{c_{p}} {\hat{W}}_{c_{p}}) & (55) \end{matrix}$

where .sub.a.sub.p>0 is a constant.
Using the approximate optimal controller (47), the error dynamics (32) can be rewritten as:

[00080] $\begin{matrix} {\dot{e}}_{f} = z_{f} + {\hat{a}}_{f}^{*} + & (56) \end{matrix}$

A Lyapunov candidate function is selected for the er subsystem as:

[00081] $\begin{matrix} L_{1} = \frac{1}{2} e_{f}^{T} e_{f} + \frac{1}{2} {\tilde{W}}_{}^{T}_{p}^{- 1} {\tilde{W}}_{} + \frac{1}{2} {\tilde{W}}_{c_{p}}^{T} {\tilde{W}}_{c_{p}} + \frac{1}{2} {\tilde{W}}_{a_{p}}^{T} {\tilde{W}}_{a_{p}} & (57) \end{matrix}$

where

[00082] ${\tilde{W}}_{} = {\hat{W}}_{} - W_{}^{*}, {\tilde{W}}_{a_{p}} = {\hat{W}}_{a_{p}} - W_{I_{p}}^{*}, and {\tilde{W}}_{c_{p}} = {\hat{W}}_{c_{p}} - W_{I_{p}}^{*}$

are the RBFNN weight errors for the identifier, actor, and critic networks, respectively.

[0122] Differentiating L.sub.1 with respect to time and considering (45), (54), and (55), one gets:

[00083] $\begin{matrix} {\dot{L}}_{1} = e_{f} (z_{f} + {\hat{}}_{f}^{*} +) + {\tilde{W}}_{}^{T} (_{}^{T} e_{f} -_{p} {\hat{W}}_{}) -_{c_{p}} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} - {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} (_{a_{p}} ({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}}) +_{c_{p}} {\hat{W}}_{c_{p}}) & (58) \end{matrix}$

Inserting (40) and (47) into (58), one gets:

[00084] $\begin{matrix} {\dot{L}}_{1} = e_{f} (- K_{p} e_{f} - \frac{1}{2} {\hat{W}}_{a_{p}}^{T}_{I_{p}} (X_{I_{p}}) - {\tilde{W}}_{}^{T}_{} (X_{}) +_{} + z_{f}) + {\tilde{W}}_{}^{T} (_{}^{T} e_{f} -_{p} {\hat{W}}_{}) -_{c_{p}} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} - {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} (_{a_{p}} ({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}}) +_{c_{p}} {\hat{W}}_{c_{p}}) & (59) \end{matrix}$

Equation (59) can be re-expressed as follows:

[00085] $\begin{matrix} {\dot{L}}_{1} = - e_{f} K_{p} e_{f} + e_{f}_{} + e_{f} z_{f} - \frac{1}{2} e_{f}^{T} {\hat{W}}_{a_{p}}^{T}_{I_{p}} -_{p} {\tilde{W}}_{}^{T} {\hat{W}}_{} -_{c_{p}} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{f}_{ff}_{I_{p}} {\hat{W}}_{c_{p}} - {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{f}_{ff}_{I_{p}} (_{a_{p}} ({\hat{W}}_{a_{p}} - {\hat{W}}_{c_{p}}) +_{c_{p}} {\hat{W}}_{c_{p}}) & (60) \end{matrix}$

By utilizing Young's inequality, we have:

[00086] $\begin{matrix} \begin{matrix} e_{f}^{T}_{} & \frac{1}{2} e_{f}^{T} e_{f} + \frac{1}{2}_{}^{T}_{} \\ e_{f}^{T} z_{f} & \frac{1}{2} e_{f}^{T} e_{f} + \frac{1}{2} z_{f}^{T} z_{f} \\ - \frac{1}{2} e_{f}^{T} {\hat{W}}_{a_{p}}^{T}_{I_{p}} & \frac{1}{4} e_{f}^{T} e_{f} + \frac{1}{4} {\hat{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{a_{p}} \end{matrix} & (61) \end{matrix}$

Substituting the Young's inequality (61) into (60) gives:

[00087] $\begin{matrix} {\dot{L}}_{1} = - e_{f} (K_{p} - 1.25) e_{f} -_{p} {\tilde{W}}_{}^{T} {\hat{W}}_{} -_{c_{p}} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} -_{a_{p}} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{a_{p}} + (_{a_{p}} -_{a_{c}}) {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} + \frac{1}{4} {\hat{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{a_{p}} + \frac{1}{2}_{}^{T} + \frac{1}{2} z_{f}^{T} z_{f} & (62) \end{matrix}$

Noting that

[00088] ${\tilde{W}}_{} = {\hat{W}}_{} - W_{}^{*}, {\tilde{W}}_{a_{p}} = {\hat{W}}_{a_{p}} - W_{I_{p}}^{*}, and {\tilde{W}}_{c_{p}} = {\hat{W}}_{c_{p}} - W_{I_{p}}^{*},$

the following facts are valid:

[00089] $\begin{matrix} {\tilde{W}}_{}^{T} {\hat{W}}_{} = \frac{1}{2} {\tilde{W}}_{}^{T} {\tilde{W}}_{} + \frac{1}{2} {\hat{W}}_{}^{T} {\hat{W}}_{} - \frac{1}{2} W_{}^{* T} W_{}^{*} & (63) \end{matrix}$ $\begin{matrix} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} = \frac{1}{2} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{c_{p}} + \frac{1}{2} {\hat{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} - \frac{1}{2} W_{I_{p}}^{* T}_{I_{p}}^{T}_{I_{p}} W_{I_{p}}^{*} & (64) \end{matrix}$ $\begin{matrix} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{a_{p}} = \frac{1}{2} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{a_{p}} + \frac{1}{2} {\hat{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{a_{p}} - \frac{1}{2} W_{I_{p}}^{* T}_{I_{p}}^{T}_{I_{p}} W_{I_{p}}^{*} & (65) \end{matrix}$

Combining (63)-(65) with (62) yields:

[00090] $\begin{matrix} {\dot{L}}_{1} = - e_{f} (K_{p} - 1.25) e_{f} - \frac{_{p}}{2} {\tilde{W}}_{}^{T} {\tilde{W}}_{} - \frac{_{c_{p}}}{2} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{c_{p}} - \frac{_{a_{p}}}{2} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{a_{p}} + (_{a_{p}} -_{a_{c}}) {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} - \frac{_{c_{p}}}{2} {\hat{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} - (\frac{_{a_{p}}}{2} - \frac{1}{4}) {\hat{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{a_{p}} + (\frac{_{a_{p}}}{2} + \frac{_{c_{p}}}{2}) W_{I_{p}}^{* T}_{I_{p}}^{T}_{I_{p}} W_{I_{p}}^{*} + \frac{_{p}}{2} W_{}^{* T} W_{}^{*} + \frac{1}{2}_{}^{T}_{} + \frac{1}{2} z_{f}^{T} z_{f} & (66) \end{matrix}$

According to Young's inequality, the following relationship is valid

[00091] $\begin{matrix} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} = \frac{1}{2} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{a_{p}} + \frac{1}{2} {\hat{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} & (67) \end{matrix}$

then

[00092] $\begin{matrix} {\dot{L}}_{1} = - e_{f} (K_{p} - 1.25) e_{f} - \frac{_{p}}{2} {\tilde{W}}_{}^{T} {\tilde{W}}_{} - \frac{_{c_{p}}}{2} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{c_{p}} - \frac{_{c_{p}}}{2} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{a_{p}} - (_{c_{p}} - \frac{_{a_{p}}}{2}) {\hat{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{c_{p}} - (\frac{_{a_{p}}}{2} - \frac{1}{4}) {\hat{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\hat{W}}_{a_{p}} + (\frac{_{a_{p}}}{2} + \frac{_{c_{p}}}{2}) W_{I_{p}}^{* T}_{I_{p}}^{T}_{I_{p}} W_{I_{p}}^{*} + \frac{_{p}}{2} W_{}^{* T} W_{}^{*} + \frac{1}{2}_{}^{T}_{} + \frac{1}{2} z_{f}^{T} z_{f} & (68) \end{matrix}$

The parameters K.sub.p, .sub.c.sub.p, and .sub.a.sub.p are selected such that:

[00093] $\begin{matrix} K_{p} > 1.2 5,_{a_{p}} > \frac{1}{2},_{a_{p}} >_{c_{p}} > \frac{_{a_{p}}}{2} & (69) \end{matrix}$

Therefore, based on the selected parameters, (68) becomes:

[00094] $\begin{matrix} {\dot{L}}_{1} = - e_{f} (K_{p} - 2) e_{f} - \frac{_{p}}{2} {\tilde{W}}_{}^{T} {\tilde{W}}_{} - \frac{_{c_{p}}}{2} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{c_{p}} - \frac{_{c_{p}}}{2} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{a_{p}} \frac{1}{2} z_{f}^{T} z_{f} +_{1} & (70) \end{matrix}$

where

[00095] $_{1} = (\frac{_{a_{p}}}{2} + \frac{_{c_{p}}}{2}) W_{I_{p}}^{* T}_{I_{p}}^{T}_{I_{p}} W_{I_{p}}^{*} + \frac{_{p}}{2} W_{}^{* T} W_{}^{*} + \frac{1}{2}_{}^{T}_{},$

which satisfies custom-character and >0 is a constant. After designing the optimized virtual control

[00096] $_{f}^{*},$

the next step is designing the actual control u. Equation (33) can be rewritten with the approximate

[00097] ${\overset{}{}}_{f}^{*}$

as follows:

[00098] $\begin{matrix} {\dot{z}}_{f} = F + u - {\dot{\hat{}}}_{f}^{*} & (71) \end{matrix}$

A performance index function associated with z.sub.f and u is defined as follows:

[00099] $\begin{matrix} I_{v} (z_{f}) =_{t}^{} r_{v} (z_{f} (), u ()) d & (72) \end{matrix}$

where

[00100] $r_{v} (z_{f}, u) = z_{f}^{T} z_{f} + u^{T} u$

is the cost function. Let u* be the optimal control and the corresponding optimal performance index I(z.sub.f)* is constructed as:

[00101] $\begin{matrix} \begin{matrix} I_{v}^{*} (z_{f}) = \min_{u} (_{t}^{} r_{v} (z_{f} (), u ()) d) \\ =_{t}^{} (z_{f}^{T} z_{f} + u^{* T} u^{*}) d \end{matrix} & (73) \end{matrix}$

where is the set of admissible control inputs. The HJB equation associated with (73) is given by:

[00102] $\begin{matrix} H_{v} (z_{f}, u^{*}, \frac{I_{v}^{*}}{z_{f}}) = r_{v} (z_{f}, u^{*}) + \frac{I_{v}^{*}}{z_{f}} {\dot{z}}_{f} = z_{f}^{T} z_{f} + u^{* T} u^{*} + \frac{I_{v}^{*}}{z_{f}} (F + u - {\dot{\overset{}{}}}_{f}^{*}) & (74) \end{matrix}$

Following the same procedure as step 1, solving

[00103] $H_{v} (z_{f}, u^{*}, \frac{I_{v}^{*}}{z_{f}}) / u^{*} = 0$

gives

[00104] $\begin{matrix} u^{*} = - \frac{1}{2} \frac{I_{v}^{*} (z_{f})}{z_{f}} & (75) \end{matrix}$

To realize the control objective, the term

[00105] $\frac{I_{v}^{*}}{z_{f}}$

is expressed as follows:

[00106] $\begin{matrix} \frac{I_{v}^{*} (z_{f})}{z_{f}} = 2 C_{v} e_{f} + 2 K_{v} z_{f} + 2 F + I_{v}^{0} & (76) \end{matrix}$

where

[00107] $I_{v}^{0} = - 2 C_{v} e_{f} - 2 K_{1} z_{f} - 2 F + \frac{I_{v}^{*} (z_{f})}{z_{f}} .$

Inserting (76) into (75) gives:

[00108] $\begin{matrix} u^{*} = - C_{v} e_{f} - K_{v} z_{f} - F - \frac{1}{2} I_{v}^{0} & (77) \end{matrix}$

The unknown continuous functions F and

[00109] $I_{v}^{0}$

can be approximately by RBFNNs as follows:

[00110] $\begin{matrix} F = W_{F}^{* T}_{F} (X_{F}) +_{F} (X_{F}) & (78) \end{matrix}$ $\begin{matrix} I_{v}^{0} = W_{I_{v}}^{* T}_{I_{v}} (X_{I_{v}}) +_{I_{v}} (X_{I_{v}}) & (79) \end{matrix}$

where

[00111] $W_{F}^{*} and W_{I_{v}}^{*}$

are the ideal weights, .sub.F(X.sub.F) and .sub.Iv(X.sub.Iv) are the vectors basis functions, and .sub.F(X.sub.F) and .sub.Iv(X.sub.Iv) are the RBFNN approximation errors.
Inserting (78) and (79) into (76) and (77), respectively gives:

[00112] $\begin{matrix} \frac{I_{v}^{*} (z_{f})}{z_{f}} = 2 C_{v} e_{f} + 2 K_{1} z_{f} + 2 W_{F}^{* T}_{F} (X_{F}) + W_{I_{v}}^{* T}_{I_{v}} (X_{I_{v}}) +_{v} & (80) \end{matrix}$ $\begin{matrix} u^{*} = - C_{v} e_{f} - K_{v} z_{f} - W_{F}^{* T}_{F} (X_{F}) - \frac{1}{2} W_{I_{v}}^{* T}_{I_{v}} (X_{I_{v}}) + \frac{1}{2}_{v} & (81) \end{matrix}$

where .sub.v=2.sub.F(X.sub.F)+.sub.Iv(X.sub.Iv).
The optimal control law (81) is unavailable since the weights

[00113] $W_{F}^{*} and W_{I_{v}}^{*}$

are unknown. To obtain a usable optimized control, an RL architecture using the identifier, critic, and actor networks is constructed as:

[00114] $\begin{matrix} \overset{}{F} = {\overset{}{W}}_{F}^{T}_{F} (X_{F}) & (82) \end{matrix}$ $\begin{matrix} \frac{I_{v}^{*} (z_{f})}{z_{f}} = 2 C_{v} e_{f} + 2 K_{1} z_{f} + 2 {\hat{W}}_{F}^{* T}_{F} (X_{F}) + {\hat{W}}_{c_{v}}^{* T}_{I_{v}} (X_{I_{v}}) & (83) \end{matrix}$ $\begin{matrix} {\hat{u}}^{*} = - C_{v} e_{f} - K_{v} z_{f} - {\hat{W}}_{F}^{* T}_{F} (X_{F}) - \frac{1}{2} {\hat{W}}_{a_{v}}^{* T}_{I_{v}} (X_{I_{v}}) & (84) \end{matrix}$

where

[00115] ${\overset{}{W}}_{F}, {\overset{}{W}}_{c_{v}}^{*} and {\hat{W}}_{a_{v}}^{*}$

are the estimates of the identifier, critic, and actor weights, respectively, {circumflex over (F)} and

[00116] $\frac{{\hat{I}}_{v} (z_{f})}{z f}$

are the estimates of F and

[00117] $\frac{I_{v} (z f)}{z f},$

respectively.
The RBFNN weights of the identifier, critic, and actor are updated by the following update laws:

[00118] $\begin{matrix} {\dot{\overset{}{W}}}_{F} =_{v} (_{F} (X_{F}) z_{f} -_{v} {\overset{}{W}}_{F}) & (85) \end{matrix}$ $\begin{matrix} {\dot{\overset{}{W}}}_{c_{v}} = -_{c_{v}}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{c_{v}} & (86) \end{matrix}$ $\begin{matrix} {\dot{\overset{}{W}}}_{a_{v}} = -_{I_{v}}^{T}_{I_{v}} (_{a_{v}} ({\overset{}{W}}_{a_{v}} - {\overset{}{W}}_{c_{v}}) +_{c_{v}} {\overset{}{W}}_{c_{v}}) & (87) \end{matrix}$

where .sub.v is a positive-definite matrix, .sub.a.sub.v>0 and .sub.c.sub.v>0 are design constants, and .sub.v>0 is a small constant.
A Lyapunov candidate function is selected for the z.sub.v subsystem as:

[00119] $\begin{matrix} L_{2} = L_{1} + \frac{1}{2} z_{f}^{T} z_{f} + \frac{1}{2} {\tilde{W}}_{F}^{T}_{v}^{- 1} {\tilde{W}}_{F} + \frac{1}{2} {\tilde{W}}_{c_{v}}^{T} {\tilde{W}}_{c_{v}} + \frac{1}{2} {\tilde{W}}_{a_{v}}^{T} {\tilde{W}}_{a_{v}} & (88) \end{matrix}$

where

[00120] ${\overset{}{W}}_{F} = {\overset{}{W}}_{F} - W_{F}^{*}, {\overset{}{W}}_{a_{v}} = {\overset{}{W}}_{a_{v}} - W_{I_{v}}^{*}, and {\overset{}{W}}_{c_{v}} = {\overset{}{W}}_{c_{v}} - W_{I_{v}}^{*}$

are the RBFNN weight errors for the identifier, actor, and critic networks, respectively.
Taking the time derivative of L.sub.2 and using (85), (86), and (87), one has:

[00121] $\begin{matrix} {\overset{}{L}}_{2} = {\overset{}{L}}_{1} + z_{f}^{T} (F + {\hat{u}}^{*} - {\dot{\hat{}}}_{f}^{*}) + {\overset{}{W}}_{F}^{T} (_{F}^{T} z_{f} -_{v} {\overset{}{W}}_{F}) -_{c_{v}} {\overset{}{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{c_{v}} - {\overset{}{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} (_{a_{v}} ({\overset{}{W}}_{a_{v}} - {\overset{}{W}}_{c_{v}}) +_{c_{v}} {\overset{}{W}}_{c_{v}}) & (89) \end{matrix}$

Inserting (82) and (84) into (89), one gets:

[00122] $\begin{matrix} {\overset{}{L}}_{2} = {\overset{}{L}}_{1} + z_{f} [- C_{v} e_{f} - K_{v} z_{f} - \frac{1}{2} {\overset{}{W}}_{a_{v}}^{T}_{I_{v}} (X_{I_{v}}) - {\overset{}{W}}_{F}^{T}_{F} (X_{F}) +_{F} - {\dot{\hat{}}}_{f}^{*}] + {\overset{}{W}}_{F}^{T} (_{F}^{T} z_{f} -_{v} {\overset{}{W}}_{F}) -_{c_{v}} {\overset{}{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{c_{v}} - {\overset{}{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} (_{a_{v}} ({\overset{}{W}}_{a_{v}} - {\overset{}{W}}_{c_{v}}) +_{c_{v}} {\overset{}{W}}_{c_{v}}) & (90) \end{matrix}$

Evaluating (90) yields:

[00123] $\begin{matrix} {\overset{}{L}}_{2} = {\overset{}{L}}_{1} - z_{f} C_{v} e_{f} - z_{f} K_{v} z_{f} + z_{f}^{T}_{F} - z_{f}^{T} {\dot{\overset{}{}}}_{f}^{*} - \frac{1}{2} z_{f}^{T} {\overset{}{W}}_{a_{v}}_{I_{v}} -_{v} {\overset{}{W}}_{F}^{T} {\overset{}{W}}_{F} -_{c_{v}} {\overset{}{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{c_{v}} - {\overset{}{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} (_{a_{v}} ({\overset{}{W}}_{a_{v}} - {\overset{}{W}}_{c_{v}}) +_{c_{v}} {\overset{}{W}}_{c_{v}}) & (91) \end{matrix}$

From Young's inequality, one gets:

[00124] $\begin{matrix} \begin{matrix} z_{f}_{F} & \frac{1}{2} z_{f}^{T} z_{f} + \frac{1}{2}_{F}^{T}_{F} \\ - z_{f}^{T} {\dot{\overset{}{}}}_{f}^{*} & \frac{1}{2} z_{f}^{T} z_{f} + \frac{1}{2} {\dot{\overset{}{}}}_{f}^{* T} {\dot{\overset{}{}}}_{f}^{*} \\ - z_{f} C_{v} e_{f} & \frac{1}{2} z_{f}^{T} C_{v} z_{f} + \frac{1}{2} e_{F}^{T} C_{v} e_{F} \\ - \frac{1}{2} z_{f}^{T} {\overset{}{W}}_{a_{v}}^{T}_{I_{v}} & \frac{1}{4} z_{f}^{T} z_{f} + \frac{1}{4} {\overset{}{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{a_{v}} \end{matrix} & (92) \end{matrix}$

Substituting the Young's inequality (92) into (91) yields:

[00125] $\begin{matrix} {\overset{}{L}}_{2} {\overset{}{L}}_{1} + \frac{1}{2} e_{f}^{T} C_{v} e_{f} - z_{f} (K_{v} - C_{v} / 2 - 1.7 5) z_{f} -_{v} {\overset{}{W}}_{F}^{T} {\overset{}{W}}_{F} -_{c_{v}} {\overset{}{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{c_{v}} -_{a_{v}} {\overset{}{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{a_{v}} + (_{a_{v}} -_{a_{c}}) {\overset{}{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{c_{v}} + \frac{1}{4} {\overset{}{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\overset{}{W}}_{a_{v}} + \frac{1}{2}_{F}^{T}_{F} + \frac{1}{2} {\dot{\overset{}{}}}_{f}^{* T} {\dot{\overset{}{}}}_{f}^{*} & (93) \end{matrix}$

Using

[00126] ${\overset{}{W}}_{F} = {\overset{}{W}}_{F} - W_{F}^{*}, {\overset{}{W}}_{a_{v}} = {\overset{}{W}}_{a_{v}} - W_{I_{v}}^{*}, and {\tilde{W}}_{c_{v}} = {\overset{}{W}}_{c_{v}} - W_{I_{v}}^{*},$

the following facts are valid:

[00127] $\begin{matrix} {\tilde{W}}_{F}^{T} {\hat{W}}_{F} = \frac{1}{2} {\tilde{W}}_{F}^{T} {\tilde{W}}_{F} + \frac{1}{2} {\hat{W}}_{F}^{T} {\hat{W}}_{F} - \frac{1}{2} W_{F}^{* T} W_{F}^{*} & (94) \end{matrix}$ $\begin{matrix} {\tilde{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{c_{v}} = \frac{1}{2} {\tilde{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{c_{v}} + \frac{1}{2} {\hat{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{c_{v}} - \frac{1}{2} W_{I_{v}}^{* T}_{I_{v}}^{T}_{I_{v}} W_{I_{v}}^{*} & (95) \end{matrix}$ $\begin{matrix} {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{a_{v}} = \frac{1}{2} {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{a_{v}} + \frac{1}{2} {\hat{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{a_{v}} - \frac{1}{2} W_{I_{v}}^{* T}_{I_{v}}^{T}_{I_{v}} W_{I_{v}}^{*} & (96) \end{matrix}$

Combining (94)-(96) with (93) yields:

[00128] $\begin{matrix} {\dot{L}}_{2} {\dot{L}}_{1} + \frac{1}{2} e_{f}^{T} C_{v} e_{f} - z_{f} (K_{v} - C_{v} / 2 - 1.75) z_{f} - \frac{_{v}}{2} {\tilde{W}}_{F}^{T} {\tilde{W}}_{F} - \frac{_{c_{v}}}{2} {\tilde{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{c_{v}} - \frac{_{a_{v}}}{2} {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{a_{v}} + (_{a_{v}} -_{a_{c}}) {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{c_{v}} - \frac{_{c_{v}}}{2} {\hat{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{a_{v}} - (\frac{_{a_{v}}}{2} - \frac{1}{4}) {\hat{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{c_{v}} + (\frac{_{a_{v}}}{2} + \frac{_{c_{v}}}{2}) W_{I_{v}}^{* T}_{I_{v}}^{T}_{I_{v}} W_{I_{v}}^{*} + \frac{_{v}}{2} W_{F}^{* T}_{F}^{*} + \frac{1}{2}_{F}^{T}_{F} + \frac{1}{2} {\overset{\dot{^}}{}}_{f}^{* T} {\overset{\dot{^}}{}}_{f}^{*} & (97) \end{matrix}$

Based on Young's inequality, one gets.

[00129] $\begin{matrix} {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{c_{v}} = \frac{1}{2} {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{a_{v}} + \frac{1}{2} {\hat{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{c_{v}} & (98) \end{matrix}$

By utilizing (98), one gets:

[00130] $\begin{matrix} {\dot{L}}_{2} e_{f}_{f}_{ff} (K_{p} - C_{v} / 2 - 1.25) e_{f} - \frac{_{c_{p}}}{2} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{c_{p}} - \frac{_{p}}{2} {\tilde{W}}_{}^{T} {\tilde{W}}_{} - \frac{_{c_{p}}}{2} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{a_{p}} - z_{f}^{T} (K_{v} - C_{v} / 2 - 1.75) z_{f} - \frac{_{c_{v}}}{2} {\tilde{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{c_{v}} - \frac{_{c_{v}}}{2} {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{a_{v}} - \frac{_{v}}{2} {\tilde{W}}_{F}^{T} {\tilde{W}}_{F} - (_{c_{v}} - \frac{_{a_{v}}}{2}) {\hat{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{c_{v}} - (\frac{_{a_{v}}}{2} - \frac{1}{4}) {\hat{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\hat{W}}_{a_{v}} + (\frac{_{a_{v}}}{2} + \frac{_{c_{v}}}{2}) W_{I_{v}}^{* T}_{I_{v}}^{T}_{I_{v}} W_{I_{v}}^{*} + \frac{_{v}}{2} W_{F}^{* T} W_{F}^{*} + \frac{1}{2}_{F}^{T}_{F} + \frac{1}{2} {\overset{\dot{^}}{}}_{f}^{* T} {\overset{\dot{^}}{}}_{f}^{*} + {\overline{}}_{1} & (99) \end{matrix}$

The parameters C.sub.v, K.sub.v, .sub.c.sub.v, and .sub.a.sub.v are selected such that

[00131] $\begin{matrix} C_{v} > 2.5, K_{v} > C_{v}, K_{v} - C_{v} > 3.5, \\ _{a_{v}} > \frac{1}{2},_{a_{v}} >_{c_{v}} > \frac{_{a_{v}}}{2} \end{matrix}$

Therefore, based on the selected parameters, (99) becomes:

[00132] $\begin{matrix} {\dot{L}}_{2} - e_{f}^{T} (K_{p} - C_{v} / 2 - 1.2 5) e_{f} - \frac{_{c_{p}}}{2} {\tilde{W}}_{c_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{c_{p}} - \frac{_{p}}{2} {\tilde{W}}_{}^{T} {\tilde{W}}_{} - \frac{_{c_{p}}}{2} {\tilde{W}}_{a_{p}}^{T}_{I_{p}}^{T}_{I_{p}} {\tilde{W}}_{c_{p}} - z_{f}^{T} (K_{v} - C_{v} / 2 - 1.75) z_{f} - \frac{_{v}}{2} {\tilde{W}}_{F}^{T} {\tilde{W}}_{F} - \frac{_{c_{v}}}{2} {\tilde{W}}_{c_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{c_{v}} - \frac{_{c_{v}}}{2} {\tilde{W}}_{a_{v}}^{T}_{I_{v}}^{T}_{I_{v}} {\tilde{W}}_{a_{v}} +_{2} & (100) \end{matrix}$

where

[00133] $_{2} = (\frac{_{a_{v}}}{2} + \frac{_{c_{v}}}{2}) W_{I_{v}}^{* T}_{I_{v}}^{T}_{I_{v}} W_{I_{v}}^{*} + \frac{_{v}}{2} W_{F}^{* T} W_{F}^{*} + \frac{1}{2}_{F}^{T}_{F} + \frac{1}{2} {\overset{\dot{^}}{}}_{f}^{* T} {\overset{\dot{^}}{}}_{f}^{*} + {\overline{}}_{1},$

which satisfies custom-character , and >0 is a constant.
The inequality (100) can be rewritten as:

[00134] $\begin{matrix} {\dot{L}}_{2} -_{p}^{\min} e_{f}^{T} e_{f} - \frac{_{p}}{2_{_{p}^{- 1}}^{\max}} {\tilde{W}}_{}^{T}_{v}^{- 1} {\tilde{W}}_{} - \frac{_{c_{p}}}{2}_{I_{p}}^{\min} {\tilde{W}}_{c_{p}}^{T} {\tilde{W}}_{c_{p}} - \frac{_{c_{p}}}{2}_{I_{p}}^{\min} {\tilde{W}}_{a_{p}}^{T} {\tilde{W}}_{a_{p}} -_{I_{v}}^{\min} z_{f}^{T} z_{f} - \frac{_{v}}{2_{_{v}^{- 1}}^{\max}} {\tilde{W}}_{F}^{T}_{v}^{- 1} {\tilde{W}}_{F} - \frac{_{c_{v}}}{2}_{I_{v}}^{\min} {\tilde{W}}_{c_{v}}^{T} {\tilde{W}}_{c_{v}} - \frac{_{c_{v}}}{2}_{I_{v}}^{\min} {\tilde{W}}_{a_{v}}^{T} {\tilde{W}}_{a_{v}} + {\overline{}}_{2} & (101) \end{matrix}$

where

[00135] $_{p}^{\min}$

is the minimum eigen value of (K.sub.pC.sub.v/21.25),

[00136] $_{v}^{\min}$

is the minimum Eigen value of (K.sub.vC.sub.v/21.75),

[00137] $_{I_{p}}^{\min}$

is the minimum eigen value of .sub.Ip.sup.T.sub.f.sub.ff.sub.Ip,

[00138] $_{I_{v}}^{\min}$

is the minimum eigen value of

[00139] $_{I_{v}}^{T}_{I_{v}},_{_{p}^{- 1}}^{_{p}}$

is the maximum eigen value of

[00140] $_{p}^{- 1}, and_{_{v}^{- 1}}^{\max}$

is the maximum eigen value of

[00141] $_{v}^{- 1}$

From (101), one can obtain:

[00142] $\begin{matrix} {\overset{}{L}}_{2} -_{3} L_{2} + {\overset{}{}}_{2} & (102) \end{matrix}$

Where,

[00143] $_{3} = \min {2_{p}^{\min}, (_{p} /_{_{p}^{- 1}}^{\max}), (_{c_{p}}_{I_{p}}^{\min}), (_{c_{p}}_{I_{p}}^{\min}), 2_{v}^{\min}, (_{v} /_{_{v}^{- 1}}^{\max}), (_{c_{v}}_{I_{v}}^{\min}), (_{c_{v}}_{I_{v}}^{\min})} .$

[0123] In first theorem, considering the second-order nonlinear multiagent system (12) with unknown nonlinear dynamics under deception attacks and actuator faults. By using the prescribed performance function (22), the error dynamics (30) & (31), the reinforcement learning-based optimized virtual controller (47) and the optimized overall controller (84) with identifier update laws (45), (85), critics update laws (54), (86), and actors update laws (55), (87), then the tracking errors in the closed-loop system are bounded and the leader-follower affine formation maneuvers are realized.

Integrating both sides of (102) gives:

[00144] $\begin{matrix} L_{2} (t) e^{-_{3} t} L_{2} (0) + \frac{{\overset{}{}}_{2}}{_{3}} & (103) \end{matrix}$

For all tT.sub.s, the inequality (103) shows that all the error signals e.sub.f, z.sub.f, {tilde over (W)}.sub.a.sub.p, {tilde over (W)}.sub.a.sub.v, {tilde over (W)}.sub.c.sub.p, {tilde over (W)}.sub.c.sub.v, {tilde over (W)}.sub., and {tilde over (W)}.sub.F are semiglobally bounded in finite settling time within a compact set defined by {L.sub.2(t):L.sub.2(t) custom-character }.

[0124] It is obvious that increasing custom-character or decreasing in (103) will aid in reducing and subsequently makes the compact set smaller. This means that one may select the parameter large enough to make e.sub.f, z.sub.f, {tilde over (W)}.sub.a.sub.p, {tilde over (W)}.sub.a.sub.v, {tilde over (W)}.sub.c.sub.p, {tilde over (W)}.sub.c.sub.v, {tilde over (W)}.sub., and {tilde over (W)}.sub.F sufficiently small.

[0125] Compared to the work of affine formation maneuver control of linear multi-agent systems, this disclosure considers the issue of affine formation maneuver control of nonlinear multi-agent systems with preassigned performance. In addition, the disclosed technique is able to counter deception attacks and actuator faults.

[0126] Multiple adaptive laws are used to estimate the upper bounds of the

[00145] $_{p_{i}},_{v_{i}},_{p_{i}}_{v_{i}}^{- 1}, {\overset{}{}}_{p_{i}}_{p_{i}}^{- 1}, and {\overset{}{}}_{v_{i}}_{v_{i}}^{- 1} .$

However, the attack signals are usually time-varying and adaptive laws can only estimate constants. By transforming the multi-agent system to the form in (28) and (29), neural networks approximate the lumped functions (p.sub.f, v.sub.f, .sub.p, .sub.v) and F(p.sub.f, v.sub.f, .sub.p, .sub.v, u) which have taken care of the time-varying attack signals and uncertain dynamics.

[0127] FIG. 5 illustrates a diagram 500 providing the leader-follower nominal formation topology 502 for a multi-agent system consisting of three leaders and four followers. The topology 502 demonstrates the interaction and connectivity between the agents during an affine formation maneuver. In this configuration, agents are connected by directed edges, showing the communication pathways that enable leader-follower control in the system.

[0128] The topology 502 consists of seven agents, where agents 1, 2, and 3 represent the leaders, while agents 4, 5, 6, and 7 are designated as the followers. The directed connections between the agents are depicted by lines, representing the nominal formation structure used to maintain coordination during the system's maneuvers.

[0129] Agent 1 is one of the leaders in the system, located on the right side of the topology. It is in communication with agents 2, 3, 4, 5, 6, and 7 via directed edges. These communication connections indicate that Agent 1 is configured for controlling the formation and influencing the motion of all followers, as well as the other leaders. Agent 1, in one implementation, serves as a primary controller in the leader-follower dynamic.

[0130] Agent 2 is another leader in the topology, in communication with agents 1, 4, 5, and 7. These connections demonstrate its role in assisting Agent 1 in maintaining the overall formation, controlling several followers and interacting with Agent 1. The connectivity of Agent 2 ensures that it contributes to the stability and coordination of the system during motion.

[0131] Agent 3, also a leader, is in communication with agents 1, 4, and 6. Similar to Agents 1 and 2, Agent 3 influences the motion of certain followers and maintains direct communication with Agent 1 to ensure proper coordination of the leader-follower structure. The connectivity of Agent 3 provides additional redundancy and robustness to the system's control scheme.

[0132] Agent 4 is a follower in communication with leaders 1, 2, and 3. The directed edges between Agent 4 and the leaders indicates reception of control signals and adjusting its position within the formation accordingly. Interactions of Agent 4 with all three leaders show its contribution in maintaining the geometric structure of the formation.

[0133] Agent 5, another follower, is in communication with leaders 1 and 2. The communication pathways between Agent 5 and the leaders ensure that it remains coordinated within the formation, following the control signals sent by Agents 1 and 2. The connections also indicate that Agent 5 is influenced by multiple leaders, contributing to the overall stability of the system.

[0134] Agent 6, a follower, is in communication with leaders 1 and 3. These connections indicate that Agent 6 is guided by the control inputs from both leaders to maintain its position and trajectory within the formation. The presence of multiple connections to leaders ensures that Agent 6 can respond appropriately to changes in the leader trajectories.

[0135] Agent 7, the final follower, is in communication with leaders 1 and 2. Like the other followers, Agent 7 receives control signals from multiple leaders, ensuring that it remains properly aligned within the formation during maneuvers of the system. The directed edges between Agent 7 and the leaders demonstrate its dependency on the leaders for maintaining its position in the topology.

[0136] A numerical example is illustrated herein to show the efficacy of the various embodiments of the present disclosure. The simulations are carried out on MATLAB/Simulink. The multi-agent system in this study consists of three leaders (N.sub.l=3) and four followers (N.sub.f=4) interacting over the nominal formation topology depicted in FIG. 5. The agents numbered 1,2, and 3 are the leaders while the remaining agents numbered 1,2,3, and 4 are the followers. The nominal configurations of the seven agents are q.sub.1=[40], q.sub.2=[2 1], q.sub.3=[2 1], q.sub.4=[0 1], q.sub.5=[0 1], q.sub.6=[2 1] and q.sub.7=[2 1], and

[00146] $q = {[q_{l}^{T}, q_{f}^{T}]}^{T},$

where

[00147] $q_{l} = {[q_{1}^{T}, q_{2}^{T}, q_{3}^{T}]}^{T} and q_{f} = {[q_{4}^{T}, q_{5}^{T}, q_{6}^{T}, q_{7}^{T}]}^{T} .$

[0137] The target formation maneuvers of the leaders are described by the trajectories in (104). Various maneuvers of the leaders' nominal configurations can be obtained by manipulating A(t) and b(t).

[00148] $\begin{matrix} p_{i}^{} (t) = A (t) q_{i} + b (t), i = 1, .Math., N_{l} & (104) \end{matrix}$

[0138] As par Assumption 1,

[00149] $p_{l} = p_{l}^{} .$

[0139] The motion of the followers is described by the following second-order nonlinear multi-agent systems.

[00150] $\begin{matrix} {\overset{}{p}}_{i} = v_{i} & (105) \end{matrix}$ ${\overset{}{v}}_{i} = [\begin{matrix} h_{i 1} \\ h_{i 2} \end{matrix}] + [\begin{matrix} g_{i 1} & 0 \\ 0 & g_{i 2} \end{matrix}] [\begin{matrix} {\overset{}{u}}_{i 1} \\ {\overset{}{u}}_{i 2} \end{matrix}] + [\begin{matrix} _{i 1} (t) \\ _{i 2} (t) \end{matrix}]$ $i = N_{l} + 1, .Math., N_{l} + N_{f}$ [0140] with

[00151] $\begin{matrix} [\begin{matrix} {\overset{}{u}}_{i 1} \\ {\overset{}{u}}_{i 2} \end{matrix}] = [\begin{matrix} m_{i 1} & 0 \\ 0 & m_{i 2} \end{matrix}] [\begin{matrix} u_{i 1} \\ u_{i 2} \end{matrix}] + [\begin{matrix} _{i 1} \\ _{i 2} \end{matrix}] & (106) \end{matrix}$

where

[00152] $h_{l 1} = 2 p_{i 1}^{2} + v_{i 1} \cos (p_{i 1}), h_{i 2} = 2 p_{i 2}^{2} + v_{i 2} \cos (p_{i 2}), g_{i 1} = \sin (2 v_{i 1}),$ $g_{i 2} = \sin (2 v_{i 2}),_{i 1} (t) = 0.1 2 \sin (t), and_{i 2} (t) = 0.11 \cos (t),$

the loss of effectiveness faults are given by m.sub.i1=0.9, m.sub.i2=0.8, the bias faults are given by .sub.i1=0.1 sin (2t)exp(0.67t), .sub.i2=0.1 cos (2t)exp(0.02t).

[0141] The compromised sensor signals of the followers are given as:

[00153] $\begin{matrix} {\begin{matrix} {\overset{}{p}}_{i} =_{p_{i}} p_{i} \\ {\overset{}{v}}_{i} =_{v_{i}} v_{i} \end{matrix}, i = N_{l} + 1, .Math., N_{l} + N_{f} & (107) \end{matrix}$ [0142] where .sub.p.sub.i=1cos.sup.2(t) and .sub.v.sub.i=1+sin (3.46t).

[0143] The prescribed performance function is designed as:

[00154] $_{i} (t) = {\begin{matrix} 2, & t = t_{n} \\ (2 - 0.2) \exp (\frac{- 4 (t - t_{n})}{1}) + 0.2 & 0 < t - t_{n} < 1 \\ 0.2, & t - t_{n} 1 \end{matrix}$ [0144] where .sub.i(0)=2, .sub.i=0.2.sub.i=4, T.sub.s=1 s, and t.sub.n=10,12,17,23,29,31, and 33 s.
Whenever t=t.sub.n, the prescribed performance function reset to .sub.i(t)=2.

[0145] The stress matrix is computed using the approach as follows:

[00155] $\overline{} = [\begin{matrix} 0.2878 & - 0.2878 & - 0.2878 & 0.1439 & 0.1439 & 0 & 0 \\ - 0.2878 & 0.7194 & 0 & - 0.5756 & 0 & 0 & 0.1439 \\ - 0.2878 & 0 & 0.7194 & 0 & - 0.5756 & 0.1439 & 0 \\ 0.1439 & - 0.5756 & 0 & 0.7194 & - 0.0719 & - 0.2878 & 0 \\ 0.1439 & 0 & - 0.5756 & - 0.0719 & 0.7194 & 0 & - 0.2878 \\ 0 & 0 & 0.1439 & - 0.2878 & 0 & 0.2878 & - 0.1439 \\ 0 & 0.1439 & 0 & 0 & - 0.2878 & - 0.1439 & 0.2878 \end{matrix}]$

[0146] A Gaussian function is chosen as the activation function of the radial basis function neural network. Each of the identifier, actor, and critic neural networks contains five nodes. The centers of the identifier, actor, and critic neural networks are equally spaced within [33], [55], and [55], respectively. The widths of the identifier, actor, and critic neural networks are 1.5, 2.5, and 2.5, respectively. The initial weights of the identifier, actor, and critic neural networks are chosen as .sub.i(0)=0.1.sub.51&.sub.iF(0)=0.1.sub.51, .sub.ia.sub.p(0)=0.2.sub.51&.sub.ia.sub.v(0)=0.2.sub.51, and .sub.ic.sub.p(0)=0.1.sub.51&.sub.ic.sub.v(0)=0.1.sub.51, respectively. The initial positions of the leaders and the followers are set as q.sub.1(0)=[4 0].sup.T, q.sub.2(0)=[2 1].sup.T and q.sub.3(0)=[2 1].sup.T, q.sub.4(0)=[0.5 1.5].sup.T, q.sub.5(0)=[0.5 0.5].sup.T, q.sub.6(0)=[1.75 0.75].sup.T, q.sub.7(0)=[2 2.4].sup.T, respectively, with q(0)=[q.sub.1(0).sup.T, q.sub.2(0).sup.T, . . . , q.sub.7(0).sup.T].sup.T=[q.sub.1(0).sup.Tq.sub.f(0).sup.T].sup.T.

[0147] The parameters of the optimized virtual and real controllers are given as

[00156] $K_{p} = 30_{8 8}, C_{v} = 2 2_{8 8}, K_{v} = 4 1_{8 8},_{} =_{F} = 2_{8 8},_{p} =_{v} = 0.3,$ $_{a_{p}} =_{a_{v}} = 0.5,_{c_{p}} =_{c_{v}} = 0.8$

[0148] The simulation results are illustrated with reference to subsequent figures.

[0149] FIG. 6A depicts a behavior 600A of the optimized virtual controllers in the x-axis for the four followers over time, as indicated on the x-axis, measured in seconds (t), while the y-axis represents the values of the virtual controllers in arbitrary units.

[0150] Curve 602 illustrates the initial response of the virtual controller for the first follower. It shows that the virtual controller experiences minor oscillations before stabilizing around t=5 seconds. Curve 602 reflects how a virtual controller of the first follower effectively suppresses the disturbances and maintains stability. Curve 604 represents a virtual controller the second follower, which follows a similar pattern to curve 602. After initial fluctuations, a virtual controller of the second follower stabilizes and remains steady for the remainder of the time period. Curve 606 demonstrates the virtual controller for the third follower. Curve 606 shows a series of transient states before the controller achieves stability after t=10 seconds, indicating the robustness of the system in handling fluctuations. Curve 608 depicts a virtual controller of the fourth follower, which also encounters initial disturbances but stabilizes shortly thereafter, demonstrating the efficiency of the proposed control mechanism in mitigating transient disturbances.

[0151] FIG. 6B shows a graph 600B of the optimized virtual controllers in the y-axis for the four followers over time, with the x-axis denoting time (t) in seconds and the y-axis indicating the values of the virtual controllers in arbitrary units.

[0152] Curve 610 illustrates the virtual controller for the first follower, demonstrating minor oscillations before reaching stability around t=5 seconds, indicating the controller's effective performance in the y-axis. Curve 612 represents the second follower's virtual controller, stabilizing after a similar period of transient disturbances as seen in curve 610. The virtual controller settles into a steady state, showing the system ability to handle initial fluctuations. Curve 614 tracks the virtual controller for the third follower, which exhibits larger oscillations but stabilizes after approximately 10 seconds. Curve 614 shows resilience of the controller in the y-axis under the influence of disturbances. Curve 616 depicts a virtual controller of the fourth follower in the y-axis, showing that, despite transient fluctuations, the controller stabilizes after a brief period, mirroring the patterns seen in the other followers.

[0153] FIG. 7A presents a graph 700A showing the control inputs in the x-axis for the four followers over time, with the x-axis indicating time (t) in seconds and the y-axis showing the control input values in arbitrary units.

[0154] Curve 702 represents a control input of the first follower in the x-axis. The curve indicates a rapid correction after the initial disturbance, achieving stability around t=5 seconds and showing effective control input handling. Curve 704 tracks a control input of the second follower, which follows a similar trend as curve 702, stabilizing after initial fluctuations and demonstrating the efficiency of the control system. Curve 706 represents a control input of the third follower, showing significant disturbances between t=20 seconds and t=30 seconds, but eventually stabilizing, indicating how the control input corrects for larger fluctuations. Curve 708 depicts a control input the fourth follower, which follows a similar pattern of early disturbances and subsequent stabilization, reinforcing an ability of the system to manage control inputs effectively.

[0155] FIG. 7B illustrates a graph 700B the control inputs in the y-axis for the four followers, with the x-axis representing time (t) in seconds and the y-axis showing the control input values in arbitrary units. The control inputs in the y-axis are represented by curve 710, curve 712, curve 714, and curve 716.

[0156] Curve 710 shows a control input of the first follower in the y-axis, stabilizing quickly after initial disturbances around t=5 seconds, indicating robustness of the system in the y-axis. Curve 712 represents a control input of the second follower in the y-axis, following a similar pattern of stabilization after early fluctuations, confirming the effectiveness of the control mechanism. Curve 714 tracks a control input of the third follower, which experiences larger fluctuations but stabilizes over time, demonstrating an ability of the system to correct significant disturbances. Curve 716 reflects a control input of the effectiveness of fourth follower, following the pattern of initial disturbances followed by stabilization, showing the control input across multiple agents.

[0157] FIG. 8A illustrates a graph 800A the time-varying trajectories of the three leaders and four followers in the x-axis, with the x-axis representing time (t) in seconds and the y-axis representing the positions of the agents in the x-axis in meters.

[0158] Curve 802 represents a trajectory of the first leader, showing smooth transitions after initial disturbances and tracking a stable path. Curve 804 represents a trajectory of the second leader, which closely follows a path of the first leader, showing minor deviations but maintaining stability. Curve 806 depicts a trajectory of the third leader, which also stabilizes after initial fluctuations, maintaining a formation of the leader. Curve 808 tracks a trajectory of the first follower, demonstrating effective tracking of the leaders, with only minor deviations corrected over time.

[0159] Curve 810 represents a trajectory of the second follower, showing similar tracking performance with stable behavior after early disturbances. Curve 812 shows a trajectory of the third follower, which mirrors the performance of the other followers, stabilizing quickly after initial fluctuations. Curve 814 reflects a trajectory of the fourth follower, demonstrating consistent tracking of the leaders with minor deviations corrected smoothly.

[0160] FIG. 8B presents the time-varying trajectories 800B of the leaders and followers in the y-axis, with the x-axis representing time (t) in seconds and the y-axis indicating the positions of the agents in the y-axis in meters.

[0161] Curve 816 shows a trajectory of the first leader in the y-axis, indicating smooth transitions after initial disturbances, following a stable path. Curve 818 represents a trajectory of the second leader, which follows a similar pattern of stability after early fluctuations. Curve 820 depicts a trajectory of the third leader, which stabilizes after early disturbances, maintaining alignment with the other leaders. Curve 822 represents a trajectory of the first follower, demonstrating effective tracking of the leaders with minor deviations that are corrected over time. Curve 824 tracks a trajectory of the second follower, showing similar tracking performance and stability after initial disturbances. Curve 826 reflects a trajectory of the third follower, which quickly stabilizes after minor deviations. Curve 828 demonstrates a trajectory of the fourth follower, showing consistent tracking of the leaders with small corrections.

[0162] FIG. 9A depicts a graph 900A the tracking errors of the leaders and followers in the x-axis over time. The x-axis represents time (t) in seconds, and the y-axis represents the tracking errors in the x-axis in arbitrary units.

[0163] Curve 902 represents the tracking error for the first leader. Initially, the system encounters minor fluctuations in tracking, but the tracking error stabilizes after t=10 seconds, indicating efficient control over the x-axis for the first leader. Curve 904 shows the tracking error for the second leader. Similar to curve 902, a tracking error of the second leader experiences slight disturbances, followed by stabilization, showing ability of the system to mitigate transient states in the x-axis.

[0164] Curve 906 represents a tracking error of the third leader. After experiencing more significant deviations around t=20 seconds, the tracking error stabilizes as the system adjusts to maneuvers of the leader. Curve 908 represents the first follower's tracking error in the x-axis. The curve reflects slight instability before achieving a steady state after t=15 seconds, demonstrating effective control over a position of the first follower. Curve 910 shows the tracking error for the second follower. The system corrects significant deviations in a trajectory of the second follower, stabilizing the error around t=25 seconds. Curve 912 depicts a tracking error of the third follower, which undergoes transient fluctuations before stabilizing after t=30 seconds, showing the system's success in managing the tracking errors of the followers.

[0165] FIG. 9B illustrates a graph 900B the tracking errors of the leaders and followers in the y-axis over time. The x-axis represents time (t) in seconds, and the y-axis represents the tracking errors in the y-axis in arbitrary units. The performance is represented by curve 914, curve 916, curve 918, curve 920, curve 922, and curve 924. Curve 914 reflects the tracking error for the first leader in the y-axis, demonstrating slight disturbances at the beginning but stabilizing after t=5 seconds as the system maintains control. Curve 916 represents the second leader's tracking error in the y-axis, which initially shows minor deviations but stabilizes quickly after t=10 seconds, indicating robust performance in tracking.

[0166] Curve 918 represents the third leader's tracking error, which experiences fluctuations around t=15 seconds but stabilizes over time, showing the system's ability to manage transient disturbances in the y-axis. Curve 920 depicts the first follower's tracking error, showing initial oscillations before stabilizing around t=20 seconds, reflecting the system's control over the y-axis for the first follower. Curve 922 tracks the second follower's tracking error in the y-axis, showing more substantial deviations but eventually stabilizing after t=25 seconds, indicating the system's capacity to correct errors. Curve 924 represents the third follower's tracking error, stabilizing after t=30 seconds, demonstrating that the system effectively manages transient errors in the y-axis.

[0167] FIG. 10 illustrates an affine formation maneuvers 1000 of the leader-follower system under adversarial conditions. The x-axis represents the position in meters, and the y-axis represents the position in meters. The performance of the leader-follower system during affine formation maneuvers is depicted through curve 1002 and curve 1004.

[0168] Curve 1002 represents the trajectories of the leader agents during the affine formation maneuvers. Curve 1002 shows that despite the presence of external disturbances, such as deception attacks and actuator faults, the leaders maintain their predefined formation shape, indicated by the consistent path shown in the graph. Curve 1004 illustrates the trajectories of the follower agents. While the followers experience slight deviations, particularly when subjected to actuator faults and deception attacks, curve 1004 shows that the system's control strategy enables the followers to track the leaders effectively, maintaining the desired formation shape.

[0169] FIG. 11A presents the time-varying trajectories 1100A of the leaders and followers in the x-axis, with the x-axis representing time (t) in seconds and the y-axis representing the positions of the agents in the x-axis in meters.

[0170] Curve 1102 depicts the first leader's trajectory in the x-axis. The curve shows a smooth decline over time, indicating controlled movements of the leader along the x-axis as the system maintains stability. Curve 1104 represents a trajectory of the second leader, which follows a similar path to the first leader, demonstrating coordinated control between the leaders in the x-axis. Curve 1106 illustrates a trajectory of the third leader. The curve reflects consistent tracking of the formation maneuver, with minor deviations corrected over time. Curve 1108 represents a trajectory of the first follower, demonstrating effective tracking of the leaders and maintaining proximity to the predefined formation. Curve 1110 shows a trajectory of the second follower, which aligns closely with the other followers, indicating effective control. Curve 1112 represents a trajectory of the third follower, which also follows a stable path after initial deviations. Curve 1114 depicts a trajectory of the fourth follower, which maintains consistent tracking of the leaders with minor adjustments over time.

[0171] FIG. 11B presents the time-varying trajectories 1100AB of the leaders and followers in the y-axis, with the x-axis representing time (t) in seconds and the y-axis indicating the positions of the agents in the y-axis in meters. Curve 1116 represents a trajectory of the first leader in the y-axis. The curve reflects smooth transitions with minor disturbances corrected quickly, a position of maintaining the leader. Curve 1118 shows a trajectory the second leader, which closely follows a path of the first leader, demonstrating coordinated movement along the y-axis. Curve 1120 depicts a trajectory of the third leader, which follows a similar stable path as the other leaders. Curve 1122 represents a trajectory of the first follower, maintaining alignment with the leaders and showing effective tracking despite minor deviations. Curve 1124 illustrates a trajectory the second follower, which demonstrates consistent tracking of the leaders with slight corrections. Curve 1126 shows a trajectory of the third follower which stabilizes after initial disturbances, indicating effective control in the y-axis. Curve 1128 depicts the fourth follower's trajectory, following the leaders closely and maintaining the required formation.

[0172] FIG. 12 shows affine formation maneuvers 1200 of the leader-follower system with another existing control method. The x-axis represents the position in meters, and the y-axis represents the position in meters. Curve 1202 represents the trajectories of the leader agents using the previous control method. Curve 1202 indicates that the leaders could not maintain the desired formation under external disturbances such as actuator faults and deception attacks. The instability in the formation is evident from the deviations in the path. Curve 1204 represents the trajectories of the follower agents. Curve 1204 shows the inability of the previous control method to prevent the followers from diverging from their assigned positions. The followers fail to maintain the desired formation and exhibit substantial deviations, reinforcing the limitations of the prior art in handling external disturbances effectively.

[0173] FIG. 13 depicts a hardware description of the processing circuit 1326 according to exemplary embodiments. In one implementation, the functions and processes of the mobile device 130 may be implemented by one or more respective processing circuits 1326. A processing circuit includes a programmed processor as a processor includes circuitry. A processing circuit may also include devices such as an application-specific integrated circuit (ASIC) and conventional circuit components arranged to perform the recited functions. Note that circuitry refers to a circuit or system of circuits. Herein, the circuitry may be in one computer system (as illustrated in FIGS. 13 and 14) or may be distributed throughout a network of computer systems. Hence, the circuitry of the server computer system 120, for example, may be in only one server or distributed among different servers/computers.

[0174] In FIG. 13, the processing circuit 1326 includes a Mobile Processing Unit (MPU) 1300 which performs the processes described herein. The process data and instructions may be stored in memory 1302. These processes and instructions may also be stored on a portable storage medium or may be stored remotely. The processing circuit 1326 may have a replaceable Subscriber Identity Module (SIM) 1301 that contains information that is unique to the network service of the mobile device 130.

[0175] Further, the claimed advancements are not limited by the form of the computer-readable media on which the instructions of the inventive process are stored. For example, the instructions may be stored in FLASH memory, Secure Digital Random Access Memory (SDRAM), Random Access Memory (RAM), Read Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), solid-state hard disk or any other information processing device with which the processing circuit 1326 communicates, such as a server or computer.

[0176] Further, the claimed advancements may be provided as a utility application, background daemon, or component of an operating system, or combination thereof, executing in conjunction with MPU 1300 and a mobile operating system such as Android, Microsoft Windows 10 Mobile, Apple iOS and other systems known to those skilled in the art.

[0177] In order to achieve the processing circuit 1326, the hardware elements may be realized by various circuitry elements, known to those skilled in the art. For example, MPU 1300 may be a Qualcomm mobile processor, a Nvidia mobile processor, an Atom processor from Intel Corporation of America, a Samsung mobile processor, or an Apple A7 mobile processor, or may be other processor types that would be recognized by one of ordinary skill in the art. Alternatively, the MPU 1300 may be implemented on a Field-Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD) or using discrete logic circuits, as one of ordinary skill in the art would recognize. Further, MPU 1300 may be implemented as multiple processors cooperatively working in parallel to perform the instructions of the inventive processes described above.

[0178] The processing circuit 1326 in FIG. 13 also includes a network controller 1306, such as an Intel Ethernet PRO network interface card from Intel Corporation of America, for interfacing with network 1324. As can be appreciated, the network 1324 can be a public network, such as the Internet, or a private network such as LAN or WAN network, or any combination thereof and can also include PSTN or ISDN sub-networks. The network 1324 can also be wired, such as an Ethernet network. The processing circuit may include various types of communications processors for wireless communications including 3G, 4G, and 5G wireless modems, WiFi, Bluetooth, GPS, or any other wireless form of communication that is known.

[0179] The processing circuit 1326 includes a Universal Serial Bus (USB) controller 1325 which may be managed by the MPU 1300.

[0180] The processing circuit 1326 further includes a display controller 1308, such as a NVIDIA GeForce GTX or Quadro graphics adaptor from NVIDIA Corporation of America for interfacing with display 1310. An I/O interface 1312 interfaces with buttons 1314, such as for volume control. In addition to the I/O interface 1312 and the display 1310, the processing circuit 1326 may further include a microphone 1341 and one or more cameras 1331. The microphone 1341 may have associated circuitry 1340 for processing the sound into digital signals. Similarly, the camera 1331 may include a camera controller 1330 for controlling image capture operation of the camera 1331. In an exemplary aspect, the camera 1331 may include a Charge Coupled Device (CCD). The processing circuit 1326 may include an audio circuit 1342 for generating sound output signals and may include an optional sound output port.

[0181] The power management and touch screen controller 1320 manage power used by the processing circuit 1326 and touch control. The communication bus 1322, which may be an Industry Standard Architecture (ISA), Extended Industry Standard Architecture (EISA), Video Electronics Standards Association (VESA), Peripheral Component Interface (PCI), or similar, for interconnecting all of the components of the processing circuit 1326. A description of the general features and functionality of the display 1310, buttons 1314, as well as the display controller 1308, power management controller 1320, network controller 1306, and I/O interface 1312 is omitted herein for brevity as these features are known.

[0182] FIG. 14 is a block diagram illustrating an example computer system for implementing the machine learning training and inference methods according to an exemplary aspect of the disclosure. The computer system may be an AI workstation running an operating system, for example Ubuntu Linux OS, Windows, a version of Unix OS, or Mac OS. The computer system 1400 may include one or more central processing units (CPU) 1450 having multiple cores. The computer system 1400 may include a graphics board 1412 having multiple GPUs, each GPU having GPU memory. The graphics board 1412 may perform many of the mathematical operations of the disclosed machine learning methods. The computer system 1400 includes main memory 1402, typically random access memory RAM, which contains the software being executed by the processing cores 1450 and GPUs 1412, as well as a non-volatile storage device 1404 for storing data and the software programs. Several interfaces for interacting with the computer system 1400 may be provided, including an I/O Bus Interface 1410, Input/Peripherals 1418 such as a keyboard, touch pad, mouse, Display Adapter 1416 and one or more Displays 1408, and a Network Controller 1406 to enable wired or wireless communication through a network 99. The interfaces, memory and processors may communicate over the system bus 1426. The computer system 1400 includes a power supply 1421, which may be a redundant power supply.

[0183] In some embodiments, the computer system 1400 may include a server CPU and a graphics card by NVIDIA, in which the GPUs have multiple CUDA cores. In some embodiments, the computer system 1400 may include a machine learning engine 1412.

[0184] The present disclosure introduces an optimized, secure, fault-tolerant control strategy with prescribed performance for affine formation maneuvers in nonlinear, second-order, leader-follower multi-agent systems subject to actuator faults and deception attacks. A novel prescribed performance function is proposed, characterized by a preassigned convergence time, capable of resetting whenever the target formation maneuver alters, thereby maintaining the new transient states of leader-follower tracking errors within predefined bounds. Subsequently, an optimized backstepping control approach is developed for the system, leveraging a streamlined identifier-actor-critic reinforcement learning framework. Within this scheme, the identifier network estimates the system's nonlinear dynamics, the actor network executes the control actions, and the critic network evaluates the control performance.

[0185] The above-described hardware description is a non-limiting example of corresponding structure for performing the functionality described herein.

[0186] Numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that the invention may be practiced otherwise than as specifically described herein.

SYSTEM, METHOD, AND COMPUTER READABLE MEDIUM FOR AFFINE FORMATION MANEUVERING OF NONLINEAR MULTI-AGENT SYSTEMS WITH FAULT-TOLERANT SECURE OPTIMIZED BACKSTEPPING CONTROL USING REINFORCEMENT LEARNING

Assignee

Inventors

Cpc classification

Classification Explorer

G05D1/86

PHYSICS

Classification Explorer

G05D1/695

PHYSICS

Classification Explorer

G05D2101/15

PHYSICS

Classification Explorer

G05D1/226

PHYSICS

Classification Explorer

G05D1/857

PHYSICS

Classification Explorer

G05B13/027

PHYSICS

Classification Explorer

G05D1/854

PHYSICS

International classification

Classification Explorer

G05D1/695

PHYSICS

Classification Explorer

G05B13/02

PHYSICS

Classification Explorer

G05D1/226

PHYSICS

Classification Explorer

G05D1/85

PHYSICS

Classification Explorer

G05D1/86

PHYSICS

Abstract

Claims

Description