SPIN-STABILIZED STEERABLE PROJECTILE CONTROL

20240280352 ยท 2024-08-22

Assignee

Inventors

Cpc classification

International classification

Abstract

A computer-implemented method of training a machine learning, ML algorithm to control spin-stabilized steerable projectiles is described. The method comprises: obtaining training data including respective policies and corresponding trajectories of a set of spin-stabilized steerable projectiles including a first projectile, wherein each policy relates to steering a projectile of the set thereof towards a target and wherein each corresponding trajectory comprises a series of states in a state space of the projectile (S2001); and training the ML algorithm comprising determining relationships between the respective policies and corresponding trajectories of the projectiles of the set thereof based on respective results of comparing the trajectories and the targets (S2002).

Claims

1. A computer-implemented method of training a machine learning, ML algorithm to control spin-stabilized steerable projectiles, the method comprising: receiving_training data including respective policies and corresponding trajectories of a set of spin-stabilized steerable projectiles, wherein each respective policy relates to steering a projectile of the set towards a target of one or more targets, and wherein each corresponding trajectory comprises a series of states in a state space of the projectile; and training the ML algorithm comprising determining relationships between the respective policies and corresponding trajectories of the projectiles of the set based on respective results of comparing the trajectories and the one or more targets.

2. The method according to claim 1, wherein the series of states includes finite states of the projectile.

3. The method according to claim 1, wherein steering the projectile towards the target comprises actioning state transitions of the projectile.

4. The method according to claim 1, wherein each corresponding trajectory comprises a series of portions correlating with the series of states.

5. The method according to claim 1, wherein comparing the trajectories and the targets comprises determining accuracies and/or precisions of the trajectories with respect to the targets.

6. The method according to claim 1, comprising programmatically generating the projectiles.

7. The method according to claim 1, comprising measuring the trajectories.

8. The method according to claim 1: wherein the ML algorithm comprises and/or is a reinforcement learning, RL, agent; and wherein training the ML algorithm comprises training the agent, comprising: (a) actioning, by the agent, a given projectile of the set according to a respective policy, wherein the policy is of an action space of the agent, comprising steering the given projectile towards a target, thereby defining a corresponding trajectory comprising a series of states in a state space of the given projectile and thereby obtaining respective training data; (b) determining a relationship between the policy and the trajectory based on a result of comparing the trajectory and the target and updating the policy based on the result; and (c) repeating steps (a) and (b) for the set of projectiles, using the updated policy.

9. A computer-implemented method of controlling a fired spin-stabilized steerable projectile, the method comprising: controlling, by a machine learning, ML, algorithm trained according to claim 1, the fired spin-stabilized steerable projectile according to a policy, comprising steering the fired spin-stabilized steerable projectile towards a target.

10. The method according to claim 8, wherein the given projectile comprises a front ogive section, an aft section and a command module in communication with the agent; wherein the front ogive section is rotatably connected to the aft section by a coupling device and wherein the front ogive section comprises an asymmetric surface; and wherein angular rotation of the front ogive section is selectively adjustable relative to the aft section by commands from the command module, responsive to the agent, to the coupling device, whereby the asymmetric surface exerts an imbalance upon the given projectile to control the trajectory of the given projectile.

11. The method according to claim 10, wherein the given projectile is arrangeable in: a first arrangement, wherein the coupling device is coupled, whereby the front ogive section spins at the same angular rotation as the aft section and wherein the given projectile travels in a first helical trajectory; and a second arrangement, wherein the coupling device is decoupled, whereby the front ogive section spins at a different angular rotation relative to the aft section and wherein the given projectile travels in a second helical trajectory, wherein first helical trajectory comprises a smaller radius than the second helical trajectory; wherein the first arrangement and the second arrangement are respectively represented by a first state and a second state in the state space of the given projectile.

12. A system comprising a steerable projectile and a computer, the system comprising: a processor and a memory; and a machine learning, ML, algorithm trained according to claim 1, and stored in the memory and executable by the processor: wherein the projectile comprises a front ogive section, an aft section and a command module communicable with a trained machine learning, ML, algorithm; wherein the front ogive section is rotatably connected to the aft section by a coupling device and wherein the front ogive section comprises an asymmetric surface; and wherein angular rotation of the front ogive section is selectively adjustable relative to the aft section by commands from the command module, responsive to the trained ML algorithm, to the coupling device, whereby the asymmetric surface exerts an imbalance upon the projectile to control the trajectory of the projectile.

13. The system according to claim 12, comprising a targeting system.

14. A non-transient computer-readable storage medium comprising instructions which, when executed by a computer comprising a processor and a memory, cause the computer to perform a process for training a machine learning, ML algorithm to control spin-stabilized steerable projectiles, the process comprising: receiving training data including respective policies and corresponding trajectories of a set of spin-stabilized steerable projectiles, wherein each respective policy relates to steering a projectile of the set towards a target of one or more targets, and wherein each corresponding trajectory comprises a series of states in a state space of the projectile; and training the ML algorithm comprising determining relationships between the respective policies and corresponding trajectories of the projectiles of the set based on respective results of comparing the trajectories and the one or more targets.

15. The non-transient computer-readable storage medium according to claim 14, wherein: the series of states includes finite states of the projectile; steering the projectile towards the target comprises actioning state transitions of the projectile; each corresponding trajectory comprises a series of portions correlating with the series of states; and/or comparing the trajectories and the targets comprises determining accuracies and/or precisions of the trajectories with respect to the targets.

16. The non-transient computer-readable storage medium according to claim 14, the process comprising: programmatically generating the projectiles; and/or measuring the trajectories.

17. The non-transient computer-readable storage medium according to claim 14, wherein the ML algorithm comprises and/or is a reinforcement learning, RL, agent, and wherein training the ML algorithm comprises training the agent, the process comprising: (a) actioning, by the agent, a given projectile of the set according to a respective policy, wherein the policy is of an action space of the agent, comprising steering the given projectile towards a target, thereby defining a corresponding trajectory comprising a series of states in a state space of the given projectile and thereby obtaining respective training data; (b) determining a relationship between the policy and the trajectory based on a result of comparing the trajectory and the target and updating the policy based on the result; and (c) repeating (a) and (b) for the set of projectiles, using the updated policy.

18. A system for controlling a fired spin-stabilized steerable projectile, the system comprising: a machine learning, ML, algorithm trained according to claim 14, wherein: the fired projectile comprises a front ogive section, an aft section and a command module in communication with the trained ML algorithm; the front ogive section is rotatably connected to the aft section by a coupling device and wherein the front ogive section comprises an asymmetric surface; and angular rotation of the front ogive section is selectively adjustable relative to the aft section by commands from the command module, responsive to the ML algorithm, to the coupling device, whereby the asymmetric surface exerts an imbalance upon the fired projectile to control the trajectory of the fired projectile.

19. The system according to claim 18, wherein: the fired projectile is arrangeable in a first arrangement, wherein the coupling device is coupled, whereby the front ogive section spins at the same angular rotation as the aft section and wherein the fired projectile travels in a first helical trajectory; the fired projectile is arrangeable in a second arrangement, wherein the coupling device is decoupled, whereby the front ogive section spins at a different angular rotation relative to the aft section and wherein the fired projectile travels in a second helical trajectory, wherein first helical trajectory comprises a smaller radius than the second helical trajectory; and the first arrangement and the second arrangement are respectively represented by a first state and a second state in the state space of the fired projectile.

20. A non-transient computer-readable storage medium comprising instructions which, when executed by a computer comprising a processor and a memory, cause the computer to perform a process for training a machine learning, ML algorithm to control spin-stabilized steerable projectiles, the process comprising: receiving training data including respective policies and corresponding trajectories of a set of spin-stabilized steerable projectiles, wherein each respective policy relates to an action for steering a given projectile of the set towards a target, and wherein each corresponding trajectory comprises a series of states in a state space of the given projectile; actioning, by the ML algorithm, the given projectile of the set according to a respective policy, thereby defining a corresponding trajectory comprising a series of states in a state space of the given projectile and thereby obtaining respective training data; determining a relationship between the policy and the trajectory based on a result of comparing the trajectory and the target and updating the policy based on the result; and repeating the actioning and determining for the set of projectiles, using the updated policy.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0143] For a better understanding of the invention, and to show how exemplary embodiments of the same may be brought into effect, reference will be made, by way of example only, to the accompanying diagrammatic Figures, in which:

[0144] FIG. 1 schematically depicts a reinforcement learning (RL) agent;

[0145] FIG. 2 schematically depicts a RL agent using an actor/critic RL algorithm.

[0146] FIGS. 3A and 3B schematically depict effects of adding a reward proportional to distance;

[0147] FIGS. 4A and 4B schematically depict effects of adding a time penalty; FIG. 5 schematically depicts a reward function;

[0148] FIGS. 6A and 6B schematically depict reward function dependency on time t and error rate {dot over (d)};

[0149] FIG. 7 schematically depicts a Simulink block diagram for velocity based actuation mechanism and Guidance Laws;

[0150] FIG. 8 shows training for the first 1?10.sup.5 episodes of the implementation test using a DQN agent;

[0151] FIG. 9 schematically depicts a 1D latax problem;

[0152] FIG. 10 is a graph of reward as a function of episode for training for basic 1D control model;

[0153] FIG. 11 a graph of reward as a function of episode for training for single channel environment and actuator lags;

[0154] FIG. 12 is a graph of positional error as a function of time, showing performance trajectory for trained agent with 0.02 s actuator lag;

[0155] FIG. 13 schematically depicts a projectile according to an exemplary embodiment;

[0156] FIG. 14 schematically depicts a force diagram of the projectile of FIG. 1.

[0157] FIG. 15A and 15B schematically depict a helix trajectory plot of a rifled projectile;

[0158] FIG. 16 schematically depicts a system of a rifled projectile fired from an artillery gun according to an exemplary embodiment;

[0159] FIG. 17 shows a system of a rifled projectile fired from a hand held weapon according to an exemplary embodiment;

[0160] FIG. 18A is a graph of terminal dispersion for projectiles according to a conventional ballistic guidance law; and FIG. 18B is a graph of terminal dispersion for projectiles controlled according to an exemplary embodiment;

[0161] FIG. 19 schematically depicts a method according to an exemplary embodiment; and

[0162] FIG. 20 schematically depicts a method according to an exemplary embodiment.

DETAILED DESCRIPTION OF THE DRAWINGS

[0163] Generally, machine learning (ML) is concerned with the development of algorithms which a computer can use to complete a task optimally, without being given explicit instructions on how to complete said task. Reinforcement learning (RL) is a specific type of ML, where an agent derives an optimal policy to maximise the future cumulative reward of every possible action.

[0164] FIG. 1 shows a diagram for operation of a typical RL agent. The agent uses a certain policy ?(s, a), to evaluate which action a it should take given the current state s of the system, in order to maximise its expected reward r. To determine what the optimum policy should be, it uses a reinforcement learning algorithm, to change the policy by comparing the actual to expected reward. RL algorithms include model-based RL and model-free RL.

[0165] In model-based RL, the agent learns or is provided with a function which maps state transitions. In the majority of cases, a ground truth model will not be available to the agent; i.e. the model used to represent the system may not perfectly represent the real-world environment. Where the model is not a perfect representation of ground-truth, there are usually biases in the model, which the agent may exploit to maximise rewards, but which may not translate to real-world performance.

[0166] Q-learning and policy optimisation are both types of model-free reinforcement learning algorithms. Policy optimisation methods represent the policy in terms of neural network parameters ?, i.e. policy ?.sub.?(a|s). The policy ?.sub.? is then maximised against the neural network parameters ? using either gradients ascent or local maximization, for example. This allows optimisation for any chosen set of ?, but may be hindered if the end performance of the model cannot be quantified in terms of the chosen neural network parameters ?. In Q-learning, the agent uses a Q-value Q(s, a) in addition to the policy. The Q-value of a given action represents the expected reward from all successive actions in the current state. The action with the highest Q-value Q (s, a) indicates it leads to the highest cumulative reward. We define the Optimal Action-value function Q*(s, a), as a function that returns the highest average Q-value Q(s, a) of every action given the current state. Q-learning methods learn an approximator function Qe(s, a) which is updated during training so the Q-value Q(s, a) more accurately represent the reward and approach the optimal action function. This update process may use the Bellman equation:

[00001] Q ( s , a ) new = Q ( s , a ) + ? [ r ( s , a ) + ? max a [ Q * ( s , a ) ] - Q ( s , a ) ] ( 1 )

with learning rate ?, reward r(s, a), discount factor ? and updated Q-value Q(s, a).sub.new.

TABLE-US-00001 TABLE 1 Terminology for Q-Learning environment Term Definition Action A parameter of the environment the agent can change in (a ? A) order to cause a desirable change. In Deep Q-learning the agent can be comprised of an actor and critic network. Actor A neural network responsible in place of the operational policy Agent The operator which takes an action to change the state of the environment Critic A neural network which evaluates the performance of the actor based on the new reward and is capable of adjusting the actor to optimise the new reward Discount A scalar factor between 0 and 1 sets the importance of factor (?) immediate (? .fwdarw. 0) or long term reward (? .fwdarw. 1). Environment The external place where the system evolves Learning Rate A parameter which determines the step size of each iteration (?) when minimising a loss function Loss function A function which maps an event to a real number characterising the cost associated with that particular event Observation Parameters or measurements of the system which are taken from the environment and passed to the agent Policy The decision-making function used by the agent which (?(s|a)) calculates the action that provides the maximum reward Reward A scalar value representing the desirability of a state RL Algorithm The method used by the agent to optimise the policy according to the received reward Q-value The cumulative reward which would be expected if an agent (Q(s, a)) in a state s, performed an action a and continued to operate under the current policy ? Reward An equation which calculates the reward for a given Function statelaction combination. State (s ? S) A particular combination of a given Observation and actions

[0167] In relatively simple systems, there may be a computationally manageable amount of states and actions within the environment. A common approach for the Q-function is to use a lookup table, which maps every state-action pair to a Q-value, which is then updated with every iteration of the loop. If the system is significantly complicated, or the states are continuous then a lookup table is no longer practical and a new function must be used. Deep Q-Neural Network learning (DQN) is a variant of Q-learning which uses a neural network to approximate the Q-function for a given state-action pair. There are also many other variants of Q-learning, such as Fuzzy Q-learning.

[0168] Another common structure in RL algorithms is the actor-critic method. Here, the actor can be thought of as the traditional policy ?(s, a), which determines the optimal action to maximise the expected reward given the current state. In general, the critic will in some way evaluate how well the actor is performing and will provide the actor with feedback to adjust its performance based on this; i.e. the critic will compute a value function, which assists the actor in learning the optimal policy. Q-learning is an example of this, where the Q-value is what encapsulates the actor performance information. The critic takes the current state and the action from the actor and use these to compute and expected reward. Then in compares the expected value to the actual reward once the action outputted from the actor has been fed to the environment. Common algorithms including actor-critic methods are A2C and the A3C used by Deepmind. FIG. 2 shows a DQN agent using an algorithm where neural networks are used for both the actor and critic. Using neural networks instead of traditional functions allows the agent to handle a very large domain.

[0169] A general problem with reinforcement learning is that an agent which is perfectly trained in the virtual environment will completely fail to perform when it is implemented into a real world system. This is because even the most accurate model is still not accurate enough to portray the stochastic nature of the real world. To combat this, a methodology is being used where the observations are intentionally perturbed during the training process to emulate real world noise. This done by means of an adversary, which introduces perturbations according to its own policy. It has been shown that under such circumstances, algorithms can be written which are able to successfully mitigate the impact perturbations have on the training procedure successfully.

[0170] When machine learning is being used for control, it is advantageous to make the system it must control as simple as possible. A complex system requires a large neural network to be able to process the different system states and interpret the correlation between desirable actions and the specific parameter set which caused them. Also, the environment the agent is trained in should be as similar as possible to the environment it will operate in.

[0171] This problem is perfectly suited for the application of an Al controller. Deep Q-learning agents have been demonstrated to perform at least as well, if not considerably better than humans in a variety of arcade style games. Deep Deterministic Policy Gradient (DDPG) methods allow continuous control of multiple actions which can be used here for a GL implementation.

[0172] The concept of reinforcement learning is that the agent will, for a given system state, use a policy to determine which action it should take to maximise a reward. This reward is calculated from the reward function R. The reward function does not have to contain any of the observations the agent makes of the environment or be any states of the system. Since the reward is computed externally, the reward may be completely arbitrary since the purpose of the reward function is to characterise the required behaviour of the agent. It can use a reward to reinforce good behaviour or a penalty to penalise undesirable behaviour. In general, rewards incentivise the agent to keep doing what it is doing to accumulate reward, while penalties cause the agent to attempt to reach a terminal state as quickly as possible to minimise loss.

[0173] By design, the policy used by the agent should maximise the expected reward by any means necessary. Quite characteristic of machine learning is the concept of a local minima, where the agent has learnt to exploit a particular aspect of the environment to increase its short term reward. It is possible for the agent to continue exploration and navigate out of this local minima, but the agent may continue the exploitation if the training does not contain sufficient episodes. Alternatively, the gradient between the local minima and the global maxima may be so great that the chances of the agent exploring through it is very low, even with sufficient episodes. As such, the reward function should be chosen very carefully and may even require different iterations after observing the results of agent training.

[0174] If rewards are only given for achieving a goal, the agent may never fully explore the environment to attain the reward and even if it does, it may happen very slowly. To rectify this, additional reward can be given for behaviour which tends towards the final goal but even this must be carefully chosen. If the reward is given in finite chunks then the same problem will arise as with only rewarding success, the agent will learn much slower. As such, the given reward for good behaviour should be continuous where possible, with a bonus given for success. This is the same for penalties, where bad behaviour should be penalised continuously with a substantial negative reward accompanying a failure. A penalty henceforth refers to a reward penalty, i.e. a penalty of ?5 does not equate to a reward of +5, rather a penalty of ?5 is the same as a reward of ?5 but with the associated negative connotations. A common idea is to reduce the scope of the search by prematurely terminating an episode if the parameters stray outside a certain range, where a large penalty will accompany the termination. This should be tested during implementation, as a successfully trained agent should still achieve its goal when operating outside of the given range.

[0175] Take for example a simplified robot golf, where an agent must move a ball around a field with the aim of dropping it into a target hole. Initially, a reward will be given for achieving the goal of getting the ball in the hole, which is a success. Equally there is no point exploring millions of miles away from the hole. If for example, the ball strays further than 10 m away then the episode can then be terminated along with a substantial penalty.

[0176] One could provide a reward directly proportional to the distance from the hole in addition to a lump sum reward for achieving the goal. This incentivises the ball to move closer toward the hole. Unfortunately, the agent is able to exploit this system in two ways. Firstly, the agent could control the ball to orbit the hole, to indefinitely accumulate a mediocre reward (FIG. 3A). Alternatively, the agent could control to move the ball straight past the hole, to maximise the reward in one quick burst but never actually achieve success (FIG. 3B).

[0177] Hence, a temporal aspect may be added to the reward by penalising the agent for the time it takes to complete the episode. If the ball continues to orbit the hole, the agent will be penalized (FIG. 4A). There is a larger probability the agent will explore alternative options and find a more ideal path toward the hole. If the goal of an agent is to avoid failure as opposed to achieving a goal, as in the inverted pendulum system, then a reward might be given for each consecutive time-step the system does not fail.

[0178] A notable case of exploitation is where the penalty for terminating an episode early is small compared to the reward for moving straight past hole. This, combined with a penalty for taking a long time to achieve the objective causes the agent to move the ball past the hole and fly outside the search range as fast as possible. This is referred to as a dive-bomb (FIG. 4B). This maximises the reward, terminates the episode early to stop the agent being penalised. From this, it can be deduced that the reward magnitude for moving towards success should be significantly smaller that the magnitude of the penalty for premature termination, which should in turn be significantly smaller than the reward for achieving the goal.

[0179] Following the justification described above, the reward function may be chosen to be:

[00002] R ( d , d ? , t ) = { k t t for d < d L - d for d 1 ? d < d T k T for d T ? d } + - d ? d ( 2 )

[0180] where k.sub.t=10 is the time dependent reward coefficient, k.sub.T=?1000 is the early termination penalty, d.sub.T=12 is the early termination distance and d.sub.L is the lower accuracy threshold. This reward function is shown graphically in FIG. 5. The first term of Equation (2) provides the associated reward for the distance around the target. If the projectile is more than d.sub.T away from the target at any point in time, the episode is immediately terminated and a 1000 point penalty is incurred. For d.sub.L<d<d.sub.T the penalty is directly proportional to distance. Lastly, a reward of k.sub.t is given for a any value of d<d.sub.L. Note that d cannot be negative, since it is a radial distance. This reward is then scaled by the time duration of the episode; if d<d.sub.L at the end of the episode it will receive a much higher reward than travelling through the target initially. While this may cause the agent to attempt to arrive at the target right at the end of the episode, it will achieve substantially more reward by hovering over the target throughout the episode (FIG. 6A).

[0181] The second term includes the error rate, {dot over (d)}. This dynamic system has a constant force vector. It is not unreasonable that to guide the projectile to the target, that the agent will keep the force vector pointed in direction of the target for as long as possible to achieve the goal. However, since the episode doesn't terminate when the target is reached the projectile will likely fly straight past the target in a scenario similar to the sling-shot shown in FIG. 3B. By observing {dot over (d)}, the agent will be able to deduce when both {dot over (d)} and d are small, the reward is maximised. i.e. if the projectile moves slower when in close proximity to the target it will maximise the reward over time. Note that a negative d indicates the projectile is moving towards the target. The ?{dot over (d)}d term in Equation (2) punishes the agent for moving away from the target and is scaled proportional to d. This is still consistent with the goal of being less than d away; a perfectly circular orbit with a radius of less than d.sub.L will have {dot over (d)}=0, which does not punish the system while still rewarding it for the close proximity (FIG. 6B).

[0182] The boundary of d.sub.L used during training is arbitrary, but the reasoning is justified. For the creation of a direct-fire guided-projectile to be worthwhile it must deliver dispersion characteristics that are at least the same or better than the equivalent unguided projectile. As such, this d.sub.L boundary, in a more complete training environment, will represent the accuracy level required by the round at that range. This also leads onto the justification for not terminating the episode when the projectile arrives at the target. The purpose of the guided weapon is to account to for variation in target location caused by factors only introduced during the projectile's flight. This includes range, which would affect the time of impact, even if that is accounted for at launch. Since the prototype is designed to be a beam rider, this logic for the agent is used to keep the projectile on the beam.

Lateral Acceleration

[0183] Without being bound by theory, one example of guidance is to determine the projectile lateral acceleration (latax) as a function of the size of the angle through which the front ogive section is slowed (2?.sub.a) and the direction about which the bias manoeuvre is centred (?.sub.B). Starting from the fundamental laws of motion, it can be shown that the latex of the projectile a can be written as:

[00003] a ~ = [ a x a y ] = ? 0 ? F ( ? , ? a ) .Math. dt m ? 0 ? ? - 1 ( ? , ? a ) .Math. d ? [ cos ( ? B ) sin ( ? B ) ]

[0184] Where a.sub.x and a.sub.y are the horizontal and vertical projectile latex respectively, F is the control force acting on the projectile, m is the projectile mass, and ? is the rotational speed of the front ogive section (and thus the control force). These terms can either be solved analytically or numerically, under different assumptions. In either case, this latex equation can then be used in conjunction with any existing or novel guidance law (such as proportional navigation) to control the projectile.

[0185] One simple assumption that may be made is to model the asymmetric surface as exerting a constant force F.sub.c through a roll angle ? with rate ?.sub.0 or ?.sub.1 where ?.sub.0<?.sub.1. The term ??[0,2?] describes the roll orientation of F.sub.c with respect to the normal axis of the projectile. The model uses fixed magnitude F.sub.c rolling at speed ?.sub.1. The roll rate is slowed to ?.sub.0 through favourable roll angles when F.sub.c is aligned with the desired correction axis, then accelerated back to ?.sub.1 through the remaining unfavourable roll angles. The act of slowing F.sub.c when sweeping through favourable roll angles is henceforth referred to as bias. The switching between spin speeds is instantaneous.

[0186] The integral of Newton's second law relates to the impulse of an object, J, to its change in velocity ?v:

[00004] J .Math. ? t = m ? v .Math. ? t

wherein the mass m is assumed to be constant since there are no on-board resources being consumed.

[0187] A generalised decomposition of F.sub.c onto any orthonormal axis i, j, in the plan view plane of projectile, herein denoted as YZ has the corresponding forces F.sub.i, F.sub.j. Let the desired decomposition axis i be an angle axis ?.sub.B from the normal axis {circumflex over (z)} (where ?=0). Let ?.sub.i be a particular angle between F.sub.c and the arbitrary decomposition axis i. Let ?.sub.a be the angle through which F.sub.c sweeps at a given rate w such that the sweep begins at the angle (?.sub.B??.sub.a) and ends at ?.sub.B.

[0188] The range of angles during which F.sub.c is slowed is defined as the bias angle. Let the mid-point of the bias angle coincide with decomposition axis i, such that the symmetrical angle on either side of the midpoint is ?.sub.a. The bias angle thus starts at (?.sub.B??.sub.a) and ends at (?.sub.B+?.sub.a) with a midpoint of ?.sub.B. F.sub.c will continue to rotate through the rest of the angle ? eventually sweeping another angular range (?.sub.B+?)??.sub.a (wrapped so ??[0,2?]). During this time the resulting change in velocity is directed along the negative i.sup.th axis.

[0189] ?V is defined as the total change in velocity of one whole roll rotation in sweeping through equal but opposing angles of size 2?.sub.a, at different rates ?.sub.0 and ?.sub.1. Assuming F.sub.c, m and ? are constant, it can be shown from that;

[00005] ? V = 2 F c m sin ( ? a ) ( ? 0 - ? 1 ? 0 ? 1 )

[0190] The maximum bias angle is half of a roll rotation, ?.sub.a,max=?/2. The maximum ?V per rotation is thus given by:

[00006] ? V m ax = ?V .Math. ? a = ? / 2

which is evaluated for a given system.

[0191] One example of a novel guidance law is the following Quasi-dynamic Guidance Law (QDGL). The QDGL calculates a desired change in speed when ?=0, then calculate the bias angles from the above equation. The projectile will then continue to roll, whereby the asymmetric surface will slow the roll if the current roll angle lies within the bias range previously calculated.

[0192] In practice, the desired speed change and resulting bias angles are calculated when o lies in a small range, ??|0,0.001|, to account for the control module inaccuracy. While this calculation could be conducted and updated continuously, the relative speeds would have to transformed to the ?=0 reference frame which adds another layer of computational complexity. In addition, this finite computation of speeds at the beginning of each rotation accommodates the bandwidth of hardware with respect to the roll rate of the projectile.

[0193] The current relative velocity of projectile to target is the difference between the projectile and target velocity,

[00007] V R = [ u R v R ] = [ u - u T v - v T ]

[0194] To achieve a circular trajectory in the resting state, the horizontal velocity at the beginning of the bias calculation must assume the control force has already rotated through one quarter rotation. Taking this into consideration, we define V.sub.DR0 as the ?V correction necessary to bring the projectile to a stable circular orbit relative to the target, including the current relative velocity;

[00008] V D R 0 = [ u R + ? V | ? = ? / 4. v R ]

[0195] This only allows the control module to bring the projectile to relative rest, the desired closing speed V.sub.PT(d) describes the chosen approach speed as a function of d. The total demanded velocity change from the velocity control module V.sub.Dem is then a linear combination of the necessary relative speed correction to bring the system to an orbit, V.sub.DR0, and the closing velocity V.sub.PT(d) dictated by the QDGL:

[00009] V D ? m = V D R 0 + V P T ( d )

[0196] V.sub.PTd must only demand speeds which can be delivered by the asymmetric surface, given that ?V can never exceed ?V.sub.max. Let the function V.sub.lim(d) be the maximum relative speed the projectile can have at a distance d?0, such that it is still able to decelerate in time to be at relative rest when d=0. This function can be calculated by starting with a stationary projectile and applying consecutive ?V.sub.max biases, since the process is reversible.

[0197] An effective acceleration value, a.sub.eff, is measured from simulations for consecutive ?V.sub.max biases. Using this, it can be shown that;

[00010] V l im ( d ) = ( 2 a eff , d ) 1 2

[0198] Since the function V.sub.PT(d) is calculated when ?=0 at a particular distance d.sub.1, the desired ?V will not be achieved until after the bias manoeuvre has been executed, one full rotation later. Hence, the process is discontinuous. By this point the projectile will have moved to some new distance d.sub.2, under its residual velocity. This delay causes the system to exceed V.sub.lim(d), resulting in an overshoot. To account for the delay, the demanded speed is modified by a factor ? which ensures the relative speed never exceeds V.sub.lim(d). The delay does not directly scale with distance but rather with V.sub.PT(d) as it is the result of dynamic system evolution. Hence the closing speed function is written as;

[00011] V P T ( d ) = V l im ( d ) - ? , ? ? ? ? 0

where ? is a constant to be optimised.

[0199] In one example, the radial velocity of the projectile to the target may be governed by the QDGL equation:

[00012] V P T ( d ) = [ V l im ( d ) - ? V k 0 ] ? d ? { d 1 ? d d 2 ? d < d 1 0 ? d < d 2

wherein; [0200] V.sub.PT(d)the lateral speed at which the projectile closes the target (to make the miss distance, i.e. the distance between the target and where the projectile impacts, equal to 0. [0201] V.sub.lim(d)the maximum lateral speed correction the projectile is capable of making at full, saturated actuator effort. [0202] ?delay modification factor [0203] V.sub.kchosen constant speed to enable quicker dynamic response. [0204] dlateral distance to target (miss distance). [0205] d.sub.1desired distance to switch from V.sub.lim(d)?? to V.sub.k, to minimise actuator effort and conserve resources. [0206] d.sub.2the desired level of accuracy of the projectile e.g. the acceptable miss distance is within 2m of target, this is satisfactory and no further corrections are necessary.

[0207] The above equation determines what the lateral speed of the projectile should be, depending on what the lateral distance (d) is. If there is a large discrepancy between the target and the estimated trajectory i.e. the projectile is on course to miss the target by a significant distance, the control module will correct it's trajectory as quick as is possible without overshoot (V.sub.PT(d)=V.sub.lim(d)??), if the distance is small, the control module will calculate guidance such that the radial velocity of the projectile is low and be ready for a change to conserve resources (V.sub.PT(d)=V.sub.K). Finally, if the projectile is on course to hit the target or is within an acceptable miss distance, the control module will not make any further commands thus the projectile will stay on course (V.sub.PT(d)=0).

Implementing RL Agents into Simulink Dynamic Model

[0208] MATLAB has a Reinforcement Learning toolbox which can be used to create a variety of RL agents, as well as a Deep Learning Toolbox which can be used for the implementation of neural networks. The Simulink model for the Al controller is shown in FIG. 7. Where applicable, the Al agent will either replace the actuator autopilot or the Guidance Law (GL) block. Instead of having a MATLAB function responsible for executing the controller logic, the MATLAB function takes parameters from the system and uses them to compute the inputs for the RL agent training: observation assignment, the reward function and the criteria for determining the end of an episode. The output of the MATLAB function is then forwarded to a pre-made Simulink box which is responsible for feeding the observations, reward and completion check to an RL agent which has been created in the workspace.

[0209] The environment is set to be the Simulink model. The model is a non-linear, dynamic model (projectile dynamics) of the form {dot over ({right arrow over (x)})}=f({right arrow over (x)}(t), {right arrow over (u)}(t), t), with system motion {dot over ({right arrow over (x)})} in terms of system states {right arrow over (x)}(t) and measurable inputs {right arrow over (u)}(t), as described by M. Costello and A. Peterson, Linear Theory of a Dual-Spin Projectile in Atmospheric Flight, Journal of Guidance, Control, and Dynamics, vol. 23, no. 5, September-October, 2000, incorporated in entirety herein by reference. See also R. L. McCoy, Modern Exterior Ballistics: The Launch and Flight Dynamics of Symmetric Projectiles. Schiffer Publishing, 1999 and S. Theodoulis, V. Gassmann, P. Wernert, L. Dritsas, I. Kitsios, and A. Tzes, Guidance and Control Design for a Class of Spin-Stabilized Fin-Controlled Projectiles, Journal of Guidance, Control, and Dynamics, vol. 36, no. 2, 2013, incorporated in entirety herein by reference. The model includes a set of equations describing the kinematic and dynamic motion of the projectile, as understood by the skilled person. The model includes various aerodynamic coefficients corresponding to the external forces acting upon the projectile in flight, which may either be obtained from existing databases or simulated using computational fluid dynamics (CFD) analysis. The number of observations with their upper and lower bounds are set. The number of actions is defined with the allowed values, which are taken from the output of the dynamics box. A reset function is also defined, which sets initial conditions of the observations for the simulation; these can either be randomised or fixed. Before the training begins, the parameters of the actor and critic neural networks are defined, with the number of hidden and active layers, their types (e.g. ReLU Layer), and the paths between them. DQN agent parameters are configured, including the discount factor ?. In any implementation described in the coming analysis, a full description of the neural network parameters will be given.

[0210] The agent training options are configured, such as maximum number of episodes, steps per episode and the reward threshold at which the training is terminated. The agent is trained in the defined environment using the set parameters and the resulting trained agent is saved to by implemented by a controller in any environment.

[0211] In more detail, Algorithm 1 shows how MATLAB updates the neural networks for each episode:

TABLE-US-00002 Algorithm 1: Step algorithm for actor/critic update Ensure: Critic Q(s, a) is initialised with parameter values ?.sub.Q, then the target critic is initialised with the same parameter values: ?.sub.Q = ?.sub.Q At each time step: 1: if RAND > ? then 2: Given the current observation S, select a random action A with probability ? 3: else 4: Select and action for which the critic value function is greatest e.g. A = max.sub.A|Q(S, A|?.sub.Q)| 5: end if 6: Execute action A and observe the reward R and next observation S 7: Store the combined experience S, A, R, S in the buffer 8: Sample a random batch of M experiences, S.sub.i, A.sub.i, R.sub.i, S.sub.i 9: if S.sub.i is a terminal state then 10: Set the value function target to be the current reward y.sub.i = R.sub.i. 11: else 12: Set the value function target to be: y.sub.i = R.sub.i + ?max.sub.A|Q(S.sub.i, A|?.sub.Q)| 13: end if 14: Update the critic parameters by minimising the loss L across the M [00013] sampled experiences : L = 1 M .Math. i = 1 M ( y i - Q ( S i , A i .Math. "\[LeftBracketingBar]" ? Q ) ) 2 15: Update the target critic periodically: ?.sub.Q = ?.sub.Q 16: Update ? for selecting a random action.

DQN Direct Control

[0212] A DQN agent may be used to control every aspect of the system. The actuation mechanisms described above, responsible for converting the bias points ?.sub.ON, ?.sub.OFF to either a latax or ?V, will be combined with the GL. The DQN agent will have direct control over whether the projectile is in the biased or natural state and will be responsible for taking all simulation parameters into account to produce a desirable approach to the target. In essence, the DQN agent will be controlling a fixed magnitude force vector rotating clockwise at the two selectable speeds, ?.sub.0 and ?.sub.B. This is the most complex application of Al to the considered system.

[0213] A full list of the training parameters for the neural network, simulation and training are shown in Table 2. Both the target and projectile have no initial velocity and they are initialised at the same point every episode. The positions can then be randomised to continue the training if the agent shows improvement. The observations are distance d, closing velocity {dot over (d)}, target bearing from projectile ?.sub.T and current roll angle of control force ?. FIG. 8 shows the training results for the first batch of training episodes.

TABLE-US-00003 TABLE 2 Training parameters for the neural network, simulation and training. Value Training parameter Maximum Episodes 200,000 Max Steps per Episode 1000 Episode termination t > 100 or d > 12 Reward value to terminate 20,000 Observations [d, {dot over (d)}, ?] Actions ? = ?.sub.0 or ?.sub.B Neural Network configuration Discount factor ? 0.99 MiniBatchSize 256 ExperienceBufferLength 10.sup.5 TargetUpdateFrequency 4 Episode Initial conditions ?.sub.0 2? ?.sub.B ?/2 x.sub.0, y.sub.0 (2, 2) u.sub.0, v.sub.0 (0, 0) ?.sub.0 0 xT.sub.0, yT.sub.0 (0, 0) uT.sub.0, vT.sub.0 (0, 0)

[0214] The agent did not show any significant development in controlling both the actuator mechanism and the GL as a whole. There was a significant improvement in reward at episodes 5?10.sup.4 and 8?10.sup.4, but the agent did not retain significant knowledge of the gained experience to capitalise on this reward increase. The fact that the increase in reward was only temporary, and that it did not lead to any permanent performance increase, indicates the surge was likely caused by the agent exploring the action space. In addition, it is a characteristic trait of under-training, where there is not a sufficient action space to map all possible system states which in the environment considered above, is very large due to it being near-continuous. Since the initial conditions for this simulation were held constant, it is likely that in this configuration, the agent was unable to learn the system to a degree that it could effectively enact control. It may be possible for the agent to learn the system if the agent is trained for longer, using a neural network with more nodes and layers. This allows the agent to explore a larger action space, mapping the actions to desirable outcomes. The larger number of nodes and layer in the neural network also means the agent will not be under-trained.

[0215] Another possible change that could be made to improve training success and times is to discretise the observation space. Consider that the current target bearing ?.sub.T is continuous ?[0,2?] at least to within the bounds of machine and rounding errors in MATLAB. Instead of feeding this raw data to the agent it could be categorised such that ?.sub.T is binned in 10 degree increments. This reduces the observation space from being effectively continuous to having 36 finite possibilities, making it much more efficient to map every possible system state to an action. While this will reduce the precision and fidelity of the action system it will return some performance by the agent to ascertain whether this method of complete control is viable. There could be either some secondary architecture which allows further control fidelity within these bins, or the agent could be retrained with a more continuous or less discretised environment.

DDPG Guidance Law

[0216] While the DQN agent described in the previous sections is capable only of finite actions, a Deep Deterministic Policy Gradient (DDPG) has a continuous action space. Different implementation methods must be used to accommodate the continuous action space of the DDPG agent. Whereas the DQN agent used above was responsible for both the actuation mechanism and the GL, the DDPG implementation will be constructed so it is responsible for only one or the other. In this implementation, a DDPG agent is used to create a GL which dictates the trajectory of the projectile on approach to the target, by demanding a latax.

[0217] A key difference must be made to the neural network when using a DDPG agent as opposed to a DQN. The output of action layer in the DQN network was a binary 0 or 1, depending on what the weighting activation of the layer decided.

[0218] The output of a DDPG action layer is continuous in the range A?[??, ?], but this is well outside the range of the latax which can be demanded of the projectile, due to saturation of the actuation mechanism. To account for this, a tanh layer is used to map the action range to A?[?1,1]. This is then also passed through a scaling layer, so that the action which is actually fed to the actuation mechanism is A?[??.sub.max, ?.sub.max].

[0219] Guidance laws usually demand latax both horizontal and normal to the projectile travel, though sometimes they may demand purely a lateral acceleration. In this sense, they are dual-channel, where each channel represents the acceleration normal and lateral to the longitudinal velocity of the direct fire projectile. While the implementation of the DQN agent above encompassed actuator control and dual-channel latax, the operation and output of the agent doesn't necessarily have to cover both channels. Much like conventional Cartesian control, the agent can have full control over a single channel and two of them can be used in combination, to generate the final signal sent to the projectile. In this sense the agent can be trained in a 1D environment, which is shown in FIG. 9. Two point masses of m=1, the projectile and target are free to move along a generic 1D axis with distances from the origin being xp and x respectively. Both have respective speeds of v.sub.P and v.sub.T directed solely along this axis. The DDPG agent is responsible for issuing an acceleration command to the point mass projectile.

[0220] Table 3 shows the model parameters for the training. The agent is trained using the reward function described previously. Notable differences are that the episode will be prematurely terminated depending on the single axis distance x.sub.T?x not the 2D radial distance d. This means the termination distance in the reward function becomes d.sub.T=x.sub.T=50. Likewise the observations have been reduced to simply the 1D equivalents, (d, {dot over (d)}).fwdarw.(x, {dot over (x)}). As mentioned, the agent action is no longer controlling the bias points, but the acceleration demand of the projectile. The action space is a single latax demand A=a.sub.x?[??.sub.max, ?.sub.max], mapped into this range from [??, ?] using the tanh layer.

TABLE-US-00004 Value Training parameter Maximum Episodes 200,000 Max Steps per Episode 1000 Episode termination t > 50 or x > 50 Reward value to terminate 20,000 Observations x and {dot over (x)} = u.sub.x both ? [??, ?] Actions a.sub.x ? [??.sub.max, ?.sub.max] Neural Network configuration Discount factor ? 0.99 MiniBatchSize 256 ExperienceBufferLength 10.sup.5 TargetUpdateFrequency 4 Episode Initial conditions x.sub.0 [?10, 10] u.sub.0 [?10, 10] xT.sub.0 [?10, 10] uT.sub.0 [?10, 10]

[0221] FIG. 10 shows the training data for the DDPG agent in the 1D latax control system. There is an obvious disparity between episodes which were terminated early due to too great a miss distance, and the episodes which had a poor performance. Around the 3000th episode, the agent was capable of preventing the early termination of the episodes. A reward of between ?300 and 0 indicates an episode wasn't a failure, but also that the agent was not sufficiently reducing the error to increase the reward. By episode 7000, the agent could reliably reduce the error to within the accuracy bound demanded. In the last 3000 episodes, the agent steadily improved performance to a mean reward of 3?10.sup.4

[0222] Since the agent was able to control the 1D dynamic system with the neural network configuration additional complications can be made. The primary change the novel actuation mechanism faces in comparison to a traditional point mass model is the lag of the system in achieving the desired control variable. As such, an actuator lag is added to emulate the delay in system response caused by the projectile needing to complete one full rotation before enacting the command of the GL. The delay is modelled by a simple time-based signal delay block in Simulink, which holds a given signal by a predetermined amount of time, before passing it along. In this way, the agent is still receiving valid information about the state of the system, it merely must learn that the actions it takes are not immediately executed. There is also no dynamic noise which goes unobserved, causing perturbations which could not be perceived by the agent. The signal delay, or actuator lag, is set to 0.1, 0.02 and 0.01 seconds; since the agent sample time T.sub.A is 0.1 s these actuator lags correlate to T.sub.A, T.sub.A/5 and T.sub.A/10 respectively. FIG. 11 shows the training data for agents with different levels of actuator lag.

[0223] FIG. 11 shows that for both delay times of 0.01 s and 0.02 s, the agent is capable of learning actions which substantially increase the episode reward. For a 0.1 s delay, the agent was unable to learn desirable actions as defined by the reward function. It should be noted that the agent may eventually have been able to learn the environment given sufficiently more episodes with the current network configuration. However, project constraints must be set somewhere and the results presented in the figure show that the capability is present within a suitable training time. Such neural-network optimisations should be considered during any higher TRL implementation, but are outside the scope of this project.

[0224] FIG. 12 shows the performance of the agent with a 0.02 s actuator lag after the 10.sup.5 episodes of training from FIG. 11. The DDPG based GL visibly reduces the miss distance of the projectile and is able to hold in an effort to reduce the steady state error.

[0225] FIG. 18A is a graph of terminal dispersion for projectiles according to a conventional ballistic guidance law (in which the controller is switched off and the projectile trajectory evolves according to the initial conditions); and FIG. 18B is a graph of terminal dispersion for projectiles controlled according to an exemplary embodiment. In more detail, FIGS. 18A and 18B show 100 impacts from each of the guidance laws, selected at random from the 10.sup.5 runs. N.B. no impact points are covered by the legend. All 100 impacts that were selected from the set are shown on the figures, there were no impact points extreme enough to warrant exclusion from the plots. There are few highly dense impact points, which would indicate a bias in the GLs towards specific locations in space. Al appears to show a circular dense region of impacts centred directly over the target at (0,0), in addition to a semicircular cluster along the right side of the CEP and DRMS circumferences. Further statistical data analysis will confirm whether or not the bias is significant. All guidance laws implemented under these nominal simulation conditions provided satisfactory levels of dispersion correction. Table 4 summarises dispersion measurements for ballistic and Al control for the 10.sup.5 runs. CEP is the most commonly used metric across both the academic and industrial fields of ballistics. The CEP is reduced from 19.25 m for the ballistic model to just 0.404 m for the Al control.

TABLE-US-00005 TABLE 4 Dispersion measurements for nominal system: circular error probability (CEP); Distance Root Mean Squared (DRMS) which includes 63.21% (or 2?); and R95 which includes 95%. Guidance Law CEP (m) DRMS (m) R95 (m) Ballistic 19.25 22.13 40.52 AI 0.404 0.457 0.874

[0226] FIG. 19 schematically depicts a method according to an exemplary embodiment. The method is a computer-implemented method of training a machine learning, ML algorithm to control spin-stabilized steerable projectiles. At S2001, the method comprises obtaining training data including respective policies and corresponding trajectories of a set of spin-stabilized steerable projectiles including a first projectile, wherein each policy relates to steering a projectile of the set thereof towards a target and wherein each corresponding trajectory comprises a series of states in a state space of the projectile. At S2002, the method comprises training the ML algorithm comprising determining relationships between the respective policies and corresponding trajectories of the projectiles of the set thereof based on respective results of comparing the trajectories and the targets. The method may include any of the steps described herein, for example with respect to the first aspect, the third aspect and/or the exemplary embodiments.

[0227] FIG. 20 schematically depicts a method according to an exemplary embodiment. The method is a computer-implemented method of controlling a spin-stabilized steerable projectile. At S2101, the method comprises controlling, by a trained machine learning, ML, algorithm, the projectile according to a policy, comprising steering the projectile of the set thereof towards a target. The method may include any of the steps described herein, for example with respect to the first aspect, the second aspect, the third aspect, the fourth aspect and/or the exemplary embodiments.

Projectile

[0228] FIG. 13 schematically depicts a projectile 100 comprising: a front ogive section 102, an aft section 104; and a command module 106; wherein the front ogive section 102 is rotatably connected to the aft section 104 by a coupling device 108, the front ogive section 102 further comprising an asymmetric surface 110, where in use, the angular rotation of the front ogive section 102 can be selectively adjusted relative to the aft section 104 by commands from a command module 106 to the coupling device 108, such that the asymmetric surface 110 exerts an imbalance upon the projectile to selectively alter the trajectory of said projectile, and thereby steer and course correct the projectile.

[0229] In this example, the projectile is a gun launched projectile, such as a medium calibre shell wherein the front ogive section 102 and aft section 104 are made from steel. For simplicity, features such as fuzes, driving bands, and other typical features are not shown.

[0230] In this example, the coupling device 108 is an active coupling device in the form of a servo motor. The servo motor allows both clockwise and anticlockwise rotation of the front ogive section 102 with respect to the aft section 104.

[0231] In this example, the projectile rotates about axis X.

[0232] In this example, the projectile comprises an electrical slip ring (not shown) between the front ogive section 102 and the aft section 104.

[0233] In this example, the asymmetric surface 110 is an aerodynamic lifting surface, specifically a truncated ogive. Said asymmetric surface extends ??, in this example 90?, around the plane face of the projectile as seen in Section A-A.

[0234] In this example, the projectile 100 comprises a continuous surface such that the outer profile of the projectile 100 is smooth blended surface absent from protruding fins or protruding control surfaces.

[0235] In this example, the projectile may comprise a receiver for receiving guidance instructions from an external targeting system in the form of an optical receiver 112. Said optical receiver 112 is in communication with the command module 106 and is a beam rider receiver such that the optical receiver senses the intensity of a guidance laser (not shown) wherein the command module 106 is configured to detect drift of the laser focus from the optical receiver 112 wherein the command module 106 issues commands to the coupling 108 in order to remain on the laser path.

[0236] FIG. 14 schematically the projectile of FIG. 1 as a force diagram. The projectile 200 comprising both front ogive section 202 and aft section 204 travelling at velocity v. In this arrangement the projectile is fired from a rifled barrel, the aft section 204 and ogive 202 both rotate at the same clockwise angular rotation ?1 & ?2 respectively against oncoming airflow A. The oncoming airflow A is deflected by the asymmetric surface 210 to create a first imbalanced force vector Fc on the projectile.

[0237] On command of the command module (not shown), the servo motor changes the rate of angular rotation of the ogive 202, to either a reduced clockwise ?2 angular rotation rate or an anticlockwise ?3 with respect to the aft section 204 which continues to rotate at angular speed ?1 thereby creating a second imbalanced force vector F.sub.c on the projectile, i.e. altering the angle of the force vector Fc about the axis X.

[0238] Alternatively, the coupling device may be a passive coupling device in the form of a brake. The brake can be selectively braked and un-braked to uncouple the front ogive section from the aft section thus allowing the front ogive section to slow due to an aerodynamic roll damping moment.

[0239] FIGS. 15A and 15B schematically depict a projectile 300, as described with the projectile 100 of FIG. 13, travelling in a helical path substantially along the axis x after firing from a rifled barrel.

[0240] In FIG. 15A, the front ogive section and aft section are in the coupled mode, i.e. both sections spin at the same angular rotation, the helix radius is r1 on the superimposed YZ plane.

[0241] In FIG. 15B, the front ogive section and aft section are in the decoupled mode, i.e. the front ogive section is spinning at a different angular rotation compared to the aft section, the helix radius is r2 on the superimposed YZ plane, wherein radius r2 is greater than radius r1. The control force from the aerodynamic surfaces on the ogive act in a tangential direction for longer, resulting in a larger radial acceleration. The projectile thus travels further radially before the control force rotates to oppose the motion. The result is that in the decoupled state, the trajectory forms a larger helix r2 diameter than in the coupled mode r1. When the command module calculates that the projectile is on trajectory to hit the intended target, the front ogive section and aft section re-couple such that the front ogive section is restored to the spin rate of the faster spinning aft section thus returning to a helix radius r1 as shown in FIG. 15A.

[0242] FIG. 16 schematically depicts a system 400 for controlling a projectile, the system comprising a projectile 402 as shown in FIG. 13 fired from a rifled artillery gun 404 towards a target 406 along a nominal trajectory 408. After firing, the coupling device of projectile 402 is coupled such that the front section spins at the same angular rotation as the aft section, the projectile travelling in a first helical trajectory with radius r1. Later in flight, the projectile 402 coupling device is decoupled, the front section spins at a different angular rotation relative to the aft section, the projectile travelling in a second helical trajectory with radius r2, wherein the first helical radius r1 is smaller than the second helical radius r2, thereby enabling the projectile 402 to be steered to the target 406.

[0243] In this example, there is provided an external targeting system in the form of a laser designator 410. Said laser designator is trained on the target 406 by beam 412. The laser designator in optical communication with the projectile 402 comprising an optical receiver on the projectile via optical signals 414.

[0244] FIG. 17 schematically depicts a system for controlling a projectile 500, the system comprising a projectile 502. In this example, said projectile 502 is a small arms calibre bullet fired from a rifle 504 towards a target 506 along a nominal trajectory 508. After firing, the coupling device of projectile 502 is coupled such that the front section spins at the same angular rotation as the aft section, the projectile travelling in a first helical trajectory with radius r1.

[0245] Later in flight, the projectile 502 coupling device is decoupled, the front section spins at a different angular rotation relative to the aft section, the projectile travelling in a second helical trajectory with radius r2, wherein the first helical radius r1 is smaller than the second helical radius r2. The second helical radius corrects the projectile flightpath such that the projectile is on a trajectory which will hit the target 506 wherein the front ogive section couples with the aft section to travel in a third helical trajectory with radius r3, wherein the third helical radius is smaller than radius r2, thereby enabling the projectile 502 to be steered to the target 506. The projectile is further able to couple and decouple multiple times during flight to switch between larger and smaller helical trajectories in order to correct the trajectory to target 506.

[0246] In this example, there is provided an internal guidance system within the command module (not shown) of the projectile 502 in the form of an accelerometer and gyroscope wherein the projectile can inherently calculate its position and issue instructions to the coupling device to guide the projectile 502 to the target 506 without reference to an external targeting system.

[0247] Although a preferred embodiment has been shown and described, it will be appreciated by those skilled in the art that various changes and modifications might be made without departing from the scope of the invention, as defined in the appended claims and as described above.

[0248] Attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.

[0249] All of the features disclosed in this specification (including any accompanying claims and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at most some of such features and/or steps are mutually exclusive.

[0250] Each feature disclosed in this specification (including any accompanying claims, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

[0251] The invention is not restricted to the details of the foregoing embodiment(s). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.