SYSTEM AND METHOD FOR OPTIMIZING PATH EXPLORATION PARAMETERS BASED ON DEEP REINFORCEMENT LEARNING
20260093264 ยท 2026-04-02
Assignee
Inventors
- Yafei Wang (Shanghai, CN)
- Xulei LIU (Shanghai, CN)
- Yichen ZHANG (Shanghai, CN)
- Zhisong ZHOU (Shanghai, CN)
- Zexing LI (Shanghai, CN)
- Bowen Wang (Shanghai, CN)
Cpc classification
G05D2105/05
PHYSICS
International classification
Abstract
The present invention relates to the technical field of path planning, and provides a deep reinforcement learning-based path exploration parameter optimization system. The system comprises: a variable parameter path planning module, configured to perform node exploration based on a deep reinforcement learning network, conduct collision detection on child nodes in a child node set, calculate cost values for all child nodes, and finally generate a loading and parking path using a Reeds-Shepp curve; an environmental state space modeling module, configured to perform regional division of obstacles around a current node and conduct environmental state space modeling; and a deep learning parameter optimization module, configured to construct a deep learning network to compute an optimal step size and an optimal steering angle, build a reward function to optimize the deep learning network, and simultaneously execute a training process of the deep learning network.
Claims
1. A deep reinforcement learning-based path exploration parameter optimization method, comprising a non-transitory computer readable medium operable on a computer with memory for the deep reinforcement learning-based path exploration parameter optimization method, and comprising program instructions for executing the following steps of: S1: generating an optimal step length and an optimal steering angle based on a deep reinforcement learning network according to a current node and environmental information, and constructing a fixed steering angle set; performing node exploration by combining the optimal step length with the fixed steering angle set to generate a child node set, and combining the optimal step length with the optimal steering angle to generate a steering angle-optimized child node which is added to the child node set, and S2: performing collision detection on the child nodes in the child node set and calculating cost values of all the child nodes, and S3: obtaining the child node with the lowest cost value in each iteration round of the search process as the final selected next node of the current node, and S4: when the distance from the current node to the target point is less than a set threshold, generating a loading and parking path using a Reeds-Shepp curve, and generating a planned path through node backtracking; and S5: operating mining with reducing costs and a performance efficiency based on results of the deep reinforcement learning-based path exploration parameter optimization method.
2. A deep reinforcement learning-based path exploration parameter optimization system based the deep reinforcement learning-based path exploration parameter optimization method of claim 1, characterized by comprising: a variable parameter path planning module, configured to generate an optimal step length and an optimal steering angle based on a deep reinforcement learning network according to a current node and environmental information, construct a fixed steering angle set, perform node exploration by combining the optimal step length with the fixed steering angle set to generate a child node set and by combining the optimal step length with the optimal steering angle to generate a steering angle-optimized child node which is added to the child node set, perform collision detection on child nodes in the child node set and calculate cost values of all child nodes, and finally generate a loading and parking path using a Reeds-Shepp curve, and assuming the current node is N.sub.c(x.sub.c, y.sub.c, .sub.c), where x.sub.c, y.sub.c are the coordinates and Pc is the heading angle, the following is established based on the motion characteristics of the unmanned mining truck:
3. The deep reinforcement learning-based path exploration parameter optimization system according to claim 2, characterized in that, in the variable parameter path planning module, performing collision detection on the child nodes in the child node set and calculating cost values of all the child nodes specifically comprises: performing collision detection on all child nodes in the child node set N by covering the mining truck with two enveloping circles, sampling along the path from the current node to the explored child nodes, determining whether the distance to any obstacle grid is smaller than the radius of the enveloping circles; if so, the child node is considered infeasible and is removed from the child node set N, and the cost value of all explored child nodes is calculated using f(N.sub.s)=g(N.sub.s)+w.sub.hh(N.sub.s), where g(N.sub.s) represents the actual cost consumed during the movement of the mining truck from the starting point to the explored child node, h(N.sub.s) represents the estimated cost from the explored child node to the target point, and w.sub.h is the weight of the estimated cost, and wherein the actual consumption cost g(N.sub.s) is:
4. The deep reinforcement learning-based path exploration parameter optimization system according to claim 2, characterized in that, in the variable parameter path planning module, generating the loading and parking path using a Reeds-Shepp curve specifically comprises: when the distance between the current node N.sub.c(x.sub.c, y.sub.c, .sub.c) and the target point N.sub.g is less than a threshold L.sub.t, a plurality of candidate loading and parking path curves from the current node N.sub.c(x.sub.c, y.sub.c, .sub.c) to the target point N.sub.g are generated using the Reeds-Shepp curve, and the node costs along the curves are calculated by Formula (3), and the curves are sorted based on their costs, and the path with the minimum cost is selected, and the global path is obtained through backtracking, and if all candidate loading and parking path curves are in collision, the process proceeds to the node exploration step.
5. The deep reinforcement learning-based path exploration parameter optimization system according to claim 2, characterized in that, in the environmental state space modeling module, performing regional division of obstacles surrounding the current node specifically comprises: the space surrounding the current node N.sub.c(x, y.sub.c, .sub.c) is divided into 8 sectors D={D.sub.1, . . . , D.sub.8} by angular divisions, and within each sector, let d.sub.obs.sub.
6. The deep reinforcement learning-based path exploration parameter optimization system according to claim 5, characterized in that, in the environmental state space modeling module, conducting environmental state space modeling specifically comprises: the state space S is defined as follows:
7. The deep reinforcement learning-based path exploration parameter optimization system according to claim 2, characterized in that, in the deep learning parameter optimization module, constructing the deep learning network to calculate the optimal step length and the optimal steering angle specifically comprises: the DQN algorithm is employed to train the deep learning network, with the action space consisting of combinations of candidate optimal step lengths .sub.rl and candidate optimal steering angles l.sub.rl during expansion, i.e., the action space comprises all possible combinations of (.sub.rl, l.sub.rl), and the DQN algorithm utilizes two networks with identical structures but different parameters for training: a training network Q.sub. used to compute the Q-value for policy selection and iteratively update the Q-values; and a target network Q.sub. used to compute the Q-value of the next state in the temporal difference target (TD Target), and the loss function Loss of the DQN algorithm is designed as follows:
8. The deep reinforcement learning-based path exploration parameter optimization system according to claim 2, characterized in that, in the deep learning parameter optimization module, constructing a reward function to optimize the deep learning network specifically comprises: the reward function involves a target approach reward r.sub.g, an obstacle avoidance reward r.sub.o, an exploration cost r.sub.t, and a smoothness reward r.sub.s: the target approach reward r.sub.g is defined as follows:
9. The deep reinforcement learning-based path exploration parameter optimization system according to claim 7, characterized in that, in the deep learning parameter optimization module, executing the training process of the deep learning network specifically comprises: first, randomly selecting appropriate starting and target points on the map based on actual production data and performing path planning; during planning, optimizing path planning parameters through reinforcement learning, thereby forming multiple sets of state transition sample data and adding them to a replay buffer; during the training process, randomly selecting batches of data from the replay buffer and updating the parameters of the estimation network Q.sub. according to the loss function; after a certain number of iterations, copying the parameters of the training network Q.sub. to the target networkQ.sub., thereby completing one learning process.
Description
DESCRIPTION OF DRAWINGS
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
DETAILED DESCRIPTION
[0071] To make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of them. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the scope of protection of the present application.
[0072] Those skilled in the art will appreciate that unless specifically stated otherwise, the singular forms a, an, said, and the used herein may also include plural forms. It should be further understood that the term comprising used in the description of the present invention indicates the presence of the stated features, integers, steps, operations, elements, and/or components, but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
[0073] To achieve efficient unmanned mining truck path planning, the present invention proposes a deep reinforcement learning-based path exploration parameter optimization system and method. Firstly, a Hybrid A* path planning framework with variable exploration parameters is constructed, an environment representation model considering obstacle regional division is established, and on this basis, a deep reinforcement learning-based exploration parameter optimization strategy is developed. The specifics are as follows:
1. Path Planning Framework with Variable Exploration Parameters
(1) Node Exploration Rules with Variable Parameters
[0074] By analyzing the kinematic characteristics of the unmanned mining truck, iterative node exploration rules for path nodes are established. Key exploration parameters are extracted to construct the node exploration process with variable parameters.
1.1 Node Evaluation Method
[0075] Based on the child nodes generated through node exploration in section 1.1, the validity of the child nodes is analyzed via collision detection. A targeted evaluation method is established to assign a cost value to each child node, thereby obtaining the child node set.
1.2 Loading and Parking Path Generation Method Based on Reeds-Shepp Curve
[0076] During the iterative exploration process, terminal constraints must be considered. Multiple candidate parking paths are generated based on the Reeds-Shepp curve, screened and sorted using an evaluation function, and finally an appropriate loading and parking path is selected to conclude the search.
2. Environment Representation Model Considering Obstacle Regional Division
2.1 Obstacle Regional Division
[0077] Based on the information of the current node, the surrounding space is divided according to the mining truck model to determine the occupancy status of obstacles, thereby modeling the distribution characteristics of the obstacles.
2.2 Environmental State Space Modeling
[0078] Based on the regional division in section 2.1, an environmental state space for deep reinforcement learning is constructed. This state space is used to represent the state obtained by the agent from the environment and is input into the deep reinforcement learning neural network.
3. Exploration Parameter Optimization Method Based on Deep Reinforcement Learning
3.1 Deep Learning Network Construction
[0079] Based on the state space designed in section 2.2 and the exploration parameter rules in section 1.1, a neural network is constructed to achieve the mapping from the state space to the exploration parameters.
3.2 Reward Function Construction
[0080] Necessary indicators for path planning are analyzed, and a reward function for the agent during iterative training is constructed to guide the training of the deep reinforcement learning strategy.
3.3 Deep Reinforcement Learning Training Process
[0081] Based on the design of the above deep reinforcement learning modules, an offline training process is constructed to optimize the exploration parameter network.
[0082] The following is explained through specific embodiments:
First Embodiment
[0083] As shown in
I. Variable Parameter Path Planning Module 1 for the Path Planning Framework with Variable Exploration Parameters
[0084] A variable parameter path planning module 1, configured to: generate an optimal step length and an optimal steering angle based on a deep reinforcement learning network according to a current node and environmental information; construct a fixed steering angle set; perform node exploration by generating a child node set by combining the optimal step length with the fixed steering angle set and generating a steering angle-optimized child node by combining the optimal step length with the optimal steering angle and adding it to the child node set; perform collision detection on the child nodes in the child node set and calculate cost values of all the child nodes; and finally generate a loading and parking path using a Reeds-Shepp curve.
[0085] In this embodiment, the variable parameter path planning module 1 is specifically as follows:
Variable Parameter Exploration Rules
[0086] Node exploration must satisfy the motion characteristics constraints of the unmanned mining truck; otherwise, the mining truck cannot track the generated path, leading to significant risks. Therefore, it is first necessary to model the mining truck's motion characteristics. In the working scenario of the mining truck in this project, since the mining truck typically operates at low speeds, a two-degree-of-freedom vehicle kinematics model can be used to characterize the motion characteristics of the unmanned mining truck.
[0087] Specifically, the vehicle pose state at any given time can be represented as: q=(x, y, ), where the coordinate origin is located at the center of the rear axle, and the coordinate axes are parallel to the vehicle body. v denotes the vehicle speed, v denotes the vehicle heading angle, denotes the vehicle steering angle, and L.sub.w denotes the wheelbase of the vehicle. The kinematic model of the vehicle can be expressed as follows:
[0088] Assuming the current node is N.sub.c(x.sub.c, y.sub.c, .sub.c), where x.sub.c, y.sub.c are the coordinates and .sub.c is the heading angle, the following is established based on the motion characteristics of the unmanned mining truck:
[0089] Where N.sub.s(x.sub.s, y.sub.s, .sub.s) is the next child node explored from the current node, x.sub.s, y.sub.s are the position coordinates, .sub.s is the heading angle, d.sub.s{1,1} represents the expansion direction of the current node including backward or forward, and l represent the steering angle and step length of node expansion respectively, and L.sub.w is the wheelbase of the unmanned mining truck.
[0090] It can be observed that the position and orientation of the child nodes are determined by the expansion direction, steering angle, and step length. Conventional algorithms often employ fixed steering angles and step lengths, which makes it difficult for them to adapt to complex and dynamic mining operating environments.
[0091] To achieve variable steering angles and step lengths, one can sample steering angles and step lengths to form corresponding sets A={.sub.1, . . . , .sub.N.sub.
[0092] Specifically, as shown in
[0093] Where .sub.max is the maximum steering angle that the mining truck can execute, and N.sub.3 is the number of steering angles constructed;
[0094] Based on the above parameters, Node exploration is performed through a two-step process comprising step size optimization and steering angle optimization. In the first step, the optimal step length L.sub.best and all sampled fixed steering angles in the fixed steering angle set .sub.1 are substituted into Formula (1), thereby generating the child node set N for fixed steering angle exploration. The number of child nodes in the child node set N is the same as the number of sampled angles in the fixed steering angle set .sub.1, which is N.sub.3. In the second step, the optimal step length L.sub.best and the optimal steering angle .sub.best are substituted into Formula (1) to generate the steering angle-optimized child node N.sub.best, which is then added to the child node set N.
Node Evaluation Method
[0095] After obtaining the child node set N, the child nodes within the set need to be evaluated. Specifically, collision detection is first performed on all child nodes in the child node set N by covering the mining truck with two enveloping circles and sampling along the path from the current node to the explored child node (a smooth circular arc generated by the turning radius corresponding to the steering angle). It is determined whether the distance to any obstacle grid is smaller than the radius of the enveloping circles; if so, the child node is considered infeasible and is removed from the child node set N;
[0096] On this basis, the cost value of all explored child nodes is calculated using f(N.sub.s)=g(N.sub.s)+w.sub.hh(N.sub.s), where g(N.sub.s) represents the actual consumption cost of the mining truck moving from the starting point to the explored child node N.sub.s, h(N.sub.s) denotes the predicted cost from the explored child node to the target point, and w.sub.h is the weight of the predicted cost. When designing the cost function g(N.sub.s), considerations are given to operations such as reversing and direction changes during the movement of the mining truck, which typically consume more time and energy. Therefore, this paper comprehensively incorporates factors such as reversing penalty, direction-switching penalty, and path length into the cost function to evaluate the quality of the nodes.
[0097] Wherein the actual consumption cost g(N.sub.s) is:
[0098] In the above formula, g(N.sub.s) incorporates five metrics based on the cost of the current node g(N.sub.c): g.sub.dis(N.sub.s) denotes the distance from the current node N.sub.c to the child node N.sub.s in the iterative search; g.sub.back(N.sub.s) represents the reversing cost; g.sub.switch(N.sub.s) indicates the mode switch cost; g.sub.steer(N.sub.s) denotes the steering cost; g.sub.change(N.sub.s) represents the steering change cost; and w.sub.i, where i=1, . . . ,5, are the weight coefficients.
[0099] If the child node is obtained through vehicle reversing exploration, a reversing cost g.sub.back(N.sub.s) is added to the cost function, typically as a relatively large constant cost; when the vehicle's movement direction is opposite to that in the previous search round, a mode switch cost g.sub.switch(N.sub.s) is added to the cost function, generally as a relatively large constant; if the steering angle used in the current exploration is non-zero, a steering cost g.sub.steer (N.sub.s) is added, the magnitude of which is proportional to the absolute value of the steering angle applied; when the steering angle used in the current search differs from that in the previous round, a steering change cost g.sub.change(N.sub.s) is added, the magnitude of which is proportional to the absolute value of the change in the steering angle. The heuristic function h(N.sub.s) is the estimated cost from the current node to the destination. In this paper, a heuristic function considering obstacles is used, specifically employing the A* method to compute the distance between the current node and the destination.
(3) Loading and Parking Path Generation Method Based on Reeds-Shepp Curve
[0100] When the distance between the current node N.sub.c(x.sub.c, y.sub.c, .sub.c) and the target point N.sub.g is less than a threshold L.sub.t, a plurality of candidate loading and parking path curves from the current node N.sub.c(x.sub.c, y.sub.c, .sub.c) to the target point N.sub.g are generated using the Reeds-Shepp curve. The node costs along the curves are calculated using Formula (3), and the curves are sorted based on their costs. The path with the minimum cost is selected, and the global path is obtained through backtracking. If all candidate loading and parking path curves result in collisions, the process returns to the node exploration step.
II. Environmental State Space Modeling Module 2 for Environment State Space Modeling Considering Obstacle Regional Division
[0101] The environmental state space modeling module 2 is configured to perform regional division of obstacles surrounding the current node and conduct environmental state space modeling.
[0102] In this embodiment, the environmental state space modeling module 2 is specifically as follows:
Obstacle Regional Division Method
[0103] As shown in
Environmental State Space Modeling
[0104] Deep reinforcement learning determines the optimal action based on the input state space. Therefore, to enhance the generalization capability of the deep reinforcement learning model, it is necessary to consider the distance information between the current node and surrounding obstacles, as well as the relative positional information between the current node, the starting point, and the target point. Specifically, the state space S is designed as follows:
[0105] Wherein S.sub.position represents the coordinates of the current node, d.sub.start denotes the distance from the starting point relative to the current node, .sub.start indicates the relative angular orientation of the starting point in a coordinate system with the current node as the origin and the heading direction as the x-axis, d.sub.goal denotes the distance from the target point relative to the current node, .sub.goal indicates the relative angular orientation of the target point in a coordinate system with the current node as the origin and the heading direction as the x-axis, .sub.goal represents the direction of the target loading position in a coordinate system with the current node as the origin and the heading direction as the x-axis, N.sub.obs denotes the number of obstacles within a given range of the current node, and d.sub.obs.sub.
III. Deep Learning Parameter Optimization Module 3 for Deep Reinforcement Learning-Based Optimization
[0106] The deep learning parameter optimization module 3 is configured to construct a deep learning network to calculate the optimal step length and the optimal steering angle, build a reward function to optimize the deep learning network, and simultaneously execute the training process.
[0107] In this embodiment, the deep learning parameter optimization module 3 is specifically as follows:
(1) Deep Learning Network Construction
[0108] The DQN algorithm is employed to train the deep learning network. The action space consists of combinations of candidate optimal step lengths .sub.rl and candidate optimal steering angles l.sub.rl during expansion, i.e., the action space comprises all possible combinations of (.sub.rl, l.sub.rl).
[0109] For example,
[0110] Where l.sub.min and l.sub.max are the minimum and maximum exploration step lengths, respectively, and can be adjusted. The action space consists of all possible combinations of .sub.rl and l.sub.rl, resulting in a total of 174=68 possible actions.
[0111] The DQN algorithm utilizes two networks with identical structures but different parameters for training: a training network Q.sub. used to compute the Q-value for policy selection and iteratively update the Q-values; and a target network Q.sub. used to compute the Q-value of the next state in the temporal difference target (TD Target). The loss function Loss of the DQN algorithm is designed as follows:
[0112] Wherein (s.sub.i, a.sub.i, r.sub.i, s.sub.i) represents a set of state transition data obtained during training, including the current state s.sub.i, the current action a.sub.i, the reward r.sub.i obtained after taking the actiona.sub.i, and the next state
obtained after interacting with the environment by taking the action; is an adjustable discount factor;
[0113] Both the target network
and the training network Q.sub. are constructed using three fully connected layers, each containing 32 neurons. The outputs of the first two fully connected layers are fed into an activation function before being passed to the next fully connected layer, with the PReLU activation function being employed. The final fully connected layer directly outputs the Q-value for each action, including steering angles and step lengths. The steering angle and step length with the highest Q-value are ultimately selected as the optimized exploration parameters, comprising the optimal step length and the optimal steering angle.
(2) Reward Function Construction
[0114] To train and optimize the deep reinforcement learning network, it is essential to design a reasonable reward function and refine the strategy through rewards. Specifically, since path planning is an iterative search process, the deep reinforcement learning network outputs an action and receives a corresponding reward during each exploration round. The design of this reward function primarily considers guiding the mining truck to reach the target point quickly, reducing the number of iteration rounds, and maintaining a safe distance from obstacles. The designed reward function includes a target approach reward r.sub.g, an obstacle avoidance reward r.sub.o, an exploration cost r.sub.t, and a smoothness reward r.sub.s.
[0115] r.sub.g is the target approach reward, designed to guide the mining truck toward the destination. Accordingly, a positive reward is given when the mining truck moves closer to the target point, while a penalty is imposed when it moves farther away. Furthermore, when the Reeds-Shepp curve in the current round successfully connects to the destination, the mining truck is considered to have reached the target point, and a fixed arrival reward r.sub.success is granted.
[0116] The target approach reward r.sub.g is defined as follows:
[0117] Wherein w.sub.g is an adjustable weight, l.sub.c is the Euclidean distance from the current node N.sub.c to the target point N.sub.g in the current iteration round, l.sub.s is the Euclidean distance from the steering angle-optimized child node N.sub.best to the target point N.sub.g, and r.sub.success is a fixed reward granted when the Reeds-Shepp curve in the current round successfully connects to the destination, indicating that the mining truck has reached the target point. Triggering the Reeds-Shepp curve signifies successful path generation, thus resulting in a relatively large reward. Conversely, if the Reeds-Shepp curve fails to trigger, further exploration is still required. During node exploration, it is desirable for the nodes generated by the optimized exploration parameters to be as close as possible to the target point. Therefore, a reward component is introduced based on the difference between the distance from the current node to the target point l.sub.c and the distance from the child node to the target point l.sub.b. If the child node generated by the optimized exploration parameters moves farther from the target point, this reward component is negative; otherwise, it is positive.
[0118] r.sub.o is the obstacle avoidance reward, designed to prevent the mining truck from getting too close to surrounding obstacles and causing collisions. When designing the obstacle avoidance reward function, the safety status of the mining truck is classified into four conditions based on the distance d.sub.obs.sub.
[0119] The obstacle avoidance reward r.sub.o is defined as follows:
[0120] Wherein r.sub.o; represents the obstacle avoidance reward in the i-th sector, w.sub.1 and w.sub.2 are adjustable weight coefficients respectively. A distance threshold d.sub.c is designed, where d.sub.obs.sub.
[0121] The exploration cost r.sub.t is defined as follows:
[0122] Wherein TimeConstant is a fixed penalty cost constant set for each step, guiding the mining truck to approach the destination more rapidly and preventing meaningless exploration, with the cost set as a negative value;
[0123] Since steering changes in the mining truck incur additional travel costs, a smoothness reward r.sub.s is set in this invention to encourage minimizing steering wheel adjustments. The smoothness reward r.sub.s is defined as follows:
[0124] Wherein .sub.c represents the steering angle corresponding to the current node N.sub.c generated in the current search iteration round, .sub.rl corresponds to the optimal steering angle generated by the deep reinforcement learning network in the current search iteration round, l.sub.rl represents the optimal step length generated by the deep reinforcement learning network in the current search iteration round, and w.sub.3 and w.sub.4 are adjustable coefficients respectively;
[0125] The final reward function is as follows:
(3) Deep Reinforcement Learning Training Process
[0126] As shown in
TABLE-US-00001 Table 1 DQN Algorithm Algorithm 1 DQN Algorithm 1. Initialize the training network Q.sub. with random parameters. 2.
Second Embodiment
[0127] As shown in
[0132] A computer-readable storage medium storing computer code, wherein when the computer code is executed, the method as described above is performed. Those of ordinary skill in the art may understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing relevant hardware. The program can be stored in a computer-readable storage medium, and the storage medium may include: Read-Only Memory (ROM), Random Access Memory (RAM), magnetic disks, optical discs, etc.
[0133] The foregoing descriptions are merely preferred embodiments of the present invention, and the scope of protection of the present invention is not limited to the above embodiments. All technical solutions under the concept of the present invention shall fall within the scope of protection of the present invention. It should be noted that for those skilled in the art, several improvements and modifications made without departing from the principles of the present invention should also be considered as within the scope of protection of the present invention.
[0134] The technical features of the embodiments described above can be arbitrarily combined. For the sake of brevity, not all possible combinations of the technical features in the above embodiments have been described. However, as long as there is no contradiction in the combination of these technical features, they should be considered as falling within the scope of this specification.
[0135] It should be noted that the above embodiments can be freely combined as needed. The foregoing descriptions are merely preferred embodiments of the present invention. It should be noted that for those skilled in the art, several improvements and modifications made without departing from the principles of the present invention should also be considered as within the scope of protection of the present invention.