Method and system to count movements of persons from vibrations in a floor
10810481 ยท 2020-10-20
Inventors
Cpc classification
F41A17/066
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
G06K7/10297
PHYSICS
F41A17/063
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F41A17/06
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
G06V40/25
PHYSICS
G06V40/10
PHYSICS
G06V20/53
PHYSICS
G06K17/0022
PHYSICS
International classification
G06M1/27
PHYSICS
G06K17/00
PHYSICS
G06K7/10
PHYSICS
Abstract
A system and method for counting persons using passages to an area by analyzing vibrations in the floor or the air above the floor with sensors and a machine learning system. The machine learning system uses a model, usually implemented as a neural network on a processor. The network is trained in levels and implemented in layers. Different levels classify and analyze vibrations by timing and frequency, by movements of persons, and by identity of persons The same person is identified by patterns in the vibrations and the vibrations are correlated to determine and count when a person uses a combination of passages. Location information for the person is used to identify persons in places and doing activities of interest. The model may be trained on one processor and downloaded to another processor for evaluation. Additional sensors and levels of training may be implemented on the latter processor.
Claims
1. A system for counting persons comprising: (a) a sensor to produce signals from vibrations in at least one of a floor and the air above the floor with the floor connecting a set of at least three areas each of which a person may use to at least one of enter the floor and exit the floor; (b) a processor to evaluate a machine learning model wherein: the model comprises a first layer wherein outputs of the layer are generated from an evaluation of the sensor signals on the basis of training to classify events based on timing and frequency determinations, the model comprises a second layer to evaluate outputs of the first layer to produce an output identifying a multiplicity of locations of origin of signals produced by a particular person on the basis of training from signals collected from the sensor, the processor produces an output of the model on the basis of the locations which signifies that a person has entered using a particular one of the areas of the set and the person has exited using a particular one of the areas of the set; and (c) an output device for at least one of signaling a count produced on the basis of the output of persons which have entered by a particular one of the areas of the set and exited by a particular one of the areas of the set and of displaying a count of persons which have entered by a particular one of the areas of the set and exited by a particular one of the areas of the set.
2. The system of claim 1 wherein: the count is restricted to persons who have been identified on the basis of signals from the sensor as being in a portion of the floor during a time range.
3. The method of claim 1 wherein: the time range is determined by an event other than vibrations in the floor.
4. The system of claim 1 wherein: the second layer of the model comprises two sub-layers with one sub-layer of the second layer receiving outputs signifying locations of persons from a second sub-layer of the second layer of the model and producing outputs signifying that the same person has produced vibrations at a multiplicity of the signified locations.
5. The system of claim 1 wherein: a second processor trains the model, the model is downloaded to said first processor, and said first evaluates inputs from the sensor and produces the output of the model.
6. The system of claim 5 wherein: said first processor trains the model subsequent to the training of the second processor.
7. The system of claim 1 wherein: a second sensor to detect vibrations in at least one of the floor and the air above the floor provides data used to train the model.
8. A system for counting persons comprising: (a) sensor to produce signals from vibrations in at least one of a floor and the air above the floor connecting a set of at least two areas each of which a person may use to at least one of enter the floor and exit the floor; (b) a processor to evaluate a machine learning model wherein: the model comprises a first layer wherein outputs of the layer are generated from an evaluation of the sensor signals on the basis of training to classify events based on timing and frequency determinations, the model comprises a second layer to evaluate outputs of the first layer to produce an output identifying a multiplicity of locations of origin of signals produced by a particular person on the basis of training from signals collected from the sensor, and the processor produces an output of the model on the basis of the locations which signifies that a person has entered using a particular one of the areas of the set and the person has exited using a particular one of the areas of the set; and (c) an output device for at least one of signaling a count of a set of persons produced on the basis of the output wherein the set of persons consists of persons who have entered by a particular one of the areas of the set of areas and exited by a particular one of the areas of the set of areas and of displaying a count of persons which have entered by a particular one of the areas of the set of areas and exited by a particular one of the areas of the set of areas, and wherein the set of persons is limited to persons who have been identified as having been in a specific area of the floor in a specific time range.
9. The system of claim 8 wherein: the time range is determined by an event other than the vibrations.
10. The system of claim 8 wherein: the second layer of the model comprises two sub-layers with one sub-layer of the second layer receiving outputs signifying locations of persons from a second sub-layer of the second layer of the model and producing outputs signifying that the same person has produced vibrations at a multiplicity of the signified locations.
11. The system of claim 8 wherein: a second processor trains the model, the model is downloaded to said first processor, and said first processor evaluates inputs from the sensor and produces the output of the model.
12. The system of claim 11 wherein: said first processor trains the model subsequent to the training of the second processor.
13. The system of claim 11 wherein: a second sensor to detect vibrations in at least one of the floor and the air above the floor provides data used to train the model.
14. A system for locating persons comprising: (a) a sensor to produce signals from vibrations in at least one of a floor and the air above the floor connecting a set of at least two areas at least one of which a person may use to at least one of enter the floor and exit the floor; (b) a processor to evaluate a machine learning model wherein: the model comprises a first layer wherein outputs of the layer are generated from an evaluation of the sensor signals on the basis of training to classify events based on timing and frequency determinations, the model comprises a second layer to evaluate outputs of the first layer to produce an output identifying a multiplicity of locations of origin of signals produced by a particular person on the basis of training from signals collected from the sensor, and the processor produces an output of the model on the basis of the locations which signifies that a person has entered using a particular one of the areas of the set and the person has exited using a particular one of the areas of the set; and (c) an output device for indicating the presence of the person in one of the areas in a specific time range.
15. The system of claim 14 wherein; the time range is determined by an event other than the vibrations.
16. The system of claim 14 wherein; the second layer of the model comprises two sub-layers with one sub-layer of the second layer receiving outputs signifying locations of persons from a second sub-layer of the second layer of the model and producing outputs signifying that the same person has produced vibrations at a multiplicity of the signified locations.
17. The system of claim 14 wherein; a second processor trains the model, the model is downloaded to said first processor, and said first processor evaluates inputs from the sensor and produces the output of the model.
18. The system of claim 17 wherein: said first processor trains the model subsequent to the training of the second processor.
19. The system of claim 17 wherein: a second sensor to detect vibrations in at least one of the floor and the air above the floor provides data used to train the model.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING
(1) The features and advantages of the various embodiments disclosed herein will be better understood with respect to the drawing in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION OF THE INVENTION AND EMBODIMENTS
(12) Definitions
(13) The definitions given in this section are intended to apply throughout the specification and in the claims.
(14) A machine learning model is a data structure such as a neural net which has been trained to process inputs to recognize significant patterns.
(15) A layer of a machine learning model is a portion of the model with inputs from input data to be evaluated and/or from previous layers of the model and with outputs from the model or to later layers of the model
(16) A level of a machine learning system is a group of training operations which enable one or more layers to produce outputs which have a pattern correlated to the inputs which provide inputs for further processing or for output from the system. A subsequent level of training would use these outputs as inputs to one or more layers to produce outputs from levels trained by the subsequent level to provide further analysis.
(17) Sensors
(18) There are many kinds of sensors available on the marketplace which can provide information to train a pattern recognition system or to be examined for patterns. Some of these examine the immediate situation at the identity token and measure factors such as acceleration including the direction of gravity, rotation, and even temperature. Others are active or passive devices to measure available information about the environment. They can work by a wide variety of information transmission modes such as infrared, radio, capacitance, visible light, sound or other means.
(19) Small, fast and cheap three dimensional accelerometers are widely used and available in the market place. They can provide a rich amount of data as a function of movements in each plane and about orientation relative to gravity. In an application where one hundred percent accuracy is not necessary, they will allow simple designs for identification devices. An accelerometer can also be used to communicate with a device by means of moving the device in predetermined patterns to change modes, set parameters, etc.
(20) Gyroscopes which measure the rate of rotation in one or more planes are also available and provide a substantial amount of additional information. They are able to work in a sealed device, as are accelerometers and various other types of sensors. Working in combination with accelerometers, gyroscopes can distinguish angular rotations in vertical and horizontal planes. The patterns of movements in various planes are a rich source of patterns which can be used to distinguish transfers of an object between different persons and mere movements by a single object possessor.
(21) Sound and vibration sensors are helpful in many embodiments. They can detect patterns such as existence of conversations between persons who might transfer a token. They can detect background noise which is correlated with location in many venues. Patterns in background or other vibration can be correlated to many categories of useful information. Various categories of vibration generation and modification are important in providing such information. One category is vibrations which are generated by objects of interest, another category is vibrations which have been modified by environmental conditions to yield additional information and a third category is background vibrations which can provide information for comparison and can be modified by objects of interest to provide information about these objects. An example of the second category is the comparison of direct and reflected receipt of a vibration by a sensor to indicate the location of an object. An example of the third category is object location by detection of background vibrations scattered by that object.
(22) Pattern Matching Software
(23) Various type of systems for pattern matching have been developed, but implementation as neural nets are rapidly replacing most older methods because of the straightforward method of development and because of effectiveness. Neural nets require substantial amounts of processing for training but once trained are easily implemented in devices for use and very quick to evaluate particular cases.
(24) The problem to be solved in most embodiments of the current invention is to classify at each time the inputs of the available sensors which have been gathered over a portion of the preceding time periods into two classes based on whether or not a suspected transfer of the token has occurred. Because of this structure of the problem an LSTM or GRU recurrent layer is appropriate. This allows for learning to take into account both short and long-term time-based features of the sensor input.
(25) Implementation of neural networks and other structures for pattern matching is now a well-known art. Courses for teaching the methods needed are available online to be audited by anyone at Stanford University and MIT. Course assignments are comparable in complexity to the required effort to implement most embodiments of the methods herein.
(26) An important advantage of pattern matching by neural network is that it is not necessary for the implementer to understand or find patterns. The task in designing such a system is to provide a rich source of inputs that are correlated with the desired states to be distinguished. The correlation does not have to be with each input but can be with an unknown function of many inputs.
(27) Pattern Matching For Acoustics
(28) Methods for methods for detection of acoustic scenes and events have become well known in the artificial intelligence community. Many papers and explanations of such methods are available from the Proceedings of the Detection and Classification of Acoustic Scenes and events 2015 Workshop (DCASE2016) held by the Tampere University of Technology of Finland. The kinds of events to have patterns recognized in the tracking of possession of tokens are of the same structure as acoustic events and can be handled by the same methods. It would often be helpful to make simple adjustments in the methods such as adopting an appropriate time scale and adapting the preprocessing to the sensors used.
(29) Training in Levels
(30) Models can be arranged in levels both for training and for evaluation of inputs. The application of the model to a set of inputs generates outputs that describe in a higher level of generality the meaning of the inputs. Those outputs can become inputs to further structure which is a model for a more general transformation of the original inputs toward meaningful outputs.
(31) In this specification and in the claims, a level of training is the training of a portion of the parameters of a model to produce outputs that are trained until a state of convergence is attained and made available for input the next portion of the model. That is, distinct levels are made distinct by separate training to convergence. It is possible to simultaneously train multiple levels, but they are distinct levels when they are separately tested for convergence. A level that is not tested for convergence, but which uses inputs from a level that has been brought to convergence is a distinct level from the level providing the inputs.
(32) Typical models are in at least four levels. The first which here is called the Basic level takes raw sensor input and describes it in terms directly definable based on the input data. Examples would be detection of edges from visual data and of tones, harmonics and burst timings for audible data. The second level which is here called the General level is to identify objects and events from the output of the first level. Examples would be to detect a person crossing the path of the sensor or identifying a sound as a gunshot or crowd noise. The third level, herein called the Specific level is to allow the model to identify actions and objects appropriate to the purpose of use of the model. Examples of this level include model layers to implement steering or acceleration of a vehicle or determination of compliance with a standard in a specific type of situation. There is also a fourth level called the In-Use level in many implementations. This level incorporates data collected while a model is in use which modifies the model to allow evaluations at a later time to take into account earlier inputs or evaluations where a series of evaluations is made.
(33) Implementation of Training on a Processor With a Memory
(34) Training requires a very large amount of processing to apply the large amount of data in the training set repeatedly to incrementally cause the model to converge on the desired behavior. If the adjustments from one pass through the data are too large, then the model may not converge or may not allow the effects of all of the inputs to diffuse through the model structure and correctly operate. For this reason, specialized very powerful processors are used for training. They are not appropriate for incorporation in portable devices because of considerations of size and expense.
(35) Basic Training
(36) In this specification and in the claims, basic training refers to training which is used to interpret inputs from sensors or raw data from data sources to identify aspects of objects and actions treated as objects that are implied by the data and too general in nature to identify the potentially detected objects at this stage. Examples include edge detection, categorization of sounds by location of the source, face detection, orientation in space, counting objects, elimination of backgrounds and many other general tasks of interpretation.
(37) A portion of a machine learning model with this training can be used for many applications and could be supplied by a specialized developer. It's training would be brought to convergence and the outputs supplied to the next level of training when the model is used to evaluate inputs either for further training of other levels or in actual use.
(38) Data For General Training Describing the Area of Application of a Model
(39) In this specification and in the claims, general training refers to training which is accomplished after a model has received basic training. The general training uses data that is representative of members of classes of objects which are to be identified by the use of the generally trained model. The focus of general training is to identify specific objects and actions in the general classes. Examples of data and training for the current application include identifying footsteps, persons, location patterns and details of such entities that are part of a signature of a specific member of these classes.
(40) Transferring a Trained Model
(41) It is well known in the art of developing machine learning models that training the model requires much more time and processing power that using a model to evaluate inputs. Because training only has to be done once or a limited number of times and many evaluations can be performed by the same trained model, it is practical to use specialized powerful processors over a long period of time to do the training; and then to download the model to a compact processor that can be taken to the place where evaluations are wanted and to perform evaluations to be used in real or limited time.
(42) Data for Detecting Movements and Identify of a Person with a Trained Model
(43) Data for detecting specific activities or objects can be gathered by a sensor after the model is downloaded to a evaluating processor. The data can be preprocessed by means other than machine learning models. Both analog an digital preprocessing based on understanding of the nature of specific collected signals will produce inputs that are more easily processed by the machine learning model. Even the early layers of the model can be so simplified. As the data is processed into inputs for detection of complex and subtle patterns, use of machine learning models becomes the most or only practical way to continue toward the desired outputs. This is especially true when the is not available human understanding of the details of pattern identification and so that specific algorithms cannot be manually created.
(44) Acquisition of In-Use Data With a Sensor For Further Training.
(45) While training as described above may require special processors the fact that training can be organized in levels and the model in layers allows additional training of some layers in a level implemented after the model is transferred to the evaluation processor. The training can be done with data acquired at a late time. This data is referred to as in-use data. This allows training for patterns associated with specific persons. This limited amount of training may be practical for implementation on a processor sized primarily for the evaluation function.
DETAILED DESCRIPTION OF THE DRAWING AND CERTAIN EMBODIMENTS
(46) Referring to
(47)
(48) Pattern analysis by means of a neural net does not require understanding by the implementer of the particular patterns being recognized, but understanding of the type of patterns can be useful. In the current case preprocessing in early layers of the neural net or in dedicated circuitry can make the developing an accurately converging neural net much easier. Inputs at a certain layer of the net can represent signals with defined delays between different pairs of sensors.
(49) In the depicted embodiment vibrations or acoustic signals 13 are shown from the footstep of a person 12 radiating to the three depicted sensors. One signal in particular 14 is shown reflecting in two bounces to join with a direct signal at particular sensor. Complexity in the reflecting surfaces enriches the patterns in the sensed vibrations which assists the neural network in finding the needed patterns.
(50)
(51) Referring to
(52)
(53) Referring to
(54) In
(55) A hall 20 similar to that of
(56) Referring to
(57) An area 20 similar to
(58) The purpose of having the depicted embodiment in the current case is to track the patterns of travel of customers buying tickets. In particular to distinguish between customers who enter from the lot 38, pass by path 33 to the ticket machine and then return to the parking lot from passengers who enter from the parking lot, buy a ticket and then use path 34 and portal 32 to take a train immediately. It may not be sufficient to correlate tickets sold to tickets used because the tickets of interest may be for future travel.
(59) The sensors 11 of
(60) In another embodiment depicted in
(61) Referring to
(62) In
(63) Referring to
(64) In
(65) The first step of the development is to accumulate 100 a data set for training and testing. While unlabeled data can be used for most or all of the development of a model for a machine learning system, it is much more accurate and efficient to use labeled data, where each section of sensor data is accompanied by data pre-interpreting the meaning of the particular piece of data. In
(66) In the depicted training scenario, the camera observes a large number of persons crossing the area to be monitored by the system. The data collected by the sensors (
(67) In this and related embodiments, a step in the development which might be started in parallel with data collection is the design of an appropriate neural network. The sizing of the layers and the setting of various factors in the neural net which are in addition to the factors and values (parameters) that are adjusted in training are collectively referred to as hyperparameters to distinguish them from the parameters which are adjusted in training the neural network. The hyperparameters are initialized 101 to appropriate values. In some systems that are taught hyperparameters are adjusted during the course of training but are distinct from trainable parameters because the adjustments are on the basis of the progress of the training rather than being direct functions of the data.
(68) The next step is to initialize 102 the parameters which are to be trained. Appropriate initialization is necessary for reasonably rapid convergence of the neural net. A number of techniques are taught to product an initial set of values which produced good training progress.
(69) The network is then trained 103 by passing data set items through the network as implemented on a training processor. Because training requires larger processing power and time than use of the network after training special powerful processors are used for this step. The training process adjusts the parameters incrementally on the basis of the output of the neural network. The hyperparameters specify the methods of calculating the adjustment to parameters. Generally the output of the network is used to back propagate through the network to provide further input to the adjustments. The items in the training portion of the dataset are used repeatedly while the convergence of the network is observed 54 by processes in the training data processor.
(70) If the convergence is judged 105 not to be adequate the training is stopped, the hyperparameters are adjusted 106, the neural network is reinitialized and the training process is repeated until satisfactory convergence is obtained. The smaller portion of the data set which has been retained and not used for training is then passed 107 through the neural network (classified) and the output is checked 108 for accuracy. If accuracy is not sufficient for the goals of the particular system being developed then the net structure is made larger 109 and the training process is repeated until satisfactory accuracy is obtained.
(71) The trained neural network is then downloaded 110 to the target device, which is then ready for system testing 111.
(72) Referring to
(73) In
(74) Vibrations caused by the impact of the persons feet may travel directly to sensors 11 over straight paths 120. Simple calculations from the time of arrival of the vibrations at the various sensors allow determination of the location of the impact. Where a vibration travels a reflected path 121 to the sensor, then a more complex calculation is required but, with the exception of degenerate points, source location is still possible. It should be noted that a machine learning program would be able to distinguish the various modes after training with varied examples.
(75) Where multiple vibrations 122 from the same source in time and space arrive with varying delays even at a single sensor. Location in multiple dimensions may be possible. Training may enable the machine learning system to distinguish the various paths based on not only signal delay but other factors such as attenuation, spread of signal details and effects of the reflection process.
(76) Where a person 12 moves to another position 123 with one or more steps there may be characteristics such as distance traveled in a step, timing of steps or specific details of the sounds as processed from inputs to a machine learning system. If that person's identity is lost to the system by complex movements, leaving a monitored area or by interference it may be regained by identification of such characteristics. In the figure the person makes a similar move in another area 124 at a later time and the machine learning system reidentifies the person.
(77) Referring to
(78) In
(79) The model is initialized 141 with suitable values in a trainable parameter set. A basic data set 142 with basic information is used to perform the first level of training 143 the model. The model would generally already have multiple layers and the basic data set would be used to train the earliest layers of the model. It would use data to allow these layers to recognize or react to features such as edges in pictorial data and sound impulses for audio data. This training would be applicable to many applications of a machine learning system. It would provide a layer of the model with an ability to recognize features such a burst of acoustic noise and determine acoustic features and localization. Higher level interpretation to distinguish physical causes such as being a footstep would come at a later layer which would be trained at a later level. It may be provided by a supplier of implementation and hardware systems and these layers may be acquired in an already trained condition by implementators of applications. In the embodiment of
(80) The second level of training 144 in the depicted embodiment is done with a second General data set 145. This data is selected to allow the model to use inputs to recognize objects and entities relevant to the application of the model. The general data set in the depicted embodiment is generated by a combination of data generation from a simulation 146 of general applications of the model and specific data gathered 147 for such applications. The applications at this level include recognition of objects and events such as persons moving, footsteps, groups moving together, and items used to define a signature for specific persons and other objects of interest as individuals rather than members of types. Prior to the training at this level layers are typically added 148 to the model to allow the training to take effect in facilitating analysis with the aid of the moved based on inputs processed by preceding levels of trained model. In the embodiment depicted in
(81) The third level of training 149 in the depicted embodiment is done with a third Specific data set 150. In the depicted system this layer and the layer for the second level are separate levels. In some systems the two layer may be combined and the training done concurrently. Multiple adjacent levels can be treated as a single level and trained concurrently or sequentially. This data is selected to allow the model to use inputs to recognize further details and activities of the objects identified in the second level. The general data set in the depicted embodiment is generated by a combination of data generation from a simulation 41 of specific applications of the model and specific data gathered 42 for such applications. Typical information used to generate a simulation at this level include many variations of relevant objects for the purpose of applying standards similar to the one to be implemented. Prior to the training at this level layers are typically added 43 to the model to allow the training to take effect in facilitating analysis with the aid of the model based on inputs processed by preceding levels of trained model. In the embodiment of
(82) After the model is trained through several levels, it is usually downloaded 44 from high powered training processors which are only used to prepare the model to a smaller portable processor to execute the model in actual use. To use the model to evaluate a situation data is gathered from the situation 45 by means of appropriate sensors and prepared to serve as an input 46 for the model. The model on the basis of (evaluating) the inputs generates 47 outputs 48 corresponding to the action of the training on the parameters of the model.
(83) In some more advanced implementations of the system, inputs and outputs are used to select 49 additional training for the model. The information in the inputs and outputs can cause the download of sets of parameters which can be added to the model or a limited training process similar to that used to develop the original model can be accomplished by the evaluation processor.
(84) Referring to
(85) The layers of the model are sequenced as in the arrow 180. In the depicted embodiment there are 4 layers. Sensors 181 detect vibrations to be analyzed an provide signals to a preprocessing unit 182 which applies analog and digital methods to simplify and quantify them for for input for evaluation by the machine learning system. Outputs 183 of the preprocessing are supplied as inputs to the first layer 184 of the model. The first layer is implemented in two sublayers 185 which are completely interconnected 186. Typical neural network models have multiple sublayers in each layer and often have complete interconnections. Each interconnection consists of a parameter which determines the strength of the interconnection. Each layer and sublayer consists of a number of neurons Training adjusts the parameters in small increments to cause the model to converge on the desired behavior. A level of training works on a layer or group of layers to produce convergence to the desired behavior for that level. Connections between major layer structures 187 are often much more sparse and are designed to transferred information which is correlated to patterns detected by the earlier layer. This layer is trained to do a very low level of pattern analysis such as identifying groups of related vibrations and statistical representations of vibrations.
(86) The second layer of the model 188 has a single sublayer. This layer has recursive connections 189 between outputs of neurons of the model which allows the model to represent time sequences. In practice this layer would have other sublayers with much more complete connections between the neurons of the layer. These sublayers are omitted to simplify the figure. This layer could be trained to work on the output of the first layer to identify time structures of vibrations.
(87) The third layer of the model 190 is similar in structure to the first. It would receive training on much higher level data. In a typical embodiment it would identify movements of people from the patterns developed by earlier layers. In some embodiments it would identify particular people from patterns of vibrations and in other embodiments an additional layer would be added for that purpose.
(88) The fourth depicted layer 191 is shown as being trained in an in-use training level. Data from a sensor is processed by a training program to allow more effective machine learning methods to be applied at that late stage by a training module 192 on the evaluating processor. Because of the limited time an processing power available for real time training this is limited in scope but because of the extensive analysis already done on the data by earlier layers of the model, a very simple layer with simple training can make a major contribution to the results.
(89) The outputs of the last layer are available then for non machine learning processing, counting and use or display 193.
(90) Referring to
(91)