SYSTEM AND METHOD FOR ANALYSING RAILWAY RELATED DATA

Abstract

The present invention relates to a method and system using multiple data sources for unsupervised and/or semi supervised algorithms to derive features such as speed of the train, length of the train, type of wagons, etc. Thus, classifying train categories. The invention provides a method and a system configured for analysing railway related vibration data. The invention is configured for collecting at least a first dataset from a sensor applied to the railway infrastructure. Further, it is configured for collecting at least a second dataset from a scheduling component. The at least one subset of the first dataset is curated with the second dataset to obtain first training database. The invention further discloses a method comprising the step of predicting at least a likelihood of one train belonging to at least one train-type.

Claims

1. A method for analysing railway related vibration data, the method comprising the steps of: collecting at least a first dataset from a sensor applied to railway infrastructure, collecting at least a second dataset from a scheduling component, curating at least one subset of the first dataset with the second dataset to obtain a first training database, predicting at least a likelihood of one train belonging to at least one train-type.

2. The method according to claim 1 further comprising the step of connecting the at least one sensor to at least one server, wherein the server comprises at least one processing component.

3. The method according to claim 1 wherein the processing component comprises a memory component configured to store at least one of at least the first dataset and the at least second dataset.

4. (canceled)

5. The method according to claim 1 comprising the step of pre-processing the first dataset, in the processing component.

6. The method according to claim 1 comprising the step of automatically converting the at least one first dataset to at least one time-frequency spectrogram.

7. The method according to claim 1 further comprising the step of unsupervised encoding of the at least one spectrogram to at least one feature map.

8. The method according to claim 1 comprising the step of facilitating the processing component with a neural network (NN) component, wherein the NN component is configured to automatically learn at least one lower-dimensional feature map.

9. The method according to claim 1 further comprising the step of teaching the NN component the at least one lower-dimensional feature map.

10. The method according to claim 1 wherein the method comprises the step of automatically calculating at least one nearest sample neighbour in the lower-dimensional feature map.

11. The method according to claim 1 further comprising the step of using the at least one feature map and the second dataset to label the at least one subset of the first dataset.

12. The method according to claim 1 further comprising the step of iteratively extending the label from the at least one subset of the first dataset to the at least one nearest sample neighbour.

13. The method according to claim 1 further comprising the step of predicting a likelihood of a train being of a certain type using the lower dimensional feature map.

14. The method according to claim 1 further comprising the step of predicting a likelihood of a train being of a certain type using the first training database.

15. A train classification system, the system comprising: a sensor configured to provide at least a first dataset and configured to railway infrastructure, a scheduling component configured to provide at least a second dataset, a server configured to curate at least one subset of the first dataset with the second dataset to obtain a first training database, a processing component configured to classify at least one train type, wherein, the system is configured to execute the method according to any of the method claims.

16. The method according to claim 1 comprising the step of further associating at least one weight with at least one distinctive feature of the train.

17. The method according to claim 1 comprising generating the first training database.

18. The method according to claim 5 wherein the step of pre-processing further comprising at least one of the steps: flagging at least one noisy component of the first dataset, removing at least one exponential wakeup, cutting off the edge of the at least one acceleration trace, stretching the at least one first dataset to a pre-determined size, representing the at least one first dataset as a time-frequency spectrogram.

19. The method according to claim 1 comprising the step of scaling the at least spectrogram value within a pre-determined region.

20. The method according to claim 19 comprising the step of generating at least one spectrogram value using hyperparameter optimization on at least one pre-determined truth dataset.

21. The system according to claim 15 wherein the first dataset comprises vibration signal associated with a motion of a rail vehicle, wherein the vibration signal comprises at least one of: at least frequency data; at least displacement data; at least velocity data; at least acceleration data.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0120] FIG. 1 depicts an example of a set-up of several data collecting sensors to a railway infrastructure in accordance with the present invention;

[0121] FIG. 2 shows a schematic flowchart of analysing railway related vibration data according to one embodiment of the present invention;

[0122] FIG. 3 exemplifies a distribution of classes of trains in latent space according to one embodiment;

[0123] FIG. 3a shows a tabular representation of a classification of trains according to FIG. 3;

[0124] FIG. 4 is an exemplifying trace of acceleration recorded with long wakeup curve;

[0125] FIG. 5 5a) is an exemplifying trace of acceleration recorded with a signal tapering off to zero according to the present invention; [0126] 5b) is an exemplifying trace of acceleration recorded with a flagged edge;

[0127] FIG. 6 6a) constitutes an exemplifying trace of acceleration recorded corresponding to a specific kind of train; [0128] 6b) constitutes an exemplifying representation of the acceleration trace of the train in accordance with the present invention;

[0129] FIG. 7 7a) constitutes an exemplifying trace of acceleration recorded corresponding to a specific kind of train; [0130] 7b) constitutes an exemplifying representation of the acceleration trace of the train in accordance with the present invention;

[0131] FIG. 8 8a) is a visualization of a classifier architecture according to one embodiment; [0132] 8b) is a visualization of an exemplary filter bank architecture

[0133] FIG. 9 is an illustration of a classifier and a reconstruction architecture according to the present invention;

[0134] FIG. 10 exemplifies a distribution of classes of trains in latent space according to one embodiment;

[0135] FIG. 11 shows an example of a train classifying system in accordance with the present invention.

DETAILED DESCRIPTION OF THE DRAWINGS

[0136] It is noted that not all the drawings carry all the reference signs. Instead, in some of the drawings, some of the reference signs have been omitted for sake of brevity and simplicity of illustration. Embodiments of the present invention will now be described with reference to the accompanying drawings.

[0137] FIG. 1 provides a schematic description of a system configured for a railway infrastructure. There is shown an example of a railway section with the railway 1 itself, comprising rails 2 and sleepers 3. Instead of the sleepers 3 also a solid bed for the rails 2 can be provided. Moreover, a mast 4 is shown that is just one further example of constructional elements that are usually arranged at or in the vicinity of railways. Also, a tunnel 5 is shown. It is needless to say that other constructions, buildings etc. can be present and also used for the present invention as described before and below. A first sensor 8 can be arranged on one or more of the sleepers 3. The sensor 8 can be an acceleration sensor and/or any other kind of railway specific sensor. Examples have been mentioned before.

[0138] A second sensor 9 is also arranged on another sleeper distant from the first sensor 8. Although it seems just a small distance in the present example, those distances can range from the distance to the neighbouring sleeper to one or more kilometres. Other sensors can be used for attachment to the sleepers as well.

[0139] Another kind of sensor 6 can be attached to the mast 4 or any other structure. This could be another sensor, such as an optical, temperature, even acceleration sensor etc. A further kind of sensor 7 can be arranged above the railway as at the beginning or within the tunnel 5. This could be height sensor for determining the height of a train, an optical sensor, a doppler sensor etc. All those sensors mentioned here and before are non-limiting examples.

[0140] FIG. 2 is intended to provide an example of a method for analysing railway related vibration data. Sensors 8 and 9 can be connected to a common component such as a server. The server can be a remote server. The server can be a part of an edge device. The server can further comprise a plurality of servers, cloud computing, cloud storage etc. The server can be transmitting, storing and/or processing etc. The server can be pulling or pushing sensor data from at least one of the sensors 8 and 9. The sensor to server connection can be hard-wired and/or wireless, depending on the needs and the further infrastructure. In S1 server can be collecting and/or pulling in unlabelled sensor data. Unlabelled sensor data can consist samples of acceleration traces. The acceleration traces can consist vibration measurements. The unlabelled sensor data can comprise the acceleration traces that have not been tagged with labels identifying characteristics, properties, or classifications. For example, an acceleration trace of a train recorded by the sensor without information about the train.

[0141] In S2, the method can comprise a step of processing the sensor data and converting it to a spectrogram. The method can comprise providing a processing unit configured to process the sensor data. Processing sensor data can comprise converting all the acceleration traces to the same length which can be achieved by cutting off the edges. The cutting of the edges can comprise the method flagging or cropping the trace if the RMS value is lower than a pre-determined acceleration value. The process is further described in the later embodiments.

[0142] The processing can also comprise discarding traces. If a trace is longer than a pre-determined time or lower than a pre-determined RMS value it can be discarded. For example, traces longer than 14 seconds can be cargo trains and can be excluded from classification. Further, the method comprises converting the traces to spectrograms. A spectrogram can be a time-frequency representation of an acceleration trace. The spectrogram can split an acceleration trace or a signal into overlapping windows. Further, a power spectrum density (PSD) of the Fourier transform can be calculated. To obtain constant energy per channel Slaney-style Mel scale can be used. The power spectrum density can then me mapped onto the Mel scale. The next step can be to take the logs of the PSDs at each of the Mel frequencies.

[0143] The processing can also comprise constraining the features of the input within a finite region. This is important because the classifier (as described later) can calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, a finite region can be specified so all features can be normalized such that each feature contributes approximately proportionately to the final distance. A global-maxima and/or a global-minima can be calculated from a first dataset.

[0144] The method can further comprise step S3 for extracting features in an unsupervised manner. The method can comprise learning an embedding for the first data set for dimensionality reduction by training the method to ignore noise. This can be used to group, or segment, datasets with shared attributes in order to extrapolate algorithmic relationships. This can identify commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data. Each group can be called a cluster.

[0145] In some embodiments the method comprises providing a scheduling database. The scheduling database can comprise at least one time table of a train. In step S6, the processing unit can comprise receiving the schedule data from the scheduling database. The scheduling database can be configured to be updated automatically or semi automatically. In one exemplary embodiment the method can comprise a step, shown in S7 of curating a small subset of unlabelled data in S2 with the schedule data. Data curation can comprise integration of a subset of data collected from the sensor and the schedule data. A subset of unlabelled data collected from the sensor can further be labelled using the information from the schedule data.

[0146] In some embodiments step S4 can comprise labelling the clusters created in S3 automatically using the information from the labelled subsets from S7. Cluster labelling can further comprise examining the features of the labelled data set per cluster to find a labelling that summarizes a class of each clusters and further can distinguish the cluster from each other.

[0147] The method can further comprise S5 of training a neural network to classify train type on the basis of the labelled and/or unlabelled clusters. The training can be done such that when a ‘new’ unlabelled acceleration trace is fed into the classifier it can predict a likelihood for a train being a certain type. The training can be done in a weekly supervised manner, such that the noise in the labels can be adjusted for.

[0148] FIG. 3 exemplifies a distribution of classes of trains in latent space according to one of the embodiments of the present invention. The method can comprise of distributing the types of trains in a latent space. The visualization in FIG. 3 is shown simply as an indication and should not be taken literally, as the actual embedding space is a higher dimension tensor which is compressed to two dimensions for representation in FIG. 3. However, as can be seen in the distribution, some train types are better separated than others, and the encoding is better in some sensors that in others.

[0149] The latent space, feature space, embedding space representation can comprise a compressed representation of multi-dimensional data and the terms can be used interchangeably. The latent space representation can be a representation of variables that are inferred through an algorithm from other variables that are observed directly. The visualization in FIG. 3 can be created by mapping a high dimensional space to a 2D space while keeping the distance between the features (data points) the same.

[0150] FIG. 3a is a tabular representation of the classification report of the information derived from FIG. 3 and shows the performance results of an autoencoder used. The precision can be a measure of how accurate the prediction is. The precision can be a number describing the ability of the classifier not to label as positive a sample that is negative (false positive).

[0151] The recall or sensitivity can be the ability of the classifier to find all the positive samples for example, the classifier is recognising 4 Train 1 in a trace containing 12 Train 1 and 2 Train 6. Of the 4 identified as Train 1, 3 actually are Train 1, while the remaining 1 is a Train 6. The classifier's precision in this case can be 0.75 while its recall can be 0.25.

[0152] The f1-score can be a weighted harmonic mean of the precision and recall. The support can be the number of occurrences of each class. The class can be type of a train. For example, Train 1 can be one type or class. Train 2 can be another type or class, etc.

[0153] FIG. 4 is an exemplifying trace of acceleration recorded at one of the sensors. In some embodiments the sensors can go to an inactive state in absence of any activity or train passing. The wakeup mode can be a limited bandwidth mode of operation. In this mode, acceleration can be measured a few times per second. When the sensor senses the presence of motion it can automatically switch to a full-bandwidth measurement mode which can result in a long wakeup curve. In the following embodiment an exponential curve can be fitted to the first 50 points of the acceleration trace. The weight of the regression can be inversely proportional to the index of a point, such that, the first point in the regression can be weighted 50 times higher than the last point. In some examples the measure acceleration can be a mixture of the wakeup curve and the ‘real signal’. The weight of the real signal increases with time. If there is no real signal or presence of a passing train/motion the exponential can simply get a very large negative exponent. This negative exponent and/or the exponential wakeup can be removed. The removing can be done automatically by a processing unit.

[0154] FIG. 5a is an exemplifying trace of acceleration recorded with a signal tapering off to zero after the removal of exponential wakeup as discussed in FIG. 4 according to the present invention. The sensor can continue to record for a further 0 -10 seconds in order to determine that the train has passed. This can result in a dead signal at the end. This dead signal or trail can be removed to have more precise or accurate data.

[0155] FIG. 5b is an exemplifying trace of acceleration after the trail (as discussed in FIG. 5a) is flagged or removed. It is important to alter, replace or delete the ‘irrelevant’ or inadequate data to increase the efficiency of the classifier. This can be done by comparing a moving average of the root mean square (RMS) value of the trace, starting from the edge of the trail, with a pre-determined threshold value. Further, everything before the first point where the threshold is breached can be discarded.

[0156] As discussed in FIG. 4 and FIG. 5 the traces can be cleaned. The cleaning will imply altering, replacing or deleting at least one signal from the trace. The data cleaning can clean, remove, alter any errors, outliers, or duplicates. This can also remove ‘irrelevant’ or noisy data.

[0157] After the data has been cleaned or pre-processed it can now be stretched to a standard size. In FIG. 6 and FIG. 7 an exemplifying representation of acceleration traces of two trains can be seen. FIG. 6a and FIG. 7a shows an acceleration trace of Train 1 recorded on sensor ID FGE WK 704 and Train 3 recorded on the same sensor in WK 706 respectively.

[0158] FIG. 6b and FIG. 7b shows a visual representation of the spectrum of Mel frequencies of the acceleration trace of the two trains (Train 1 and Train 3) as it varies with time. The Mel spectrogram can be a representation of the power spectrum based on linear cosine of a log power spectrum on a nonlinear Mel scale. The Mel scale can be a quasi-logarithmic spacing of frequencies resembling the resolution of the human auditory system. In particular, Slaney-style Mel scale can be used to obtain constant energy per channel. The parameters for plotting the spectrogram can be found from hyperparameter optimization of at least one truth set. For example, parameters used in this embodiment are as follows:

[0159] Number of Mel: 2.sup.6=64

[0160] Fast Fourier transfer window size: 2.sup.10=1024

[0161] Window forward skip: 2.sup.8=256

[0162] Window type: Hann

[0163] The data cleaned or removed at this step can be stored and be used in other embodiments.

[0164] FIG. 8 is visualizations of classier architecture according to the present invention.

[0165] FIG. 8a is an exemplary classifier according to an embodiment of the present invention. It shows an exemplary down-sampling via a convolution with stride two 10, 11, 12, 13 and an additional residual filter bank 20, 21, 22, 23. The Mel spectrum can be a matrix of pixel values. Further, a ‘filter’ or ‘kernel’ or ‘feature detector’ can be used to create an ‘activation map’ or a ‘feature map’. The different values of the different matrices can produce different feature maps for the same input Mel spectrum. The convolution of a second filter on the same image can create a second feature map. The method can learn the values of these filters on its own during the training process. The number of filters can be proportional to the number of features extracted which can facilitate pattern recognition. The depth of a feature map can correspond to the number of filters. The stride can be the number of pixels by which a filter can be shifted across an input matrix. The activation layer applied can be the softmax activation layer 60. Further, a batch normalization layer 40 and a dropout layer 70 can be applied. The dropout layer can be configured with a rate of 0.5. The softmax function can squash the output of each neuron to be between 0 and 1, just like a sigmoid function. The softmax function can further divide each output by a factor such that the total sum of the outputs can be equal to 1. Mathematically, a softmax function can be defined as:

[00002] ${σ (Z)}_{j} = \frac{e^{Z_{j}}}{{.Math.}_{k = 1}^{K} e^{z_{k}}},$

where z can be a vector of the inputs to the output layer, j can be the indices of the output units. A full classifier can comprise at least one of the at least one classifier architecture and at least one fully connected layer, which can result in the classification. The fully connected layers can connect every neuron in one layer to every neuron in another layer.

[0166] FIG. 8b is an exemplary filter bank architecture comprising a sigmoid activation layer 50. It shows an exemplary filter bank architecture composed of a 1×1 convolution. An additional operation called ReLU (Rectified Linear Unit) 30 can be applied. ReLU can adjust activation values to be within a pre-determined range. The range can be 0-1 or 0 to +∞, etc. The ReLU can be configured to replace all negative pixel values in the feature map by zero. The split can be applied, and the input matrix can be split into two identical matrices. One of the feature layers can comprise the sigmoid activation 50. Sigmoid activation can operate on a sigmoid function. The sigmoid function can be:

[00003] $\emptyset (Z) = \frac{1}{1 + e^{- Z}},$

where z can be a vector of the inputs to the output layer.

[0167] The input layer or input matrix can comprise pixels which can also be interpreted as neuron activations. These neurons can be scaled or normalized by a batch normalization layer 40. The normalization can be such that no activation deviates more than a ‘standard deviation’ of the activation strength. It can further allow each layer of a network to learn by itself a little bit more independently of other layers.

[0168] FIG. 9 is a diagram illustrating the process through which the classifier architecture creates an embedding, and the subsequent reconstruction of the image from the embedding. The method can comprise providing a decoder. The decoder can be configured with up-sampling with stride 2. This reconstruction can further be used to test the information contained in the embedding.

[0169] FIG. 10 shows the distribution of train types in three different switches. The classification results can be plotted in FIG. 10 after the embedding. It may be noted that the two-dimensional display is for visualization purpose only and is not the real picture. The real latent space can be a 4×4×32=512-dimensional tensor.

[0170] FIG. 11 shows an example of a train classifying system on the basis of the railway related vibrational data. The system can comprise sensor data 100. The sensor data can be vibrational patterns recorded from the acceleration of the train. The system can further comprise a scheduling database 200. The scheduling database can comprise automated or semi-automated extraction of the train schedule information. The system can further provide a processing unit 300. The processing unit can be configured to pull the sensor data and convert it into a classifier 400 acceptable format. The classifier 400 acceptable format of schedule database and/or the sensor data can comprise acceleration traces converted to spectrograms at a pre-determined frequency. The processing unit can additionally be configured to curate at least some part of the sensor data with the scheduling information. The system can further comprise pushing the processed data from the processing unit 300 to a classifier 400. The classifier can be configured with machine learning methods to extract at least one ‘feature’ from the processed data. The classifier according to the method described in the present invention can also be configured to predict a likelihood of a train being of a certain type.

SYSTEM AND METHOD FOR ANALYSING RAILWAY RELATED DATA

Inventors

Cpc classification

Classification Explorer

B61L27/57

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B61L27/40

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N3/088

PHYSICS

Classification Explorer

B61L23/048

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N3/082

PHYSICS

Classification Explorer

B61L27/53

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B61L23/045

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

B61L25/04

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B61L23/047

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06N3/048

PHYSICS

International classification

Classification Explorer

B61L27/57

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B61L25/04

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

B61L27/40

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06N3/08

PHYSICS

Abstract

Claims

Description