Intelligent Computer Aided Decision Support System
20250053783 · 2025-02-13
Inventors
Cpc classification
H04M11/04
ELECTRICITY
G16H50/20
PHYSICS
G16H80/00
PHYSICS
H04M2203/555
ELECTRICITY
G10L13/027
PHYSICS
G06N7/01
PHYSICS
A61B5/00
HUMAN NECESSITIES
G08G5/26
PHYSICS
A61B5/4803
HUMAN NECESSITIES
A61B5/747
HUMAN NECESSITIES
H04W4/90
ELECTRICITY
G16H40/20
PHYSICS
H04M3/527
ELECTRICITY
International classification
G16H40/20
PHYSICS
G16H50/20
PHYSICS
G16H80/00
PHYSICS
A61B5/00
HUMAN NECESSITIES
Abstract
The present invention relates to a method for assisting an interviewing party in deciding a response action in response to an interview between said interviewing party and an interviewee party. The method comprises providing a processing unit and inputting the voice of the interviewee party into the processing unit as an electronic signal, and processing the electronic signal by means of said processing unit in parallel with the interview taking place. The method further includes an anomaly routine comprising a statistically learned model, and by means of said statistically learned model determining a respective number of samples of said sequence of samples being an anomaly of said statistically learned model and returning to said anomaly routine for processing a subsequent number of samples of said sequence of samples by said anomaly routine.
Claims
1. A method for generating information such as an image or speech, comprising: providing a storage unit storing a set of instructions, and a processing unit for executing said set of instructions, said set of instructions including a generator routine having a decoder constituted by a statistically learned model, said decoder being defined by an observable variable and a decoder hierarchy of a set of random variables, said decoder hierarchy constituted by layers having at least one random variable from said set of random variables in each layer, said observable variable, and said set of random variables being jointly distributed according to a prior probability distribution, said prior probability distribution being factorized having: a first factor defined as a first probability distribution of said observable variable conditioned on at least one random variable from said set of random variables, a second factor defined as a second probability distribution of the random variable of the top layer of said decoder, a third factor defined as the product of sequence of the probability distributions for the random variables of said set of random variables, the random variable of each respective element in said product of sequence being conditioned on at least two of the random variables in the higher layers, said method further comprising sampling a value of the random variable of the top layer, and processing said value through said hierarchy such that said information is generated.
2. The method according to claim 1, the random variable of each respective element in said product of sequence being conditioned on the random variables in the higher layers.
3. The method according to claim 1, the random variables of said set of random variables being divided into a bottom-up path and a top-down path.
4. The method according to claim 1, said decoder comprising a deterministic variable for summarizing information from random variables higher in said decoder hierarchy.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0123] The invention will now be explained in more detail below by means of examples of embodiments with reference to the very schematic drawing, in which
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131] The invention may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like reference numerals refer to like elements throughout. Like elements will thus, not be described in detail with respect to the description of each figure.
[0132]
[0133] An injured person 10 is experiencing an acute injury or illness and is illustrated as laying on the ground.
[0134] A caller 12 is a bystander and is in need of assistance in order to help the injured person 10 experiencing the medical emergency in the form of an acute injury or illness.
[0135] The caller (interviewee party) 12 is illustrated using a mobile phone 18 for calling the emergency telephone number (emergency call).
[0136] The emergency call is transmitted through a public switched telephone network having a cell tower 16, and routed to a public safety answering point 26.
[0137] The emergency call is received at the public safety answering point as an electronic signal (electric current that represents information). The electronic signal may either be analog or digital.
[0138] A medical call taker (interviewing party), which is termed a dispatcher 14 in the following, answers the emergency call and a conversation between the caller and the dispatcher begins. A receiver 20 (such as a telephone possible including a handset for the dispatcher) may be used to answer the emergency call and transform the electronic signal to sound by means of a loudspeaker.
[0139] During the emergency call, the dispatcher follows the protocol for a systematized caller interrogation.
[0140] The protocol for a systematized interview may include questions in sequential order in order to determine the type of assistance needed. The protocol may be stored by the processing unit, and the sequence of questions of the protocol may be rearranged by the processing unit based on the information identified in the processingfor example certain cues or background noise may have a correlation or be associated with certain emergency situations, which means that the processing unit may present questions relating to such an emergency situation sooner than the interviewing party may realize he or she should ask questions relating to such an emergency situation.
[0141] At a point in time during the conversation, the dispatcher decides on a response action, which typically includes dispatching emergency medical services such as an ambulance and providing a set of pre-arrival instructions to the caller.
[0142] The electronic signal is also routed to a processing unit 22, i.e. a computer associated with the public safety answering point. The processing unit includes a set of instructions (algorithm) for processing the electronic signal.
[0143]
[0144] During the interview, the electronic signal is continuously routed or passed to the processing unit as long as the call/interview takes place, and the processing of the electronic signal lasts for the duration of the interview.
[0145] However, the processing may be aborted when the dispatcher/interviewing party has decided on a response action such as dispatching an ambulance to the scene of the accidentor when the caller/interviewee party has hung up.
[0146] The first number of samples is selected and passed to an anomaly routine, which is further described in connection with
[0147]
[0148] The anomaly routine determines if an interval is an anomaly or not. If it is not an anomaly, it is said to be a normality.
[0149] The anomaly routine comprises a statistically learned model, which may be a hidden Markov model or a neural network.
[0150] The statistically learned model is trained with training data in order to find a predictive function, which associates an input and an output.
[0151] The training data are samples from the probability distribution according to which the input and output are distributed.
[0152] The statistically learned model is trained such that the predictive function associates the input and output with the smallest error as possible, i.e. a loss function is defined and minimised.
[0153]
[0154] Between the input layer and the output layer of the left hand side neural network is illustrated a single layer (one hidden layer). The inputs and outputs are connected to each other via the single layer. Each node in the single layer may have a function f, for example a tan h function with a certain amplitude/constant in front of the function.
[0155] The right hand side neural network illustrated in
[0156] The two neural networks illustrated in
[0157] Thus, in the processing of the first number of samples by the anomaly routine, a first neural network (encoder/inference model/recognizing model) and a second neural network (decoder/generator) may be used in order to determine if a respective number of samples are an anomaly or a normality.
[0158] The first neural network may constitute an audio recognition neural network, and the second neural network may constitute a reversing neural network, i.e. a neural network that may reverse the prediction of the first neural network-predicting the input to the first neural network from the output of the first neural network.
[0159] The audio recognition neural network may be constituted by a recurrent neural network, which may use previously recognized letters in a word to estimate the following letter in the word.
[0160] The input of the first neural network is defined by the input vector
[0161] If a respective first number of samples represent 20 ms audio sampled at 16 kHz, the respective number of samples is 320, and the first neural network may have 320 inputs (320 input nodesone for each sample).
[0162] The number of hidden layers and the number of hidden nodes in total should be chosen such that the neural network is not under or over fitted. These numbers may be chosen as a function of the number of samples used to train the neural network, and the number of input and output nodes.
[0163] The first neural network may be defined by a matrix with up to 50 million elements.
[0164] The output of the first neural network is defined by the output vector
[0165] The outputs of the first neural network in the anomaly routine may constitute the inputs of the second neural network, i.e. the input vector
[0166] The second neural network may be defined by a matrix with up to 130 million elements.
[0167] The output of the second neural network is defined by the output vector {circumflex over (x)}.
[0168] Anomaly routine:
[0169] The anomaly routine may compare the input vector
[0170] In other words, it is tested if the training samples (training data) used to train the statistically learned model are also representative of the respective number of samples (real/production data), i.e. if the respective number of samples is an outlier of the distribution that the statistically learned model has been trained fromfor example if the signal to noise in the real data is too low or if the pronunciation deviates to a degree that it cannot be recognized in view of the data that the statistically learned model was trained with.
[0171] If the respective number of samples is an outlier, the statistically learned model will not result in a useful outputit can be said that the statistically learned model had then been trained with too few training samples, which are not representative of the respective number of samples such that a useful output can be expected.
[0172] The comparison of vectors may comprise determining the correlation between the input vector
[0173] The correlation or difference between the input and output of the statistically learned model and the output vector {circumflex over (x)} may be compared to a threshold, i.e. if the correlation or difference is above or below the threshold value, the respective number of samples is normality or an anomaly.
[0174] If the correlation is for example between 0 and 0.5, it is determined that the respective interval is an anomaly. If the correlation for example is greater than 0.5, it is determined that the respective number of samples is a normality.
[0175] As an alternative or addition to the above, the anomaly test/routine may include comparing the probability distribution of the training data with the probability distribution of the real data (the respective number of samples).
[0176] In case the probability function of the training data is unknown or of high dimensionality, the training data may undergo a transformation in order to reduce the dimensionality such that the probability function of the training data may be represented for example by a one-dimensional probability distribution (such as an exponential function in one dimension).
[0177] The transformation may be linear or nonlinear. A neural network may be used to reduce the dimensionality, i.e. to perform the transformation.
[0178] The probability distribution of lower dimensionality may be a prior given distribution (such as a Gaussian or exponential function in for example one dimension with defined parameters), and the transformation function may be trained to transform the probability function of the training data into the defined probability distribution of lower dimensionality. The same transformation function may be used as the basis for transforming the probability distribution of the respective number of samples. The lower dimensionality probability distribution of the respective number of samples may then be compared to the lower dimensionality probability distribution of the training data.
[0179] The comparison of probability distributions (whether in high or low dimensionality) may be done with an integration over the two probability distributions and comparing the integral to a threshold. If the integral is below the threshold, it may be decided that the respective number of samples represents an anomaly.
[0180] As another alternative or in addition, the test for a respective number of samples being an outlier/anomaly or not may comprise determining how far the respective number of samples is from the average of the probability distribution of the training data, i.e. the probability distribution defining the statistically learned model. For example, if the distance is more than 0.5 standard deviation away, it may be decided that the respective number of samples is an anomaly.
[0181] The respective interval may be Fourier transformed to the frequency domain, and the frequencies of the signal represented by the respective number of samples may constitute the input to the statistically learned model used by the anomaly routine, i.e. the signal represented by the respective number of samples may be divided into frequencies according to the energy at the frequencies of the respective interval.
[0182] Returning to the exemplary flowchart of
[0183] The feedback may constitute negative feedback in the sense that the interviewing party is informed that the processing is ongoing. The feedback may be by sound or displayed on a display 24.
[0184] Alternatively, the method may not provide feedback to the interviewing party until either it returns a suggestion to a response action or until the processing reaches a number of samples that represent a signal, which is not an anomaly.
[0185] The processing may then continue with the next number of samples (window) representing the part of the interviewee's voice following the previous number of samples, which was an anomaly. Thus, the method loops back to the anomaly routine-possibly while the condition that the emergency call is still ongoing is fulfilled.
[0186] In case the respective number of samples, which have been processed by the anomaly routine, turns out not to be an anomaly, the processing may then continue with that respective number of samples, which may be passed on to an audio recognition routine for recognizing audio in the respective interval.
[0187] The audio recognition routine may already have been performed in connection with the anomaly routine, i.e. the first neural network of the anomaly routine is a neural network for audio recognitionthe audio recognition routine may comprise a audio recognition neural network corresponding to the first neural network also used by the anomaly routine so that there is compliance between the test performed by the anomaly routine and the audio recognition routine.
[0188] The audio recognition routine determines the letters spoken by the interviewee party, and combines the letters to words.
[0189] The recognized audio, the word(s), of the respective number of samples are passed to a diagnosis routine for diagnosing the emergency.
[0190] The diagnosis routine may comprise a diagnosis neural network.
[0191] It may be necessary with audio from more than one set of samples in order to arrive at a diagnosis and for the method to propose a response action.
[0192] However, only a few set of samples, such as only one set of samples, which has been through the audio recognition and diagnosis routines, may lead to positive feedback in the form of one or more hints or suggestions to the interviewing partyalternatively in the form of rearranging the protocol for a systematized caller interrogation, i.e. rearranging the order of the list of questions in the protocol such that the probability of the interviewee party faster asking the right question(s) with respect to a certain emergency situationthe feedback to the interviewing party may include such rearrangement of the protocol's questions.
[0193] When enough audio has been recognized in order for the diagnosis routine to diagnose based on the recognized words, a response action is proposed to the dispatcher.
[0194] The processing may include a test similar to the anomaly routine in which it is tested if there is enough information available to the diagnosis routine in order to diagnose, i.e. that it can be expected that the diagnosis routine returns a diagnosis and corresponding response action that is correct/probable.
[0195] Background noise and cues.
[0196] The audio recognition and diagnosis routines may also analyse and/or respond to background noise, i.e. a specific background noise may be indicative of a certain emergency situation. The processing may identify such a specific background noise before the interviewing party, and thereby decrease the response time.
[0197] A neural network for audio event classification may be used in order to identify background noise, such as a breathing pattern and diagnose based on the identified background noise. It may be a dedicated neural network in that the training data may represent a specific audio event, i.e. the training data may be constituted by recordings of the sound of breathing of a number of people (such as 1000 people) experiencing a heart attack.
[0198] The diagnosis routines may also analyse and/or respond to special cues such as specific words, which may have been observed as having a high correlation with certain emergencies or physical conditions. A special word may be help, heart attack, engine failure etc. A dedicated neural network may be used for diagnosing special cues, and the dedicated neural network may be trained with data representing special cues.
[0199] Furthermore, the amplitude and/or frequency of the voice of the interviewee party may also be analysed in the processing. For example, a high amplitude may be indicative of the need for quickly dispatching an ambulance to the scene of the accident.
[0200] The processing may include a language routine for determining the language of the caller. The language routine may comprise a language neural network. The language routine may initialise immediately as the emergency call is received at the public safety answering point.
[0201]
[0202] The anomaly routine is omitted in the method, and the set of samples are processed one after the other by the audio recognition routine, and the diagnosis routine respectively.
[0203]
[0204]
[0205] Both the decoder and encoder may be used for the processing routines illustrated in both
[0206] In
[0207] More than three layers may be used and in the following, the number of layers is arbitrary, and a specific layer may be referred to using an index number.
[0208] For the layer below the top layer, i.e. the second layer, as well as the bottom layer, i.e. the first layer, it can be seen that the random variable of the respective layer is split in two components
The superscript BU refers to a bottom-up encoder path, and the superscript TD refers to a top-down encoder path (the encoder will be explained in connection with
[0209] The decoder may have a deterministic top-down path d.sub.L-1, . . . , d.sub.1 (which may be parameterized with neural networks), and receives as input at each layer i of the hierarchy the random variable z.sub.i+1.
[0210] This may be done by defining a fully convolutional model and concatenating
and d.sub.i+1 along the features' dimension.
[0211] d.sub.i can therefore be seen as a deterministic variable that summarizes all the relevant information coming from the random variables higher in the hierarchy, z.sub.>i.
[0212] In
[0213] The random variables z.sub.i.sup.TD and z.sub.i.sup.BU are conditioned on all the information in the higher layers, and are conditionally independent given z.sub.>i.
[0214] The joint distribution (prior probability distribution) p.sub.(x, z) of the decoder is given by:
where refers to the parameters of the decoder, i.e. in the case neural networks are used to define the random variables (or the probability distributions for the random variables in the hierarchy), the parameters may be the weights of the neural networks.
[0215] p.sub.(x|z) is a first factor defined as a first probability distribution of the observable variable x conditioned on the set of random variables, i.e. z.
[0216] p.sub.(z.sub.L) is a second factor defined as a second probability distribution of the random variable of the top layer z.sub.L (with index i=L) of the decoder. In
is a third factor defined as the product of sequence
for the probability distributions for the random variables of the set of random variable for the decoderthe random variable of each respective element in said product of sequence is conditioned on the random variable of one or more of the higher layers. For example for index i=2, the random variable z.sub.2 may be conditioned on the random variable z.sub.3 or z.sub.4 or higher. The condition may also be on several of higher laying random variables, i.e. z.sub.3 or z.sub.4 for example. For index or element i=L1 the random variable z.sub.L-1 is only conditioned on the top random variable z.sub.L.
[0217] The elements in the product sequence may be factored out as p.sub.(z.sub.i|z.sub.>i)=p.sub.(z.sub.i.sup.BU|z.sub.>i)p.sub.(z.sub.i.sup.TD|z.sub.>i), i.e. a first factor defined as the conditional probability distributions for the bottom-up random variables (where information/data goes from the bottom towards the top of the hierarchy). And a second factor defined as the conditional probability distributions for the top-down random variables (where information/data goes from the top towards the bottom of the hierarchy).
[0218] The probability distributions may have diagonal covariance, with one neural network for the mean and another neural network for the variance.
[0219] Since the
variables are on the same level in the decoder and of the same dimensionality, all the deterministic parameters going to the layer below are shared.
[0220] Specifically, the decoder has a top-down path going from z.sub.L through the intermediary random variables to x. Between each layer there is a ResNet block with M layers set up. Weight normalization is applied in all neural network layers.
[0221] The neural network function (a function of decoder parameters ) of ResNet layer j associated with layer i is denoted f.sub.i,j.
[0222] The feature maps are written as d.sub.i,j. The decoder routine can then be iterated as
[0223] In
[0224] Due to the nonlinearities in the neural networks that parameterize the decoder, the exact posterior distribution p.sub.(z|x) is intractable and needs to be approximated. A variational distribution (probability distribution for approximating a posterior probability distribution) q.sub.(z|x) may be defined for this.
[0225] A bottom-up (BU) and a top-down (TD) encoder path are defined, and which are computed sequentially when constructing the posterior approximation for each data point x.
[0226] The variational distribution over the BU random variables depends on the data x and on all BU variables lower in the hierarchy, i.e.
alternatively, the condition may be on a fewer number of BU variables lower in the hierarchy such as only the BU variable in the layer below the layer of the respective index i. denotes all the parameters of the BU path. z.sub.i.sup.BU may have a direct dependency only on the BU variable below, i.e.
[0227] The dependency on
may be achieved, similarly to the decoder, through a deterministic bottom-up path {tilde over (d)}.sub.1, . . . , {tilde over (d)}.sub.L-1.
[0228] The TD variables as well depend on the data and the BU variables lower in the hierarchy through the BU encoder path, but also on all variables above in the hierarchy through the TD encoder path in
[0229] All the parameters of the TD path may be shared with the decoder, and are therefore denoted as whereas the parameters of the encoder are denoted . The encoder may be factorized as follows:
i.e. the random variables z are conditionally distributed on the observable variable x according to a probability distribution q.sub.(z|x) for approximating a posterior probability distribution.
[0230] The variational distributions over the BU and TD random variables may be Gaussians whose mean and diagonal covariance may be parameterized with neural networks that take as input the concatenation over the feature dimension of the conditioning variables.
[0231] The first factor q.sub.(z.sub.L|x, z.sub.<L.sup.BU) may be defined as a first probability distribution of the random variable of the top layer of said encoder conditioned on the observable variable and the respective random variable of the bottom-up path below the top layer. The conditioned may further be on all of the respective random variable of the bottom-up path below the top layeror it may exclude some of the random variables closet to the bottom of the hierarchy.
[0232] The second factor:
may be defined as the product of sequence of the products between the probability distributions for the random variables of the bottom-up path q.sub.(z.sub.L|x, z.sub.<L.sup.BU) and the probability distributions
for the random variables of the top-down distributions path.
[0233] The respective random variable of the bottom-up path for a given index (or element) in the product of sequence is conditioned on the observable variable and the respective random variable of the bottom-up path for a lower index than the given index.
[0234] The respective random variable of the top-down path for a given index in the product of sequence is conditioned on [0235] the observable variable, [0236] at least one respective random variable of the bottom-up path, and [0237] the respective random variable of the top-down path for a higher index than the given index.
[0238] The respective random variable of the top-down path for a given index in the product of sequence may be conditioned on a random variable of the bottom-up path for a higher index than said given index and a random variable of the bottom-up path for a lower index than said given index, such that all of the random variables of the bottom-up path except for the one of the given index.
[0239] Training of encoder and decoder may be performed, as for variational auto encoders, by maximizing the evidence lower bound (ELBO) with stochastic backpropagation and the reparameterization trick:
[0240] In the standard ELBO from the above, the main contribution to the expected log-likelihood term is coming from averaging over the variational distribution of the lower level random variables. This will thus emphasize low-level statistics.
[0241] When performing anomaly detection with the specific encoder and decoder shown in
[0242] The evidence lower bound for the anomaly detection is a function of the posterior probability distribution (or the approximation thereof) q.sub.(z.sub.>k|x) as well as the prior probability distribution p.sub.(z.sub.>k).
[0243] With respect to the prior probability distribution, it is for the random variables for the layers higher than k, i.e. the random variables of the lower variables are excluded. The choice of k may for example be 3 such that for a hierarchy with 6 layers, the prior probability distribution is for the random variables of the fourth, fifth and sixth layer. Thus, k may be seen as a layer number defining a specific layer between the top and bottom of the hierarchy. The layer number may correspond to the middle layer or a layer closer to the middle layer than the top or bottom.
[0244] It may be so that only one variable in the layer at or lower than k is excluded, i.e. for example only the random variable of the first or second layer is excluded and the distribution is for the other random variables of the hierarchy. In general, the prior probability distribution is for the random variables of the hierarchy excluding at least one random variable from one of the bottom two layers or the bottom three layers or the bottom four layers.
[0245] With respect to the posterior probability distribution, it is for the random variables for the layers higher than k. The random variables are conditioned on the observable variable x. The choice and function of k may be the same as for the prior probability distribution.
[0246] The random variables may be said to belong to a set of random variables defined as z=z.sub.1, z.sub.2, z.sub.3, . . . , z.sub.L with z.sub.i=(z.sub.i.sup.BU,z.sub.i.sup.TD).
[0247] The computation of .sup.>k is approximated with Monte Carlo integration.
[0248] Sampling from p.sub.(z.sub.k|z.sub.>k)q.sub.(z.sub.>k|x) can be performed by obtaining samples {circumflex over (z)}.sub.>k from the encoder that are then used to sample {circumflex over (z)}.sub.k from the conditional prior p.sub.(z.sub.k|{circumflex over (z)}.sub.>k).
[0249] By only sampling the top Lk variables from the variational approximation, there is relied only on the high-level semantics encoded in the highest variables of the hierarchy when evaluating this metric, and not on the low-level statistics encoded in the lower variables.
[0250] Below is a list of reference signs used in the detailed description of the invention and in the drawings referred to in the detailed description of the invention. [0251] 10 Injured person [0252] 12 Caller [0253] 14 Medical call taker [0254] 16 Cell tower [0255] 18 Mobile phone [0256] 20 Receiver [0257] 22 Processing unit [0258] 24 Display [0259] 26 Public safety answering point.