FEEDBACK LOOP FOR EMOTION RECOGNITION SYSTEM

20220409113 · 2022-12-29

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention relates to a system and method of emotion recognition. An emotion recognition system may utilize a Valence-Arousal factor along with training data. The training data may exist as emotions assigned to actual measurements of user inputs. The actual measurements of user inputs may be assigned to a plurality of points on the Valence-Arousal model. A user input acquisition device may be used to collect actual measurements of user inputs. A processor may utilize an algorithm to assign user emotions based on the training data. A user may provide feedback on the assigned user emotions, and the training data may be updated based on the user feedback, depending on whether the user feedback is considered an outlier to the training data.

    Claims

    1. An emotion recognition system comprising: a. a Valence-Arousal model comprising: i. a Valence factor comprising two endpoints; ii. an Arousal factor comprising two endpoints; iii. a plurality of points, one of said plurality of points being an origin; b. an algorithm; c. a user input acquisition device; d. a database comprising training data, said training data comprising: i. actual measurements of user inputs assigned to the endpoints of each the Valence factor and the Arousal factor; ii. actual measurements of user inputs assigned to some of the plurality of points, said some of the plurality of points being in addition to the endpoints of the Valence factor and the Arousal factor; iii. emotions assigned to each actual measurement of user inputs; e. a processor; and f. a user device, wherein the user input acquisition device collects actual measurements of user inputs, and wherein the actual measurements of user inputs are transmitted to the database in the form of non-transitory computer-readable media, and wherein the processor retrieves the actual measurements of user inputs from the database, and wherein the processor uses the algorithm to assign the actual measurements of user inputs to one or more of the plurality of points of the Valence-Arousal model, and wherein the processor uses the algorithm to recognize the closest corresponding emotions to the one or more of the plurality of points, and wherein the processor uses the algorithm to assign user emotions based on the closest corresponding emotions to said one or more of the plurality of points, and wherein the user emotions are transmitted to a user device in the form of non-transitory computer-readable media, and wherein the user emotions are displayed on the user device in the form of human-readable information, and wherein a user uses the user device to provide user's emotion feedback.

    2. The emotion recognition system of claim 1, further comprising a credibility algorithm, wherein the credibility algorithm determines that the user's emotion feedback are outliers, and wherein the user's emotion feedback are discarded.

    3. The emotion recognition system of claim 1, further comprising a credibility algorithm, wherein the credibility algorithm determines that the user's emotion feedback are not outliers, and wherein the processor re-assigns the emotions to re-assigned points on the Valence-Arousal model based on the user's emotion feedback, and wherein the processor updates the algorithm based on the re-assigned points.

    4. The emotion recognition system of claim 1, wherein the user input acquisition device is an EEG device.

    5. The emotion recognition system of claim 1, wherein the user input acquisition device in an ECG device.

    6. The emotion recognition system of claim 1, wherein the actual measurements of user inputs are transformed into Hjorth parameters for further processing.

    7. The emotion recognition system of claim 1, wherein the Fourier transform is applied to the actual measurements of user inputs to obtain other measurements of user inputs.

    8. The emotion recognition system of claim 1, wherein the processor uses the algorithm to assign the actual measurements of user inputs to one or more of the plurality of points of the Valence-Arousal model by converting the actual measurements of user inputs into one or more scalograms or other continuous wavelet transformation coefficient(s) as the input of machine learning/deep learning.

    9. The emotion recognition system of claim 8, wherein one or more pre-trained algorithms such as VGG16 is used within the convolution neural network.

    10. A method of recognizing emotions comprising: a. Providing a Valence-Arousal model, the Valence-Arousal model comprising: i. a Valence factor comprising two endpoints; ii. an Arousal factor comprising two endpoints; iii. a plurality of points, one of said plurality of points being an origin; b. assigning training data to the Valence-Arousal model, comprising: i. assigning actual measurements of user inputs to the endpoints of each the Valence factor and the Arousal factor; ii. assigning actual measurements of user inputs to some of the plurality of points, said some of the plurality of points being in addition to the endpoints of the Valence factor and the Arousal factor; iii. assigning an emotion to each expected measurement of user inputs; c. providing an algorithm; d. providing a user input acquisition device; e. collecting actual measurements of user inputs with the user input acquisition device; f. providing a processor; g. using the processor to denoise the actual measurements of user inputs; h. transmitting the actual measurements of user inputs to a database in the form of non-transitory computer-readable media; i. using the processor to retrieve the actual measurements of user inputs from the database; j. using the processor to recognize one or more user emotions by: i. using the algorithm to assign the actual measurements of user inputs to one or more of the plurality of points of the Valence-Arousal model; ii. using the algorithm to recognize the closest corresponding emotions to the one or more of the plurality of points; iii. assigning user emotions based on the closest corresponding emotions; k. transmitting the user emotions to a user device in the form of non-transitory computer-readable media; l. displaying the user emotions on the user device in the form of human-readable text; and m. using the user device to provide user's emotion feedback.

    11. The method of claim 10, further comprising providing a credibility algorithm, wherein after using the user device to provide user's emotion feedback, the credibility algorithm is used to determine that the user's emotion feedback are outliers, and wherein the user's emotion feedback are discarded.

    12. The method of claim 10, further comprising providing a credibility algorithm, wherein after using the user device to provide user's emotion feedback, the credibility algorithm is used to determine that the user's emotion feedback are not outliers, and wherein the emotions are re-assigned to re-assigned points on the Valence-Arousal model, and wherein the algorithm is updated based on the re-assigned points.

    13. The method of claim 12, wherein said method is continuously repeated.

    14. The method of claim 10, wherein the user inputs device is an EEG device.

    15. The method of claim 10, wherein the user inputs device is an ECG device.

    16. The emotion recognition system of claim 10, wherein the actual measurements of user inputs are transformed into Hjorth parameters for further processing.

    17. The emotion recognition system of claim 10, wherein the Fourier transform is applied to the actual measurements of user inputs to obtain other measurements of user inputs.

    18. The emotion recognition system of claim 17, further comprising denoising the other measurements of user inputs.

    19. The emotion recognition system of claim 1, wherein the processor uses the algorithm to assign the actual measurements of user inputs to one or more of the plurality of points of the Valence-Arousal model by converting the actual measurements of user inputs into one or more scalograms.

    20. The emotion recognition system of claim 10, wherein one or more pre-trained algorithms such as VGG16 is used within the convolution neural network.

    Description

    BRIEF DESCRIPTION OF THE FIGURES

    [0039] FIG. 1 is a first example Valence-Arousal model utilized by some embodiments of the present invention.

    [0040] FIG. 2 is a second example Valence-Arousal model utilized by some embodiments of the present invention.

    [0041] FIG. 3 is an example process flow of some embodiments of the present invention.

    [0042] FIG. 4 is the example process flow of FIG. 3 wherein Emotion Recognition is further subdivided into 4 different processes.

    [0043] FIG. 5 is an example logic flow utilized by some embodiments of the present invention for determining whether to accept or override manual user feedback.

    [0044] FIG. 6 is an example process flow of a test period of an emotion recognition system.

    [0045] FIG. 7 is an example process flow of creation of a user database.

    [0046] FIG. 8 is an example process flow of denoising a user input such as an EEG signal.

    [0047] FIG. 9 is an example process flow of validation of user credibility.

    [0048] FIG. 10 is an example process flow of determination of a best AI model for emotion recognition.

    [0049] FIG. 11 is an example Valence-Arousal model showing a distance between two emotions.

    [0050] FIGS. 12-14 are an example process flow of a running period of an emotion recognition system.

    [0051] FIG. 15 is an example process flow of emotion recognition by an AI model.

    DETAILED DESCRIPTION

    [0052] The description provided herein describes example embodiments of the present invention and is not to be interpreted as limiting the invention to any particular embodiment, feature, step, or property. The figures provided and described herein also illustrate example embodiments of the invention, and are not to be interpreted as limiting the invention to any particular embodiment, feature, step, or property.

    [0053] As shown in FIG. 1, a Valence-Arousal model has two axes: a Valence factor and an Arousal factor. Each axis has two endpoints. The endpoints of the Valence factor are labeled “Pleasant” and “Unpleasant”. The opposite ends of the Arousal factor are labeled “Activated” and “Deactivated”. The Valence-Arousal model allows emotions to be categorized into quadrants based on the Valence and Arousal axes. For example, in FIG. 1, the emotion “relaxed” falls in the quadrant defined by the Pleasant end of the Valence factor and the Deactivated end of the Arousal factor. This quadrant may be referred to as the “Pleasant-Deactivated” quadrant.

    [0054] The Pleasant end of the Valence factor may be considered positive Valence, and the Unpleasant end of the Valence factor may be considered negative Valence. The Activated end of the Arousal factor may be considered positive Arousal, and the Deactivated end of the Arousal factor may be considered negative Arousal. The intersection of the Valence and Arousal axes may be referred to as the origin of the Valence-Arousal model, wherein the magnitude of both Valence and Arousal are zero. The halves of a Valence-Arousal model defined by each axis may be referred to as “hemispheres”. For example, in FIG. 1, tense and alert both fall into the Activated hemisphere, and happiness and content both fall into the Pleasant hemisphere. The two hemispheres created by the Valence factor may also be referred to as hemispheres.

    [0055] Due to the fact that Valence-Arousal models comprise finite ends to their axes, Valence-Arousal models may be depicted as circles, as shown in FIG. 1. FIG. 1 shows each emotion along the inner diameter of the circle. This is one example of a Valence-Arousal model. Another more detailed example of a Valence-Arousal model is shown in FIG. 2.

    [0056] As shown in FIG. 2, different emotions are placed at different points within their respective quadrants in order to show the magnitude of both Valence and Arousal that each emotion comprises. Emotions located towards the origin of the graph may be considered to be more neutral emotions than the emotions located towards the edge of the graph. The Valence-Arousal model shown in FIG. 2 may be more informative and therefore better suited for use in certain emotion recognition applications.

    [0057] The Valence-Arousal models illustrated in FIGS. 1 and 2 are examples of Valence-Arousal models and are not intended to limit the invention to utilizing any particular Valence-Arousal model or any particular emotion mapping. The locations of the emotions on the Valence-Arousal models utilized by the invention are subject to change based on the understanding at any given time of emotions in the art and how they should be graphed on a Valence-Arousal model. Furthermore, the present invention may modify a Valence-Arousal model utilized by the invention based on the invention's understanding at any given time of how emotions should be graphed on a Valence-Arousal model. The invention may perform said modifications as the invention's understanding of emotional graphing changes based on the “learning” of the AI feature of the invention.

    [0058] As shown in FIG. 3, a method of recognizing emotions begins with neurophysiological signal acquisition 300, which is using one or more user input acquisition devices to obtain actual measurements of user inputs. The actual measurements of user inputs are gathered as analog data and are then converted to digital data through analog-digital conversion of signals 301, the signals being the analog form of the actual measurements of user inputs. The method shown in FIG. 3 utilizes one or more EEG devices as the one or more user input acquisition devices. Since EEG devices provide multiple channels of actual measurements of user inputs, the method of recognizing emotions comprises primary signal processing of each channel 302, wherein each channel provided by the one or more EEG devices is processed separately by a processor.

    [0059] Emotion recognition 303 is carried out by the processor utilizing the actual measurements of user inputs, a Valence-Arousal model, training data, and an algorithm. The output of emotion recognition 303 is user emotions that are provided to a user via emotion analysis output display to user 304. Emotion analysis output display to user 304 includes transmitting the user emotions to a user device in the form of non-transitory computer-readable media. The user may provide emotion labels 305 by using the user device to provide user's emotion feedback, which may be based on the user's own understanding of their own emotions. The method may further comprise utilizing a credibility algorithm for user emotion feedback evaluation considering user credibility 306, in which the credibility algorithm determines if the user's emotion feedback are considered outliers to the training data. Incoming emotion entry into database 307 includes entering either the user emotions or the user's emotion feedback into a database.

    [0060] As shown in FIG. 4, emotion recognition 303 as used in the method of emotion recognition shown in FIG. 3 may be carried out by pattern distinction 401, cross-channel coherence 402, Valence-Arousal analysis 403, and overall rating 404. During pattern distinction 401, patterns within the actual measurements of user inputs obtained by the one or more EEG devices are recognized. During cross-channel coherence 402, the actual measurements of user inputs of each of the multiple channels provided by the one or more EEG devices are compared to one another to determine if the actual measurements of user inputs from each channel align with one another.

    [0061] During Valence-Arousal analysis 403, a processor uses an algorithm to assign the actual measurements of user inputs to one or more of a plurality of points of a Valence-Arousal model. The processor then uses the algorithm to recognize the closest corresponding emotions, said emotions having been assigned to some of the plurality of points as part of the training data. The processor then assigns user emotions based on the closest corresponding emotions during overall rating 404.

    [0062] As shown in FIG. 5, a logic flow may be utilized by a credibility algorithm in order to determine if the user's emotion feedback provided by the user are outliers to the training data. First, it is determined whether the user's emotion feedback (user's emotion label) is an exact match to the user emotions recognized (equals to the prediction) 500. If so, the user's emotion feedback is stored in a database 508 and the user's credibility “C”, is increased 510.

    [0063] If the user's emotion feedback are not an exact match to the user emotions recognized 500, it is determined whether the user's emotion feedback and the user emotions provided belong to the same hemisphere of Valence 501. If not, the user's emotion feedback are subject to manual review by experts 507, said experts being humans trained in the art of emotion recognition such as physiatrists. If the manual review by experts 507 determines that the user's emotion feedback are valid, then the user's emotion feedback are stored in the database 508 and the user's credibility is increased 510. If the manual review by experts 507 determines that the user's emotion feedback are invalid, then the user's emotion feedback are not entered into the database and the user's credibility is decreased 509.

    [0064] If the user's emotion feedback and the user emotions provided belong to the same hemisphere of Valence 501, it is then determined if the user's emotion feedback and user emotions provided belong to the same hemisphere of Arousal 502. If so, it is then determined if the user's credibility “C” is higher than Cv 503. If C is higher than Cv, the emotions of the training data are re-assigned to re-assigned points on the Valence-Arousal model and the algorithm is updated based on the re-assigned points 504. The user's emotion feedback is entered into the database 508 and C is increased 510.

    [0065] If, when determining if C is higher than Cv 503, it is determined that C is lower than Cv, the user's emotion feedback are subject to manual review by experts 507. If the manual review by experts 507 determines that the user's emotion feedback is valid, then the user's emotion feedback are stored in the database 508 and C is increased 510. If the manual review by experts 507 determines that the user's emotion feedback are invalid, then the user's emotion feedback are not entered into the database and the user's credibility is decreased 509.

    [0066] If the manual user feedback and the emotion recognition of the invention do not belong to the same hemisphere of Valence 501 nor the same hemisphere of Arousal 502, the difference in Arousal level between the user's emotion feedback and the user emotions provided is calculated and compared to Ax 505. If said difference is greater than Ax, the user's emotion feedback is subject to manual review by experts 507. If the manual review by experts 507 determines that the user's emotion feedback is valid, then the user's emotion feedback is stored in the database 508 and C is increased 510. If the manual review by experts 507 determines that the user's emotion feedback are invalid, then the user's emotion feedback are not entered into the database and the user's credibility is decreased 509.

    [0067] If the difference in Arousal level between the user's emotion feedback and the user emotions provided 505 is less than Ax, C is compared to Ca 506. If C is greater than Ca, the emotions of the training data are re-assigned to re-assigned points on the Valence-Arousal model and the algorithm is updated based on the re-assigned points 504. The user's emotion feedback are entered into the database 508 and C is increased 510. If C is less than Ca, the user's emotion feedback is subject to manual review by experts 507. If the manual review by experts 507 determines that the user's emotion feedback is valid, then the user's emotion feedback is stored in the database 508 and C is increased 510. If the manual review by experts 507 determines that the user's emotion feedback are invalid, then the user's emotion feedback are not entered into the database and the user's credibility is decreased 509.

    [0068] Cv, C, Ca, and Ax are used in this description and are known statistical properties in the art of statistics. Thus, the terms Cv, C, Ca, and Ax shall be interpreted by their meaning in the art of statistics for purposes of this description.

    [0069] The emotion recognition system of the present invention may comprise a test period (also referred to as a “reference period”) and a running period. As shown in FIG. 6, the test period may start with the creation of a user database 601. The user database may be tested (“validation of user database credibility”) 602, and an appropriate AI model for emotion recognition may be determined (“determination of best AI model”) 603. Validation of user database credibility 602 may consist of parts such as but not limited to checking the consistency of user's emotion feedback upon the same emotion or upon the signal characteristics and verifying the user's emotion feedback using the objective standards of emotion derived by scientific and/or clinical studies. The test period may be repeated any number of times in order to provide the emotion recognition system with enough training data to use during the running period. During the test period, a database may be created for each user of the emotion recognition system thereby creating a “subject dependent” emotion recognition system. Alternatively, a single database may be created for a plurality of users, thereby creating a “subject independent” emotion recognition system.

    [0070] As shown in FIG. 7, creation of the user database 601 may start with raw user input 701. Raw user input 701 may exist as un-filtered or otherwise un-altered user input such as but not limited to EEG signals, fNIRS signals, ECG signals, and body temperature. The raw user input 701 is denoised through denoising of user input 702, which produces clean user input 703. A user may provide initial values of Valence and Arousal 704 (also referred to as “initial user annotations”) based on the values of Valence and Arousal that the user thinks correlate with their emotions. The clean user input 703 and initial user annotations 704 are then added to the user database 705.

    [0071] As shown in FIG. 8, denoising of user input 702 may comprise environmental noise removal 801 and biosignals noise removal 805. FIG. 8 shows biosignals noise removal 805 being completed after environmental noise removal 801. However, various embodiments of the invention may utilize various denoising techniques. Some embodiments of the invention may only utilize environmental noise removal 801 or biosignals noise removal 805. Other embodiments of the invention may complete biosignals noise removal 805 before environmental noise removal 801.

    [0072] During environmental noise removal 801, 50 Hz or 60 Hz power line noise removal 802 may occur, during which portions of the raw user input that have frequencies of exactly 50 Hz or 60 Hz are removed from the user input. These specific frequency values are chosen since noise created by EEG devices is generally at the frequency of 50 Hz in European EEG devices and 60 Hz in American EEG devices. High pass and low pass band filtering 803 may also occur as part of environmental noise removal 801. During high pass and low bass band filtering 803, portions of the raw user input in the range of 0-4 Hz inclusive (low pass band) and portions of the raw user input in the range of 45-128 Hz inclusive (high pass band) are removed from the user input. These frequency ranges are chosen since they correlate with common environmental noise generated by events such as but not limited to the user moving or the user touching the user input acquisition device. Removal of other environmental noise 804 may further occur as part of environmental noise removal 801.

    [0073] During biosignals noise removal 805, unwanted user input may be removed from the raw user input by CNN for EOG, EMG, and ECG denoising 806. In emotion recognition systems that utilize EEG as the preferred user input, other user inputs such as eye blink artifacts (EOG), muscle artifacts (EMG), and heartbeat artifacts (ECG) may be considered noise, and therefore should be removed from the user input. This may be achieved using a convolutional neural network (CNN) which is trained to recognize EEG signals when mixed with other biosignals such as EOG, EMG, and ECG. A convolutional neural network is a series of AI algorithms that are configured to recognize 1-dimensional (1D) images. The “algorithm” referred to herein may be a convolutional neural network. The “credibility algorithm” referred to herein may also be a convolutional neural network. Examples of CNNs that exist in the art are GoogLeNet, VGG16, and VGG19. During CNN for EOG, EMG, and ECG denoising 806, the various biosignals such as EEG, EOG, EMG, and ECG may be converted into 2D images that may be recognized using the CNN. The CNN may then separate the EEG signal from the rest of the biosignals, thereby removing the rest of the biosignals from the user input.

    [0074] Removal of other biosignals noise 807 may also occur as part of biosignals noise removal 805.

    [0075] As shown in FIG. 9, a user provides values of Valence and Arousal (initial user annotations) 704, which are compared to the clean user input 703, as well as existing user input in the user database. The emotion recognition system finds a number (N) entries within the user database with the smallest distances to the initial user annotations 901. An algorithm, which may be a CNN, determines the correlation between the user input and the initial user annotations 902. If a high correlation is determined by the algorithm, the initial user annotations are said to be credible 904. If a negative correlation is determined by the algorithm, the initial user annotations are said to be not credible 905.

    [0076] To determine the correlation between the user input and initial user annotations 902, the algorithm may calculate a correlation coefficient between the clean user input 703 and initial user annotations, as well as correlation coefficients between the initial user annotations and other user input entries that exist in the user database. Of the correlation coefficients between the initial user annotations and the other user input entries, an average correlation coefficient is calculated. A correlation criterion is then used to determine if the average correlation coefficient between the initial user annotations and existing user input entries is correlated to the correlation coefficient of the initial user annotations and the new, clean user input 703.

    [0077] As shown in FIG. 10, determination of best AI model 603 may use a plurality of emotion recognition models 1001. The algorithm compares the distances between the predicted emotions of each of the plurality of emotion recognition models to the initial user annotations 1002. The minimum distance is then found over all of the emotion recognition models 1003, and the emotion recognition model with the smallest distance is chosen as the best AI model.

    [0078] Determination of best AI model 603 is particularly useful for subject dependent emotion recognition systems, since different AI models may work better for different users. Therefore, the same emotion recognition system may be used by multiple different users even though different AI models are used for each user.

    [0079] The “AI model” described herein may comprise the algorithm and credibility algorithm described herein. Either, or both, of the algorithm and credibility algorithm may be a convolutional neural network.

    [0080] As shown in FIG. 11, a 2D Valence-Arousal model is used with an Arousal factor 1101 and a Valence factor 1102. A first emotion 1103 and second emotion 1104 are assigned two different points within the Valence-Arousal model. The distance 1105 between the first and second emotions is calculated using the Euclidian distance formula:


    d=√{square root over (α(V2−V1).sup.2+β(A2−A1).sup.2)}

    wherein d is the distance between the first and second emotions, V1 is the Valence value of the first emotion, V2 is the Valence value of the second emotion, A1 is the Arousal value of the first emotion, A2 is the Arousal value of the second emotion, α is a Valence constant, and β is an Arousal constant. The Valence and Arousal constants may be used to weigh either Valence or Arousal. For example, if Valence is determined to be twice as important for emotion recognition, the Valence constant may be set to twice the value of the Arousal constant.

    [0081] The first and second emotions may be two emotions of the same user in the database. For example, a User A may provide two readings of a happiness emotion to the database, which may be the first and second emotions. The first and second emotions may alternatively be two emotions from different users in the database. For example, a User A may provide a reading of happiness and a User B may also provide a reading of happiness, which may be the first and second emotions. The first and second emotions may alternatively be an initial user annotation and a user emotion. For example, a User A may provide a reading of happiness (the first emotion), and may also provide an annotation of User A's Valence and Arousal values (the second emotion).

    [0082] As shown in FIG. 12, the running period may start with raw user input 1201. Denoising 1202 may occur, which may function exactly like the denoising of the test period, resulting in a clean user input 1203. If user annotation 1206 has been provided, a credibility check 1204 occurs. The credibility check 1204 may function exactly like the credibility check (correlation determination) of the test period. If no user annotation 1206 has been provided, the running period proceeds to emotion recognition 1401. The results of emotion recognition 1401 are displayed to the user 1205. If the user agrees with the results of emotion recognition 1401, the running period continues as a new cycle with new raw user input 1201. If the user does not agree with the results of emotion recognition 1401, user annotation 1206 is provided, and a credibility check 1204 is performed on the user annotation 1206. User annotation 1206 may be values of Valence and Arousal provided by a user, said values being the values of Valence and Arousal that the user thinks correspond with their emotions.

    [0083] As shown in FIG. 13, if there is no correlation/credibility between the clean user input 1203 and the user annotation 1206, then the user is asked if the user annotation is correct 1301. Asking the user to re-annotate 1301 may prevent mistakes in user annotation from contaminating the database with incorrect data. If the user states that their previous annotation is not correct, the user re-annotates 1306, and their re-annotation is subjected to a credibility check 1204. If the user states that their previous annotation is correct, the user input and user annotation are discarded 1303. 1 count is then added to a counter of consecutive non-credible annotations 1304. If the total count is found to be greater than a number N, then human intervention 1305 is required, and the running period starts over from obtaining raw user input 1201.

    [0084] The number N may be any positive integer. The number N may be chosen to represent a number of consecutive non-credible user annotations that would determine a user to be incapable of providing accurate user annotations, therefore requiring human intervention 1305. Human intervention 1305 may be performed by medical experts in the art such as psychologists or psychiatrists.

    [0085] Also as shown in FIG. 13, if there is correlation/credibility between the user input and user annotation, the user input and user annotation are entered into the database 1302, and the running period proceeds to emotion recognition 1401.

    [0086] As shown in FIG. 14, the emotion recognized by the emotion recognition system is compared to the user annotation 1402. This is done in order to determine the best AI model for emotion recognition. Though an AI model was previously used during emotion recognition 1401, calculating the distance between the recognized emotion and the user annotation 1402 may determine whether said AI model is the best AI model to be used for a particular user. A set criterion may be used to determine if the distance between the recognized emotion and the user annotation 1402 is “large” or “small”. If the distance is small, the recognized emotion is considered valid, and the emotion recognition system may recognize further emotions by obtaining new raw user input 1201. If the distance is large, the AI model for emotion recognition is changed 1403 based on the AI model that provides the smallest distance between the recognized emotion and the user annotation. Upon changing the AI model for emotion recognition 1403, the running period resumes from emotion recognition 1401.

    [0087] As shown in FIG. 15, emotion recognition 1401 may be completed using clean user input 1203. Since emotion recognition uses an algorithm that may be a convolution neural network, continuous wavelet transformation 1501 is used to convert the 1-dimensional (1D) clean user input 1203 into a 2D scalogram 1502 that can be read by a CNN. A pre-trained CNN 1503 (CNN that has been equipped with training data such as that acquired during the test period of the emotion recognition system). The scalogram 1502 may be a time-frequency representation of the clean user input 1203. Within the pre-trained CNN 1503, a tensor of a predetermined size encapsulates the relevant feature of the scalogram necessary for recognizing an emotion. Said tensor may be the input to a second CNN (CNN classifier 1505) that is dedicated to using the image provided by said tensor to output an emotion category 1506, thereby recognizing an emotion using clean user input 1203. The CNN classifier 1505 may be part of the overall pre-trained CNN 1503.