METHOD FOR RECOGNIZING MULTIPLE USER ACTIONS ON BASIS OF SOUND INFORMATION
20170371418 ยท 2017-12-28
Inventors
Cpc classification
G01H17/00
PHYSICS
G06F3/017
PHYSICS
G01N29/36
PHYSICS
G06F3/167
PHYSICS
International classification
G01N29/36
PHYSICS
Abstract
The present invention relates to a method for recognizing multiple user actions and, more particularly, provided is a method capable of recognizing multiple user actions from a collected sound source when multiple actions are performed in a specific space, and accurately determining a user situation from the recognized multiple user actions.
Claims
1. A method of recognizing multiple user actions, the method comprising: collecting sounds in a place in which a user is located; calculating starting similarities between a starting sound pattern of the collected sounds and reference sound patterns stored in a database and ending similarities between an ending sound pattern of the collected sounds and the reference sound patterns stored in the database; selecting starting candidate reference sound patterns and ending candidate reference sound patterns, same as the starting sound pattern and the ending sound pattern of the collected sounds, from among the reference sound patterns, based on the starting similarities and the ending similarities; and recognizing multiple user actions based on the starting candidate reference sound patterns, the ending candidate reference sound patterns, and user location information.
2. The method according to claim 1, further comprising: determining increasing zones, increasing by a size equal to or greater than a threshold size in the collected sounds; and determining the number of multiple actions that produce the collected sounds, based on the number of the increasing zones.
3. The method according to claim 2, wherein selecting the starting candidate reference sound patterns and the ending candidate reference sound patterns comprises: determining exclusive reference sound patterns, not occurring in the place, from among the starting candidate reference sound patterns or the ending candidate reference sound patterns, based on the user location information; and determining final candidate reference sound patterns by removing the exclusive reference sound patterns from the starting candidate reference sound patterns or the ending candidate reference sound patterns, wherein the multiple user actions are recognized based on the final candidate reference sound patterns and the user location information.
4. The method according to claim 3, wherein, when the number of the increasing zones or the decreasing zones is determined to be 2, recognizing the multiple user actions comprises: generating a candidate combination sound by combining a single starting candidate reference sound pattern from among the final candidate reference sound patterns and a single ending candidate reference sound pattern from among the final candidate reference sound patterns; determining a final candidate combination sound, most similar to the collected sounds, by comparing similarities between the candidate combination sound and the collected sounds; and recognizing multiple actions mapped to the starting candidate reference sound pattern and the ending candidate reference sound pattern of the final candidate combination sound as the multiple user actions.
5. The method according to claim 3, wherein, when the number of the increasing zones is determined to be 2, recognizing the multiple user actions comprises: determining whether or not a final candidate reference sound pattern from among the final candidate reference sound patterns of the starting candidate reference sound patterns is same as a final candidate reference sound pattern from among the final candidate reference sound patterns of the ending candidate reference sound patterns; when the same final candidate reference sound pattern is present, determining the same final candidate reference sound pattern as a first final sound pattern; determining a second final sound pattern by comparing similarities between subtracted sounds, produced by removing the first final sound pattern from the collected sounds, and the reference sound patterns stored in the database; and recognizing actions mapped to the first final sound pattern and the second final sound pattern as the multiple user actions.
6. A method of recognizing multiple user actions, the method comprising: collecting sounds in a place in which a user is located; calculating starting similarities between a starting sound pattern of the collected sounds and reference sound patterns stored in a database and ending similarities between an ending sound pattern of the collected sounds and the reference sound patterns stored in the database; determining starting candidate reference sound patterns, same as the starting sound pattern, from among the reference sound patterns, based on the starting similarities, and ending candidate reference sound patterns, same as the ending sound pattern, from among the reference sound patterns, based on the ending similarities; determining whether or not a candidate reference sound pattern from among the starting candidate reference sound patterns is same as a candidate reference sound pattern from among the ending candidate reference sound patterns; when the same candidate reference sound pattern is present, determining the same candidate reference sound pattern as a first final sound pattern and determining remaining final sound patterns using the first final sound pattern; and recognizing user actions mapped to the first final sound pattern and the remaining final sound patterns as multiple user actions.
7. The method according to claim 6, further comprising: determining increasing zones, increasing by a size equal to or greater than a threshold size, in the collected sounds; and determining the number of multiple actions that produce the collected sounds, based on the number of the increasing zones.
8. The method according to claim 7, wherein, when the number of the increasing zones is determined to be 2, recognizing the multiple user actions comprises: when the same candidate reference sound pattern is present, determining the same candidate reference sound pattern as the first final sound pattern; determining a second final sound pattern by comparing similarities between subtracted sounds, produced by removing the first final sound pattern from the collected sounds, and the reference sound patterns stored in the database; and recognizing actions mapped to the first final sound pattern and the second final sound pattern as the multiple user actions.
9. The method according to claim 7, wherein, when the same candidate reference sound pattern is not present and the number of the increasing zones is determined to be 2, recognizing the multiple user actions comprises: generating a candidate combination sound by combining the starting candidate reference sound patterns and the ending candidate reference sound patterns; determining a final candidate combination sound, most similar to the collected sounds, from among the candidate combination sound by comparing similarities between the candidate combination sound and the collected sounds; and recognizing actions mapped to the starting candidate reference sound patterns and the ending candidate reference sound patterns of the final candidate combination sound as the multiple user actions.
10. The method according to claim 8, wherein determining the starting candidate reference sound patterns and the ending candidate reference sound patterns comprises: determining exclusive reference sound patterns, not occurring in the place, from among the candidate reference sound patterns, based on the user location information; and determining final candidate reference sound patterns by removing the exclusive reference sound patterns from the starting candidate reference sound patterns or the ending candidate reference sound patterns.
11. A method of determining a user situation, the method comprising: collecting sounds and user location information in a place in which a user is located; calculating starting similarities between a starting sound pattern of the collected sounds and reference sound patterns stored in a database and ending similarities between an ending sound pattern of the collected sounds and the reference sound patterns stored in the database; selecting starting candidate reference sound patterns and ending candidate reference sound patterns, same as the starting sound pattern and the ending sound pattern, from among the reference sound patterns, based on the starting similarities and the ending similarities; determining a first final sound pattern and a second final sound pattern, producing the collected sounds, from among the starting candidate reference sound patterns or the ending candidate reference sound patterns, by comparing combined sound patterns, produced from the starting candidate reference sound patterns and the ending candidate reference sound patterns, with the collected sounds; and determining a user situation based on a combination of sound patterns, produced from the first final sound pattern and the second final sound pattern, and the user location information.
12. The method according to claim 11, further comprising: determining increasing zones, increasing by a size equal to or greater than a threshold size, in the collected sounds; and determining the number of multiple actions that produce the collected sounds, based on the number of the increasing zones.
13. The method according to claim 12, wherein selecting the starting candidate reference sound patterns and the ending candidate reference sound patterns comprises: determining exclusive reference sound patterns, not occurring in the place, from among the starting candidate reference sound patterns or the ending candidate reference sound patterns, based on the user location information; and removing the exclusive reference sound patterns from the starting candidate reference sound patterns or the ending candidate reference sound patterns.
14. The method according to claim 13, wherein, when the number of the increasing zones is determined to be 2, determining the user situation comprises: generating a candidate combination sound by combining a single candidate reference sound pattern from among the starting candidate reference sound patterns and a single candidate reference sound pattern from among the ending candidate reference sound patterns; determining a final candidate combination sound, most similar to the collected sounds, from the candidate combination sound by comparing similarities between the candidate combination sound and the collected sounds; and determining the user situation based on the multiple actions corresponding to a combination of the first final sound pattern and the second final sound pattern of the final candidate combination sound.
15. The method according to claim 13, wherein, when the number of the increasing zones is determined to be 2, determining the user situation comprises: determining whether or not a final candidate reference sound pattern from among the starting candidate reference sound patterns is same as a final candidate reference sound pattern from among the ending candidate reference sound patterns; determining the same final candidate reference sound pattern as a first final sound pattern; determining a second final sound pattern by comparing similarities between subtracted sounds, produced by removing the first final sound pattern from the collected sounds, and the reference sound patterns stored in the database; and determining the user situation based on the multiple actions corresponding to a combination of the first final sound pattern and the second final sound pattern.
16. The method according to claim 9, wherein determining the starting candidate reference sound patterns and the ending candidate reference sound patterns comprises: determining exclusive reference sound patterns, not occurring in the place, from among the candidate reference sound patterns, based on the user location information; and determining final candidate reference sound patterns by removing the exclusive reference sound patterns from the starting candidate reference sound patterns or the ending candidate reference sound patterns.
Description
DESCRIPTION OF DRAWINGS
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
MODE FOR INVENTION
[0046] Hereinafter, a method of recognizing multiple user actions according to the present disclosure will be described in detail with reference to the accompanying drawings.
[0047]
[0048] Described in detail with reference to
[0049] An action number determiner 120 determines increasing zones or decreasing zones that increase or decrease by a size equal to or greater than a threshold size by measuring the sizes of the collected sounds and determines the number of actions that produce the collected sounds, based on the number of the increasing zones or the decreasing zones. In addition, the action number determiner 120 divides a first increasing zone in the collected sounds as a starting sound pattern (PRE-P) and divides a last decreasing zone in the collected sounds as an ending sound pattern (POST-P).
[0050] A similarity calculator 130 calculates similarities between the starting sound pattern and the reference sound patterns and between the ending sound pattern and the reference sound patterns by comparing the starting sound pattern and the ending sound pattern with the reference sound patterns stored in a database 140. The similarities may be calculated by comparing sound information, corresponding to at least one of the formant, pitch, and intensity of the starting sound pattern or the ending sound pattern, with sound information, corresponding to at least one of the formant, pitch, and intensity of each of the reference sound patterns.
[0051] A candidate reference sound selector 150 selects reference sound patterns, the same as the starting sound pattern and the ending sound pattern, as candidate reference sound patterns, based on the similarities between the starting sound pattern and the reference sound patterns or between the ending sound pattern and the reference sound patterns. The candidate reference sound patterns, the same as the starting sound pattern, are referred to as starting candidate reference sound patterns, while the candidate reference sound patterns, the same as the ending sound pattern, are referred to as ending candidate reference sound patterns.
[0052] An exclusive reference sound remover 160 determines exclusive reference sound patterns, not occurring in the place in which the user is located, from among the selected candidate reference sound patterns, based on the collected position information, and determines final candidate reference sound patterns by removing the determined exclusive reference sound patterns from the selected candidate reference sound patterns. For example, the exclusive reference sound remover 160 determines the final candidate reference sound patterns of the starting candidate reference sound patterns by removing the exclusive reference sound patterns from the starting candidate sound patterns and determines the final candidate reference sound patterns of the ending candidate reference sound patterns by removing the exclusive reference sound patterns from the ending candidate sound patterns. The database 140 may contain the reference sound patterns and user action information and place information mapped to the reference sound patterns. Here, the user action information is information regarding user actions corresponding to the reference sound patterns, and the place information is information regarding places in which the reference sound patterns may occur.
[0053] A multiple action recognizer 170 recognizes multiple user actions based on the final candidate reference sound patterns of the starting candidate reference sound patterns and the final candidate reference sound patterns of the ending candidate reference sound patterns.
[0054]
[0055] An information collector 210, an action number determiner 220, a similarity calculator 230, a database 240, a candidate reference sound selector 250, and an exclusive reference sound remover 260, illustrated in
[0056] A multiple action recognizer 270 determines a final starting sound pattern and a final ending sound pattern from the starting candidate reference sound patterns or the ending candidate reference sound patterns, the collected sounds being composed of the final starting sound pattern and the final ending sound pattern, by comparing combined sound patterns, generated from the starting candidate reference sound patterns and the ending candidate reference sound patterns, with the collected sounds.
[0057] A user situation determiner 280 searches the database 240 for a user situation corresponding to a combination of sound patterns and user position information, based on the combination of sound patterns generated from the final starting sound pattern and the final ending sound pattern and the user position information, and determines the searched user situation as the current situation of the user. The database 240 may contain user situations mapped to the combination of sound patterns.
[0058]
[0059] Described in greater detail with reference to
[0060] A determiner 125 determines the number of user actions that produce the collected sounds, based on the number of the increasing zones or the number of the decreasing zones determined by the divider 123.
[0061]
[0062] Described in greater detail with reference to
[0063] A final candidate combination sound determiner 173 determines the candidate combination sound, most similar to the collected sounds, from among the candidate combination sound, to be a final candidate combination sound, by comparing similarities between the candidate combination sound and the collected sounds.
[0064] An action recognizer 125 searches the database 140 and 240 for actions mapped to the starting candidate reference sound patterns and the ending candidate reference sound patterns of the candidate combination sound and recognizes the searched actions as multiple user actions.
[0065]
[0066] Described in greater detail with reference to
[0067] When the same candidate reference sound pattern is present, a first final sound determiner 183 determines the same candidate reference sound pattern to be a first final sound pattern, and a second final sound determiner 183 determines a reference sound pattern having a highest similarity to be a second final sound pattern by comparing similarities between subtracted sounds, produced by subtracting the first final sound pattern from the collected sounds, and reference sound patterns stored in the database 140 and 240.
[0068] An action recognizer 187 recognizes actions mapped to the first final sound pattern and the second final sound pattern in the database 240 to be multiple user actions.
[0069]
[0070] Described in greater detail with reference to
[0071] In S30, the number of multiple actions producing the collected sounds is determined, based on the number of the increasing zones or decreasing zones. Typically, when the user additionally performs an action while performing another action, the size of the information regarding the collected sounds suddenly increases. When the user stops performing an action while performing multiple actions, the size of the information regarding the collected sounds suddenly decreases. Based on this fact, the number of multiple actions producing the collected sounds is determined from the number of the increasing zones or decreasing zone.
[0072]
[0073] First, referring to
[0074] Referring to
[0075] Returning to
[0076] Types of information regarding reference sound patterns stored in the database are the same types of information regarding the collected sounds. Similarities between the collected sounds and the information regarding the reference sound patterns are calculated, according to types of information, such as a formant, a pitch, and intensity. An example of a method of calculating similarities S.sub.SI may be represented by Formula 1.
[0077] In Formula 1, SI.sub.i indicates an information type i regarding reference sound patterns, GI.sub.i indicates an information type i regarding collected sounds, the same type as the information type regarding reference sound patterns, and n indicates the number of information types regarding reference sound patterns or the number of information types regarding the collected sounds.
[0078] In S50, starting candidate reference sound patterns and ending candidate reference sound patterns are selected from among the reference sound patterns based on the calculated similarities S.sub.SI. Specifically, the reference sound pattern, the similarities thereof to the starting sound pattern being equal to or higher than a threshold similarity, are selected as the starting candidate reference sound patterns, and the reference sound patterns, the similarities thereof to the ending sound pattern being equal to or higher than a threshold similarity, are selected as the ending candidate reference sound patterns. Based on the calculated similarities S.sub.SI, reference sound patterns having an upper threshold number and a higher similarity to the starting sound pattern may be selected as the starting candidate reference sound patterns, or reference sound patterns having an upper threshold number and a higher similarity to the ending sound pattern may be selected as the ending candidate reference sound patterns.
[0079] In S60, multiple user actions are recognized from the collected sounds based on the starting candidate reference sound patterns, the ending candidate reference sound patterns, and user location information.
[0080]
[0081] Described in greater detail with reference to
[0082] In S53, reference sound patterns, not occurring in the place in which the user is located, from among the starting candidate reference sound patterns or the ending candidate reference sound patterns, are determined to be exclusive reference sound patterns, based on the user location information and the place information of the reference sound patterns stored in the database. For example, when pattern 1, pattern 2, pattern 3, and pattern 7 are selected as the starting candidate reference sound patterns, the user location information may be determined to be a dining room. In this case, pattern 7 is determined to be an exclusive reference sound pattern not occurring in the place in which the user is located, since the place information mapped to pattern 7 indicates a living room and a library.
[0083] In S55, final candidate reference sound patterns are determined by removing the exclusive reference sound patterns from the starting candidate reference sound patterns or the ending candidate reference sound patterns.
[0084] Preferably, in the step of recognizing the multiple user actions, the multiple user actions are recognized based on the final candidate reference sound pattern, produced by removing the exclusive reference sound patterns from candidate reference sound patterns, and user location information.
[0085]
[0086] Described in greater detail with reference to
[0087] In S115, a final candidate combination sound most similar to the collected sounds is determined by comparing similarities between the candidate combination sound and the collected sounds. Here, the similarities between the candidate combination sound and the collected sounds are calculated by combing the similarities of pieces of information regarding the candidate combination sound, according to the types of information regarding the collected sounds, as described above with reference to Formula 1.
[0088] In S117, the database is searched for multiple actions mapped to the starting candidate reference sound patterns and the ending candidate reference sound patterns of the combination of final candidate sounds, and the searched actions are recognized as multiple user actions.
[0089]
[0090] Described in greater detail with reference to
[0091] In S127, a second final sound pattern is determined by comparing similarities between subtracted sounds, produced by subtracting the first final sound pattern from the collected sounds, and reference sound patterns stored in the database. The similarities between the subtracted sounds and the reference sound patterns may be calculated by combining the similarities of pieces of information regarding the reference sound patterns, according to the types of information regarding the subtracted sounds, as described above with reference to Formula 1.
[0092] In S129, the database is searched for actions mapped to the first final sound pattern and the second final sound pattern, and the searched actions are recognized as multiple user actions.
[0093]
[0094] First, referring to
[0095] The most similar final candidate combination sound (a1, b2) are determined by comparing similarities between the candidate combination sound and the combined sound patterns of the collected sounds. Actions mapped to (a1, b2) are regarded as being multiple user actions.
[0096] Referring to
[0097] When there is the same reference sound pattern (a1), the same reference sound pattern (a1) is determined to be a first final sound pattern. A subtracted image is generated by subtracting the first final sound pattern from the combined sound pattern of the collected sounds, and the database is searched for a reference sound pattern most similar to the subtracted image. When the most similar reference sound pattern (b1) is found, the most similar reference sound pattern (b1) is determined to be a second final sound pattern. Actions mapped to (a1, b1) are recognized as multiple user actions.
[0098]
[0099] Referring to
[0100] First, reference sound patterns similar to the starting sound pattern are selected as first candidate reference sound patterns (a1, a2), and reference sound patterns similar to the ending sound pattern are selected as second candidate reference sound patterns (a1, c2). When any one of the second candidate reference sound patterns is the same as any one of the first candidate reference sound patterns, the same candidate reference sound pattern (a1) is determined to be a first final sound.
[0101] Reference sound patterns similar to subtracted sounds, produced by subtracting the first final sound (a1) from the unit increasing zone 2, are selected as third candidate reference sound patterns (b1, b2), while reference sound patterns similar to subtracted sounds, produced by subtracting the first final sound (a1) from the unit increasing zone 4, are selected as fourth candidate reference sound patterns (b1, d2). A subtracted image is produced by subtracting a combined sound, produced by combining a first final sound and a second final sound, from the unit increasing zone 3 corresponding to the combined sound pattern. The similarities between the subtracted image and the reference sound patterns are calculated, and a reference sound pattern having a highest similarity is selected as a third final sound.
[0102] Actions mapped to the first final sound, the second final sound, and the third final sound in the database are recognized as multiple user actions.
[0103] However, when none of the second reference sound patterns (c1, c2) is the same as any one of the first candidate reference sound patterns, reference sound patterns similar to subtracted sounds produced by subtracting any one of the first candidate reference sound patterns (a1, a2) from the unit increasing zone 2 are selected as third candidate reference sound patterns (b2, b3). In addition, reference sound patterns similar to subtracted sounds produced by subtracting any one of the second reference sound patterns (c1, c2) from the unit decreasing zone 4 are selected as fourth candidate reference sound patterns (d1, d2).
[0104] When any one of the third reference sound patterns is the same as any one of the fourth candidate reference sound patterns, the same candidate reference sound pattern is selected as a final sound as described above. However, when the same candidate reference sound pattern is not present, fifth candidate reference sound patterns (e1, e2) are selected by calculating the similarities between subtracted sounds and the reference sound patterns. Here, the subtracted sounds are produced by subtracting combined sounds, composed of a combination of the first candidate reference sound patterns and the third candidate reference sound patterns, from the unit increasing zone 3.
[0105] A final combined sound having a highest similarity is selected by comparing similarities between final combined sounds, respectively produced by combining one of the first candidate reference sound patterns, one of the third candidate reference sound patterns, and one of the fifth candidate reference sound patterns, and the collected sounds in the unit increasing zone 3. Actions corresponding to the first candidate reference sound pattern, the third candidate reference sound pattern, and the fifth candidate reference sound pattern of the final combined sound are recognized as multiple user actions.
[0106]
[0107] Described in greater detail with reference to
[0108] In S260, combined sound patterns generated from starting candidate reference sound patterns and ending candidate reference sound patterns are compared with the collected sounds, so that first final sound patterns and second final sound patterns, producing sounds collected from the starting candidate reference sound patterns or the ending candidate reference sound patterns, are determined.
[0109] In S270, a user situation is determined based on combinations of sound patterns, generated from the first final sound patterns and the second final sound patterns, and user location information. Combinations of sound patterns and user situations corresponding and mapped to the combinations of sound patterns may be stored in the database.
[0110] As described above, a plurality of final sound patterns of collected sounds are determined from the collected sounds. User actions are mapped to the final sound patterns. Since situations mapped to a combination of sound patterns consisting of a plurality of final sound patterns are recognized as user situations, a user situation corresponding to multiple user actions can be accurately determined.
[0111] The above-described embodiments of the present disclosure can be recorded as computer executable programs, and can be realized in a general purpose computer that executes the program using a computer readable recording medium.
[0112] Examples of the computer readable recording medium include a magnetic storage medium (e.g. A floppy disk or a hard disk), an optical recording medium (e.g. a compact disc read only memory (CD-ROM) or a digital versatile disc (DVD)), and a carrier wave (e.g. transmission through the Internet).
[0113] While the present disclosure has been described with reference to the certain exemplary embodiments shown in the drawings, these embodiments are illustrative only. Rather, it will be understood by a person skilled in the art that various modifications and equivalent other embodiments may be made therefrom. Therefore, the true scope of the present disclosure shall be defined by the concept of the appended claims.