Patent classifications
G10L15/00
PROCESSING ACCELERATOR ARCHITECTURES
In various embodiments, this application provides an audio information processing method, an audio information processing apparatus, an electronic device, and a storage medium. An audio information processing method in an embodiment includes: obtaining a first audio feature corresponding to audio information; performing, based on an audio feature at a specified moment in the first audio feature and audio features adjacent to the audio feature at the specified moment, an encoding on the audio feature at the specified moment to obtain a second audio feature corresponding to the audio information; obtaining decoded text information corresponding to the audio information; and obtaining, based on the second audio features and the decoded text information, text information corresponding to the audio information. According to this method, fewer parameters are used in the process of obtaining the second audio feature and obtaining, based on the second audio feature and the decoded text information, the text information corresponding to the audio information, thereby reducing computational complexity in the audio information processing process and improving audio information processing efficiency.
SPEECH RECOGNITION APPARATUS, METHOD AND PROGRAM
A score integration unit 7 obtains a new score Score (l.sub.1:n.sup.b, c) that integrates a score Score (l.sub.1:n.sup.b, c) and a score Score (w.sub.1:o.sup.b, c). This new score Score (l.sub.1:n.sup.b, c) becomes a score Score (l.sub.1:n.sup.b) in a hypothesis selection unit 8. Thus, the score Score (l.sub.1:n.sup.b) can be said to take into account the score Score (w.sub.1:o.sup.b, c). In a speech recognition apparatus, first information is extracted on the basis of the score Score (l.sub.1:n.sup.b) taking into account the score Score (w.sub.1:o.sup.b, c). Thus, speech recognition with higher performance than that in the related art can be achieved.
Generating Computer Augmented Maps from Physical Maps
A method by a computing device obtains a digital image of a physical map, identifies features in the digital image, and obtains map augmentation information based on the identified features. The method then generates an augmented map based on the map augmentation information, and provides the augmented map for display. Related mobile devices and computer program products are disclosed.
Task resumption in a natural understanding system
A speech-processing system may provide access to one or more skills via spoken commands and/or responses in the form of synthesized speech. The system may be capable of keeping one or more skills active in the background while a user interacts (e.g., provides inputs to and/or receives outputs from) with a skill running in the foreground. A background skill may receive some trigger data, and determine to request the system to return the background skill to the foreground to, for example, request a user input regarding an action previously requested by the user. In some cases, the user may invoke a background skill to continue a previous interaction. The system may return the background skill to the foreground. The resumed skill may continue a previous interaction to, for example, to query the user for instructions, provide an update or alert, or continue a previous output.
Task resumption in a natural understanding system
A speech-processing system may provide access to one or more skills via spoken commands and/or responses in the form of synthesized speech. The system may be capable of keeping one or more skills active in the background while a user interacts (e.g., provides inputs to and/or receives outputs from) with a skill running in the foreground. A background skill may receive some trigger data, and determine to request the system to return the background skill to the foreground to, for example, request a user input regarding an action previously requested by the user. In some cases, the user may invoke a background skill to continue a previous interaction. The system may return the background skill to the foreground. The resumed skill may continue a previous interaction to, for example, to query the user for instructions, provide an update or alert, or continue a previous output.
Electronic device configured to perform action using speech recognition function and method for providing notification related to action using same
A method includes receiving a designated event related to a second application while an execution screen of a first application is displayed on a display. The method also includes executing an artificial intelligent application in response to the designated event. The method further includes transmitting data related to the designated event to an external server, based on the executed artificial intelligent application. Additionally, the method includes sensing a user utterance related to the designated event for a designated period of time. The method also includes transmitting the user utterance to the external server. The method further includes receiving an action order for performing a function related to the user utterance from the external server. The method also includes executing the second application at least based on the received action order. The method further includes outputting a result of performing the function by using the second application.
Keyword determinations from conversational data
Topics of potential interest to a user, useful for purposes such as targeted advertising and product recommendations, can be extracted from voice content produced by a user. A computing device can capture voice content, such as when a user speaks into or near the device. One or more sniffer algorithms or processes can attempt to identify trigger words in the voice content, which can indicate a level of interest of the user. For each identified potential trigger word, the device can capture adjacent audio that can be analyzed, on the device or remotely, to attempt to determine one or more keywords associated with that trigger word. The identified keywords can be stored and/or transmitted to an appropriate location accessible to entities such as advertisers or content providers who can use the keywords to attempt to select or customize content that is likely relevant to the user.
Keyword determinations from conversational data
Topics of potential interest to a user, useful for purposes such as targeted advertising and product recommendations, can be extracted from voice content produced by a user. A computing device can capture voice content, such as when a user speaks into or near the device. One or more sniffer algorithms or processes can attempt to identify trigger words in the voice content, which can indicate a level of interest of the user. For each identified potential trigger word, the device can capture adjacent audio that can be analyzed, on the device or remotely, to attempt to determine one or more keywords associated with that trigger word. The identified keywords can be stored and/or transmitted to an appropriate location accessible to entities such as advertisers or content providers who can use the keywords to attempt to select or customize content that is likely relevant to the user.
Device and method for activating with voice input
An information processing apparatus that detects a voice command via a microphone in order to activate the device and execute certain applications. The apparatus comprises a digital signal processor (DSP) and a host controller which are responsible for processing the voice commands. The DSP recognizes and processes voice commands intermittently while the host processor is in a sleep state, thereby reducing the overall power consumption of the apparatus. Further, when the DSP is configured to recognize voice commands intended, only to activate the device, a memory having a sufficiently lower storage capacity suffices.
Device and method for activating with voice input
An information processing apparatus that detects a voice command via a microphone in order to activate the device and execute certain applications. The apparatus comprises a digital signal processor (DSP) and a host controller which are responsible for processing the voice commands. The DSP recognizes and processes voice commands intermittently while the host processor is in a sleep state, thereby reducing the overall power consumption of the apparatus. Further, when the DSP is configured to recognize voice commands intended, only to activate the device, a memory having a sufficiently lower storage capacity suffices.