Patent classifications
G10L2025/783
METHODS AND SYSTEMS FOR CORRECTING, BASED ON SPEECH, INPUT GENERATED USING AUTOMATIC SPEECH RECOGNITION
Methods and systems for correcting, based on subsequent second speech, an error in an input generated from first speech using automatic speech recognition, without an explicit indication in the second speech that a user intended to correct the input with the second speech, include determining that a time difference between when search results in response to the input were displayed and when the second speech was received is less than a threshold time, and based on the determination, correcting the input based on the second speech. The methods and systems also include determining that a difference in acceleration of a user input device, used to input the first speech and second speech, between when the search results in response to the input were displayed and when the second speech was received is less than a threshold acceleration, and based on the determination, correcting the input based on the second speech.
Meeting inclusion and hybrid workplace insights
The disclosure herein describes a system for calculating meeting inclusion metrics including insights and recommendations. Meeting data associated with one or more meetings attended by at least one participant remotely is converted into anonymized meeting data for inclusivity metric analysis. An inclusivity insights manager generates inclusivity metrics associated with inclusive behavior and language occurring during meetings to measure the level of inclusivity. The inclusivity metrics include attendee participation metrics measuring an amount of participation by each meeting attendee, participation in-person versus participation remotely, concurrent speech indicating attendees may be talking over one another or other interruptions occurring during meetings. Inclusivity metric data includes insights and actionable recommendations to improve inclusivity at future meetings provided at an individual level, group level or organizational level. The inclusivity insights can also include percentage metric values, graphs, feedback, and other metric-related information for improving participation by meeting attendees.
METHOD AND USER DEVICE FOR PROVIDING CONTEXT AWARENESS SERVICE USING SPEECH RECOGNITION
A method for providing a context awareness service is provided. The method includes defining a control command for the context awareness service depending on a user input, triggering a playback mode and the context awareness service in response to a user selection, receiving external audio through a microphone in the playback mode, determining whether the received audio corresponds to the control command, and executing a particular action assigned to the control command when the received audio corresponds to the control command.
SOUND SIGNAL DETECTOR
One example discloses an apparatus for sound signal detection, comprising: a first wireless device including a first pressure sensor having a first acoustical profile and configured to capture a first set of acoustic energy within a time window; wherein the first wireless device includes a wireless signal input; wherein the first wireless device includes a processing element configured to: receive, through the wireless signal input, a second set of acoustic energy captured by a second pressure sensor, having a second acoustical profile, within a second wireless device and within the time window; apply a signal enhancement technique to the first and second sets of acoustic energy based on the first and second acoustical profiles; search for a predefined sound signal within the enhanced sets of acoustic energy; and initiate a subsequent set of sound signal detection actions if the search finds the sound signal.
Methods and apparatus for identifying fraudulent callers
The methods, apparatus, and systems described herein are designed to identify fraudulent callers. A voice print of a call is created and compared to known voice prints to determine if it matches one or more of the known voice prints, and to transaction data associated with a database of voice prints. The methods include a pre-processing step to separate speech from non-speech, selecting a number of elements that affect the voice print the most, and/or computing an adjustment factor based on the scores of each received voice print against known voice prints.
COMMUNICATION APPARATUS MOUNTED WITH SPEECH SPEED CONVERSION DEVICE
In a communication apparatus, an encoder compresses telephone call voice which is transmitted from another communication apparatus. A voice accumulator preserves the telephone call voice, which is compressed by the encoder, as a message. A decoder expands the telephone call voice which is preserved in the voice accumulator. A signal memory temporarily maintains the telephone call voice which is expanded by the decoder. A speech speed convertor performs speech speed conversion on the telephone call voice, which is read from the signal memory, and outputs resulting voice from a speaker. A memory monitor temporarily stops to expand the telephone call voice in the decoder in a case where the memory monitor determines that an idle capacity of the signal memory approaches a predetermined lower limit value.
Hybrid decoding using hardware and software for automatic speech recognition systems
Embodiments describe a method for decoding speech including receiving speech input at an audio input device, generating speech data that is a digital representation of the speech input; extracting acoustic features of the speech data, assigning acoustic scores to the acoustic features, receiving data representing the acoustic features and the acoustic scores, decoding the data representing the acoustic features into a word, having a word score, by referencing a WFST language model, modifying the word score into a new word score based on a personalized grammar model stored in the external memory device, the processor is separate from and external to the WFST accelerator, and determining an intent represented by a plurality of words outputted by the WFST accelerator, where the plurality of words include the word and the new word score.
SYSTEMS AND METHODS FOR IMPROVING AUDIO CONFERENCING SERVICES
Systems and methods are disclosed herein for improving audio conferencing services. One aspect relates to processing audio content of a conference. A first audio signal is received from a first conference participant, and a start and an end of a first utterance by the first conference participant are detected from the first audio signal. A second audio signal is received from a second conference participant, and a start and an end of a second utterance by the second conference participant is detected from the second audio signal. The second conference participant is provided with at least a portion of the first utterance, wherein at least one of start time, start point, and duration is determined based at least in part on the start, end, or both, of the second utterance.
SYSTEM AND METHOD FOR ENCOURAGING GROUP DISCUSSION PARTICIPATION
Techniques for encouraging group discussion participation are provided. A conversational analytics system monitors a discussion within a Push to Talk (PTT) radio talkgroup. A topic of discussion within the PTT radio talkgroup is identified. A participation level of each member of the PTT radio talkgroup is identified. It is determined that a member of the PTT radio talkgroup may have information relevant to the topic of the discussion within the PTT radio talkgroup. It is determined that the participation level of the member determined to have information relevant to the topic of discussion within the PTT radio talkgroup is below a threshold. The member determined to have information relevant to the topic of discussion within the PTT radio talkgroup is prompted to participate in the discussion.
MULTILAYERED DETERMINATION OF HEALTH EVENTS USING RESOURCE-CONSTRAINED PLATFORMS
Detecting and identifying a predetermined health event can include detecting a potential occurrence of the predetermined health event for a user by processing in real-time motion signals corresponding to motion of the user. A likelihood that the potential occurrence is an actual occurrence of the predetermined health event can be determined based on template matching of the motion signals. In response to determining that the likelihood exceeds a predetermined threshold, audio signals coinciding in time with the motion of the user can be processed using one or more layers of a multilayered audio event classifier.