Patent classifications
G10L21/18
Accent detection method and accent detection device, and non-transitory storage medium
Disclosed are an accent detection method, an accent detection device and a non-transitory storage medium. The accent detection method includes: obtaining audio data of a word; extracting a prosodic feature of the audio data to obtain a prosodic feature vector; generating a spectrogram based on the audio data to obtain a speech spectrum feature matrix; performing a concatenate operation on the prosodic feature vector and the speech spectrum feature matrix to obtain a first feature matrix, and performing a redundancy removal operation on the first feature matrix to obtain a second feature matrix; and classifying the second feature matrix by a classifier to obtain an accent detection result of the audio data.
Information processing apparatus and information processing method for generating and processing a file including speech waveform data and vibration waveform data
Provided is an information processing apparatus including a file generation unit that generates a file including speech waveform data and vibration waveform data. The file generation unit cuts out waveform data in a to-be-synthesized band from first speech data, synthesizes waveform data extracted from a synthesizing band of vibration data with the to-be-synthesized band to generate second speech data, and encodes the second speech data to generate the file.
Information processing apparatus and information processing method for generating and processing a file including speech waveform data and vibration waveform data
Provided is an information processing apparatus including a file generation unit that generates a file including speech waveform data and vibration waveform data. The file generation unit cuts out waveform data in a to-be-synthesized band from first speech data, synthesizes waveform data extracted from a synthesizing band of vibration data with the to-be-synthesized band to generate second speech data, and encodes the second speech data to generate the file.
Transcription summary presentation
A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.
Transcription summary presentation
A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.
METHODS AND SYSTEMS FOR SIGN LANGUAGE INTERPRETATION OF MEDIA STREAM DATA
Techniques are described by which set-top boxes receive closed-captioning data streams as input to a Sign Language Interpretation (SLI) library. Depending on the demographics, different SLIs are provided. Additionally, input audio stems, e.g., for video programs without closed captioning, are sent to a speech-to-text processor before the SLI library. The text stream is then converted into sign language view mode in a PIP window for single view mode or to a multiview window for dual view mode. The current accessibility setup menu holds the ‘SLI’ option on/off button. SLI library contains videos for vocabulary which are sequenced in the SLI mode view window based on input text from closed captioning stream. If there is a word without a matching video in the SLI library, then the word itself is displayed in the SLI window. Such words are reported to a server for possible future package release with the additions.
METHODS AND SYSTEMS FOR SIGN LANGUAGE INTERPRETATION OF MEDIA STREAM DATA
Techniques are described by which set-top boxes receive closed-captioning data streams as input to a Sign Language Interpretation (SLI) library. Depending on the demographics, different SLIs are provided. Additionally, input audio stems, e.g., for video programs without closed captioning, are sent to a speech-to-text processor before the SLI library. The text stream is then converted into sign language view mode in a PIP window for single view mode or to a multiview window for dual view mode. The current accessibility setup menu holds the ‘SLI’ option on/off button. SLI library contains videos for vocabulary which are sequenced in the SLI mode view window based on input text from closed captioning stream. If there is a word without a matching video in the SLI library, then the word itself is displayed in the SLI window. Such words are reported to a server for possible future package release with the additions.
Audio Channel Monitoring By Voice to Keyword Matching With Notification
Systems and methods of monitoring radio channels and automatically providing selective notifications through a network that messages containing useful information, transmitted in the form of voice content, have been received. Keywords are compared with textual data transcribed from voice messages receive on a radio channel. The textual data and the keywords are compared, and upon identifying a correlation therebetween, a notification is automatically generated that indicates receipt of a given message, the existence of the correlation with the keywords, and an identity of the channel, so that client terminals can receive the message and also receive subsequent or related messages.
Audio Channel Monitoring By Voice to Keyword Matching With Notification
Systems and methods of monitoring radio channels and automatically providing selective notifications through a network that messages containing useful information, transmitted in the form of voice content, have been received. Keywords are compared with textual data transcribed from voice messages receive on a radio channel. The textual data and the keywords are compared, and upon identifying a correlation therebetween, a notification is automatically generated that indicates receipt of a given message, the existence of the correlation with the keywords, and an identity of the channel, so that client terminals can receive the message and also receive subsequent or related messages.
METHODS AND SYSTEMS FOR SPEECH SIGNAL PROCESSING
Methods and systems for speech signal processing an interactive speech are described. Digitized audio data comprising a user query from a user is received over a network in association with a user identifier. A protocol associated with the user identifier is accessed. A personalized interaction model associated with the user identifier is accessed. A response is generated using the personalized interaction model and the protocol. The response is audibly reproduced by a voice assistance device.