Patent classifications
G10L21/18
Methods and systems for sign language interpretation of media stream data
Techniques are described by which set-top boxes receive closed-captioning data streams as input to a Sign Language Interpretation (SLI) library. Depending on the demographics, different SLIs are provided. Additionally, input audio stems, e.g., for video programs without closed captioning, are sent to a speech-to-text processor before the SLI library. The text stream is then converted into sign language view mode in a PIP window for single view mode or to a multiview window for dual view mode. The current accessibility setup menu holds the ‘SLI’ option on/off button. SLI library contains videos for vocabulary which are sequenced in the SLI mode view window based on input text from closed captioning stream. If there is a word without a matching video in the SLI library, then the word itself is displayed in the SLI window. Such words are reported to a server for possible future package release with the additions.
Method and device for audio signal processing, and storage medium
A method and device for audio signal processing is provided. The method includes steps of: obtaining an inputted audio signal; parsing the audio signal to obtain at least one audio feature; determining at least one vibration feature corresponding to the at least one audio feature; and generating a vibration signal corresponding to the audio signal according to the at least one vibration feature. The inputted audio signal is automatically converted into a vibration signal by the vibration feature corresponding to the audio feature of the inputted audio signal, which can avoid errors caused by manual operation and make the vibration signal possess high versatility.
Method and device for audio signal processing, and storage medium
A method and device for audio signal processing is provided. The method includes steps of: obtaining an inputted audio signal; parsing the audio signal to obtain at least one audio feature; determining at least one vibration feature corresponding to the at least one audio feature; and generating a vibration signal corresponding to the audio signal according to the at least one vibration feature. The inputted audio signal is automatically converted into a vibration signal by the vibration feature corresponding to the audio feature of the inputted audio signal, which can avoid errors caused by manual operation and make the vibration signal possess high versatility.
TRANSCRIPTION SUMMARY PRESENTATION
A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.
TRANSCRIPTION SUMMARY PRESENTATION
A method to present a summary of a transcription may include obtaining, at a first device, audio directed to the first device from a second device during a communication session between the first device and the second device. Additionally, the method may include sending, from the first device, the audio to a transcription system. The method may include obtaining, at the first device, a transcription during the communication session from the transcription system based on the audio. Additionally, the method may include obtaining, at the first device, a summary of the transcription during the communication session. Additionally, the method may include presenting, on a display, both the summary and the transcription simultaneously during the communication session.
Artificial intelligence based virtual agent trainer
The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.
Artificial intelligence based virtual agent trainer
The present disclosure relates to a system, a method, and a product for an artificial intelligence based virtual agent trainer. The system includes a processor in communication with a memory storing instructions. When the processor executes the instructions, the instructions are configured to cause the processor to obtain input data and generate a preliminary set of utterances based on the input data, process the preliminary set of utterances to generate a set of utterance training data, generate a set of conversations based on the set of utterance training data, simulate the set of conversations on a virtual agent to obtain a conversation result, verify an intent and a response based on the conversation result, verify a use case flow and flow hops based on the conversation result, and generate recommendation information and maturity report based on verification results.
Generating video information representative of audio clips
A service for automatically generating video representations of audio content is provided. A video representation generation component receives a request with search criteria related to processing audio content to generate video representations of the content. The video representation generation component then identifies one or more audio clips or segments from the audio content responsive to the search criteria. The video representation generation component can then generate or obtain video representations of the audio clips without requiring a generation of representations of the full audio content. The processing result can be utilized used to publish to social media sites or electronic communications as video content.
Generating video information representative of audio clips
A service for automatically generating video representations of audio content is provided. A video representation generation component receives a request with search criteria related to processing audio content to generate video representations of the content. The video representation generation component then identifies one or more audio clips or segments from the audio content responsive to the search criteria. The video representation generation component can then generate or obtain video representations of the audio clips without requiring a generation of representations of the full audio content. The processing result can be utilized used to publish to social media sites or electronic communications as video content.
Accent detection method and accent detection device, and non-transitory storage medium
Disclosed are an accent detection method, an accent detection device and a non-transitory storage medium. The accent detection method includes: obtaining audio data of a word; extracting a prosodic feature of the audio data to obtain a prosodic feature vector; generating a spectrogram based on the audio data to obtain a speech spectrum feature matrix; performing a concatenate operation on the prosodic feature vector and the speech spectrum feature matrix to obtain a first feature matrix, and performing a redundancy removal operation on the first feature matrix to obtain a second feature matrix; and classifying the second feature matrix by a classifier to obtain an accent detection result of the audio data.