Patent classifications
G10L15/005
CONTENT ACCESS DEVICES THAT USE LOCAL AUDIO TRANSLATION FOR CONTENT PRESENTATION
A content access device uses local audio translation for content presentation. The content access device receives video and first audio data associated with a first language. The content access device uses translation software and/or other automated translation services to translate the first audio data to second audio data associated with a second language. The content access device synchronizes the video with the second audio data and outputs the video and the second audio data for presentation. The first audio data may be audio, text, and so on. The second audio data may be output as audio, text, and so on.
VIRTUAL RECEPTIONIST VIA VIDEOCONFERENCING
One disclosed example system includes a reception room meeting device configured for establishing a video conference with a device associated with a remote receptionist. The reception room meeting device sends a request for a video meeting with one of a plurality of candidate remote receptionists in response to receiving an activation signal triggered by a visitor to a reception area, and establishes the video meeting with a device associated with one remote receptionist selected based on the request. The system further includes a virtual receptionist system configured to access visitor data obtained by various input devices at the reception area, and determine the status of the visitor based on the visitor data. The virtual receptionist system further transmits the status of the visitor to the device associated with the selected remote receptionist to facilitate the check-in process.
SYSTEM, METHOD, OR PROGRAM FOR EVALUATING CONTENT SPOKEN AT MEETING OR BRIEFING
A system that evaluates spoken content, includes: an acquirer that acquires each content spoken by a plurality of participants in a meeting or a briefing; an identifier for identifying each speaker of the each spoken content; and an evaluator for evaluating the each spoken content, wherein the identifier identifies whether the each spoken content is content spoken by a first speaker, or by a second speaker or apparatus that interprets content spoken by the first speaker, and the evaluator evaluates content spoken by an identified speaker that is the first speaker, or the second speaker or apparatus.
Method for receiving emergency information, method for signaling emergency information, and receiver for receiving emergency information
A device may be configured to parse a syntax element specifying the number of available languages within a presentation associated with an audio stream. A device may be configured to parse one or more syntax elements identifying each of the available languages and parse an accessibility syntax element for each language within the presentation.
System and method for dialog modeling
Disclosed herein are systems, computer-implemented methods, and computer-readable media for dialog modeling. The method includes receiving spoken dialogs annotated to indicate dialog acts and task/subtask information, parsing the spoken dialogs with a hierarchical, parse-based dialog model which operates incrementally from left to right and which only analyzes a preceding dialog context to generate parsed spoken dialogs, and constructing a functional task structure of the parsed spoken dialogs. The method can further either interpret user utterances with the functional task structure of the parsed spoken dialogs or plan system responses to user utterances with the functional task structure of the parsed spoken dialogs. The parse-based dialog model can be a shift-reduce model, a start-complete model, or a connection path model.
Electronic apparatus and controlling method thereof
A electronic apparatus includes a display, a voice receiver configured to receive a user voice input, and a processor to obtain a first text from the user voice input that is received through the voice receiver based on a function corresponding to a first voice recognition related to a first language, based on an entity name not being included in the first text using the function corresponding to the first voice recognition related to the first language, obtain a second text corresponding to the entity name from of the user voice input based on a function corresponding to a second voice recognition related to a second language, and control the display to display a voice recognition result corresponding to the user voice input based on the first text and the second text.
Language-agnostic multilingual modeling using effective script normalization
A method includes obtaining a plurality of training data sets each associated with a respective native language and includes a plurality of respective training data samples. For each respective training data sample of each training data set in the respective native language, the method includes transliterating the corresponding transcription in the respective native script into corresponding transliterated text representing the respective native language of the corresponding audio in a target script and associating the corresponding transliterated text in the target script with the corresponding audio in the respective native language to generate a respective normalized training data sample. The method also includes training, using the normalized training data samples, a multilingual end-to-end speech recognition model to predict speech recognition results in the target script for corresponding speech utterances spoken in any of the different native languages associated with the plurality of training data sets.
AUTOMATED ACTIONS IN A CONFERENCING SERVICE
Disclosed are various approaches for performing automated actions in a conferencing service. Distractions can be detected and users can be muted. Breakout rooms can be suggested to attendees based upon the user's identity. Additionally, event summaries and recaps can be generated for users who are late-arriving or who depart and return to the event.
AUDIO CONTENT RECOGNITION METHOD AND APPARATUS, AND DEVICE AND COMPUTER-READABLE MEDIUM
Embodiments of the present disclosure disclose an audio content recognition method and apparatus, an electronic device and a non-transitory computer-readable medium. A specific implementation of the method includes: obtaining a voice fragment collection and a non-voice fragment collection by segmenting audio; determining a type and language information of each voice fragment in the voice fragment collection; obtaining, for each voice fragment in the voice fragment collection, a first recognition result by performing voice recognition on the voice fragment based on the type and the language information of the voice fragment. In the implementation, speaking and music fragments in the audio are recognized by different models, so that two audio contents may both have better recognition effects. Moreover, audio of different language contents is recognized by using different models, thereby further improving a voice recognition effect.
AUGMENTED REALITY HOLOGRAM VIRTUAL AQUARIUM SYSTEM
An augmented reality hologram virtual aquarium system includes a control unit which includes a virtual fish video control unit for generating and controlling a virtual fish video, a display which displays a virtual fish video generated and transmitted by the video control unit, an aquarium management unit which is connected with the control unit and includes a history management unit related with the growth of the virtual fish and an equipment supply unit supplying equipment related with the growth of the virtual fish, and a user input unit which inputs the selection of the virtual fish and the growth activity through the control unit.