G10L15/25

Sign language information processing method and apparatus, electronic device and readable storage medium

Sign language information processing method and apparatus, an electronic device and a readable storage medium provided by the present disclosure, achieve real-time collection of language data in a current communication of a user by obtaining voice information and video information collected by a user terminal in real time; and then match a speaking person with his or her speaking content by determining, in the video information, a speaking object corresponding to the voice information; and finally, make it possible for the user to clarify the corresponding speaking object when the user sees AR sign language animation in a sign language video by superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object to obtain a sign language video. Therefore, it is possible to provide a higher user experience.

Sign language information processing method and apparatus, electronic device and readable storage medium

Sign language information processing method and apparatus, an electronic device and a readable storage medium provided by the present disclosure, achieve real-time collection of language data in a current communication of a user by obtaining voice information and video information collected by a user terminal in real time; and then match a speaking person with his or her speaking content by determining, in the video information, a speaking object corresponding to the voice information; and finally, make it possible for the user to clarify the corresponding speaking object when the user sees AR sign language animation in a sign language video by superimposing and displaying an augmented reality AR sign language animation corresponding to the voice information on a gesture area corresponding to the speaking object to obtain a sign language video. Therefore, it is possible to provide a higher user experience.

WEARABLE APPARATUS FOR ACTIVE SUBSTITUTION
20230045237 · 2023-02-09 · ·

A hearing aid and related systems and methods. In one implementation, a hearing aid system may comprise a wearable camera configured to capture images from an environment of a user, a microphone configured to capture sounds from the environment of the user, and a processor. The processor may be programmed to receive images captured by the camera; receive audio signals representative of sounds captured by the microphone; operate in a first mode to cause a first selective conditioning of a first audio signal; determine, based on analysis of at least one of the images or the audio signals, to switch to a second mode to cause a second selective conditioning of the first audio signal; and cause transmission of the first audio signal selectively conditioned in the second mode to a hearing interface device configured to provide sound to an ear of the user.

Systems and Methods for Assisted Translation and Lip Matching for Voice Dubbing
20230039248 · 2023-02-09 ·

Systems and methods for generating candidate translations for use in creating synthetic or human-acted voice dubbings, aiding human translators in generating translations that match the corresponding video, automatically grading how well a candidate translation matches the corresponding video, suggesting modifications to the speed and/or timing of the translated text to improve the grading of a candidate translation, and suggesting modifications to the voice dubbing and/or video to improve the grading of a candidate translation. In that regard, the present technology may be used to fully automate the process of generating lip-matched translations and associated voice dubbings, or as an aid for human-in-the-loop processes that may reduce or eliminate the time and effort required from translators, adapters, voice actors, and/or audio editors to generate voice dubbings.

Systems and Methods for Assisted Translation and Lip Matching for Voice Dubbing
20230039248 · 2023-02-09 ·

Systems and methods for generating candidate translations for use in creating synthetic or human-acted voice dubbings, aiding human translators in generating translations that match the corresponding video, automatically grading how well a candidate translation matches the corresponding video, suggesting modifications to the speed and/or timing of the translated text to improve the grading of a candidate translation, and suggesting modifications to the voice dubbing and/or video to improve the grading of a candidate translation. In that regard, the present technology may be used to fully automate the process of generating lip-matched translations and associated voice dubbings, or as an aid for human-in-the-loop processes that may reduce or eliminate the time and effort required from translators, adapters, voice actors, and/or audio editors to generate voice dubbings.

AUDIO MATCHING METHOD AND RELATED DEVICE

Embodiments of the present application disclose an audio matching method and a related device. The audio matching method includes: obtaining audio data and video data; extracting to-be-recognized audio information from the audio data; extracting lip movement information of N users from the video data, where N is an integer greater than 1; inputting the to-be-recognized audio information and the lip movement information of the N users into a target feature matching model, to obtain a matching degree between each of the lip movement information of the N users and the to-be-recognized audio information; and determining a user corresponding to the lip movement information of the user with the highest matching degree as the target user to which the to-be-recognized audio information belongs.

AUDIO MATCHING METHOD AND RELATED DEVICE

Embodiments of the present application disclose an audio matching method and a related device. The audio matching method includes: obtaining audio data and video data; extracting to-be-recognized audio information from the audio data; extracting lip movement information of N users from the video data, where N is an integer greater than 1; inputting the to-be-recognized audio information and the lip movement information of the N users into a target feature matching model, to obtain a matching degree between each of the lip movement information of the N users and the to-be-recognized audio information; and determining a user corresponding to the lip movement information of the user with the highest matching degree as the target user to which the to-be-recognized audio information belongs.

Electronic device and method for providing conversational service

A method, performed by an electronic device, of providing a conversational service includes: receiving an utterance input; identifying a temporal expression representing a time in a text obtained from the utterance input; determining a time point related to the utterance input based on the temporal expression; selecting a database corresponding to the determined time point from among a plurality of databases storing information about a conversation history of a user using the conversational service; interpreting the text based on information about the conversation history of the user, the conversation history information being acquired from the selected database; generating a response message to the utterance input based on a result of the interpreting; and outputting the generated response message.

Electronic device and method for providing conversational service

A method, performed by an electronic device, of providing a conversational service includes: receiving an utterance input; identifying a temporal expression representing a time in a text obtained from the utterance input; determining a time point related to the utterance input based on the temporal expression; selecting a database corresponding to the determined time point from among a plurality of databases storing information about a conversation history of a user using the conversational service; interpreting the text based on information about the conversation history of the user, the conversation history information being acquired from the selected database; generating a response message to the utterance input based on a result of the interpreting; and outputting the generated response message.

VIDEO IMAGE COMPOSITION METHOD AND ELECTRONIC DEVICE
20230237838 · 2023-07-27 ·

The present disclosure provides a video image composition method including the following steps. A priority level list is obtained, and the priority level list includes multiple priority levels of multiple person identities. Multiple video streams are received. Multiple identity labels corresponding to human face frame images from the video streams are determined. The multiple display levels of the human face frame images are determined according to the identity labels and priority level list. A part of the human face frame images being in speaking status are detected. At least one of the part of the human face frame images being in speaking status is constituted as a main display area of a video image, according to the display levels.