Patent classifications
G10L17/02
DISPLAY APPARATUS AND PROCESSING METHOD FOR DISPLAY APPARATUS WITH CAMERA
Disclosed are a display apparatus and a processing method for the display apparatus with a camera. The display apparatus includes a camera, a sound collector and controller. The controller is configured for: starting shooting at least one image through the camera; in response to the at least one image not including a portrait of a user, starting obtaining a first test audio signal input from the user through the sound collector; in response to the first test audio signal, determining a target azimuth corresponding to the user; generating a rotation instruction for the camera according to the target azimuth of the user; sending the rotation instruction to the camera to adjust a shooting direction of the camera to the target azimuth.
DISPLAY APPARATUS AND PROCESSING METHOD FOR DISPLAY APPARATUS WITH CAMERA
Disclosed are a display apparatus and a processing method for the display apparatus with a camera. The display apparatus includes a camera, a sound collector and controller. The controller is configured for: starting shooting at least one image through the camera; in response to the at least one image not including a portrait of a user, starting obtaining a first test audio signal input from the user through the sound collector; in response to the first test audio signal, determining a target azimuth corresponding to the user; generating a rotation instruction for the camera according to the target azimuth of the user; sending the rotation instruction to the camera to adjust a shooting direction of the camera to the target azimuth.
Machine learning for improving quality of voice biometrics
Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.
Machine learning for improving quality of voice biometrics
Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.
Communication method between different electronic devices, server and electronic device supporting same
Disclosed is a server for supporting a communication environment between different electronic devices. The server includes a communication circuit, a memory, and a processor. The processor is electrically connected to the communication circuit and the memory. The processor is configured to receive a first voice signal transmitted from a second electronic device to a first electronic device through the communication circuit. The Processor is also configured to allow the first electronic device to transmit network connection information for connecting with the server to the second electronic device based on whether the first voice signal corresponds to a second voice signal stored in the memory.
Communication method between different electronic devices, server and electronic device supporting same
Disclosed is a server for supporting a communication environment between different electronic devices. The server includes a communication circuit, a memory, and a processor. The processor is electrically connected to the communication circuit and the memory. The processor is configured to receive a first voice signal transmitted from a second electronic device to a first electronic device through the communication circuit. The Processor is also configured to allow the first electronic device to transmit network connection information for connecting with the server to the second electronic device based on whether the first voice signal corresponds to a second voice signal stored in the memory.
PAYMENT METHOD, CLIENT, ELECTRONIC DEVICE, STORAGE MEDIUM, AND SERVER
Embodiments of this application disclose a payment method, a client, an electronic device, a storage medium, and a server. The method includes: receiving a payment instruction of a user; generating, according to audio information in a voice input of the user, a voice feature vector of the audio information; performing matching between the voice feature vector and a user feature vector; and when the matching succeeds, sending personal information associated with the user feature vector to a server, so that the server performs a payment operation for a resource account associated with the personal information. The method can bring convenience to shopping by a consumer.
SPEAKER RECOGNITION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM
The present disclosure provides a speaker recognition method, an electronic device, and a storage medium. An implementation includes: segmenting the target audio file and the to-be-recognized audio file into a plurality of audio units respectively; extracting an audio feature from each of the audio units to obtain an audio feature sequence of the target audio file and an audio feature sequence of the to-be-recognized audio file; performing feature learning on the audio feature sequence of the target audio file and the audio feature sequence of the to-be-recognized audio file by using Siamese neural network, to obtain a feature vector corresponding to the target audio file and feature vectors respectively corresponding to the plurality of audio units in the to-be-recognized audio file; and recognizing, by using an attention mechanism-based machine learning model, the audio units belonging to the target speaker in the to-be-recognized audio file based on the feature vectors.
SPEAKER RECOGNITION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM
The present disclosure provides a speaker recognition method, an electronic device, and a storage medium. An implementation includes: segmenting the target audio file and the to-be-recognized audio file into a plurality of audio units respectively; extracting an audio feature from each of the audio units to obtain an audio feature sequence of the target audio file and an audio feature sequence of the to-be-recognized audio file; performing feature learning on the audio feature sequence of the target audio file and the audio feature sequence of the to-be-recognized audio file by using Siamese neural network, to obtain a feature vector corresponding to the target audio file and feature vectors respectively corresponding to the plurality of audio units in the to-be-recognized audio file; and recognizing, by using an attention mechanism-based machine learning model, the audio units belonging to the target speaker in the to-be-recognized audio file based on the feature vectors.
SYSTEMS AND METHODS TO ANALYZE AUDIO DATA TO IDENTIFY DIFFERENT SPEAKERS
A computing system may receive data representing dialog between persons, the data representing words spoken by at least first and second speakers, determine an intent of a speaker for a first portion of the data, the intent being indicative of an identity of the first or second speaker for the first portion of the data or another portion of the data different than the first portion, determine a name of the first or second speaker represented in the first portion of the data based at least in part on the determined intent, and output an indication of the determined name so that the indication identifies the first portion of the data or the another portion of the data with the first or second speaker.