Patent classifications
G10L17/00
Payment method, client, electronic device, storage medium, and server
Embodiments of this application disclose a payment method, a client, an electronic device, a storage medium, and a server. The method includes: receiving a payment instruction of a user; generating, according to audio information in a voice input of the user, a voice feature vector of the audio information; performing matching between the voice feature vector and a user feature vector; and when the matching succeeds, sending personal information associated with the user feature vector to a server, so that the server performs a payment operation for a resource account associated with the personal information. The method can bring convenience to shopping by a consumer.
Payment method, client, electronic device, storage medium, and server
Embodiments of this application disclose a payment method, a client, an electronic device, a storage medium, and a server. The method includes: receiving a payment instruction of a user; generating, according to audio information in a voice input of the user, a voice feature vector of the audio information; performing matching between the voice feature vector and a user feature vector; and when the matching succeeds, sending personal information associated with the user feature vector to a server, so that the server performs a payment operation for a resource account associated with the personal information. The method can bring convenience to shopping by a consumer.
TRAINING AND USING A TRANSCRIPT GENERATION MODEL ON A MULTI-SPEAKER AUDIO STREAM
The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.
SYSTEM AND METHOD FOR REAL-TIME FRAUD DETECTION IN VOICE BIOMETRIC SYSTEMS USING PHONEMES IN FRAUDSTER VOICE PRINTS
A system and method for real-time fraud detection with a social engineering phoneme (SEP) watchlist of phoneme sequences may perform real-time fraud prevention operations including receiving incoming call interactions and grouping the call interactions into one or more clusters, each cluster associated with a speaker's voice based on voiceprints. For a pair of voiceprints in a cluster, a phoneme sequence is extracted for each voice print. From the extracted phoneme sequences, a similarity score is then calculated to determine if a match exists between the extracted phoneme sequences based on a threshold. If determined a match exists, the phoneme sequence may be added to a SEP watchlist.
Voice recognition device and method for learning voice data
A voice recognition device and a method for learning voice data using the same are disclosed. The voice recognition device combines feature information for various speakers with a text-to-speech function to generate voice data recognized by a voice recognition unit, and can improve voice recognition efficiency by allowing the voice recognition unit itself to learn various voice data. The voice recognition device can be associated with an artificial intelligence module, a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.
Speaker identification
A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.
Auto-completion for gesture-input in assistant systems
In one embodiment, a method includes receiving an initial input in a first modality from a first user from a client system associated with the first user, determining one or more intents corresponding to the initial input by an intent-understanding module, generating one or more candidate continuation-inputs based on the one or more intents, where the one or more candidate continuation-inputs are in one or more candidate modalities, respectively, and wherein the candidate modalities are different from the first modality, and sending instructions for presenting one or more suggested inputs corresponding to one or more of the candidate continuation-inputs to the client system.
Automatic active noise reduction (ANR) control to improve user interaction
A method performed by a wearable audio output device worn by a user is provided for controlling external noise attenuated by wearable audio output device. A speech is detected from a user wearing the wearable audio output device, wherein the audio output device has active noise reduction turned on. It is determined, based on the detecting, that the user desires to speak to a subject in the vicinity of the user. In response to the determining, a level of noise reduction is reduced to enable the user to hear sounds external to the audio output device. It is determined that the user desires to speak to the subject by detecting at least one condition of a plurality of conditions.
Wakeword detection
Techniques for processing incoming audio using multiple wakeword detectors are described. Audio data representing an utterance may be processed by different wakeword detectors that can detect different wakewords and are associated with different speech processing components. When a wakeword is detected by one of the wakeword detectors, it may be processed by the corresponding speech processing component.
Intelligent Test Cases Generation Based on Voice Conversation
Aspects of the disclosure relate to generating test cases based on voice conversation. In some embodiments, a computing platform may receive voice data associated with an agile development meeting. Subsequently, the computing platform may identify, using a natural language processing engine, context of one or more requirements being discussed during the agile development meeting. Based on identifying the context of the one or more requirements being discussed during the agile development meeting, the computing platform may store context data into a database. Next, the computing platform may map the context data to a corresponding task item of a software development project. Thereafter, the computing platform may identify one or more test cases to be generated. Then, the computing platform may cause the identified test cases to be executed.