IPIQ

G10L15/083

AUTOMATIC GENERATION OF LECTURES DERIVED FROM GENERIC, EDUCATIONAL OR SCIENTIFIC CONTENTS, FITTING SPECIFIED PARAMETERS

20220406210 · 2022-12-22 ·

A method of generating an educational output unit includes analyzing, using a machine learning module, content based on a logic tree, generating a plurality of blocks, associating tags with each block of the plurality of blocks, and assembling the plurality of blocks into an output unit based on one or more parameters and the tags. The logic tree comprises a structural hierarchy for the content.

ANNOTATION OF MEDIA FILES WITH CONVENIENT PAUSE POINTS

20220399010 · 2022-12-15 ·

A computer-implemented method, a computer system and a computer program product annotate media files with convenient pause points. The method includes acquiring a text file version of an audio narration file. The text file version includes a pause point history of a plurality of prior users. The method also includes generating a list of pause points based on the pause point history. In addition, the method includes determining a tone of voice being used by a speaker at each pause point using natural language processing algorithms. The method further includes determining a set of convenient pause points based on the list of pause points and the determined tone of voice. Lastly, the method includes inserting the determined set of convenient pause points into the audio narration file.

Rescoring Automatic Speech Recognition Hypotheses Using Audio-Visual Matching

20220392439 · 2022-12-08 ·

Google Llc

A method (400) includes receiving audio data (112) corresponding to an utterance (101) spoken by a user (10), receiving video data (114) representing motion of lips of the user while the user was speaking the utterance, and obtaining multiple candidate transcriptions (135) for the utterance based on the audio data. For each candidate transcription of the multiple candidate transcriptions, the method also includes generating a synthesized speech representation (145) of the corresponding candidate transcription and determining an agreement score (155) indicating a likelihood that the synthesized speech representation matches the motion of the lips of the user while the user speaks the utterance. The method also includes selecting one of the multiple candidate transcriptions for the utterance as a speech recognition output (175) based on the agreement scores determined for the multiple candidate transcriptions for the utterance.

Online Marketing Campaign Platform

20220391937 · 2022-12-08 ·

Devin Kumar Nath

A computer-implemented method includes the operations of receiving marketing campaign data over a network from an originator computing device to establish a marketing campaign, wherein the marketing campaign data identifies one or more keywords and/or key phrases. The operations may further include causing the marketing campaign data to be accessible over a network to a participant computing device and receiving one or more audio file over the network from the participant computing device, wherein each received audio file includes spoken words of the participant recorded by a microphone on the participant computing device. The operations may additionally include determining, for each received audio file, a number of instances that the participant says the identified one or more keywords and/or key phrases in the received audio file, identifying a reward for the participant based on the determined number of instances, and crediting an account of the participant with the identified reward.

Neural network accelerator with compact instruct set

11520561 · 2022-12-06 ·

Amazon Technologies, Inc.

Tariq Afzal

Described herein is a neural network accelerator with a set of neural processing units and an instruction set for execution on the neural processing units. The instruction set is a compact instruction set including various compute and data move instructions for implementing a neural network. Among the compute instructions are an instruction for performing a fused operation comprising sequential computations, one of which involves matrix multiplication, and an instruction for performing an elementwise vector operation. The instructions in the instruction set are highly configurable and can handle data elements of variable size. The instructions also implement a synchronization mechanism that allows asynchronous execution of data move and compute operations across different components of the neural network accelerator as well as between multiple instances of the neural network accelerator.

Systems and methods for rapid analysis of call audio data using a stream-processing platform

11522993 · 2022-12-06 ·

Marchex, Inc.

A call analytics system and associated methods that can be used to rapidly analyze call data and provide conversational insights. The call analytics system receives audio call data of a phone call between a customer and an agent of a business, and converts the call data into one or more messages for handling by a distributed stream-processing platform. In some embodiments, the stream-processing platform is the Apache Kafka platform. The distributed platform processes the messages and communicates with various software modules to generate a variety of conversational insights. When processed by a stream-processing platform, certain analyses can occur in parallel which allows conversational insights to be provided to the businesses shortly (e.g., within seconds) after the call data is received.

Interfacing with applications via dynamically updating natural language processing

11514896 · 2022-11-29 ·

Google Llc

Dynamic interfacing with applications is provided. For example, a system receives a first input audio signal. The system processes, via a natural language processing technique, the first input audio signal to identify an application. The system activates the application for execution on the client computing device. The application declares a function the application is configured to perform. The system modifies the natural language processing technique responsive to the function declared by the application. The system receives a second input audio signal. The system processes, via the modified natural language processing technique, the second input audio signal to detect one or more parameters. The system determines that the one or more parameters are compatible for input into an input field of the application. The system generates an action data structure for the application. The system inputs the action data structure into the application, which executes the action data structure.

DECODING NETWORK CONSTRUCTION METHOD, VOICE RECOGNITION METHOD, DEVICE AND APPARATUS, AND STORAGE MEDIUM

20220375459 · 2022-11-24 ·

IFLYTEK CO., LTD.

A method for constructing a decoding network, a speech recognition method, a device, an apparatus, and a storage medium are provided. The method for constructing a decoding network includes: acquiring a general language model, a domain language model, and a general decoding network generated based on the general language model; generating a domain decoding network based on the domain language model and the general language model; and integrating the domain decoding network with the general decoding network to obtain a target decoding network. The speech recognition method includes: decoding to-be-recognized speech data by using a target decoding network to obtain a decoding path for the to-be-recognized speech data; and determining a speech recognition result for the to-be-recognized speech data based on the decoding path for the to-be-recognized speech data.

METHOD AND APPARATUS FOR GENERATING INTERACTION RECORD, AND DEVICE AND MEDIUM

20220375460 · 2022-11-24 ·

A method and apparatus for generating an interaction record, and a device and a medium are provided. The method includes: firstly, from a multimedia data stream, collecting behavior data, represented by the multimedia data stream, of a user, wherein the behavior data includes voice information and/or operation information; and then, on the basis of the behavior data, generating interaction record data corresponding to the behavior data. According to the technical solution, by means of collecting voice information and/or operation information from a multimedia data stream, and generating interaction record data on the basis of the voice information and the operation information, an interacting user can determine interaction information by using the interaction record data, and the interaction efficiency of the interacting user is improved, thereby also improving the user experience.

METHOD AND SYSTEM FOR PROTECTING USER PRIVACY DURING AUDIO CONTENT PROCESSING

20220375458 · 2022-11-24 ·

A method and system for protecting user privacy in audio content is disclosed. An audio content including private information related to at least one user is received. The audio content is segmented to generate a plurality of audio blocks. Each audio block is associated with a sequence number based on a respective chronological position in the audio content. A random key of predefined length is generated for each audio block. The plurality of audio blocks are randomly distributed to a plurality of agents for audio-to-text transcription. The random distribution is configured to scramble a data context for protecting the user privacy of the at least one user during the audio-to-text transcription. A textual transcript corresponding to the audio content is generated based on the audio-to-text transcription, the sequence number and the random key generated for each audio block.

Patent classifications

G10L15/083