H04N21/2335

SIMULTANEOUS RECORDING AND UPLOADING OF MULTIPLE AUDIO FILES OF THE SAME CONVERSATION AND AUDIO DRIFT NORMALIZATION SYSTEMS AND METHODS
20230156298 · 2023-05-18 ·

The invention relates to simultaneous recording and uploading systems and methods, and, more particularly to a simultaneous recording and uploading multiple files from the same conversation.

Remotely generated encoding metadata for local content encoding

Embodiments are directed towards remotely generating encoding metadata at a remote content distributor for use by a local user computing device. The remote content distributor receives and encodes content. During or after the encoding process, the remote content distributor generates encoding metadata that indicates how the content was encoded by the remote content distributor. The remote content distributor provides the encoding metadata to the user computer device. The user computing device receives the content and the encoding metadata and encodes the content based on the encoding metadata. The user computing device can then provide the encoded content to another computing device for decoding and presentation to a user.

METHOD AND APPARATUS FOR EFFICIENT DELIVERY AND USAGE OF AUDIO MESSAGES FOR HIGH QUALITY OF EXPERIENCE
20230370684 · 2023-11-16 ·

A method and a system for virtual reality, augmented reality, mixed reality, or 360-degree Video environment is disclosed. The system receives Video Streams associated to audio and video scenes to be reproduced and Audio Streams associated to audio and video scenes to be reproduced. There are provided a Video decoder which decodes signal from the Video Stream for the representation of the audio and video scene; an Audio decoder which decodes signal from the Audio Stream for the representation of the audio and video scene to the user; and a region of interest processor deciding, based e.g. on the user’s viewport, head orientation, movement data, or metadata, whether an Audio information message is to be reproduced. At the decision, the reproduction of the Audio information message is caused.

CHEERING SUPPORT METHOD, CHEERING SUPPORT APPARATUS, AND PROGRAM

The user communicates reactions to the distribution source without inputting character strings, and parties of an event naturally feel the reactions of the users so as not to disturb the progress. A cheering support device (2) reproduces, in a space of a distribution source, a cheering composed of an acoustic signal based on sounds generated by viewers in a plurality of spaces different from the space of the distribution source and a video signal. A voice acquisition unit (22) acquires a voice material from a voice material storage unit (24). A video acquisition unit (25) acquires a video material from a video material storage unit (27) according to the voice material acquired by the voice acquisition unit (22). A voice reproduction unit (23) emphasizes and reproduces the voice material corresponding to the sound type in which many viewers have uttered. A video reproduction unit (26) reproduces the video material acquired by the video acquisition unit (25).

FRAGMENT-ALIGNED AUDIO CODING

Audio video synchronization and alignment or alignment of audio to some other external clock are rendered more effective or easier by treating fragment grid and frame grid as independent values, but, nevertheless, for each fragment the frame grid is aligned to the respective fragment's beginning. A compression effectiveness lost may be kept low when appropriately selecting the fragment size. On the other hand, the alignment of the frame grid with respect to the fragments' beginnings allows for an easy and fragment-synchronized way of handling the fragments in connection with, for example, parallel audio video streaming, bitrate adaptive streaming or the like.

Audio customization in streaming environment

A live stream, that includes a video stream and an audio stream, of a presenter is monitored. The live stream is attended by an audience that includes one or more audience members. One or more stream content features of the live stream at a first window of time is transmitted to a multimodal machine learning model. One or more audience content features of the audience at the first window of time is transferred to the multimodal model. One or more feature results, based on the stream content features and based on the audience content features, of the first window of time is obtained from the multimodal model. The feature results are sent to an auditory machine learning model. A first audio signal from the auditory machine learning model is received. An augmented stream of the first window of time is generated based on the first audio signal.

Simultaneous recording and uploading of multiple audio files of the same conversation and audio drift normalization systems and methods
11540030 · 2022-12-27 ·

The invention relates to simultaneous recording and uploading systems and methods, and, more particularly to a simultaneous recording and uploading of multiple files from the same conversation.

Method and system for implementing an elastic cloud-based voice search utilized by set-top box (STB) clients

Systems, and methods are described to provide voice search in an elastic cloud environment communicating with a set-top box (STB) by receiving by a voice cloud search server pulse-code modulation (PCM) audio packets transmitted from the STB; sending the PCM audio packets to a natural language processing (NLP) service for converting to text; sending the text sets to an elastic voice cloud search server for querying an electronic program guide (EPG) service, channel and program data associated with the text sets wherein the EPG service to at least return identified channel and program data; in response to an identified return of channel and television program data, sending sets of text to a search service for performing an elastic search for related data from a plurality of different search sources and returning search results and error codes to a requester.

DYNAMIC INSERTION OF SUPPLEMENTAL AUDIO CONTENT INTO AUDIO RECORDINGS AT REQUEST TIME
20220286732 · 2022-09-08 ·

The present disclosure is generally related to inserting supplemental audio content into primary audio content via digital assistant applications. A data processing system can maintain an audio recording of a content publisher and a content spot marker to specify a content spot that defines a time at which to insert supplemental audio content. The data processing system can receive an input audio signal from a client device. The data processing system can parse the input audio signal to determine that the input audio signal corresponds to a request and can identify the audio recording of the content publisher. The data processing system can identify, responsive to the determination, a content selection parameter. The data processing system can select an audio content item using the content selection parameter. The data processing system can generate and transmit an action data structure including the audio recording inserted with audio content item.

WATERMARKING WITH PHASE SHIFTING
20220303594 · 2022-09-22 ·

Apparatus, devices, systems, methods, and articles of manufacture are disclosed for watermarking with phase shifting. Example watermark decoding apparatus disclosed herein are to identify watermark components in a media signal, determine a phase shift pattern associated with the watermark components in the media signal, the phase shift pattern based on one or more phase references, and detect a symbol of a watermark based on the phase shift pattern, the watermark associated with the watermark components in the media signal.