H04N21/4666

Systems and methods for improved content accessibility scoring

Provided herein are methods and systems for improved accessibility scoring for content items. A predicted accessibility score may be based on a plurality of multimodal features present within a content item. The plurality of multimodal features may include video features (e.g., based on video/image analysis), audio features (e.g., based on audio analysis), text-based features (e.g., based on closed-captioning analysis), features indicated by metadata (e.g., duration, genre, etc.), a combination thereof, and/or the like. A predicted accessibility score for a content item may indicate how accessible the content item may be for persons who are visually impaired, hearing impaired, cognitively impaired, etc., as well as for persons who desire to view content that requires less visual attention and/or audio attention as the case may be.

ARTIFICIAL INTELLIGENCE INFORMATION PROCESSING APPARATUS, ARTIFICIAL INTELLIGENCE INFORMATION PROCESSING METHOD, AND ARTIFICIAL-INTELLIGENCE-FUNCTION-EQUIPPED DISPLAY APPARATUS

An information processing apparatus that performs an automatic operation of equipment by artificial intelligence is provided. An artificial intelligence information processing apparatus includes a control section that estimates and controls the operation of the equipment by artificial intelligence on the basis of sensor information, and a presentation section that estimates and presents a reason that the control section has performed the operation of the equipment by artificial intelligence on the basis of the sensor information, the presentation section, as estimating the operation by the artificial intelligence, estimating the reason that the operation of the equipment has been performed by using a first neural network that has learnt a correlation between the sensor information and the operation of the equipment and the reason that the operation of the equipment has been performed.

Video streaming method and system

A method for streaming a video. The method includes determining a total bitrate for a segment of a video to be received and streamed; predicting a viewpoint of a user for the segment; and determining bitrates for one or more tiles in the segment based on the determined total bitrate and the predicted viewpoint.

Video data processing method and apparatus, and readable storage medium

The present disclosure provides a video data processing method and device, and a readable storage medium, where the technical means is adopted which includes: processing, according to a preset trained deep learning algorithm model, input video data to be processed, to obtain a label vector of the video data; determining, according to a label vector of each music data in a music library and a preset recommendation algorithm, a recommendation score of the label vector of each music data with respect to the label vector of the video data; and taking, according to the each recommendation score, the music data matching the video data as background music. By means of the above, and further a deep learning algorithm model and a recommendation algorithm, the data processing efficiency of finding the background music of the video data in the music library is effectively improved, and the labor cost is reduced.

METHOD OF DETECTING ACTION, ELECTRONIC DEVICE, AND STORAGE MEDIUM

A method of detecting an action, an electronic device, and a storage medium. A method can include: performing a temporal action proposal on at least one target feature data obtained by a feature extraction on a plurality of target frame data of a target resource, so as to obtain at least one first candidate action proposal information; classifying target feature data corresponding to at least one first candidate action proposal interval included in the first candidate action proposal information, so as to obtain at least one classification confidence level corresponding to the at least one first candidate action proposal interval; and determining an action detection result for at least one action segment contained in the target resource according to the at least one classification confidence level corresponding to the at least one first candidate action proposal interval, wherein the action detection result includes an action category and an action period.

Account behavior prediction using prediction network
11601718 · 2023-03-07 · ·

In some embodiments, a method inputs a sequence of historical behaviors for a plurality of instances of content into a prediction network to generate a sequence of values that model the sequence of historical behaviors. A restriction on an operation performed by the prediction network is based on a characteristic of an viewing behavior. A sequence of attention scores is generated based on a similarity of a current behavior for a first instance of content to respective instances of historical behaviors in the sequence of historical behaviors. The method adjusts respective values based on corresponding attention scores to generate an adjusted sequence of values. The adjusted sequence of features are sampled to generate an output from the prediction network that models the sequence of historical behaviors based on the current behavior. The output for determining a prediction if the current behavior is indicative of the viewing behavior.

VOICE PACKET RECOMMENDATION METHOD AND APPARATUS, DEVICE AND STORAGE MEDIUM

Provided are a voice packet recommendation method and apparatus, a device and a storage medium, relating to intelligent search technologies. The solution includes constructing a first video training sample according to first user behavior data of a first sample user in a video recommendation scenario and first video data associated with the first user behavior data; constructing a user training sample according to sample search data of the first sample user and historical interaction data about a first sample voice packet; pretraining a neural network model according to the first video training sample and the user training sample; and retraining the pretrained neural network model by using a sample video and sample tag data which are associated with a second sample voice packet to obtain a voice packet recommendation model. With the solution, the neural network model can be trained in the case of cold start so that the neural network model can recommend a voice packet automatically in the case of cold start.

Light Field Display System for Adult Applications

A light field (LF) display system for displaying holographic content within an adult entertainment context is disclosed. The LF display system includes a plurality of LF displays that, in one embodiment, are tiled to form an array of LF displays within an environment and the LF display system may customize a viewer's experience using artificial intelligence (AI) and machine learning (ML) models that track and respond to each viewers movements and/or requests in the environment, their behaviors (e.g., body language, facial expressions, tone of voice, etc.) through various sensors (e.g., cameras, microphones, LF display sensors, etc.). Accordingly, the result is an adult entertainment environment customize for each viewer including AI holographic performers that engage viewers within the environment.

Computer-implemented system and method for determining attentiveness of user
11632590 · 2023-04-18 · ·

Disclosed herein is a method and system for collecting attentiveness information associated with a user's response to consuming a piece of media content. The attentiveness information is used to create an attentiveness-labelled behavioural data for the user's response. A computer-implemented attentiveness model may be generated by applying machine learning techniques to the a set of attentiveness-labelled behavioural data from multiple users. The system may comprise an annotation tool that facilitates human labelling of the user's response with attentiveness data. The resulting attentiveness model is therefore based on correlations indicative of attentiveness within the attentiveness-labelled behavioural data and/or physiological data that are based on real human cognition rather than a predetermined feature or combination of features.

ADJUSTING VIDEO CONTENT BASED ON AUDIENCE EXPERIENCES

Computer technology for making sure that no viewer in an audience of multiple co-viewers will effectively be presented with audiovisual content portions that inappropriate and/or irrelevant for that co-viewer. Avoiding the effective presentation of inappropriate and/or irrelevant content to one or more of the co-viewers may involve currently conventional techniques such as blurring, scrambling, obscuring, distracting, etc.