Patent classifications
H04N21/234336
System and method to summarize one or more videos based on user priorities
System and method to summarize one or more videos are provided. The system includes a data receiving module configured to receive videos; a video analysis module configured to analyse the one or more videos to generate one or more transcription text output; a building block data module configured to create a building block model and to apply the building block model on analysed videos; a video presentation module configured to present contents of the videos using elements and to present the one or more transcription texts; a video prioritization configured to generate one or more ranking formulas for the videos, to prioritize building block models, upon receiving feedback from users, based on contents and transcription texts; a video summarization module configured to generate a video summary; a video action module configured to choose an action to be performed on the videos based on the feedback received from the corresponding users.
A METHOD AND SYSTEM FOR CONTENT INTERNATIONALIZATION & LOCALISATION
A method of processing a video file to generate a modified video file, the modified video file including a translated audio content of the video file, the method comprising: receiving the video file; accessing a facial model or a speech model for a specific speaker, wherein the facial model maps speech to facial expressions, and the speech model maps text to speech; receiving a reference content for the originating video file for the specific speaker; generating modified audio content for the specific speaker and/or modified facial expression for the specific speaker; and modifying the video file in accordance with the modified content and/or the modified expression to generate the modified video file.
Captions for audio content
This disclosure describes, in part, techniques for providing captions with audio content. For instance, an electronic device may receive first data representing audio content and second data representing captions that are available for the audio content. The electronic device may then select portions of the captions for display while outputting the audio content. In some instances, the electronic device selects the portions using timestamps represented by the second data. For instance, the electronic device may select a portion of the captions such that the portion of the captions begins at a first pause within the audio content and/or ends at a second pause within the audio content. In some instances, the electronic device may also display graphical elements that indicate the current location within the captions.
Method and system of presenting moving images or videos corresponding to still images
The present application discloses a method of presenting moving images or videos corresponding to still images. The method includes: storing a still image and a moving image or video corresponding to the still image into a cloud storage; extracting feature points of the still image stored in the cloud storage, and storing the feature points in the cloud storage in a manner which associates the feature points with the still image; when a device obtains a first still image through scanning, extracting feature points from the first still image, comparing and judging whether the extracted feature points match feature points of each still image stored in the cloud storage to determine a second still image whose feature points match the feature points of the first still image; rendering a moving image or video corresponding to the second still image stored in the cloud storage at the position of the first still image. The present application can facilitate presenting a moving image corresponding to a still image, and increase the information and entertainment provided by a still image.
System and method for context aware detection of objectionable speech in video
Embodiments provide a system and method for filtering speech in a video. Speech in video may contain objectionable or profane words that need to be filtered. To ascertain whether a word or phrase is objectionable, the contextual information from surrounding words and the contextual information from detected objects and scenes in the video are used. Unwanted words may be filtered or collected and presented to the user.
Video caption generating method and apparatus, device, and storage medium
A video caption generating method is provided to a computer device. The method includes encoding a target video by using an encoder of a video caption generating model, to obtain a target visual feature of the target video, decoding the target visual feature by using a basic decoder of the video caption generating model, to obtain a first selection probability corresponding to a candidate word, decoding the target visual feature by using an auxiliary decoder of the video caption generating model, to obtain a second selection probability corresponding to the candidate word, a memory structure of the auxiliary decoder including reference visual context information corresponding to the candidate word, determining a decoded word in the candidate word according to the first selection probability and the second selection probability, and generating a video caption according to decoded word.
Reference of neural network model for adaptation of 2D video for streaming to heterogeneous client end-points
A method, computer program, and computer system is provided for streaming immersive media. The method includes ingesting content in a two-dimensional format, the 2D format referencing at least one neural network; converting the ingested content to a three-dimensional format based on the referenced at least one neural network; and streaming the converted content to a client end-point.
Caption service system for remote speech recognition
The present invention provides a caption service system for remote speech recognition, which provides caption service for the hearing impaired. This system includes a speaker and a live broadcast equipment at A, a listener-typist and a computer at B, a hearing impaired and a live screen at C, and an automatic speech recognition (ASR) caption server at D. Connect the live broadcast equipment, the computer, the live screen and the ASR caption server with a network. The speaker's audio is sent to the automatic speech recognition (ASR) caption server to be converted into text, which is corrected by the listener-typist, and then the text caption is sent to the live screen of the hearing impaired together with the speaker's video and audio, so that the hearing impaired can see the text caption spoken by the speaker.
Media streaming
There is disclosed a system for providing streaming services, comprising: a plurality of capture devices, each for capturing data and providing a captured data stream; and a server, for receiving the plurality of captured data streams; wherein each capture device is configured to generate metadata for the captured data, and transmit said metadata to the server.
System and method for providing descriptive video
A system and method for providing described video for media content generates a plurality of individual audio files, possibly using text-to-speech, for each line of a described video script. The described video script provides an indication of the timing, such as for example the start time and length, of the individual described video lines. The described video script can then be used to combine the individual audio files into a single audio file for inclusion with the media content.