Controlling play out of advertisement content during live video streaming at an end-user terminal

11089367 · 2021-08-10

Assignee

Inventors

Cpc classification

International classification

Abstract

A method (100) of controlling playout of advertisement content during live video streaming at an end-user terminal comprising steps of: receiving (110) advertisement content from an advertisement server; receiving (112) live streamed video content from a content delivery network and playing out the video content; obtaining (114) at least one of image features and audio features of the video content during playout; calculating (116) a content importance rating of video content to be played out during a prediction time window based on said features; and postponing (118) playout of advertisement content scheduled to be played out during the prediction time window if the calculated content importance rating for the prediction time window is above a threshold value.

Claims

1. A method of controlling playout of advertisement content during live video streaming at an end-user terminal, the method comprising: receiving advertisement content from an advertisement server; receiving live streamed video content from a content delivery network and playing out the video content; obtaining at least one of image features and audio features of the received video content during playout; calculating a content importance rating of received video content to be played out during a prediction time window when a pre-set playout time for received advertisement content is reached, the prediction time window following the pre-set playout time; and postponing playout of the received advertisement content scheduled to be played out during the prediction time window if the calculated content importance rating of received video content to be played out during the prediction time window is above a threshold value.

2. The method as claimed in claim 1, further comprising at least one of obtaining image features by identifying objects within the video content, and obtaining audio features by obtaining representations of power spectra of sounds within the received video content.

3. The method as claimed in claim 1, further comprising: obtaining additional features comprising at least one of a video frame rate, a video bit rate, a video contrast, a spatial complexity, a temporal complexity, a sound level, proximity of a user to the end-user terminal, number of running applications on the end-user terminal, and a current time of day; and calculating the content importance rating of received video content based additionally on the obtained additional features.

4. The method as claimed in claim 3, wherein calculating the content importance rating of video content to be played out comprises: determining a first criticality probability of the obtained image features and applying a first weight to the first criticality probability; determining second criticality probability of the obtained audio features and applying a second weight to the second criticality probability; determining a third criticality probability of the additional features and applying a third weight to the third criticality probability; and summing the weighted first, second and third probabilities.

5. The method as claimed in claim 1, further comprising obtaining the image features, audio features and additional features from received video content within a measurement time window, wherein the prediction time window follows the measurement time window, and wherein the measurement and prediction time windows are rolling time windows.

6. An end-user terminal arranged to control playout of advertisement content during live video streaming, the end-user terminal comprising a receiver, a processor and a memory containing instructions executable by the processor thereby to configure the processor: to receive, via the receiver, advertisement content from an advertisement server and live streamed video content from a content delivery network to obtain at least one of image features and audio features of the received video content during playout; to calculate, when a pre-set playout time for the received advertisement content is reached, a content importance rating of received video content to be played out during a prediction time window, the prediction time window following the pre-set playout time; and to postpone playout of received advertisement content scheduled to be played out during the prediction time window if the calculated content importance rating of received video content to be played out during the prediction time window is above a threshold value.

7. The end-user terminal as claimed in claim 6, wherein the processor is thereby further configured to obtain at least one of image features by identifying objects within the received video content and audio features by obtaining representations of power spectra of sounds within the received video content.

8. The end-user terminal as claimed in claim 6, wherein the processor is thereby further configured: to obtain additional features comprising at least one of a video frame rate, a video bit rate, a video contrast, a spatial complexity, a temporal complexity, a sound level, proximity of a user to the end-user terminal, number of running applications on the end-user terminal, and a current time of day; and to calculate the content importance rating of received video content based additionally on the obtained additional features.

9. The end-user terminal as claimed in claim 8, wherein the processor is thereby further configured: to predict the content importance rating by: determining a first criticality probability of the obtained image features and applying a first weight to the first criticality probability; determining a second criticality probability of the obtained audio features and applying a second weight to the second criticality probability; determining a third criticality probability of the additional features and applying a third weight to the third criticality probability; and summing the weighted first, second and third criticality probabilities.

10. The end-user terminal as claimed in claim 6, wherein the processor is thereby further configured to obtain the image features, audio features and additional features from received video content within a measurement time window and the processor is arranged to predict a content importance rating of received video content to be played out during a prediction time window following the measurement time window, and wherein the measurement and prediction time windows are rolling time windows.

11. An end-user terminal comprising: advertisement content processing circuitry arranged to receive advertisement content from an advertisement server; a playback buffer processing circuitry arranged to receive video content from a content delivery network; video player processing circuitry arranged to playout video content from the playback buffer and advertisement content from the advertisement content processing circuitry, and arranged to obtain at least one of image features and audio features of the video content during playout; and calculation processing circuitry arranged to calculate a content importance rating of video content to be played out during a prediction time window when a pre-set playout time for received advertisement content is reached, the prediction time window following the pre-set playout time, and wherein the video player processing circuitry is arranged to postpone playout of advertisement content scheduled to be played out during the prediction time window if the calculated content importance rating of video content to be played out during the prediction time window is above a threshold value.

12. A non-transitory computer-readable storage medium having stored therein a computer program which when loaded into a processor and executed, causes the processor to implement the method according to claim 1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 illustrates prior art replacement of original video content with advertisement content during live video streaming;

(2) FIGS. 2 to 8 illustrate methods of controlling playout of advertisement content during live video streaming at an end-user terminal according to embodiments of the invention; and

(3) FIGS. 9 and 10 are schematic representations of end-user terminals according to embodiments of the invention.

DETAILED DESCRIPTION

(4) The same reference numbers will used for corresponding features in different embodiments.

(5) Referring to FIG. 2, an embodiment of the invention provides a method 100 of controlling playout of advertisement content during live video streaming at an end-user terminal. The method comprises steps of:

(6) receiving 110 advertisement content from an advertisement server;

(7) receiving 112 live streamed video content from a content delivery network and playing out the video content;

(8) obtaining 114 at least one of image features and audio features of the video content during playout;

(9) calculating 116 a content importance rating of video content to be played out during a prediction time window based on said features; and

(10) postponing 118 playout of advertisement content scheduled to be played out during the prediction time window if the calculated content importance rating for the prediction time window is above a threshold value.

(11) In an embodiment, the end-user terminal is a video player on a mobile device, such as a smartphone or a tablet.

(12) Pre-fetching advertisement content at the beginning of live video streaming is preferred to just-in-time, JIT, downloading of advertisement content. Since advertisement content is often not long, pre-fetching is not expected to impact network requirements significantly.

(13) An embodiment of the invention therefore provides a method of controlling playout of advertisement content during live video streaming at an end-user terminal comprising: downloading the (as predefined by a service provider) advertisement content in advance, before live video streaming starts; calculating a real time content importance rating of the video content that the user is watching; and if advertisement content has be scheduled to playout during the presentation of important content, postponing the advertisement playout slightly, to a later time which is not too different from its original location. The method may therefore advantageously avoid replacement of important/critical live information/video content with advertisement content.

(14) In an embodiment, as illustrated in FIG. 3, when a pre-set playout time for advertisement content is reached 210, a content importance rating is calculated for a prediction time window following the pre-set playout time.

(15) A pre-set playout time for advertisement content is typically indicated by an advertisement marker. In the method 300 of the embodiment illustrated in FIG. 4, when an advertisement marker is reached 310 an assessment 312 is made as to whether the video content is critical within a prediction time window that is L seconds long, i.e. is the content critical within the next L seconds? Video content is considered to be critical if the calculated content importance rating is above a threshold value. If the content is critical, playout of the advertisement content is postponed 314 by L seconds. If the content is not critical, the advertisement content is played out 316.

(16) In an embodiment, obtaining image features comprises identifying objects within the video content and obtaining audio features comprises obtaining representations of power spectra of sounds within the video content.

(17) Calculating the content importance rating may include analysing audio features of the video content to obtain Mel-frequency cestrum coefficients of the audio features or may include performing sentiment analysis on the audio features.

(18) In the method 400 of the embodiment illustrated in FIG. 5, the content importance rating is additionally calculated based on one or more additional features. The additional features include features of the video content, such as a video frame rate, a video bit rate, a video contrast, a spatial complexity, a temporal complexity and a sound level, features of the end-user terminal, such as proximity of a user to the end-user terminal, number of running applications on the end-user terminal, and a current time of day.

(19) The method 400 comprises obtaining at least one additional feature 412, in addition to obtaining image features and audio features of the video content, and calculating 414 a content importance rating based on all the obtained features.

(20) FIG. 6 illustrates steps 500 of a method of controlling playout of advertisement content during live video streaming at an end-user terminal, according to an embodiment of the invention.

(21) In this embodiment, the method comprises obtaining image features and audio features of the video content and additional features of the video content and the end-user terminal during the previous 30 s of video streaming.

(22) Calculating a content importance rating of video content to be played out during a prediction time window comprises:

(23) determining 512 a first criticality probability of the image features and applying a first weight 520 to first criticality probability;

(24) determining 514 a second criticality probability of the audio features and applying a second weight 522 to the second criticality probability;

(25) determining 516 a third criticality probability of the additional features and applying a third weight 524 to the third criticality probability;

(26) and summing the first, second and third weighted criticality probabilities.

(27) The content importance rating is compared to a threshold value to determine 530 a binary rating, Yes/No. Playout of advertisement content scheduled to be played out during the prediction time window is postponed if the calculated content importance rating for the prediction time window is a ‘Yes’.

(28) FIG. 7 illustrates a method 600 of controlling playout of advertisement content during live video streaming at an end-user terminal, according to an embodiment of the invention.

(29) The image features, audio features and additional features are obtained from video content 610 within a 30 s long measurement time window 510. The prediction time 620 window follows the measurement time window, and the measurement and prediction time windows are rolling time windows.

(30) A content importance rating is calculated for the measurement window in the manner illustrated in FIG. 6. Playout of advertisement content scheduled to be played out during the prediction time window 620 is postponed by L seconds if the calculated content importance rating for the prediction time window is a ‘Yes’.

(31) In an embodiment, image features and audio features are obtained and a content importance rating calculated as follows.

(32) If, for instance, the video content includes people having a conversation, the content importance rating can be mainly based on changes in the audio frequency, including detection of who is talking importance detection based on the volume of the person, and detection of a question being asked by a person. This enables the advertisement content to be relocated from its originally planned location as the audience might be interested in hearing the answer to the question, which makes the content critical to the user.

(33) With the help of sentiment analysis, the audio features of a person speaking and the speaker's attitude towards a particular topic as being positive or negative can be identified. If there are multiple people in the same scene, and if the attitudes continuously alter between positive and negative that are mapped to different human sound, then the content of the video can be identified as an important discussion. In addition, the volume of each speaker also helps to detect whether two people are arguing.

(34) Numerous context/content/activity detection/awareness algorithms from sound, video and other sensor data are also known, as reported in Cavallaro et al “Semantic video analysis for adaptive content delivery and automatic description” IEEE Transactions On Circuits and Systems for Video Technology, Vol. 15, No. 10, October 2005, which enable sentiment analysis of the audio features.

(35) Deep learning algorithms such as Long-short Term Memory, LSTM, algorithms enable powerful and accurate future sequence predictions from a given input sequence, and can map many sequences in the past to many sequences in the future.

(36) Image features may, for example, be obtained during streaming of a live soccer match, and the calculation of a content importance rating can include detection of the speed and the location of the ball in the soccer field. For example, if the ball is within the penalty zone, an advertisement can be delayed. This can be performed using, for example, the Google Cloud Video Intelligence API that identifies what is in the image during a video stream.

(37) The features that support the calculation of the content importance rating are as follows. The overall statistics (average, mean, median, standard deviation, skewness, kurtosis) of the following features measured within the last 30 seconds of video stream, as shown in FIG. 7:

(38) Video content: The video content is processed as follows: Video content (image).fwdarw.identify objects (who/what/where is in the video).fwdarw.criticality (Boolean)

(39) Identify objects: In case of sports event, detect the distance between the ball and the goal, detect the score of the match. In the case of news, detect who/what are in the screen and detect if the person/object is important.

(40) Audio content: Audio content is analysed using well-known Mel-frequency cestrum coefficients, MFCC. Mel-frequency cestrum, MFC, represents the power spectrum of a sound, and it consists of MFC coefficients. It is typically used in audio similarity and music information retrieval. The audio content is processed as follows: Audio content.fwdarw.Audio features.fwdarw.Phonemes.fwdarw.words.fwdarw.sentiment analysis.fwdarw.criticality (Boolean)

(41) Other features: Video frame rate, video bitrate, proximity of the user to the end-user terminal, number of running apps, hour of the day, video contrast, spatial complexity, temporal complexity, sound level.

(42) Using these features, the machine learning model 500 detects 530 whether the content is critical or not, i.e., a binary classification problem. The prediction is done via end-to-end machine learning using the above features, with three different models, as shown in FIG. 6. A criticality probability 512, 514, 516 is calculated for each of video/image content, audio content and additional content by the three models. The criticality probabilities are multiplied by respective weights W1, W2, and W3, and summed to obtain the overall importance content rating.

(43) During playout of live streaming video content, if an advertisement marker is reached, the machine learning model will make a decision to either postpone or display the advertisement content. The method is therefore event based, in response to an advertisement marker being reached, rather than being performed periodically.

(44) FIG. 8 illustrates signalling 700 between various nodes in a cloud server implementation of a method of controlling playout of advertisement content 632 during live video streaming on an end-user terminal.

(45) The end-user terminal comprises a video player application 710.

(46) The method comprises creating 720 a media selected player, i.e. the video player application 710 and sending a play request 702 to a video service authentication server 732, which sends back a media url (CDN link), advertisement content url, and available bitrates 704. The video player application then sends a request 706 to an analytics server 718 for the machine-learning model for calculating a content importance rating and deciding whether to play or postpone advertisement content. The machine-learning models are then downloaded 708 to the video player 710.

(47) The video player sends a request 710 for advertisement content to an advertisement server 714, and advertisement content is downloaded 712 to the video player. A manifest file is created 722 containing the received advertisement content, including advertisement playout markers.

(48) Video content 740 is then downloaded from a content delivery network, CDN, 716, to the video player. The video content is then played out by the video player and the video player obtains image features and audio features of the video content within a rolling 30 s measurement window 612, as shown in FIG. 7

(49) When an advertisement marker 730, indicating a pre-set playout time for advertisement content, is reached, the video player decides whether to play the advertisement content associated with the marker (if a calculated content importance rating is below a threshold value) or to postpone playout of the advertisement content (if a calculated content importance rating is above a threshold value).

(50) The process of receiving 740 video content obtaining image features and audio features within a 30 s rolling measurement window, and calculating 724 a content importance rating is repeated for the duration of the live streaming session.

(51) Corresponding embodiments apply to the end-user terminals 800, 900 described below.

(52) Referring to FIG. 9, an embodiment of the invention provides an end-user terminal 800 arranged to control playout of advertisement content during live video streaming. The end-user terminal comprises a receiver 810 and a processor 812;

(53) The receiver 810 is arranged to receive advertisement content from an advertisement server 820 and to receive live streamed video content from a content delivery network 830. The receiver is also arranged to play out the received video content and advertisement content.

(54) The processor 812 arranged to obtain at least one of image features and audio features of the video content during playout. The processor further arranged to calculate a content importance rating of video content to be played out during a prediction time window based on said features. The processor is also arranged to postpone playout of advertisement content scheduled to be played out during the prediction time window if the calculated content importance rating for the prediction time window is above a threshold value.

(55) In an embodiment, the processor 812 is arranged to determine, when a pre-set playout time for advertisement content is reached, whether the video content to be played out during a prediction time window following the pre-set playout time has a content importance rating above the threshold value. The processor is further arranged to postpone playout of advertisement content if the video content to be played out during said prediction time window has a content importance rating that is above the threshold value.

(56) In an embodiment, the processor 812 is arranged to obtain image features by identifying objects within the video content. The processor is also arranged to obtain audio features by obtaining representations of power spectra of sounds within the video content.

(57) In an embodiment, the processor 812 is arranged to obtain additional features comprising at least one of a video frame rate, a video bit rate, a video contrast, a spatial complexity, a temporal complexity, a sound level, proximity of a user to the end-user terminal, number of running applications on the end-user terminal, and a current time of day. The processor is arranged to calculate the content importance rating based on the image features, the audio features and the additional features.

(58) In an embodiment, the processor 812 is arranged to predict the content importance rating by: determining a first criticality probability of the image features and applying a first weight to first criticality probability; determining second criticality probability of the audio features and applying a second weight to the second criticality probability; determining a third criticality probability of the additional features and applying a third weight to the third criticality probability; and summing the first, second and third weighted probabilities.

(59) In an embodiment, the processor 812 is arranged to obtain the image features, audio features and additional features from video content within a measurement time window. The processor is arranged to predict a content importance rating of video content to be played out during a prediction time window following the measurement time window. The measurement and prediction time windows are rolling time windows.

(60) Referring to FIG. 10, an embodiment of the invention provides an end-user terminal 900 arranged to control playout of advertisement content during live video streaming. The end-user terminal comprises advertisement content processing circuitry 910, playback buffer processing circuitry 912, video player processing circuitry 916 and calculation processing circuitry 914.

(61) The advertisement content processing circuitry is arranged to receive advertisement content from an advertisement server 820. The playback buffer processing circuitry arranged to receive video content from a content delivery network, CDN, 830. The video player processing circuitry arranged to playout video content from the playback buffer and to playout advertisement content from the advertisement content processing circuitry. The video player is arranged to obtain at least one of image features and audio features of the video content during playout. The calculation processing circuitry arranged to calculate a content importance rating of video content to be played out during a prediction time window based on said features. The video player processing circuitry is arranged to postpone playout of advertisement content scheduled to be played out during the prediction time window if the calculated content importance rating for the prediction time window is above a threshold value.

(62) An embodiment of the invention provides a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out any of the above steps of the method of controlling playout of advertisement content during live video streaming at an end-user terminal.

(63) An embodiment of the invention provides a data carrier having computer readable instructions embodied therein. The said computer readable instructions are for providing access to resources available on a processor. The computer readable instructions comprise instructions to cause the processor to perform any of the above steps of the method of controlling playout of advertisement content during live video streaming at an end-user terminal.

(64) In an embodiment, the data carrier is a non-transitory data carrier.