METHOD FOR RECOMMENDING VIDEO CONTENT

20220107978 · 2022-04-07

Assignee

Inventors

Cpc classification

International classification

Abstract

A method of recommending video content using a computer-based system, the method including providing an initial set including a plurality of videos; extracting a digital audio signal from each of the plurality of videos; determining at least one temporal sequence of low-level audio features for each digital audio signal of the plurality of videos by analyzing the digital audio signals; calculating an audio similarity index between each of the plurality of videos by comparing their respective at least one temporal sequence of low-level audio features; receiving a query Q comprising reference to a seed video; the seed video being one of the plurality of videos; determining, for the seed video, a ranking of the rest of the initial set of videos based on their audio similarity index with respect to the seed video; and returning, as a reply to the query Q, an ordered set of video references according to the ranking.

Claims

1-17. (canceled)

18. A method of recommending video content using a computer-based system, the method comprising: providing an initial set of a plurality of videos; extracting a digital audio signal from each of the plurality of videos; determining at least one temporal sequence of low-level audio features for each digital audio signal of the plurality of videos by analyzing the digital audio signal from each of the plurality of videos; calculating an audio similarity index between each of the plurality of videos by comparing respective ones of the at least one temporal sequence of low-level audio features; receiving, from an input device of the computer-based system, a query Q comprising a reference to a seed video, the seed video being one of the plurality of videos; determining a ranking for the seed video, the ranking consisting of a comparison of the audio similarity index of a rest of the initial set of the plurality of videos to the audio similarity index of the seed video; and returning to a display device of the computer-based system, as a reply to the query Q, an ordered set of video references according to the ranking.

19. The method according to claim 18, wherein a duration of each digital audio signal corresponds to a duration of a video of the plurality of videos it was extracted from, wherein the method further comprises: dividing each digital audio signal into a plurality of audio segments; and determining at least one of a temporal sequence of low-level audio features and at least one high-level feature vector V.sub.f for at least one of the plurality of audio segments; and wherein calculating the audio similarity index between each of the plurality of videos comprises comparing at least one of a temporal sequence of low-level audio features or at least one high-level feature vector V.sub.j of respective ones of the plurality of audio segments of each video of the plurality of videos.

20. The method according to claim 19, wherein the plurality of audio segments cover a whole duration of the respective digital audio signal, and wherein the plurality of audio segments have equal segment duration L.sub.s, wherein the segment duration is between 1 s<L.sub.s<60 s.

21. The method according to claim 19, further comprising: determining a temporal arrangement of the plurality of audio segments for each digital audio signal; and wherein calculating the audio similarity index between each of the plurality of videos further comprises taking into account the temporal arrangement of respective ones of the plurality of audio segments.

22. A method of recommending video content using a computer-based system, the method comprising: providing an initial set of a plurality of videos; extracting a digital audio signal from each of the plurality of videos; determining at least one temporal sequence of low-level audio features for the digital audio signal of each of the plurality of videos by analyzing the digital audio signal; calculating at least one high-level feature vector V.sub.f for the digital audio signal of each of the plurality of videos by analyzing the at least one temporal sequence of low-level audio features, wherein elements of the high-level feature vector V.sub.f each represent a high-level audio feature associated with the digital audio signal; calculating an audio similarity index between each of the plurality of videos by calculating a respective pairwise distance D.sub.p between the high-level feature vectors V.sub.f in the vector space, wherein a shorter pairwise distance D.sub.p represents a higher degree of similarity between respective ones of the digital audio signal of the plurality of videos; receiving, from an input device of the computer-based system, a query Q comprising a reference to a seed video, the seed video being one of the plurality of videos; determining a ranking for the seed video, the ranking comprising a comparison of the audio similarity index of the seed video to a rest of the initial set of videos; and returning to a display device of the computer-based system, as a reply to the query Q, an ordered set of video references according to the ranking.

23. The method according to claim 22, wherein each of the at least one high-level feature vectors V.sub.f comprises a number n.sub.f of elements, wherein each of the elements is a real or integer number, and represents one of a perceived musical characteristic corresponding to one or more of a musical style, musical genre, musical sub-genre, rhythm, tempo, vocals or instrumentation; or a perceived emotional characteristic corresponding to a mood of a respective one of the digital audio signal, and wherein 1≤n.sub.f≤256.

24. The method according to claim 22, wherein calculating the respective pairwise distance D.sub.p between the high-level feature vectors V.sub.f comprises: applying Dynamic Time Warping (DTW) between the high-level feature vectors V.sub.f, wherein the shorter pairwise distance D.sub.p between respective ones of the digital audio signal in the vector space represents a higher degree of similarity.

25. The method according to claim 22, wherein calculating the at least one high-level feature vector V.sub.f for each digital audio signal further comprises: calculating at least one 2-dimensional low-level audio feature matrix for each digital audio signal based on their respective at least one temporal sequence of low-level audio features, feeding at least one of the low-level audio feature matrices or the digital audio signal into a Machine Learning, ML, engine; and calculating, using a respective output of the ML engine, at least one high-level feature vector V.sub.f for each digital audio signal; wherein at least one of the low-level audio features is a Mel Frequency Cepstrum Coefficient (MFCC), vector, a Mel-spectrogram, a Constant-Q transform, a Variable-Q transform, or a Short Time Fourier Transform (STFT).

26. The method according to claim 22, wherein a duration of the digital audio signal corresponds to a duration of a corresponding one of the plurality of videos the digital audio signal was extracted from, wherein the method further comprises: dividing each digital audio signal into a plurality of audio segments; and determining at least one of a temporal sequence of low-level audio features and at least one high-level feature vector V.sub.f for at least one of the plurality of audio segments; and wherein calculating the audio similarity index between each of the plurality of videos comprises comparing at least one of a temporal sequence of low-level audio features or at least one high-level feature vector V.sub.f of respective ones of the plurality of audio segments of each video of the plurality of videos.

27. The method according to claim 26, wherein the plurality of audio segments cover a whole duration of the respective digital audio signal, and wherein the plurality of audio segments have equal segment duration L.sub.s, wherein the segment duration L.sub.s is between 1 s<L.sub.s<60 s.

28. The method according to claim 26, further comprising: determining the temporal arrangement of the plurality of audio segments for each digital audio signal; and wherein calculating the audio similarity index between each of the plurality of videos further comprises taking into account the temporal arrangement of respective ones of the plurality of audio segments.

29. The method according to claim 22, wherein the videos in the initial set comprise pieces of metadata, each piece of the metadata comprising textual information associated with the respective video such as title, description, tags, keywords, or MPEG-7 metadata, the method further comprising: extracting metadata from each of the plurality of videos; calculating a metadata similarity index between each of the plurality of videos based on the degree of similarity between their respective metadata; wherein the ranking of the rest of the initial set of videos is further adjusted by ensembling the calculations of the respective similarity indexes of each video with respect to the seed video.

30. The method according to claim 22, the method further comprising: collecting online data by analyzing online sources referring to the plurality of videos, the online data representing similarities between the plurality of videos based on at least one of Collaborative Filtering, CF, and associated editorial content; calculating an online similarity index between each of the plurality of videos based on the online data; wherein the ranking of the rest of the initial set of videos is further adjusted by ensembling the calculations of the respective similarity indexes of each video with respect to the seed video.

31. The method according to claim 22, the method further comprising receiving the query Q from a user; extracting user preference data associated with the user from a user profile database, the user preference data representing the given user's preferences regarding the ranking of the plurality of videos based on at least one of: previously recorded user interactions with at least one of the plurality of videos, the user interactions comprising at least one of playing, skipping, rewinding, repeating, adding to a playlist or liking a video, information on any video played by the given user before receiving the query Q, date and time of receiving the query Q, and location of the given user when receiving the query Q; adjusting the ranking of the rest of the initial set of videos according to the user preference data; returning to the user, as a reply to the query Q, an ordered set of videos according to the adjusted ranking.

32. The method according to claim 22, the method further comprising displaying on the display device, as part of a user interface, a seed video selector area comprising a plurality of visual representations T.sub.1 . . . n, each visual representation T representing one video from the initial set of videos, determining through an input device when a user selects one of the visual representations T of the videos from the seed video selector area, determining for the query Q the reference to a seed video according to the selected visual representation, and displaying, as part of the user interface, a video recommendation area comprising a plurality of visual representations T.sub.1 . . . m, wherein each visual representation T represents one video from the initial set of videos, and wherein the plurality of visual representations T.sub.1 . . . m are ordered according to the ranking.

33. The method according to claim 32, the method further comprising displaying, as part of the user interface, a recommendation adjustment area comprising visual means for dynamically adjusting the order of the visual representations T.sub.1 . . . m in the video recommendation area, determining, according to user interaction with the visual means, an adjusted order of the visual representations T.sub.1 . . . m, by one of adjusting the weight with which a user preference data is taken into account when calculating the ranking, or adjusting the weight with which different similarity indexes, such as a metadata similarity index, an online similarity index, or a visual similarity index is taken into account during ensembling calculations for determining the ranking, wherein the visual means comprise at least one of a graphical element, such as a slider, or a numerical input field, and displaying the visual representations T.sub.1 . . . m on the user interface according to the adjusted order.

34. A non-transitory computer-readable storage medium having stored thereon a computer program product operable to cause a computer to perform the method of claim 18.

35. A non-transitory computer-readable storage medium having stored thereon a computer program product operable to cause a computer to perform the method of claim 18.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0064] In the following detailed portion of the present disclosure, the aspects, embodiments and implementations will be explained in more detail with reference to the example embodiments shown in the drawings, in which:

[0065] FIG. 1 shows a flow diagram of a method of recommending video content using a computer-based system in accordance with the first aspect;

[0066] FIG. 2 shows a flow diagram of a method in accordance with a possible implementation form of the first or second aspect, wherein each audio signal is divided into a plurality of audio segments;

[0067] FIG. 3 shows a flow diagram illustrating the steps of calculating the audio similarity index in accordance with the second aspect;

[0068] FIG. 4 shows a flow diagram illustrating the steps of calculating a high-level feature vector V.sub.f for a digital audio signal in accordance with a possible implementation form of the second aspect;

[0069] FIG. 5 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos, taking into account metadata extracted from each video, according to a possible implementation form of the first or second aspect.

[0070] FIG. 6 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos, taking into account online data from online sources, according to a possible implementation form of the first or second aspect.

[0071] FIG. 7 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos, taking into account user preference data extracted from a user profile database, according to a possible implementation form of the first or second aspect.

[0072] FIG. 8 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos, taking into account digital visual signals extracted from each of video, according to a possible implementation form of the first or second aspect.

[0073] FIG. 9 shows a block diagram of a computer-based system in accordance with a possible implementation form of the third aspect.

[0074] FIGS. 10A and 10B shows illustrative representations of a user interface to be displayed by the display device illustrated on FIG. 9, in accordance with a possible implementation form of the third aspect.

DETAILED DESCRIPTION

[0075] FIG. 1 shows a flow diagram of a method for recommending video content in accordance with the present disclosure, using a computer or computer-based system such as for example the system shown on FIG. 9.

[0076] In the first step 101 there is provided an initial set of a plurality of videos 1.

[0077] In this disclosure, the term ‘video’ in this context refers to a digital medium, such as e.g. a video file or record, for the recording, copying, playback, broadcasting, and display of moving visual media. The videos can be of any duration and encoded by any known video encoding method or standard. A video file normally consists of a container containing a digital visual signal in a visual coding format alongside a digital audio signal in an audio coding format. The container may also contain synchronization information, subtitles, and metadata such as title. Videos are generally stored using lossy compression to reduce the video file size.

[0078] The initial set of videos contains at least three videos (so that a ranking can be established based on selecting a seed video between the remaining plurality of videos), more preferably at least ten videos. In some embodiments, especially when using a Machine Learning engine, a larger set of videos is preferred to improve the accuracy of the method. The set of videos can be provided locally on a storage medium 31 of the computer-based system 30, or can be stored on a remote data server.

[0079] In a next step 102 a digital audio signal 2 is extracted from each of the plurality of videos 1.

[0080] In this disclosure “digital audio signal” refers to any sound that has been recorded or converted into digital form, where the sound wave is encoded as numerical samples in continuous sequence. The average number of samples obtained in one second is called the sampling frequency. An example of an encoding format for digital audio signals generally referred to as “CD audio quality” uses a sampling rate of 44.1 thousand samples per second, however it should be understood that any suitable sampling rate can be used for storing the digital audio signal. In a particularly advantageous implementation the audio signal is sampled at 22050 Hz and converted to mono by averaging the left and right audio channels.

[0081] In this disclosure “digital audio signal” refers to the audio part stored in the container of a video file in an audio coding format as described above.

[0082] The digital audio signals can extracted from the containers of the video files using any known digital audio extraction method and can be stored on a storage medium 31 of the computer-based system 30 from where it can be further processed.

[0083] In a next step 103 at least one temporal sequence 4 of low-level audio features is determined for each digital audio signal 2.

[0084] The term ‘low-level audio feature’ in this context refers to numerical values describing the contents of an audio signal on a signal level (as opposed to high-level features referring to an abstracted, symbolic level) and are determined according to different kinds of inspections such as temporal, spectral, etc. In particular the temporal sequence of low-level audio features in this context may refer to a Mel-spectrogram, a Mel Frequency Cepstrum Coefficient (MFCC) vector, a Constant-Q transform, a Variable-Q transform, or a Short Time Fourier Transform (STFT). Further examples may include, but are not limited to, those of fast Fourier transforms (FFTs), digital Fourier transforms (DFTs), Modified Discrete Cosine Transforms (MDCTs), Modified Discrete Sine Transforms (MDSTs), Quadrature Mirror Filters (QMFs), Complex QMFs (CQMFs), discrete wavelet transforms (DWTs), or wavelet coefficients.

[0085] The object of this feature extraction step is to transform the input audio signal into a new space of variables that simplifies further analysis and processing.

[0086] In a next step 104 an audio similarity index 5 between each of the plurality of videos 1 is calculated by comparing their respective temporal sequence(s) 4 of low-level audio features. The audio similarity index 5 represents the similarity of the audio signals extracted from and associated with the videos. The details of the calculation are explained below in detail. The audio similarity indexes 5 may be stored (and optionally visualized) in the form of an audio similarity matrix 6, wherein each row and column represents one of the plurality of videos, and each value in the matrix 6 represents the audio similarity index 5 between the two videos that its column and row represents. Thus, the diagonal values of the matrix 6 will always be of highest value as they show the highest possible degree of (self-)similarity.

[0087] In a next step 105 a query Q comprising reference to a seed video is received; wherein the seed video is one of the plurality of videos 1. The query Q may originate from a user 16 and received via an input device 36. The user 16 may input the query Q using a dedicated user interface such as the user interface 380 explained below in detail.

[0088] In a next step 106 a ranking 7 of the rest of the initial set of videos 1 is determined based on their audio similarity index 5 with respect to the seed video. Possible implementations of determining the ranking 7 are explained below in detail. The ranking 7 can be temporarily stored in a memory 33 of the computer-based system 30.

[0089] In a next, final step 107 an ordered set of video references according to the ranking 7 are returned as a reply to the query Q. The ranking can be read from the memory 33 and displayed to the user 16 via the display device 38 with the help of a dedicated user interface such as the user interface 380 explained below in detail.

[0090] FIG. 2 shows a flow diagram illustrating a possible implementation of the method, wherein each audio signal is divided into a plurality of audio segments.

[0091] In this implementation, steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

[0092] In a first step 1031 each digital audio signal 2 is divided into a plurality of audio segments 3. These audio segments 3 can be overlapping or non-overlapping, and have differing or equal segment durations between 1 s<L.sub.s<60 s, more preferably between 5 s<L.sub.s<30 s. In a particularly advantageous example the audio segments 3 are non-overlapping and have equal segment duration L.sub.s=15 s.

[0093] In a next step 1032 at least one temporal sequence 4 of low-level audio features for each audio segment 3 is determined by analyzing the audio segments 3 in accordance with the feature extraction step 103 explained above.

[0094] In a next step 1033 an audio similarity index 5 between each of the plurality of videos 1 is calculated by comparing the respective at least one temporal sequence 4 of low-level audio features of at least one of their audio segments 3. The calculation and storage of data is similar to the audio similarity index calculation step 104 explained above, only with more complex calculations due to the increased granularity.

[0095] In a next, optional step 1034 the temporal arrangement of the plurality of audio segments 3 for each digital audio signal 2 is also determined, and taken into account when calculating 1033 the audio similarity index 5 between each of the plurality of videos 1.

[0096] FIG. 3 shows a flow diagram illustrating core steps of calculating the audio similarity index 104 in accordance with a possible implementation of the method. In this implementation, steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

[0097] In a first step 1043 at least one high-level feature vector V.sub.f for each digital audio signal 2 or segment 3 is calculated by analyzing its at least one temporal sequence 4 of low-level audio features. The elements of the high-level feature vector V.sub.f each represent a high-level audio feature associated with the digital audio signal 2 or segment 3.

[0098] As explained above the term ‘low-level audio feature’ in this context refers to numerical values describing the contents of an audio signal on a signal level and are determined according to different kinds of inspections (such as temporal, spectral, etc.), whereas ‘high-level audio feature’ refers to numerical values on an abstracted, symbolic level determined from those low-level audio features and may represent a perceived musical characteristic corresponding to the style, genre, rhythm, tempo, or instrumentation; or a perceived emotional characteristic corresponding to the mood of the respective digital audio signal 2 or segment 3.

[0099] Each of the high-level feature vectors V.sub.f comprises f elements, and each of the elements is a real or integer number. The number of elements may range between 1≤n.sub.f≤256, more preferably between 1≤n.sub.f≤100. In an advantageous example the number of elements is between 1≤n.sub.f≤34.

[0100] In a next step 1044 the respective pairwise (Euclidean) distance D.sub.p between the high-level feature vectors V.sub.f is calculated in the vector space. Here, the shorter pairwise (Euclidean) distance D.sub.p represents a higher degree of similarity between the respective digital audio signals 2 or segments 3.

[0101] In an embodiment the respective pairwise distances D.sub.p between the high-level feature vectors V.sub.f are calculated with the inclusion of an optional step 1045 whereby Dynamic Time Warping DTW is applied between the high-level feature vectors V.sub.f. Similarly as above, the shorter pairwise (Euclidean) distance D.sub.p between the respective digital audio signals 2 or segments 3 in the vector space represents a higher degree of similarity.

[0102] FIG. 4 shows a flow diagram illustrating the remaining steps of calculating a high-level feature vector V.sub.f for a digital audio signal in accordance with a possible implementation of the method. In this implementation, steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

[0103] In a first step 1041 at least one 2-dimensional low-level audio feature matrix 8 for each digital audio signal 2 or segment 3 is calculated based on their respective at least one temporal sequence 4 of low-level audio features.

[0104] In a next step 1042 at least one of the low-level audio feature matrices 8 or the digital audio signal 2 or segment 3 is fed into a Machine Learning (ML) engine.

[0105] In a next step 1043 at least one high-level feature vector V.sub.f is calculated for each digital audio signal 2 or segment 3 using the respective output of the ML engine.

[0106] FIG. 5 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos taking into account metadata extracted from each video in accordance with a possible implementation of the method. In this implementation, steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

[0107] In a first step 201 after providing 101 an initial set of a plurality of videos 1 metadata 10 from each of the plurality of videos 1 is extracted. As explained above the containers of the videos 1 in the initial set may comprise pieces of metadata 10. A piece of the metadata 10 may refer to any kind of textual information about the content of the respective video such as title, description, tags, keywords, or MPEG-7 metadata, as well as in the case of e.g. a movie the transcripts of conversations, or text descriptions of its scenes. There are two sources in which video metadata is usually derived: operational gathered metadata which contains information about the content produced, such as the type of equipment, software, date, and location; and human-authored metadata which may improve search engine visibility, discoverability, audience engagement, and may provide advertising opportunities to video publishers. Thus, video metadata can be created either by automated information processing or by manual work.

[0108] In a next step 202 a metadata similarity index 11 between each of the plurality of videos 1 is calculated based on the degree of similarity between their respective metadata 10. The calculation of the metadata similarity index 11 may happen in a similar fashion as the calculations explained above in connection with calculating the audio similarity index 5 or using any other known method for calculating similarities between pieces of textual information. The resulting metadata similarity index 11 may be stored in a metadata similarity matrix 12. The metadata similarity matrix 12 may then be physically stored on a storage medium 31 of the computer-based system 30.

[0109] In a next step 203 the calculations of the respective similarity indexes of each video are ensembled to determine 106 the ranking 7 of the rest of the initial set of videos 1 with respect to the seed video. The similarity indexes may refer to any similarity index other than the metadata similarity index that has been calculated in the manners explained in the present disclosure, such as: an audio similarity index, an online similarity index, or a visual similarity index.

[0110] FIG. 6 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos taking into account online data from online sources in accordance with a possible implementation of the method. In this implementation, steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

[0111] In a first step 301 after providing 101 an initial set of a plurality of videos 1 online data 13 is collected by analyzing online sources referring to the plurality of videos 1. The online data 13 may represent similarities between the plurality of videos 1 based on Collaborative Filtering (CF), or associated editorial content.

[0112] Collaborative Filtering (CF) generally refers to a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A likes the same type or piece of content as a person B in a particular context (such as music), A is more likely to have similar taste as B in a different context (such as movies) than that of a randomly chosen person. For example, a collaborative filtering recommendation system for television tastes could make predictions about which television show a user should like given a partial list of that user's tastes (likes or dislikes. This means that these predictions are specific to the user, but use information collected from many other users.

[0113] Associated editorial content refers to official or user-generated content that can be associated with a video, such as reviews, articles, or other kinds of textual content to be found online that may serve as a basis for deriving similarities (e.g. based on classification, review score, etc.). The two preferred databases for collecting such information for the method are IMDB and OMDB.

[0114] In a next step 302 an online similarity index 14 between each of the plurality of videos 1 is calculated based on the degree of similarity between their respective online data 13. The calculation of the online similarity index 14 may happen in a similar fashion as the calculations explained above in connection with calculating the audio similarity index 5 or using any other known method for calculating similarities between pieces of textual information. The resulting online similarity index 14 may be stored in an online similarity matrix 15. The online similarity matrix 15 may then be physically stored on a storage medium 31 of the computer-based system 30.

[0115] In a next step 303 the calculations of the respective similarity indexes of each video are ensembled to determine 106 the ranking 7 of the rest of the initial set of videos 1 with respect to the seed video. The similarity indexes may refer to any similarity index other than the online similarity index that has been calculated in the manners explained in the present disclosure, such as: an audio similarity index, a metadata similarity index, or a visual similarity index.

[0116] FIG. 7 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos taking into account user preference data extracted from a user profile database in accordance with a possible implementation of the method. In this implementation, steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

[0117] In a first step 401 after receiving 105 a query Q from a user 16, user preference data 17 associated with the user 16 is extracted from a user profile database 18. In the present example the user profile database 18 is stored on a storage medium 31 of the computer-based system 30 and can thus be directly accessed upon receiving the query Q. However, in other possible embodiments the user profile database 18 can also be stored on a server that is configured to store and provide data to a client device such as the computer-based system 30. The user preference data 17 refers to any kind of information that represents the given user's taste that can be translated into preferences regarding the ranking 7 of the plurality of videos 1. In an embodiment the user preference data 17 is based on at least one of [0118] previously recorded user interactions with at least one of the plurality of videos 1, the user interactions comprising at least one of playing, skipping, rewinding, repeating, adding to a playlist or liking a video 1, [0119] information on any video 1 played by the given user before receiving the query Q, [0120] date and time of receiving the query Q, and location of the given user when receiving the query Q. Alternatively or in addition, the preference data 17 can also be derived from online profiles that the user created on other 3.sup.rd party services, such as a music streaming service.

[0121] In a next step 402 the ranking 7 of the rest of the initial set of videos 1 is adjusted according to the user preference data 17. This adjustment can be done automatically by averaging or assigning weights to certain sources of information. The adjustment can also be done manually via a user interface 380 in a manner explained below. The result of the adjustment is an adjusted ranking 7A of the videos that can be stored on a memory 33 of the computer-based device 30.

[0122] In a next step 107 an ordered set of videos 1 according to the adjusted ranking 7A is returned to the user 16, as a reply to the query Q. The ranking can be read from the memory 33 and displayed to the user 16 via the display device 38 with the help of a dedicated user interface such as the user interface 380 explained below in detail.

[0123] FIG. 8 shows a flow diagram illustrating the steps of determining the ranking of the initial set of videos taking into account digital visual signals extracted from each of video in accordance with a possible implementation of the method. In this implementation, steps and features that are the same or similar to corresponding steps and features previously described or shown herein are denoted by the same reference numeral as previously used for simplicity.

[0124] In a first step 501 after providing 101 an initial set of a plurality of videos 1 a digital visual signal 19 is extracted from each of the plurality of videos 1. As explained above the container of the video files in the initial set comprises digital visual signals 19 alongside digital audio signals 2 and other optional content such as synchronization information, subtitles, and metadata. These digital visual signals 19 can be extracted from the video file containers by any commonly known method.

[0125] In a next optional step 502 each digital visual signal 19 is divided into a plurality of visual segments 20. The division is determined in accordance with the segmentation of the respective digital audio signal 2 of the video as explained in detail above, with a resulting visual segment duration L.sub.S. These visual segments 20 can thus also be overlapping or non-overlapping, and have differing or equal segment durations between 1 s<L.sub.s<60 s, more preferably between 5 s<L.sub.s<30 s. In a particularly advantageous example the visual segments 20 are also non-overlapping and have equal segment durations of L.sub.s=15 s.

[0126] In a next step 503 the digital visual signals 19 are processed to calculate 503 at least one visual feature vector V.sub.fv for each digital visual signal 19 or segment 20. The calculation of the visual feature vector V.sub.fv may happen in a similar fashion as the calculations explained above in connection with calculating the feature vectors V.sub.f.

[0127] In a next step 504 a visual similarity index 21 is calculated between each of the plurality of videos 1 based on the respective pairwise distance D.sub.pv between their associated visual feature vectors V.sub.fv in the vector space. Here the shorter pairwise distance D.sub.pv results in a higher visual similarity index 21 between the respective videos 1. The resulting visual similarity index 21 may be stored in a visual similarity matrix 22. The visual similarity matrix 22 may then be physically stored on a storage medium 31 of the computer-based system 30.

[0128] In a next step 505 the calculations of the respective similarity indexes of each video are ensembled to determine 106 the ranking 7 of the rest of the initial set of videos 1 with respect to the seed video. The similarity indexes may refer to any similarity index other than the visual similarity index that has been calculated in the manners explained in the present disclosure, such as: an audio similarity index, a metadata similarity index, or an online similarity index.

[0129] FIG. 9 shows a schematic view of an illustrative computer-based system 30 in accordance with the present disclosure.

[0130] The computer-based system 30 can include a storage medium 31, a processor 32, a memory 33, a communications circuitry 34, a bus 35, an input interface 36, an audio output 37, and a display 38. The computer-based system 30 can include other components not shown in FIG. 9, such as a power supply for providing power to the components of the computer-based system. Also, while only one of each component is illustrated, the computer-based system 30 can include more than one of some or all of the components.

[0131] A storage medium 31 stores information and instructions to be executed by the processor 32. The storage medium 31 can be any suitable type of storage medium offering permanent or semi-permanent memory. For example, the storage medium 31 can include one or more storage mediums, including for example, a hard drive, Flash, or other EPROM or EEPROM. As described in detail above, the storage medium 31 can be configured to store digital video signals of a plurality of videos 1 and digital audio signals 2 that may originate from being extracted from the videos 1 using the computer-based system 30, in accordance with the present disclosure.

[0132] A processor 32 controls the operation and various functions of system 30. As described in detail above, the processor 32 can control the components of the computer-based system 30 to execute a method of recommending video content in accordance with the present disclosure. The processor 32 can include any components, circuitry, or logic operative to drive the functionality of the computer-based system 30. For example, the processor 32 can include one or more processors acting under the control of an application.

[0133] In some embodiments, the application can be stored in a memory 33. The memory 33 can include cache memory, Flash memory, read only memory, random access memory, or any other suitable type of memory. In some embodiments, the memory 33 can be dedicated specifically to storing firmware for a processor 32. For example, the memory 33 can store firmware for device applications.

[0134] A bus 35 may provide a data transfer path for transferring data to, from, or between a storage medium 31, a processor 32, a memory 33, a communications circuitry 34, and some or all of the other components of the computer-based system 30.

[0135] A communications circuitry 34 enables the computer-based system 30 to communicate with other devices, such as a server. For example, communications circuitry 34 can include Wi-Fi enabling circuitry that permits wireless communication according to one of the 802.11 standards or a private network. Other wired or wireless protocol standards, such as Bluetooth, can be used in addition or instead.

[0136] An input interface 36, audio output 37, and display 38 provides a user interface for a user to interact with the computer-based system 30.

[0137] The input interface 36 may enable a user to provide input and feedback to the computer-based system 30. The input interface 36 can take any of a variety of forms, such as one or more of a button, keypad, keyboard, mouse, dial, click wheel, touch screen, or accelerometer.

[0138] An audio output 37 provides an interface by which the computer-based system 30 can provide music and other audio elements to a user. The audio output 37 can include any type of speaker, such as computer speakers or headphones.

[0139] A display 38 can present visual media and can be configured to show a user interface 380 to the user. A display 38 can include, for example, a liquid crystal display, a touchscreen display, or any other type of display.

[0140] The computer-based system 30 can communicate with, or further comprise a user profile database 18. The user profile database 18 can be stored on a server that is configured to store and provide data to a client device such as the computer-based system 30. The server can store media and other data, such as user preference data 17, and the server can receive data download requests from a client device. The server can communicate with the client device over a communications link which can include any suitable wired or wireless communications link, or combinations thereof, by which data may be exchanged. For example, the communications link can include a satellite link, a fiber-optic link, a cable link, an Internet link, or any other suitable wired or wireless link.

[0141] FIGS. 10A and 10B show schematic views of an illustrative user interface 380 implemented to be displayed by the display device 38 of the computer-based system 30 in accordance with the present disclosure.

[0142] FIG. 10A shows an exemplary embodiment of the user interface 380 configured to be shown on the display device 38.

[0143] The user interface 380 may comprise a header menu 381 comprising menu buttons with text displayed on them such as ‘MOVIES’, ‘VIDEOS’, ‘TV’ that represent different initial sets of videos 1 to choose from.

[0144] The user interface 380 may further comprise a search bar 382 that enables a user 16 to manually enter a search string when looking for a particular seed video to start with.

[0145] The user interface 380 may further comprise a profile area 383 which can lead to an information page about the user 16 comprising their settings, personal video preferences, etc.

[0146] The user interface 380 comprises a seed video selector area 384 comprising a plurality of visual representations T.sub.1 . . . n, each visual representation T representing one video from an initial set of videos 1. The visual representations T.sub.1 . . . n may have equal or different sizes according to initial recommendation values that may be associated with them as preset values. Each visual representation T may be in the form of a cover image that generally is associated with the represented video to make a selection by a user 16 easier and straightforward without the need of any further information (such as metadata). The user 16 can select one of the visual representations T of the videos 1 from the seed video selector area 384 that results in sending a query Q to the computer-based system 30 comprising a reference to a seed video based according to the selected visual representation.

[0147] The user interface 380 may further comprise a genre collection area 385 that may comprise menu buttons with text displayed on them such as ‘COMEDY’, ‘DRAMA, ‘ACTION’ that represent different initial groupings of videos 1 to choose from according to pre-assigned genre values.

[0148] FIG. 10B shows an exemplary embodiment of the user interface 380 that is configured to display, as a result of the user 16 selecting a selected visual representation 386 of a seed video, a video recommendation area 389 comprising a plurality of visual representations T.sub.1 . . . m, wherein each visual representation T represents one video 1 from the initial set of videos 1, and wherein the plurality of visual representations T.sub.1 . . . m are ordered according to a ranking 7 determined by executing the steps of a method for recommending video content in accordance with the present disclosure.

[0149] The user interface 380 may further comprise the selected visual representation 386 of a seed video in the form of a cover image that is associated with the seed video, a metadata display area 387 for showing to the user 16 information from the metadata of the seed video (such as title, duration, etc.), and a summary display area 388 for showing to the user 16 a summary of the seed video content or other relevant content-related information.

[0150] The user interface 380 may further comprise a recommendation adjustment area 390 comprising visual means for dynamically adjusting the order of the visual representations T.sub.1 . . . m in the video recommendation area 389. This adjustment can be achieved by adjusting the weight with which user preference data 17 is taken into account when calculating the ranking 7, or adjusting the weight with which different similarity indexes, such as a metadata similarity index, an online similarity index, or a visual similarity index is taken into account during ensembling calculations for determining the ranking 7. The visual means may comprise a graphical element, such as a slider, for allowing a user to horizontally move a marker and thereby execute the adjustment. The visual means may also comprise a numerical input field allowing a user 16 to input a number between 1 and 100 representing the proportion in percentages between an original ranking 7 and an adjusted ranking 7A of the video recommendations.

[0151] The various aspects and implementations have been described in conjunction with various embodiments herein. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject-matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.

[0152] The reference signs used in the claims shall not be construed as limiting the scope.