METHOD AND DEVICE FOR DISPLAYING A PLURALITY OF VIDEOS
20170289485 · 2017-10-05
Inventors
Cpc classification
H04N21/44
ELECTRICITY
H04N21/4316
ELECTRICITY
G09G2340/10
PHYSICS
H04N21/44209
ELECTRICITY
H04N5/44504
ELECTRICITY
G06F3/14
PHYSICS
H04N21/4312
ELECTRICITY
International classification
H04N21/442
ELECTRICITY
H04N21/44
ELECTRICITY
Abstract
A method for displaying a plurality of videos is disclosed. The method comprises: displaying a main video in a main graphical unit; displaying a secondary video among said plurality of videos in each of at least a secondary graphical unit; wherein the size or structure or position or transparency or overlap or shape of the at least one secondary graphical unit depends on information representative of spatio-temporal connectivity between a segment currently displayed of the main video and a segment currently displayed of the secondary.
Claims
1. A method, performed by a computer, for generating a graphical interface displaying a plurality of videos, comprising: generating a main graphical unit displaying a main video; generating at least one secondary graphical unit displaying a secondary video among said plurality of videos; wherein a relative position of said at least one secondary graphical unit to said main graphical unit corresponds to a relative spatial position of a segment currently displayed of said main video to a segment currently displayed of said secondary video.
2. The method of claim 1 wherein the size or structure or position or transparency or overlap or shape of said at least one secondary graphical unit further depends on a maximum number of graphical units.
3. The method of claim 1, wherein information representative of spatio-temporal connectivity comprises a connectivity score and wherein the size of said secondary graphical unit increases when said connectivity score increases.
4. (canceled)
5. The method of claim 1, wherein information representative of spatio-temporal connectivity comprises a connectivity score and wherein the transparency of said secondary graphical unit decreases when said connectivity score increases.
6. The method of claim 1, wherein information representative of spatio-temporal connectivity comprises a connectivity score and wherein the overlap of said secondary graphical unit decreases when said connectivity score increases.
7. The method of claim 1, wherein said information representative of spatio-temporal connectivity comprises a connectivity score and wherein said secondary graphical unit is placed closer to the main graphical unit when said connectivity score increases.
8. The method of claim 2, wherein the size or structure or position of said at least one secondary graphical unit are quantified.
9. The method of claim 2, wherein the size, structure, position, transparency, overlap of said at least one secondary graphical unit continuously varies from a first size, structure, position, transparency, overlap corresponding to a first information to a second size, structure, position, transparency, overlap corresponding to a second information.
10. The method of claim 2, wherein the main graphical unit is centered in the display or wherein the main graphical unit has a higher size than said at least one secondary graphical unit or wherein the main graphical unit is placed in the front of said at least one secondary graphical unit.
11. A graphical interface for displaying a plurality of videos, the graphical interface comprising: a main graphical unit displaying a main video; at least one secondary graphical unit, each secondary graphical unit displaying a secondary video among said plurality of videos; wherein a relative position of said at least one secondary graphical unit to said main graphical unit corresponds to a relative spatial position of a segment currently displayed of said main video to a segment currently displayed of said secondary video.
12. (canceled)
13. A device for generating a graphical interface displaying a plurality of videos comprising at least one processor configured to: generate a main graphical unit displaying a main video; generate at least one secondary graphical unit displaying a secondary video among said plurality of videos; wherein the at least one processor is further configured to determine a relative position of said at least one secondary graphical unit to said main graphical unit, said relative position corresponding to a relative spatial position of a segment currently displayed of said main video to a segment currently displayed of said secondary video.
14-15. (canceled)
16. The device of claim 13 wherein a size or structure or transparency or overlap or shape of said at least one secondary graphical unit further depends on an information representative of spatio-temporal connectivity.
17. The device of claim 16 wherein the size or structure or position or transparency or overlap or shape of said at least one secondary graphical unit further depends on a maximum number of graphical units.
18. A processor readable medium having stored therein instructions for causing a processor to perform at least the steps of the method according to claim 1.
19. The method of claim 1 wherein a size or structure or transparency or overlap or shape of said at least one secondary graphical unit further depends on an information representative of spatio-temporal connectivity.
Description
4. BRIEF DESCRIPTION OF THE DRAWINGS
[0031] In the drawings, an embodiment of the present invention is illustrated. It shows:
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
5. DETAILED DESCRIPTION OF THE INVENTION
[0040]
The input 10 is linked to a first video decoder 12 configured to display a main video in a main graphical unit wherein the main video MV is selected by a user for reproduction among a collection of videos. The input 10 is further linked to an analysis module 14 configured to generate information relative to spatial and/or temporal relationship between videos. According to a variant the analysis module 14 is external to the processing device 1 and the information is input to the processing device through the input 10. The outputs of the modules 12 and 14 are connected to a module 16 configured to obtain an information representative of spatio-temporal connectivity between the temporal segment currently displayed of the main video and a temporal segment of a secondary video and determine the sizing, structuring, positioning, transparency, overlapping of at least one secondary graphical unit. The second decoder 18 (or a plurality of second decoders) is configured to display the secondary video(s) SV obtained by the module 16 in a second graphical unit according to characteristics determined by the module 16. The first video decoder 12 and the at least one second video decoder 18 are linked to an output 22 so as to be sent to a video display. In the variant where the analysis module is internal 14, the information relative to spatial and/or temporal relationship between videos determined by the analysis module 14 can be stored in a memory or can be sent to a destination. As an example, such information is stored in a remote or in a local memory, e.g. a video memory or a RAM, a hard disk. In a variant, the information is sent to a storage interface, e.g. an interface with a mass storage, a ROM, a flash memory, an optical disc or a magnetic support and/or transmitted over a communication interface, e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
[0045]
[0046] According to an exemplary and non-limitative embodiment of the invention, the processing device 1 further comprises a computer program stored in the memory 120. The computer program comprises instructions which, when executed by the processing device 1, in particular by the processor 110, make the processing device 1 carry out the processing method described with reference to
[0047] According to exemplary and non-limitative embodiments, the processing device 1 is a device, which belongs to a set comprising: [0048] a mobile device ; [0049] a communication device ; [0050] a game device ; [0051] a tablet (or tablet computer) ; [0052] a laptop ; [0053] a still picture camera; [0054] a video camera ; [0055] an encoding chip; [0056] a decoding chip; [0057] a still picture server ; [0058] a video server (e.g. a broadcast server, a video-on-demand server or a web server) ; and [0059] a video sharing platform.
[0060]
[0061] The method for displaying a collection of videos in an intuitive and user-friendly way enhancing video browsing relies on two main steps: [0062] Firstly, finding relations between video clips and recording that information in a structure so that the videos collection can be easily explored; [0063] Secondly displaying the information by integrating directly on the video display some means to jump from one video to others.
Thus, in a step S10, information relative to spatial and/or temporal relationship between videos is generated, e.g. by the analysis module 14. The terms link, connectivity or relationship are indifferently used in the description. Accordingly, a graph is built where a node corresponds to a temporal segment of videos and where a link corresponds to a score for instance representative of the quality of the spatial relationship such as in a non-limitative example, 80% of matched pixels between main and secondary videos with a sufficient confidence.
[0064] Then, in a step S20, a video is presented to a user in a main graphical unit along with link information displayed in a first embodiment as a graphical unit (arrows) or in a second embodiment as properties of at least one second graphical unit displaying a secondary video related to the main video through the link.
[0065] The skilled in the art will appreciate that the disclosed method may be shared between a screen and a processing device, the processing device generating a user interface with first and second graphical units and sending information to a screen for displaying the generated user interface.
[0066]
[0067]
[0068] The step S10 of time-space link generation is explained according a particularly interesting embodiment wherein the links are described as homography (including affine transformation) between videos.
[0069] In a step S11 of key frame selection, the videos of the collection of videos are represented as a sequence of key frames. Since an exhaustive computation of the relative position of all the frames of each video within the dataset is not tractable, the first step is to detect some key frames in order to reduce the cardinality of the input frames set. The man skilled in the art know that step S11 is common in many video processing applications (scene summary, copy detection . . . ) and many algorithmic solutions exist to achieve it based on clustering plus election of a representative frame, shot detection plus stable frames, motion analysis, etc. Unlike in copy detection, in the context of user generated content of a same event, images within videos that are very different from the previous ones mean that a point of view has been changed or the object has changed. Consequently, a way to overcome the problem of finding an homography between one video and another one is solved by firstly finding temporal segments of the first video and temporal segments of the second one and then secondly by selecting a meaningful key frame for each segment wherein the homography is computed between pair of key frames of videos.
[0070] In a step S12 of key points selection, some points of interest are detected and described on each key frame selected in the previous step.
[0071] In a step S13 of key points matching, key points of pair of key frames of videos are matched. This is also a common step in image processing (stitching, retrieval . . . ) and one of the techniques is the well-known SIFT algorithm. This step is done on each pair of key frames.
[0072] In a step S14 of affine transform calculation, an affine transform is computed for each pair of key frames in different videos of the collection using matched key points. As the man skilled in art knows such affine transform is represented by 2×2 transformation matrix and a constant displacement vector.
[0073] In a step S15 of affine transform interpretation an advantageous processing is disclosed to estimate the relative position into space of the pair of key frames using the homography. The problem to solve is to find the relative position of two images (i.e key frames) while coefficients in the homography matrix stands for some translation and some rotation information. The step proposed here allows a fast geometric interpretation of the relative position between two images. As represented on
[0074] In a step S16 of link generation, information relative to spatial positioning of 2 videos is defined. That information is stored for each pairs of key frames as a metadata or recorded as a graph. Indeed, the previous process is iterated for all the pairs of key frames, hence all the pairs of video segments. A complete connection graph can then be constructed, giving the relationship between all the video segments. Even those with no direct relation can be estimated through transitivity, i.e. pathfinding through the graph. This graph can be appended to the videos as a metadata file and interpreted by the video player as it will be seen later with
[0075]
[0076] In a step S21, a video MV is displayed in a graphical unit, e.g. by the module 12 i.e. the main video decoder. This video is called main video MV and the graphical unit is called main graphical unit. The main graphical unit is advantageously larger than secondary graphical units, placed in the center of the window display. According to a variant, the main video MV is selected by a user in the video among a collection of videos. According to other variant, the main video is automatically selected for reproduction by an application, for instance by selecting the last video viewed or the video the most viewed. The main video MV is split into temporal segments. Thus, when the main video MV is displayed into the main graphical unit, the module 12 is further configured to determine the temporal segment of the main video that is currently displayed.
[0077] In a step S22, a secondary video SVi is displayed in a secondary graphical unit e.g. by the module 18 i.e the secondary video decoder. In the following, a secondary video and the graphical unit presenting the video are indifferently named SV. According to a characteristic particularly advantageous, features of the secondary graphical unit depend on information representative of spatio-temporal connectivity between a pair comprising a main video segment currently displayed and a temporal secondary video segment currently displayed. As detailed with the various embodiments later on detailed, among the features of the graphical units, the size, the structure of the arrangement of graphical units, their relative position, their transparency, or their overlap are disclosed. According to a preferred embodiment, there is more than one secondary graphical unit, for instance from 3 to 12 as represented on
[0078] Thus in a substep S23, information representative of spatio-temporal connectivity from the main video is selected. Indeed as explained for the graph, a plurality of links connects the temporal segment of the main video (corresponding to a node in the graph) that is currently displayed with other segments of other videos (corresponding to others nodes) and, in a variant, segments of the main video itself (intra relationship) Advantageously, a score, called connectivity score, is attached to each link according to its relevance. The method is compatible with any metric for estimating the connectivity score. All methods share the hypothesis that a relational graph has been established between the videos of the collection. A graph may determine when, where and in what amount (if this can be quantified) two videos are related (by appearance similarity, action, point of view, semantics etc).
[0079] The links are sorted from the highest connectivity score to the lowest connectivity score and the secondary video SVi associated with links of the higher score are selected for visualization by the viewer of the main video MV. According to a first characteristic represented on
[0080] The links are continuously sorted for each current temporal segment of the main video, i.e; each time a new each current temporal segment is reached while the main video is decoded. Accordingly from a first segment of the main video to the following second segment of the main video, links are again sorted, a first and second information are selected and dependent graphical units characteristics, associated secondary video are obtained for each first and second segment for instance by the module 16. According to a sixth characteristic, the characteristics such as size, structure, position, transparency, overlap of the secondary graphical unit continuously varies from a first size, structure, position, transparency, overlap corresponding to a first information to a second size, structure, position, transparency, overlap corresponding to a second information thus allowing a smooth transition between 2 settings of graphical units. In a variant of the sixth characteristic, a straight change occurs from the first to the second graphical unit setting.
[0081] According to a seventh characteristic as represented on
[0082] In a step, not represented in
[0083]
[0084]
[0085]
[0086]
[0087] The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
[0088] Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
[0089] Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
[0090] As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
[0091] A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.