Video processing apparatus and method
10014029 ยท 2018-07-03
Assignee
Inventors
- Gye-wook Jo (Uiwang-si, KR)
- Won-cheul Kim (Yongin-si, KR)
- Sang-hoon LEE (Seoul, KR)
- Jong-woo Kim (Seoul, KR)
- Min Lee (Seoul, KR)
Cpc classification
H04N5/775
ELECTRICITY
G11B27/28
PHYSICS
G06V20/46
PHYSICS
H04N5/93
ELECTRICITY
G11B27/10
PHYSICS
G06V20/49
PHYSICS
H04N5/765
ELECTRICITY
International classification
H04N5/775
ELECTRICITY
G11B27/031
PHYSICS
Abstract
Provided is a video processing method and apparatus. The video processing method includes acquiring an input video including a plurality of video frames and audio frames; dividing the input video into one or more sections; determining a representative video frame from among the plurality of video frames with respect to each of the one or more sections; and acquiring a slide video that includes the representative video frames.
Claims
1. A video processing method, comprising: acquiring an input video including a plurality of video frames and audio frames; dividing the input video into one or more sections; determining a representative video frame from among the plurality of video frames with respect to each of the one or more sections; and acquiring a slide video that includes each of the representative video frames, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, and each of the representative video frames is synchronized with audio frames corresponding to the section of each of the representative video frames, and wherein when a certain object provided from the input video is included in a video frame, the video frame including the certain object is determined as the representative video frame.
2. The video processing method of claim 1, wherein the dividing of the input video is performed so that the one or more sections have equal lengths.
3. The video processing method of claim 1, wherein the determining of the representative video frame is performed based on information about viewer's preferences when the information about the viewer's preferences with respect to the plurality of video frames included in the input video is provided.
4. A video processing method, comprising: acquiring an input video including a plurality of video frames and audio frames; dividing the input video into one or more sections; determining a representative video frame from among the plurality of video frames with respect to each of the one or more sections; and acquiring a slide video that includes each of the representative video frames, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, and each of the representative video frames is synchronized with audio frames corresponding to the section of each of the representative video frames, wherein the determining of the representative video frame is performed based on analysis of at least one of the audio frames and the plurality of video frames, and wherein when a certain sound is detected from the audio frames, the representative video frame is determined based on a location where the certain sound is reproduced.
5. A video processing method, comprising: acquiring an input video including a plurality of video frames and audio frames; dividing the input video into one or more sections; determining a representative video frame from among the plurality of video frames with respect to each of the one or more sections; acquiring a slide video that includes each of the representative video frames; extracting a representative text corresponding to each of the representative video frame, when text information accompanied with the input video is provided, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, and each of the representative video frames is synchronized with audio frames corresponding to the section of each of the representative video frames, and wherein the acquiring of the slide video comprises acquiring a slide video that includes the representative text corresponding to each of the representative video frame.
6. A video processing apparatus, comprising: an input unit configured to receive an input video including a plurality of video frames and audio frames; and a video processing unit configured to divide the input video into one or more sections, determine a representative video frame from among the plurality of video frames included in each section with respect to each of the one or more sections, and acquire a slide video that includes each of the representative video frames, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, wherein each of the representative video frames is synchronized with the audio frames corresponding to the section of each of the representative video frames, and wherein when a certain object provided from the input video is included in a video frame, the video processing unit is further configured to determine the video frame including the certain object as the representative video frame.
7. The video processing apparatus of claim 6, wherein the video processing unit divides the input video so that the one or more sections have equal lengths.
8. The video processing apparatus of claim 6, wherein when information about viewer's preferences with respect to the plurality of video frames included in the input video is provided with the input video, the video processing unit is further configured to determine the representative video frame based on the information about the viewer's preferences.
9. A video processing apparatus, comprising: an input unit configured to receive an input video including a plurality of video frames and audio frames; and a video processing unit configured to divide the input video into one or more sections, determine a representative video frame from among the plurality of video frames included in each section with respect to each of the one or more sections, and acquire a slide video that includes each of the representative video frames, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, wherein each of the representative video frames is synchronized with the audio frames corresponding to the section of each of the representative video frames, wherein the video processing unit is further configured to determine the representative video frame based on analysis of at least one of the audio frames and the plurality of video frames, and wherein when a certain sound is detected from the audio frames, the video processing unit is further configured to determine the representative video frame based on a location where the certain sound is reproduced.
10. A video processing apparatus, comprising: an input unit configured to receive an input video including a plurality of video frames and audio frames; and a video processing unit configured to divide the input video into one or more sections, determine a representative video frame from among the plurality of video frames included in each section with respect to each of the one or more sections, and acquire a slide video that includes each of the representative video frames, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, and each of the representative video frames is synchronized with the audio frames corresponding to the section of each of the representative video frames, wherein the input unit is further configured to acquire text information when the text information accompanied with the input video is provided, and the video processing unit is configured to extract a representative text corresponding to each of the representative video frames from the text information and acquire the slide video in which the representative text corresponding to each of the representative video frames is included.
11. A non-transitory computer-readable recording medium having embodied thereon a computer program for executing the video processing method according to claim 1.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) These and/or other aspects will become apparent and more readily appreciated from the following description of the exemplary embodiments, taken in conjunction with the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
BEST MODE FOR CARRYING OUT THE INVENTION
(10) According to one or more exemplary embodiments, a video processing method includes: acquiring an input video including a plurality of video frames and audio frames; dividing the input video into one or more sections; determining a representative video frame from among the plurality of video frames with respect to each of the one or more sections; and acquiring a slide video that includes the representative video frames, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, and each of the representative video frame is synchronized with audio frames corresponding to the section of the representative video frames.
(11) According to one or more exemplary embodiments, a video processing apparatus includes: an input unit configured to receive an input video including a plurality of video frames and audio frames; and a video processing unit configured to divide the input video into one or more sections, determine a representative video frame from among the plurality of video frames included in each section with respect to each of the one or more sections, and acquire a slide video that includes the representative video frames, wherein each of the representative video frames is displayed as long as a play time of a section corresponding to each of the representative video frames, and each of the representative video frames is synchronized with the audio frames corresponding to the section of the representative video frames.
MODE FOR THE INVENTION
(12) Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. In this regard, the present exemplary embodiments may have different forms and should not be construed as being limited to the descriptions set forth herein. Accordingly, the exemplary embodiments are merely described below, by referring to the figures, to explain aspects of the present description. As used herein, the term and/or includes any and all combinations of one or more of the associated listed items. Expressions such as at least one of, when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list.
(13)
(14) An upper block 100 displays audio data 110 and a plurality of video frames 120, 122, 124, 126, 128, and 130 of an input video before being processed by a video processing apparatus according to passage of time. Although only the six video frames 120, 122, 124, 126, 128, and 130 are shown in the upper block 100 for convenience of description, the input video may include more video frames. A video frame at a left side is earlier than a video frame at a right side. Therefore, temporally, the video frame 120 at a leftmost side is the earliest video frame, and the video frame 130 at a rightmost side is the last video frame. The audio data 110 includes a plurality of audio frames that corresponds to video frames (not drawn in
(15) The upper block 100 may be divided into two sections based on a center dotted line 125. The video frames 120, 122, and 124 are included in a section at a left side based on the dotted line 125. The video frames 126, 128, and 130 are included in a section at a right side of the dotted line 125. For example, a representative video frame is determined in each section. According to an example of
(16) A lower block 150 includes audio data 160 and representative video frames 170 and 172 of a slide video. The audio 160 of the lower block 150 may be the same as the audio 110 of the upper block 100. Referring to
(17) In the slide video, the representative video frame 170 is displayed during a period of reproducing the video frames 120, 122, and 124 in the input video. Likewise, the representative video frame 172 is displayed during a period of reproducing the video frames 126, 128, and 130 in the input video.
(18) In
(19)
(20) The video processing apparatus 200 may include an input unit 210, a video processing unit 220, a storage unit 230, and an output unit 240. In
(21) The input unit 210 may acquire an input video including a plurality of video frames and audio. The input unit 210 may acquire various kinds of data accompanied with the input video. For example, if the input video include text information such as subtitles, the input unit 210 may also acquire the text information. As another example, if the input video includes metadata about a producer, a play time, and an encoding type of the input video, the input unit 210 may also acquire the metadata.
(22) The input unit 210 may acquire input videos from various sources. For example the input unit 210 may acquire input videos at constant intervals from a broadcasting signal 210 in real time. As another example, the input unit 210 may acquire input videos from an external input source 214 such as a CD, a DVD, a USB, or an HDMI. The input unit 210 may be implemented as a transmitter, a CD reader, a DVD reader, a processor or a device that can read video and audio data. Also, the input unit 210 may be implemented as a software that performs inputting video and audio data to the processor 220.
(23) The video processing unit 220 may include a video processor 222, an audio processor 228, and a slide video generator 229. The video processing unit 220 may be implemented as one of more processors.
(24) The video processor 222 may include an input video division unit 224 and a representative video frame determiner 226, and determines a representative video frame from the input video.
(25) The input video division unit 224 may divide the input video into at least one section. The input video division unit 224 may divide the input video so that each divided section has a constant length. For example, the input video division unit 224 may divide the input video so that the divided sections have video frames of the equal number.
(26) The input video division unit 224 may divide the input video so that the divided sections have different lengths from each other, according to some embodiments. For example, if the input video includes text regarding dialogues between persons, the input video division unit 224 may divide the input video so that the divided sections have different lengths that vary depending on amount of the text.
(27) The representative video frame determiner 226 may determine a representative video frame from among the video frames included in each of the sections that are divided by the input video division unit 224. The representative video frame determiner 226 may determine the representative video frame based on various criteria. For example, the representative video frame determiner 226 may determine the representative video frame based on viewer preference information, if the input unit 210 acquires the information about the viewer preference with respect to the frames included in the input video. As another example, the representative video frame determiner 226 may determine the representative video frame based on analysis on at least one of audio and video frames. An example of determining the representative video frame based on the analysis of the audio and video frames is illustrated with reference to
(28) The representative video frame determiner 226 may encode and output the determined representative video frame. For example, the representative video frame determiner 226 may encode the representative video frame in a file format such as jpg, gif, or png, and output the encoded representative video frame.
(29) The audio processor 228 may extract audio data from the input video. The audio processor 228 may encode the extracted audio data and output the encoded audio data. For example, the audio processor 228 may encode the audio data in a file format such as mp3 or wma, and may output the encoded audio data.
(30) The slide video generator 229 may generate a slide video by synchronizing the representative video frame output from the video processor 222 and the audio data output from the audio processor 228 with each other. For example, if a representative video frame A is on behalf of video frames included in a time period from 00:00 to 00:05 of the input video, the representative video frame A may be synchronized with a part of the audio data corresponding to the time period 00:00 to 00:05. The slide video generated as above plays the part of the audio data corresponding to the time period 00:00 to 00:05 in the section of 00:00 to 00:05, and at the same time, displays the representative video frame A.
(31) The slide video generator 229 may make metadata provided with the input video included in the slide video. For example, the slide video generator 229 may make metadata about a producer, a play time, and an encoding type of the input video included in the slide video.
(32) In addition, if the input unit 210 acquires text information accompanied with the input video, the slide video generator 229 extracts representative text corresponding to the representative video frame from the text information and adds the representative text to the video frame that is determined as the representative video frame. An example of adding the representative text to the representative video frame is shown in
(33) The storage unit 230 may store the slide video acquired by the processing unit 220. The storage unit 230 may be a volatile memory, a non-volatile memory, an internal memory, an external memory, or a combination thereof. For example, the storage unit 230 may be a non-volatile storage device such as a non-volatile random access memory (NVRAM), a flash memory, and a disk storage device, or a volatile storage device such as a static random access memory (SRAM).
(34) The storage unit 230 may store the representative video frame and the audio configuring the slide video, as well as the slide video. Also, if the slide video provides the text information such as subtitles or the metadata of the input video, the storage unit 230 may store the text information or the metadata.
(35) The output unit 240 may output the slide video acquired by the processing unit 220 or the slide video stored in the storage unit 230 to an external device of the video processing apparatus 200. The output unit 240 may be connected to the external device in various ways. The output unit 240 may be implemented as a transmitter or a variety of device that transmits the data to the external device. Also, the output unit may be implemented as a display that reproduces the slide video.
(36)
(37)
(38) An upper block 300 displays a plurality of video frames 310, 320, 330, and 340 of the input video according to lapse of time. The upper block 300 includes only four video frames for convenience of description, but more video frames may be included in the input video. A video frame located at a left side is temporally earlier than a video frame located at a right side. Therefore, the video frame 310 that is located at the leftmost side is the earliest video frame, and the video frame 340 that is located at the rightmost side is the latest video frame.
(39) The upper block 300 is divided into two sections based on a dotted line on a center portion thereof. A left section of the dotted line includes the video frames 310 and 320. A right section of the dotted line includes the video frames 330 and 340. Each of the sections has a representative video frame.
(40) A video frame including a certain object in the section may be determined as the representative video frame. In
(41) In
(42)
(43) An upper block 400 displays a plurality of video frames 410, 420, 430, and 440 of an input video according to lapse of time. The upper block 400 includes only four video frames 410, 420, 430, and 440 for convenience of description, but more video frames may be included in the input video.
(44) A video frame located at a left side is temporally earlier than a video frame located at a right side. Therefore, the video frame 410 that is located at the leftmost side is the earliest video frame, and the video frame 440 that is located at the rightmost side is the latest video frame.
(45) The upper block 400 is divided into two sections based on a dotted line at a center portion thereof. A left section of the dotted line includes the video frames 410 and 420. A right section of the dotted line includes the video frames 430 and 440. Each of the sections has a representative video frame.
(46) If a certain sound is detected from the audio in each section, the representative video frame may be determined based on a location where the certain sound is detected. In
(47) In
(48)
(49) An upper block 500 displays a plurality of video frames 510, 520, 530, and 540 of an input video according to lapse of time. According to
(50) The upper block 500 is divided into two sections based on a dotted line on a center portion thereof. A left section of the dotted line includes the video frames 510 and 520. A right section of the dotted line includes the video frames 530 and 540. One representative video frame is determined in each of the sections.
(51) When the video frame 510 at the leftmost side is displayed, voice hello is detected. When the video frame 520 is displayed, voice how are you doing? is detected. When the video frame 530 is displayed, voice not bad is detected. When the video frame 540 is displayed, voice and you? is detected.
(52) A slide video may recognize the voice detected in a section corresponding to the representative video frame, and may display text corresponding to the recognized voice with the representative video frame. For example, a representative frame 560 is displayed with a text hello, how are you doing? 565 that is a combination of the voices detected from the video frames 510 and 520 on a lower portion thereof. A representative video frame 570 is displayed with a text not bad, and you? 575 that is a combination of the voices detected from the video frames 530 and 540 on a lower portion thereof.
(53) In
(54) If text information about the audio of the input video is acquired with the input video, text corresponding to the detected voice may be extracted from the text information and displayed, instead of recognizing the detected voice.
(55)
(56) In operation S610, an input video including a plurality of video frames and audio frames are acquired. Information about viewer's preferences with respect to the video frames of the input video may be acquired with the input video. Also, text information about the input video may be acquired.
(57) In operation S620, the input video is divided into one or more sections. The input video may be divided so that each of the sections has an equal length to each other.
(58) In operation S630, a representative video frame from among the video frames included in each of the sections is determined with respect to each of the one or more sections. If view preference information including reactions of the viewers with respect to the video frames included in the input video is provided with the input video, the representative video frame may be determined based on the viewer preference information.
(59) Also, the representative video frame may be determined by analyzing at least one of the audio and the video frames. For example, if a certain sound is detected from the audio, the representative video frame may be determined on a location where the certain sound is reproduced. As another example, if a video frame includes a certain object in the input video, the video frame including the certain object may be determined as a representative video frame.
(60) In operation S640, each of the representative video frames is synchronized with the audio, and then, a slide video displayed as long as a reproduction time of the section corresponding to the representative video frame is acquired.
(61)
(62) The storage unit 230 of
(63) The representative video frame images 710 may be compressed as file formats such as jpg or gif. File names of the representative video frame images 710 may be determined to be related to a file name of the slide video 718. Referring to
(64) The audio 714 included in the input video may be stored as an audio file having an extension such as way, mp3, or wma. A file name of the audio 714 may be determined to be related to the file name of the slide video 718. Referring to
(65) The slide video 718 may be stored as a file having a predetermined extension. According to
(66) On the contrary, the slide video file may include information about a location where data of the representative video frames and the audio is stored, instead of including the data about the representative video frames and the audio. For example, A.PLT 750 may include information about locations of the files A_001.jpg, A_002.jpg . . . 734 and A.wav 744.
(67) The representative video frame images 710, the audio 714, and the slide video 718 are not necessarily stored in the same storage device, but may be distributed to be stored.
(68)
(69)
(70)
(71)
(72)
(73) The exemplary embodiments can be written as computer programs and can be implemented in general-use digital computers that execute the programs using a computer readable recording medium. Examples of the computer readable recording medium include magnetic storage media (e.g., ROM, floppy disks, hard disks, etc.), optical recording media (e.g., CD-ROMs, or DVDs), etc.
(74) It should be understood that the exemplary embodiments described therein should be considered in a descriptive sense only and not for purposes of limitation. Descriptions of features or aspects within each exemplary embodiment should typically be considered as available for other similar features or aspects in other exemplary embodiments.
(75) While one or more exemplary embodiments have been described with reference to the figures, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope as defined by the following claims.