METHOD AND DEVICE FOR EMULATING CONTINUOUSLY VARYING FRAME RATES
20180025686 · 2018-01-25
Inventors
- Krzysztof TEMPLIN (Saarbrücken, DE)
- Karol MYSZKOWSKI (Saarbrücken, DE)
- Hans-Peter Seidel (St. Ingbert, DE)
- Piotr DIDYK (Homburg, DE)
Cpc classification
G09G2320/0247
PHYSICS
H04N21/440281
ELECTRICITY
H04N21/440245
ELECTRICITY
G09G3/2092
PHYSICS
H04N5/2625
ELECTRICITY
G09G3/20
PHYSICS
H04N5/262
ELECTRICITY
G09G2340/0435
PHYSICS
H03H17/0621
ELECTRICITY
H04N7/0127
ELECTRICITY
International classification
G09G3/20
PHYSICS
H04N7/01
ELECTRICITY
H04N21/4402
ELECTRICITY
G06T3/40
PHYSICS
H04N5/262
ELECTRICITY
Abstract
The present invention relates to a method and a device for emulating frame rates in video or motion picture.
Claims
1. A method for emulating frame rates in a video, comprising the step of: obtaining a sequence of frames to be displayed at a presentation frame rate to a human viewer, characterized in that the sequence of frames is obtained such that an emulated frame rate of at least a region within a frame of the displayed sequence is perceived to be lower than the presentation frame rate by the human viewer.
2. The method of claim 1, wherein the emulated frame rate can be varied.
3. The method of claim 2, wherein the emulated frame rate can be varied between different regions of a frame and/or between frames.
4. The method of claim 1, wherein the emulated frame rate can be varied continuously.
5. The method of claim 1, wherein a difference between sampling times of some region of consecutive frames varies periodically,
6. The method of claim 5, wherein said difference either equals zero or belongs to a set of at least two, strictly greater than zero, pair-wise different parameters D and within said period all parameters from D are used at least once.
7. The method of claim 6, wherein D contains exactly two parameters, wherein each parameter is used exactly once within said period, and the distance between the two occurrences of parameters from D is equal to half the period length.
8. The method of claim 7, wherein said period has length exactly 2, 4 or 8 frames.
9. The method of claim 6, wherein said difference within said period on average equals the inverse of said presentation frame rate.
10. The method of claim 1, wherein a frame is obtained based on a shutter angle of a camera.
11. The method of claim 1, wherein a frame is obtained by sampling from a sequence of input frames.
12. The method of claim 1, wherein sampling a frame from the sequence of input frames comprises interpolating between two subsequent input frames.
13. The method of claim 1, wherein the frames are obtained by controlling capture times of a video camera.
14. The method of claim 1, wherein the frames are obtained by rendering.
15. The method of claim 1, wherein a frame is obtained based on a displacement parameter ().
16. The method of claim 9, wherein the displacement parameter () is set automatically.
17. The method of claim 9, wherein the displacement parameter () is set by a user.
18. The method of claim 1, wherein the veridical frame rate is 48 fps, 60 fps, 96 fps, 120 fps or 144 fps.
19. The method of claim 1, implemented on a computer.
20. The method of claim 1, wherein the sequence of frames corresponds to a film shot.
21. A non-volatile medium, storing a video generated by a method according to claim 1.
22. A computer program product, comprising instructions that, when executed by a computer, implement a method according to claim 1.
23. A video camera, wherein a capture time of a frame is controlled in order to obtain a sequence of frames to be displayed at a presentation frame rate to a human viewer, characterized in that the sequence of frames is obtained by controlling the capture time such that an emulated frame rate of at least a region within a frame of the displayed sequence is perceived to be lower than the presentation frame rate by the human viewer.
Description
[0011] These and other aspects of the present invention will be more readily understood when studying the following detailed description of the invention, in relation to the annexed drawing in which
[0012]
[0013]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023] The acquisition (i. e., sampling) of a given motion picture frame can be modeled as a convolution of a continuous, time-dependent signal S with a rectangular filter. The temporal support of the filter is proportional to normalized shutter w=/360 and inversely proportional to frame rate f, and is defined as:
[0024] The temporal sampling positions are always distributed uniformly: for a given frame rate f, the sampling time of frame I.sub.k is described by function T.sub.f(k): N.fwdarw.R,T.sub.f (k)=t.sub.0+k/f, where t.sub.o is the sampling time of I.sub.o. Using the above definitions, the sampled frame sequence is given by:
I.sub.k=.sub..sup.S(t).Math.rect.sub.f,w(tT.sub.f(k))dt.
[0025]
[0026] Given a display which operates at f frames per second, a sequence corresponding to the signal S sampled at rate f can be presented directly. It is also straightforward to present content at frame rates lower than f, that result from dividing the presentation frame rate by a positive integer (i.e., f/2, f/3, f/4, . . . ). To this end, it is enough to repeat every frame a fixed number of times, which formally means that for a number of consecutive frames the sampling position of signal S does not change. For instance, to emulate the (f/2)-fps rate every sampling position is used twice, which corresponds to the following modification of T.sub.f.
[0027] Note, that this leads to a situation in which the acquisition times of odd frames do not exactly correspond to their presentation times (see
[0028] The above example is a special case of the more general solution that repeats somebut not allsampling positions. Such a technique can be used to emulate arbitrary frame rates, and in fact, it is routinely used by most video players, which repeat certain frames when required to play content of a lower frame rate on a display with a higher frame rate. This approach, however, introduces additional, unwanted temporal frequencies, causing non-smooth motion (video stutter), which is easily spotted by the observer. For example, one can emulate a 40-fps display at the 48-fps playback rate by repeating every fifth sampling position, but this results in objectionable 8 Hz stutter.
[0029]
[0030] The inventive method overcomes the above limitations and enables emulation of arbitrary frame rates below the display frame rate. An important feature of the solution is that the frame rate can be smoothly varied over the spatial and temporal domain without introducing visible artifacts. For clarity of exposition, it is described how to interpolate between f/2 and f frames per second, where f is the display frame rate. The generalization of the technique to lower frame rates is discussed later.
[0031] The key observation is that the difference between the extreme cases of f fps and f/2 fps is the position of the odd sampling kernels (
[0032] Note that =0 and =1 provide the sampling for the f-fps and the (f/2)-fps case, respectively, i.e., T.sub.f.sup.0T.sub.f and T.sub.f.sup.1T.sub.f)
[0033] Although displacing kernel positions interpolates between two frame rates, the exposure time in terms of the shutter angle is not preserved, because the kernels do not change their width. To solve this problem, one may also interpolate the width of sampling kernels using a generalized version of the sampling function:
where [0,1] is an interpolation parameter.
[0034]
[0035]
[0036] Given the above definitions, one may define a new interpolated sampling with parameters and as follows:
I.sub.k.sup.(,)=.sub..sup.S(t).Math.rect.sub.f,w.sup.(tT.sub.f.sup.(k)dt.
[0037] This interpolation technique enables smooth transition between frame rate f/2 and f fps at shutter angle w.
[0038] The construction described above does not impose any constraints on frame rate f, and in particular the same technique can be applied to a (f/2) Hz display, resulting in interpolation between the rates of (f/4) and (f/2) frames per second. The overlapping kernels of the (f/2)-fps emulation (
[0039] In the above construction, only odd sampling kernels were moved, while keeping even kernels unchanged. This results in a slight positioning error of moving objects along the motion direction, and can cause distortion of the image, particularly visible as slanting of vertical lines. To avoid this effect, an alternative implementation may displace both kernels symmetrically in opposite directions, which is achieved by modifying function T.sub.f.sup. as follows:
[0040] Although interpolation parameters d and g have been defined globally for the whole image, the above equation can be generalized to allow for spatial variation by letting each pixel assume its own d and g. This requires that each pixel be sampled at arbitrary time-points with a kernel of arbitrary size. In the case of rendered content, such a sampling could be incorporated directly in the renderer. Modern renderers can efficiently simulate finite-time exposure, and the only additional feature we require is that instead of using a single global temporal sampling kernel, many local sampling kernels are used. However, when only an input video is available one needs to resample it in order to obtain required sampling kernels. The invention proposes two solutions to this problem: an accurate but costly filtering of a densely-sampled video or a optic-flow-based warping of a regular video.
[0041] If the temporal resolution of the input video is high (hundreds of frames per second), the re-sampling is straight-forward and can be implemented by simple temporal filtering of the input video. Each pixel of each video frame is considered independently, and its value is obtained by averaging pixel values at the corresponding position in all frames that fall within the time interval defined by the kernel. This approach introduces some temporal quantization of the sampling kernel; however, given a sufficiently high input frame rate, this error becomes negligible. The disadvantage of this approach is that generating a densely-sampled video is a costly process.
[0042] When sampling a dense input video is not possible, determining the value of a given pixel at an arbitrary time-point is not trivial. In this case, one may approximate arbitrary, spatially varying sampling kernels using frame blending followed by optic-flow-based frame warping, as described below. The preferred format of the input video for this method is a near-shutter, at a relatively high f (e, g., or 96). Such high-frame-rate videos are an emerging standard in the film industry enabling synthesis of various frame rates and shutter combinations, which is achieved by dropping some of the frames of the original video and blending the remaining ones. For instance, by averaging one, two, three, or four consecutive frames, one obtains the corresponding frame of a 90-, 180-, 270-, or 360-degree, (f=4)-fps video, respectively. In-between shutter angles can be approximated by blending between those outputs. The sequences used in the experiment were generated assuming (below) such input. Applying this method is also possible for lower-frame-rate videos: for instance, when the input video is a 24-fps, 90-degree one, it can be temporally up-sampled to 96 fps, degree using frame interpolation. Depending on the initial frame rate and shutter angle combination, different kernel sizes can be reproduced with varying degree of accuracy. At the very least, the input video can be temporally up-sampled ignoring the shutter angle and a simplified version of the below procedure can be implemented, with the first step (frame blending) omitted.
[0043] Let V.sub.k denote the k-th frame of the f-fps, 360-degree input video, K.sub.k.sup.2.fwdarw.
.sup.+ and D.sub.k
.sup.2.fwdarw.[0,1] the maps of kernel sizes and displacements, respectively, and F.sub.k, B.sub.k
.sup.2.fwdarw.Z.sup.2 the corresponding forward and backward optic flow maps (in our experiments we used the technique by Brox et al. [2004] to estimate these). The value at K.sub.k(i; j) is the integration time for frame k and the pixel position (i; j) in seconds multiplied by 1=f, and the value D.sub.k(i; j) is the displacement parameter d for that pixel.
[0044] The method proceeds in two steps. First, one takes an input frame corresponding to the desired presentation time, and locally blends it with neighboring frames to approximate the required kernel size (pixel indexing is omitted for clarity, all operations are performed pixel-wise):
where clamp(x;a;b)=min(max(a;x);b).
[0045] Second, one warps the frame by re-projecting each pixel to its position in the past or in the future (depending if the frame is even or odd), with the time-point being determined by the desired kernel displacement at the given pixel:
[0046] The arrow notation {circumflex over (V)}.sub.k(i, j){circumflex over (V)}.sub.k(i, j) means, that the pixel in the input image at the position (i; j) is warped to the position (i; j) in the output image.
[0047] After the warping the actual kernel at any given position in {circumflex over (V)}.sub.k is not exactly equal to that given by K.sub.k and D.sub.k for that position, but under the assumption that the kernel displacement/size and optical flow are locally constant, the outcome is equivalent to the filtering solution. Since this method blends few frames to approximate different kernel sizes, its accuracy in this respect is admittedly lower when compared to the dense video approach. However, it has the advantage of a relatively low computation cost, enabling a real-time implementation, e. g., in TV-sets or computer games.
[0048] In order to investigate the perceptual effect of the inventive interpolation technique, one may establish a mapping between combinations of actual frame rates and shutter angles and the interpolation parameters and in the range 24-96 fps. Although the inventive technique is not limited to f=96, it is believed that this is the most interesting scenario for the method, because it allows for an exact emulation of both standard 24 fps and HFR 48 fps. The mapping was derived in the following calibration experiment.
[0049] Ten subjects, including two authors, took part in the experiment. An Asus PG278Q display (27 inch diagonal, native resolution 25601440 px, maximum refresh rate 144 Hz) and an Nvidia GeForce GTX 970 graphics card were used. This configuration supports Nvidia G-Sync technology, which enables the system to refresh the display as soon as the frame has been rendered, without waiting for the next refresh cycle of the display. Thus, by putting the process to sleep for an appropriate number of milliseconds the display could be set programmatically to any frame rate below 144 Hz on the fly. The subjects were seated ca. 50 cm from the display, but were allowed to freely change their position. The experiment was conducted in controlled office lighting conditions.
[0050] The stimulus was a vertical 1001440 px light-gray bar moving left-to-right on a dark-gray background. When the bar reached the right end of the display, the motion was restarted from the left end of the display. The subjects could alternate between the reference bar and the test bar by pressing the left and the right arrow key, respectively. Both bars were moving with velocity v{256 px/s, 512 px/s, 1024 px/s}. The reference bar was displayed with veridical frame rate f.sub.r{29, 34, 40, 68} and normalized shutter angle s.sub.r{0.25, 0.5, 0.75}. The test bar was always displayed using our technique at frame rate f.sub.t=96 fps. Kernel displacement of the test bar could be adjusted via parameter d[1,4] by pressing the plus and the minus key, and shutter angle s.sub.t could be adjusted in the range of [0,4] by pressing [and] key. Values of d[1,2] corresponded to [0,1], whereas values of d[2,4] corresponded to [0,1] assuming virtual frame rate of f/2=48 fps achieved by joint displacement of overlapping kernels. In a single trial, the participant was asked to adjust the kernel displacement d and shutter angle s.sub.t of the test bar so that its appearance matched the appearance of the reference bar as closely as possible, and confirm the settings with Enter key. The whole session consisted of all 3.Math.4.Math.3=36 possible trials in random order, and the time to perform the task was not limited. No test was done for f.sub.r{24, 48, 96} since the method can emulate these rates exactly.
[0051]
[0052] As can be seen, d is approximately inversely proportional to the reference frame rate, however, for 34 and 40 fps this value tends to be lower. This is accompanied by significantly increased blur in comparison to what would be predicted by simple matching of the absolute exposure time. In our experience, the most important factor determining the similarity of the two bars for frequencies between 24 and 48 fps, was the perceived intensity of judder at the bar edges.
[0053]
[0054]
[0055] In other words, the displacement values at the black solid line in
[0056] When the frame rate of the stimulus exceeds the critical flicker frequency, the changing signal is averaged by the visual system, and the bar appears blurred (so-called holdtype blur). Thus, for the highest frame rate (68 fps), the dominant parameter is the amount of blurring at the edges, since virtually no judder is visible in this case.
[0057] The obtained data points can be interpolated and used to define improved correspondence between intended frame rate and interpolation parameters and .
[0058] In order to show that the inventive frame rate emulation leads to possibly similar appearance for real-world content a perceptual evaluation experiment is presented in which one compares the proposed technique against two baseline methods. Sixteen nave, non-expert, paid subjects took part in the experiment. All had normal or corrected-to-normal vision. The experimental setup was the same, as in the calibration experiment.
[0059] Three real-world video sequences were used as stimuli. The reference sequence was rendered using veridical frame rates f.sub.r{29,34,40,68} and shutter s.sub.r{f.sub.r/96.2.Math.f.sub.r/96}(except for f.sub.r, where only s.sub.r=68/96 was used). The rendering of different frame rates and shutter angles was achieved by interpolation and averaging of consecutive frames of the original 96 fps, near-360 videos. The test sequences were synthesized using our technique at frame rate f.sub.t=96 fps, with displacement d and shutter s.sub.t locally adjusted according to the velocities in the video, as determined in the calibration experiment (see
[0060] The subjects could switch between the reference, test, and the comparison sequence using the arrow keys, with the Up key corresponding to the reference bar, and the Left/Right keys corresponding to the test and comparison sequence in random arrangement. In a single trial, the subject was asked to select one of the two sequences that looked more similar to the reference sequence and confirm the choice with the Enter key. One session consisted of all 42 possible trials in random order. The subjects had unlimited time to complete the experiment.
[0061] Before the experiment, a control session was performed in which the frame rate of the reference and the test sequence was set to either 24, 48, or 96 fps and the comparison sequence was set to one of the remaining two frame rates (thus the test sequence was identical to the reference, while the comparison sequence had a significantly different frame rate). Two of the subjects were unable to perform above the chance level in this setting and where subsequently excluded from our analysis.
[0062]
[0063] In general, the inventive technique turned out to be more similar to the reference than the baseline sequences. The baseline methods used nearest standard cinematic frame rates and had matching amount of blur, which can be considered the state-of-the art in terms of matching the film look. There were only two cases where our method performed significantly worse than the baseline, both at higher frame rates, and one of them at 68 fps, where judder is practically invisible, and the only difference in appearance can be attributed to the blur profile. The results of this experiment prove that our technique provides a very good approximation of the look of other frame rates.
[0064] The inventive technique requires sampling the scene at arbitrary times with a kernel of arbitrary size. In the case of real-world content, an emerging standard is to film the scene at 120 Hz with a nearly 360 shutter to enable synthesis of several frame rates and shutter combinations. This temporal resolution might not be sufficient to smoothly interpolate between various sampling kernels, however, it is high enough to estimate optical flow quite reliably and thus to obtain required level of precision via frame interpolation. If required, varying shutter size can be obtained by adding appropriate amounts of blur along the motion direction. In the case of rendered content, achieving such sampling is straightforward and could be incorporated directly in the renderer. Alternatively content can be rendered with a very high frame rate and the required frames can be synthesized in a post-process.
[0065] The invention can be applied by an artist to apply accurate, manual tweaks to the video, based on his or her artistic vision. With standard techniques, the artist is forced to choose from a very limited set of possible frame rates. The benefits of smooth spatial frame rate variation compared to simple combination of two frame rates are clear: In the two-frame-rates approach, one needs to carefully decompose the scene into layers (figure-background) to avoid artifacts at the locations of the framerate seams. Such a solution, however, may lead to significant artifacts when the decomposition is imperfect. In contrast, in our approach it is enough to scribble a mask with a soft brush, and the interpolation will produce seamless results. Similarly, smooth temporal variation of the frame rate can help make the moment of transition unnoticeable when an abrupt frame-rate change is not desired.
[0066] In another application, the velocities within the frame can be automatically analyzed and the appropriate frame rate can be applied locally. For instance, depending on the camera parameters such as focal length and frame rate there are certain recommendations as to the maximum comfortable on-screen speed of any object in the scene [Hummel 2002, p. 887]. The rule of thumb is that at 24 frames per second no object should cross the entire screen in under 7 seconds, and that the maximum allowable speed is proportional to the frame rate [Samuelson 2014, p. 314]. Using these guidelines, the inventive technique can automatically minimize the frame rates across the screen in order to maximize the cinematic look, yet without introducing objectionable artifacts. Conversely, by emulating higher frame rates more dynamic scene changes can be locally allowed, while overall 24 frames per second are maintained.
[0067] In a further embodiment of the invention, the networks may also be used for stereoscopic presentation. The image separation protocols between eyes, for example in timesequential shutter glasses, might cause additional motion perception artifacts are taken into consideration.
[0068] Appendix A is a Matlab program implementing a method according to claim 1.
TABLE-US-00001 % Input frame rate - the temporal resolution of the input sequence that has % been pre-interpolated from a regular sequence (24fps, 48fps, etc.) % or pre-rendered. This frame rate is assumed to be high enough to % approximate fully continuous temporal sampling. % Alternatively one could interpolate frames on-the-fly within the script % using optic flow to obtain arbitrary precision. infr = 480; % Intended frame rate - the frame rate of the display system. % We will emulate all frame rates between outfr/2 and outfr % but the real output will be always at frame rate outfr outfr = 48; skip = infr/outfr; assert(mod(skip, 2) == 0) % (infr / outfr must be divisible by 2) % Input sequence startframe = 0; endframe = 5759; framesdir = .\results\tos\interpolated1\; % Frame rate masks - kernel displacement for given time and location. % Black means full displacement (frames are doubled; frame rate outfr/2), % white means no displacement (frames are at correct positions; frame rate outfr). % Grey levels - emulation of fractional displacements (in-between frame rates). maskdir = .\tos1_mask\; % Output directory outdir = .\tos1_out_test\; % Current output frame number - we start from 2 to have some margin for % sampling the past. ff = 2; % In each interation we output 2 frames for f = startframe+ff*skip:2*skip:endframe2*skip+1 % Read a chunk of frames (fskip/2+1, ..., f+skip/2) C = { }; for i=1:skip C{i} = im2double(imread(sprintf(.\\%s\\%04d.jpg, framesdir, f+iskip/2))); end % At first both frames are the same (frame rate is outfr/2) F1 = C{skip/2}; F2 = C{skip/2}; % Read frame rate mask for the current time M = im2double(imread(sprintif(.\\%s\\%04d.jpg, maskdir, ff/21))); % Progresively replace parts of the output frames with % less displaced kernels according to the frame rate masks. for i=2:2:skip2 frac = i/skip; B = (M >= frac); % We assume that we keep fixed abslolute exposure time % (as in the input sequence), hence we assign values from a single % image in C. If interpolation between different exposures is also % needed one needs to average multiple imges from C, add blur % on-the-fly according to optic flow, or provide % an input sequence that has already additional blur factored in F1(B) = C{skip/2i/2}(B); F2(B) = C{skip/2+i/2}(B); end % Output the two frames imwrite(F1, sprintf(.\\%s\\%04d.jpg, outdir, ff), Quality, 98); ff = ff + 1; imwrite(F2, sprintf(.\\%s\\%04d.jpg, outdir, ff), Quality, 98); ff = ff + 1; end