Computer-implemented method for reducing video latency of a computer video processing system and computer program product thereto

10593299 ยท 2020-03-17

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention relates to a computer-implemented method for reducing video latency and computer program product thereto for a computer video processing system. Two separate threads, an input thread and an output thread, are created and configured to run simultaneously and independently from each other. The input thread is configured to process video input frames that may be split into a plurality of input slices. The output thread is configured to process video output frames. The video output frames may also be split into a plurality of output slices.

Claims

1. A computer-implemented method for reducing video latency of a computer video processing system comprising at least one video input source, at least one processor, at least one memory including a computer program code, at least one video input card, at least one graphics processing unit known as GPU and at least one video output display, the method comprising: creating an input thread and an output thread by the at least one processor; configuring the input thread and the output thread to run simultaneously and independently from each other by the at least one processor; choosing a manner of splitting at least one video input frame received from the at least one video input source via the at least one video input card into a plurality of input slices from S.sub.i1 to S.sub.in where a single input slice is known as S.sub.ix by the at least one processor on the input thread; and choosing a manner of splitting at least one video output frame into a plurality of output slices from S.sub.O1 to S.sub.On where a single output slice is known as S.sub.ox via the at least one GPU by the at least one processor on the output thread, wherein the method further comprises on the input thread calculating a start time and an end time for each single input slice S.sub.ix of the input slices S.sub.i1 to S.sub.in of the at least one input frame received from the at least one video input card by the at least one processor, locating vertical blanking interval for the at least one video input card by the at least one processor, receiving at least one single input slice S.sub.ix of the plurality of input slices from S.sub.i1 to S.sub.in from the at least one video input source via the at least one video input card by the at least one processor until all input slices from S.sub.i1 to S.sub.in have been received, and sending the received at least one single input slice S.sub.ix to the output thread by the at least one processor until all input slices from S.sub.i1 to S.sub.in have been sent to the output thread; and simultaneously on the output thread calculating a start time and an end time for each output slice S.sub.ox of S.sub.O1 to S.sub.On for the at least one GPU by the at least one processor, configuring the at least one GPU to draw directly to a front buffer by the at least one processor, calculating a required latency on the basis of positioning the at least one video input frame within the at least one video output frame by the at least one processor, locating vertical blanking interval for the at least GPU by the at least one processor, receiving the at least one single input slice S.sub.ix of the input slices S.sub.i1 to S.sub.in sent from the input thread by the at least one processor, calculating a required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising at least one of the input slices from S.sub.i1 to S.sub.in for drawing output slices S.sub.O1 to S.sub.On for the at least one GPU by the at least one processor on the basis of positioning of the at least one video input frame within the at least one video output frame, waiting until the input thread has received all the plurality of input slices from S.sub.i1 to S.sub.in from the at least one video input card by the at least one processor; and drawing by the at least one processor the required input slices S.sub.i1 to S.sub.in for the output slices S.sub.O1 to S.sub.On for the at least one GPU where a single output slice S.sub.o(x+1) consisting of the required input slices from S.sub.i1 to S.sub.in is drawn before the at least one GPU completes sending a single output slice S.sub.ox of output slices from S.sub.O1 to S.sub.On corresponding to the required at least one input slice S.sub.iy to the at least one video output display until the last output slice S.sub.on corresponding to the required at least one input slice S.sub.iy is sent by the at least one GPU to the at least one video output display.

2. The method according to claim 1, wherein the method further comprises calculating on the input thread the start time and the end time for each input slice from S.sub.i1 to S.sub.in of the at least one input frame received from the least one video input source via the at least one video input card by the at least one processor within a period of the at least one video input frame.

3. The method according to claim 2, wherein the method further comprises calculating on the output thread the start time and an end time for each output slice from S.sub.o1 to S.sub.on of the at least one output frame by the at least one processor for the at least one GPU within a period of the at least one video output frame.

4. The method according to claim 1, wherein the method further comprises calculating on the input thread the start time and an end time for each input slice from S.sub.i1 to S.sub.in of the at least one input frame received from the at least one video input source via the at least one video input card by the at least one processor on the basis of information obtained from the at least one video input card.

5. The method according to claim 4, wherein the method further comprises calculating on the output thread the start time and an end time for each output slice from S.sub.o1 to S.sub.on of the at least one output frame by the at least one processor for the at least one GPU on the basis of information obtained from the at least one GPU.

6. The method according to claim 1, wherein the method further comprises on the output thread drawing a background content for each output slice S.sub.O1 to S.sub.On by the at least one processor on the front buffer via the at least one GPU.

7. The method according to claim 6, wherein the method further comprises on the output thread drawing a foreground content for each output slice S.sub.O1 to S.sub.On by the at least one processor on the front buffer via the at least one GPU.

8. The method according to claim 6, wherein the method further comprises on the output thread verifying that drawing is finished within an allowable time limit by the at least one processor wherein the allowable time limit is defined so that if the at least one GPU starts sending the single output slice S.sub.ox of the output slices from S.sub.o1 to S.sub.on to the at least one video output display before drawing the background content, the foreground content and/or the required at least one input slice S.sub.iy of the plurality of input slices from S.sub.i1 to S.sub.in corresponding to the single output S.sub.ox of the output slices from S.sub.o1 to S.sub.on then is completed the allowable time limit has been exceeded.

9. A computer program product on a non-transitory media for reducing video latency of a computer video processing system comprising at least one video input source, at least one processor, at least one memory including a computer program code, at least one video input card, at least one graphics processing unit known as GPU and at least one video output display, the computer program product comprising: a computer readable code for creating an input thread and an output thread by the at least one processor; a computer readable code for configuring the input thread and the output thread to run simultaneously and independently from each other by the at least one processor; a computer readable code for choosing a manner of splitting at least one video input frame received from the at least one video input source via the at least one video input card into a plurality of input slices from S.sub.i1 to S.sub.in where a single input slice is known as S.sub.ix by the at least one processor on the input thread; and a computer readable code for choosing a manner of splitting at least one video output frame into a plurality of output slices from S.sub.O1 to S.sub.On where a single output slice is known as S.sub.ox via the at least one GPU by the at least one processor on the output thread, wherein the computer program product further comprises a computer readable code for calculating a start time and an end time for each single input slice S.sub.ix of the input slices S.sub.i1 to S.sub.in of the at least one input frame received from the at least one video input card by the at least one processor on the input thread, a computer readable code for locating vertical blanking interval for the at least one video input card by the at least one processor on the input thread, a computer readable code for receiving at least one single input slice S.sub.ix of the plurality of input slices from S.sub.i1 to S.sub.in from the at least one video input source via the at least one video input card by the at least one processor until all input slices from S.sub.i1 to S.sub.in have been received on the input thread, a computer readable code for sending the received at least one single input slice S.sub.ix to the output thread by the at least one processor until all input slices from S.sub.i1 to S.sub.in have been sent to the output thread on the input thread, a computer readable code for calculating a start time and an end time for each output slice S.sub.ox of S.sub.O1 to S.sub.On for the at least one GPU by the at least one processor on the output thread, a computer readable code for configuring the at least one GPU to draw directly to a front buffer by the at least one processor on the output thread, a computer readable code for calculating a required latency on the basis of positioning the at least one video input frame within the at least one video output frame by the at least one processor on the output thread, a computer readable code for locating vertical blanking interval for the at least GPU by the at least one processor on the output thread, a computer readable code for receiving the at least one single input slice S.sub.ix of the input slices S.sub.i1 to S.sub.in sent from the input thread by the at least one processor on the output thread, a computer readable code for calculating a required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising at least one of the input slices from S.sub.i1 to S.sub.in for drawing output slices S.sub.O1 to S.sub.On for the at least one GPU by the at least one processor on the basis of positioning of the at least one video input frame within the at least one video output frame on the output thread, a computer readable code for waiting until the input thread has received all the plurality of input slices from S.sub.i1 to S.sub.in from the at least one video input card by the at least one processor on the output thread, and a computer readable code for drawing by the at least one processor the required input slices S.sub.i1 to S.sub.in for the output slices Soi to S.sub.on for the at least one GPU where a single output slice S.sub.o(x+1) consisting of the required input slices from S.sub.i1 to S.sub.in is drawn before the at least one GPU completes sending a single output slice S.sub.ox of output slices from S.sub.O1 to S.sub.On corresponding to the required at least one input slice S.sub.iy to the at least one video output display until the last output slice S.sub.on corresponding to the required at least one input slice S.sub.iy is sent by the at least one GPU to the at least one video output display on the output thread.

10. The computer program product on a non-transitory media according to claim 9, wherein the computer program product further comprises a computer readable code for calculating on the input thread the start time and the end time for each input slice from S.sub.i1 to S.sub.in of the at least one input frame received from the least one video input source via the at least one video input card by the at least one processor within a period of the at least one video input frame.

11. The computer program product on a non-transitory media according to claim 10, wherein the computer program product further comprises a computer readable code for calculating on the output thread the start time and an end time for each output slice from S.sub.o1 to S.sub.on of the at least one output frame by the at least one processor for the at least one GPU within a period of the at least one video output frame.

12. The computer program product on a non-transitory media according to claim 9, wherein the computer program product further comprises a computer readable code for calculating on the input thread the start time and an end time for each input slice from S.sub.i1 to S.sub.in of the at least one input frame received from the least one video input source via the at least one video input cardby the at least one processor on the basis of information obtained from the at least one video input card.

13. The computer program product on a non-transitory media according to claim 12, wherein the computer program product further comprises a computer readable code for calculating on the output thread the start time and an end time for each output slice from S.sub.o1 to S.sub.on of the at least one output frame by the at least one processor for the at least one GPU on the basis of information obtained from the at the at least one GPU.

14. The computer program product on a non-transitory media according to claim 9, wherein the computer program product further comprises a computer readable code for drawing on the output thread a background content for each output slice S.sub.O1 to S.sub.On by the at least one processor on the front buffer via the at least one GPU.

15. The computer program product on a non-transitory media according to claim 14, wherein the computer program product further comprises a computer readable code for drawing on the output thread a foreground content for each output slice S.sub.O1 to S.sub.On by the at least one processor on the front buffer via the at least one GPU.

16. The computer program product on a non-transitory media according to claim 14, wherein the computer program product further comprises a computer readable code for verifying on the output thread that drawing is finished within an allowable time limit by the at least one processor wherein the allowable time limit is defined so that if the at least one GPU starts sending the single output slice S.sub.ox of the output slices from S.sub.o1 to S.sub.on to the at least one video output display before drawing the background content, the foreground content and/or the required at least one input slice S.sub.iy of the plurality of input slices from S.sub.i1 to S.sub.in corresponding the single output S.sub.ox of the output slices from S.sub.o1 to S.sub.on is completed, then the allowable time limit has been exceeded.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The present invention will become more fully understood from the detailed description given herein below and accompanying drawings which are given by way of illustration only, and thus are not limitative of the present invention and wherein

(2) FIG. 1 shows an exemplary schematical representation of a computer video processing system in the context of the present invention.

(3) FIG. 2a shows an exemplary flow chart representing basic method steps according to the invention;

(4) FIG. 2b shows an exemplary flow chart representing a sub-set of method steps according to the invention;

(5) FIG. 2c shows an exemplary flow chart representing another sub-set of method steps according to the invention.

DETAILED DESCRIPTION

(6) In the following description, considered embodiments are merely exemplary, and one skilled in the art may find other ways to implement the invention. Although the specification may refer to an, one; or some embodiment(s) in several locations, this does not necessarily mean that each such reference is made to the same embodiment(s), or that the feature only applies to a single embodiment. Single feature of different embodiments may also be combined to provide other embodiments.

(7) FIG. 1 shows an exemplary schematical representation of a computer video processing system 10 with at least one video input source 11 and at least one video output display 12 in the context of the present invention. The computer video processing system in the context of the present invention comprises the at least one video input source 11, at least one processor 103, at least one memory 104 comprising a computer program code 1041 for the computer program product according to the invention, at least one video input card 105, at least one graphics processing unit known as GPU 106 and at least one video output display 12. According to the present invention an input thread 101 and an output thread 102 are created for video content processing by the at least one processor 103. According to the present invention an input thread 101 and an output thread 102 are created for video content processing by the at least one processor 103 with the computer program code 1041.

(8) The at least one video input source 11 may be, for example, a camera, a video camera, a mobile device, a computer, a portable device or other such apparatus capable to send video content over a data communication channel to the computer video processing system 10. The video content is received through the at least one video input card 11. The at least one video input card 11 sends the video content via the at least one memory 104 to the at least one GPU 106. The at least one video input card 11 may also send the video content directly to the at least one GPU 106 by the at least one processor 103. Advantageously, the at least GPU 106 may be configured to draw directly to a front buffer containing the video content that is being sent and presented on the at least one video output display 12 at a certain point of time.

(9) The at least one video input card 105 operates on the input thread 101 side of the computer video processing system 10. The at least one GPU 106 operates on the output thread 102 side of the computer video processing system 10. The at least one memory 104 and the computer program code 1041 configured together with the at least one processor 103, may cause the at least one video input card 11 to send the video content via the at least one memory 104 to the at least one GPU 106. The at least one memory 104 and the computer program code 1041 configured together with the at least one processor 103, cause the at least one video input card 11 to send the video content directly to the at least one GPU 106 by the at least one processor 103. The at least one memory 104 and the computer program code 1041 configured to, with the at least one processor 103, may cause the at least GPU 106 to draw directly to a front buffer containing the video content that is being sent and presented on the at least one video output display 12 at a certain point of time.

(10) The at least one memory 104 and the computer program code 1041 are configured together with the at least one processor 103, create the input thread 101 to receive video content from the at least one video input source 11. The at least one memory 104 and the computer program code 1041 are configured together with the at least one processor 103, create the input thread 101 to process video content from the at least one video input source 11. The at least one memory 104 and the computer program code 1041 are configured to, with the at least one processor 103, create the input thread 101 to send video content to the output thread 102. The video content is understood as video image and may also comprise sounds and text, for example. The video content is received as one or more video input frames. The one or more video input frames may be split into a plurality of input slices from S.sub.i1 to S.sub.in where a single input slice is known as S.sub.ix by the at least one processor 103 on the input thread 101. On the output thread correspondingly the video content is processed as one or more video output frames. The one or more video output frames may be split into a plurality of output slices from S.sub.o1 to S.sub.on where a single output slice is known as S.sub.ox by the at least one processor 103 on the output thread 102.

(11) Video latency in the computer video processing system 10 is understood as a time that is required to present video content received from the at least one video input source 11 on the at least one video output display 12. The at least one video output display 12 may be, for example, a single computer display, a television system, an arrangement of display or other such display system.

(12) FIG. 2a shows an exemplary flow chart representing the basic method steps according to the invention. References to the components of the set-up according to FIG. 1 are made.

(13) The computer implemented according to the present invention is started in step 20. The computer-implemented method for reducing video latency of the computer video processing system 10 comprising the at least one video input source 11, the at least one processor 103, the at least one memory 104 including the computer program code 1041, the at least one video input card 105, the at least one GPU 106 and the at least one video output display 12 comprises at least the method steps herein described.

(14) According to step 21 two threads, an input thread 201 and an output thread 202 are created by the at least one processor 103. The input thread 201 and the output thread 202 are configured to run simultaneously by the at least one processor 103. The input thread 201 and the output thread 202 are configured to run independently from each other by the at least one processor 103. The input thread 201 and the output thread 202 are configured to run simultaneously and independently from each other by the at least one processor 103. The input thread 201 is configured to at receive and read at least one video input frame from the at least one video input source 11 via the at least one video input card 105 by the at least one processor 103. The input thread 201 is configured by the at least one processor 103 to send the content of the at least one video input frame to the output thread 202 for drawing to the at least one video output display 12 as out frames. Correspondingly, the output thread 202 is configured by the at least one processor 103 to receive the content of the at least one video input frame from the input thread 201 and draw at least one video output frame to the at least one video output display 12 by the at least one GPU 106.

(15) According to step 2011 choosing a manner of splitting the at least one video input frame received from the at least one video input source 11 via the at least one video input frame into a plurality of input slices from S.sub.i1 to S.sub.in where a single input slice is known as S.sub.ix is executed on the input thread by the at least one processor 103. There are different ways to split the at least one video input frame into a plurality of input slices from S.sub.i1 to S.sub.in where a single input slice is known as S.sub.ix. For example, the at least one input frame may be split into video input slices from S.sub.i1 to S.sub.in where each single input slice S.sub.ix contains M horizontal lines of video. However, it is not required for the plurality of input slices from Si1 to Sin to be of same size. Further, the video input slices from S.sub.i1 to S.sub.in do not need to consist of complete horizontal lines of video. It is assumed here that the video input slices from S.sub.i1 to S.sub.in are numbered in the order they are transmitted over a wire: topmost single video input slice S.sub.ix is transmitted first and it will get slice number S.sub.i1 of S.sub.i1 to S.sub.in.

(16) According to step 2021 choosing a manner of splitting the at least one video output frame into a plurality of output slices from S.sub.o1 to S.sub.on where a single output slice is known as S.sub.ox is executed on the output thread 202 by the at least one processor 103 via the at least one GPU 106. There are different ways to split the at least one video output frame into a plurality of output slices from S.sub.o1 to S.sub.on where a single output slice is known as S.sub.ox. For example, the at least one output frame may be split into video input slices from S.sub.o1 to S.sub.on where each single input slice S.sub.ox contains M horizontal lines of video. However, it is not required for the input slices from S.sub.o1 to S.sub.on to be of same size. Further, the video output slices from S.sub.o1 to S.sub.on do not need to consist of complete horizontal lines of video. Further, the manner of splitting the at least one video output frame into a plurality of output slices from S.sub.o1 to S.sub.on where a single output slice is known as S.sub.ox may be different on the output thread 202 than on the input thread 201.

(17) Process on the input thread 201 is further described in steps 2012-2015. According to step 2012 a start time and an end time for each single input slice S.sub.ix of input slices S.sub.i1 to S.sub.in of the at least one input frame received from the at least one video input card 105 is calculated by the at least one processor 103 on the input thread 201. Calculating the start time and the end time for each single input slice S.sub.ix of input slices S.sub.i1 to S.sub.in of the at least one input frame received from the at least one video input card 105 is executed by the at least one processor 103 on the input thread 201 and further comprises calculating the start time and the end time of each single input slice S.sub.ix of input slices S.sub.i1 to S.sub.in of the at least one input frame within a period of the at least one video input frame. Advantageously, some video input cards 105 also provide exact information on what image line the input signal is currently going and this can be used to wait until a single video input slice S.sub.ix is complete.

(18) According to step 2013 vertical blanking interval for the at least one video input card 105 is located by the at least one processor 103 on the input thread 201. There are several ways to locate vertical blanking interval for both the input thread 201 side and the output thread 202 side. If the computer video processing system 10 is genlocked it is known that vertical blanking interval both for the input thread 201 side and the output thread 202 side will happen at same time. In such case if vertical blanking interval can be recognized either on the input thread 201 side or the output thread 202 side, then it will be known for both the input thread 201 and the output thread 202. Input vertical blanking interval detection depends on drivers capabilities of the at least one video input card 105. At least an approximate vertical blanking interval time can be detected by recording a timestamp when the driver notifies that the next video input frame is ready or when capturing the next video input frame is complete and then adding empirically tested constant to the timestamp.

(19) According to step 2014 at least one single input slice S.sub.ix of input slices from S.sub.i1 to S.sub.in is received from the at least one video input card 105 by the at least one processor for the at least one GPU 106 on the input thread 201. Further, the received at least one single input slice S.sub.ix is sent to the output thread 202 by the processor until all input slices from S.sub.i1 to S.sub.in have been received and sent to the output thread 202. Advantageously, a single input slice S.sub.ix of input slices from S.sub.i1 to S.sub.in is received from the at least one video input card 105 by the at least one processor for the at least one GPU 106 on the input thread 201. More advantageously, the received single input slice S.sub.ix is sent to the output thread 202 before another received single input slice S.sub.ix is sent to the output thread 202 by the at least one processor 103 until all input slices from S.sub.i1 to S.sub.in have been received and sent to the output thread 202. In other words, each single input slice S.sub.ix is sent to the output thread 202 one by one by the at least one processor 103 until all input slices from S.sub.i1 to S.sub.in have been received and sent to the output thread 202.

(20) The input thread 201 keeps running by the at least one processor 103 as long as at least one video input frame is to be received from the at least one video input source 11 via the at least one video input card 105. If there is no longer at least one video input frame to be received from the at least one video input source 11 via the at least one video input card 105, then step 2015 is taken and the input thread 201 can be stopped.

(21) Process on the output thread 202 is described in steps 2022-2028. According to step 2022 a start time and an end time for each output slice S.sub.Ox of S.sub.O1 to S.sub.On for the at least one GPU 106 is calculated by the at least one processor 103 on the output thread 202. The start time and the end time for each output slice S.sub.Ox of S.sub.O1 to S.sub.On for the at least one GPU 106 is calculated by the at least one processor 103 on the output thread 202 and further comprises calculating the start time and the end time of each single output slice S.sub.ox of output slices S.sub.o1 to S.sub.on of the at least one video output frame within a period of the at least one video output frame.

(22) According to step 2023 the at least one GPU 106 is configured to draw directly to a front buffer by the at least one processor 103. The at least one GPU 106 is configured to draw directly to a front buffer by the at least one processor 103 so that no buffering of one or more video output frames can increase latency by at least one processor 103. In other words, the at least one GPU 106 is configured to draw directly to a front buffer by the at least one processor 103 to prevent buffering of output frames and thereby increasing latency. In the computer video processing system 10, the one or more video output frames stored in the front buffer are the one or more video output frames that will be sent over the wire by the at least one GPU 106 to be displayed on the at least one video output display 12. In the computer video processing system 10, each output slice S.sub.Ox of S.sub.O1 to S.sub.On stored in the front buffer is the output slice S.sub.Ox of S.sub.O1 to S.sub.On that will be next transferred over wire to be displayed on on the at least one video output display 12. More generally, in the computer video processing system 10, data stored in the front buffer is the data that will be next displayed on the at least one video output display 12. The front buffer is also known as visible front buffer.

(23) According to step 2024 a required latency is calculated on the basis of positioning the at least one video input frame within the at least one video output frame by the at least one processor 103. Calculating the required latency is needed if there is a possibility that the positioning of the at least one video input frame is changed in corresponding at least one video output frame. For example, a simplified example can be given if a situation where the at least one video input frame is rotated from 0 degrees to 90 degrees.

(24) According to step 2025 a vertical blanking interval is located for the at least one GPU 106 by the at least one processor 103. There are several ways to locate vertical blanking interval for both the input thread 201 side and the output thread 202 side. If the computer video processing system 10 is genlocked it is known that vertical blanking interval both for the input thread 201 side and the output thread 202 side will happen at same time. In such case if vertical blanking interval can be recognized either on the input thread 201 side or the output thread 202 side, then it will be known for both the input thread 201 and the output thread 202. On the output thread 202 side, when using double buffering glSwapBuffers in OpenGL can be used, for example. Further glFinish enables to wait until vertical blanking interval has started. GLX_NV_delay_before_swap OpenGL extension can be used to wait until a specific point of time within a single video output frame. This can be applied to wait for appropriate position within the at least one video output frame and thus also to calculate the vertical blanking interval's position. There is similar functionality available in other APIs as well.

(25) According to step 2026 running a drawing loop is launched by the at least one processor 103. Receiving the at least one single input slice S.sub.ix of S.sub.i1 to S.sub.in sent from the input thread 201 launches the drawing loop by the at least one processor 103. This is described in relation to FIGS. 2b and 2c further below.

(26) When the drawing loop is ended, step 2027 or 2028 is taken by the at least one processor 103. Step 2027 is taken by the processor 103 if there is a need to update the required latency. The required latency may needs to be updated if positioning of at least one video input frame within at least one video output frame has been changed. In such case step 2025 may also be returned by at least one processor 103.

(27) By step 2028 the process is concluded.

(28) FIG. 2b shows an exemplary flow chart representing a sub-set of method steps according to the invention for the drawing loop on the output thread 202. References to the components of the set-up according to FIGS. 1 and 2a are made. The drawing loop is launched according to step 2026. The drawing loop is launched when the output thread 202 receives the at least one single input slice S.sub.ix of S.sub.i1 to S.sub.in sent from the input thread 201 by the at least one processor 103. In step 20261 a sub-set of the input slices from S.sub.i1 to S.sub.in required for drawing output slices S.sub.O1 to S.sub.On is calculated by the at least one processor 103 on the output thread 202 on the basis of positioning of the at least one video input frame within the at least one video output frame. It should be noticed that when a single output slice S.sub.ox of the output slices from S.sub.O1 to S.sub.On is being sent over wire to the at least one video output display 12 by the at least one GPU 106, then drawing at least the next single output slice S.sub.ox+1 needs to be completed for the at least one GPU 106 by the at least one processor 103 before the at least one GPU 106 has finished sending the single output slice S.sub.ox of the output slices from S.sub.O1 to S.sub.On over the wire to the at least one video output display 12. Thus, a required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising one or more input slices of the plurality of input slices from Si1 to Sin for drawing output slices S.sub.O1 to S.sub.On by the at least one GPU 106 is calculated on the output thread 202 by the at least one processor 103 on the basis of positioning of the at least one video input frame within the at least one video output frame.

(29) In step 20262 the process is set to wait until the input thread 201 has received all the input slices from S.sub.i1 to S.sub.in from the at least one video input card 105 by the at least one processor 103. The required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising one or more input slices of the plurality of input slices from Si1 to Sin for drawing output slices S.sub.O1 to S.sub.On by the at least one GPU 106 was identified in step 20261.

(30) In step 20263 the required at least one input slice S.sub.iy for drawing the output slices S.sub.O1 to S.sub.On for the at least one GPU 106 is drawn by the at least one processor 103. Namely, the required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising one or more input slices of the plurality of input slices from Si1 to Sin for drawing the output slices S.sub.O1 to S.sub.On for the at least one GPU 106 are drawn by the at least one processor 103. When another output slice S.sub.Ox+1 is drawn by the at least one processor 103 for the at least one GPU 106, background content for the output slice S.sub.Ox+1 is drawn first. Then, the required at least one input slice S.sub.iy for drawing the output slice S.sub.Ox+1 are drawn by the at least one processor 103 for the at least one GPU 106. Next, foreground content for the output slice S.sub.O+1 is drawn by the at least one processor 103 for the at least one GPU 106. Drawing the background content, the required at least one input slice S.sub.iy and the foreground content can be combined in one pass. However, drawing the background content, the required at least one input slice S.sub.iy and the foreground content needs to be completed before the at least one GPU 106 starts sending out slice S.sub.Ox+1 over the wire to the at least one video output display 12.

(31) In step 20264 the output thread 202 keeps running by the at least one processor 103 until the at least one GPU 106 has completed generating output signal to the at least one video output display 12. Advantageously, the output thread 202 keeps running by the at least one processor 103 until the at least one GPU 106 has sent all the output slices from S.sub.O1 to S.sub.On corresponding the required at least one input slice S.sub.iy, wherein y is from 1 to n. comprising one or more input slices of the plurality of input slices from Si1 to Sin to the at least one video output display 12. Correspondingly, the step 20263 is repeated until the output slices from S.sub.O1 to S.sub.On have been drawn to all the input slices S.sub.i1 to S.sub.in by the at least one processor 103.

(32) Thus, according to steps 20263 and 20264 together, the required input slices S.sub.i1 to S.sub.in for the output slices S.sub.O1 to S.sub.On are drawn by the at least one processor 103 for the at least one GPU 106 where a single output slice S.sub.ox+1 consisting of the required input slices from S.sub.i1 to S.sub.in is drawn before the at least one GPU 106 completes sending a single output slice S.sub.ox of output slices from S.sub.O1 to S.sub.On corresponding the required at least one input slice S.sub.iy to the at least one video output display 12 until the last output slice S.sub.on corresponding the required at least one input slice S.sub.iy is sent by the at least one GPU 106 to the at least one video output display 12.

(33) Advantageously, according to steps 20263 and 20264 together, the required input slices S.sub.i1 to S.sub.in for output slices S.sub.O1 to S.sub.On are drawn by the at least one processor 103 for the at least one GPU 106 where a single output slice S.sub.ox+1 consisting of the required input slices from S.sub.i1 to S.sub.in is drawn by the at least one processor 103 before the at least one GPU 106 completes sending a single output slice S.sub.Ox of output slices from S.sub.O1 to S.sub.On corresponding to the required at least one input slice S.sub.iy to the at least one video output display 12 until the last output slice S.sub.on is sent by the at least one GPU 106 to the at least one video output display 12.

(34) Once the at least one GPU 106 has sent all the output slices from S.sub.O1 to S.sub.On to the at least one video output display 12, step 2027 (described further on FIG. 2a) is taken by the at least one processor 103 and the drawing loop is finished on the output thread 202.

(35) FIG. 2c shows an exemplary flow chart representing another sub-set of method steps according to the invention for the drawing loop on the output thread 202. References to the components and method steps of the set-up according to FIGS. 1 and 2a are made. The drawing loop is launched according to step 2026. The drawing loop is launched when the output thread 202 receives the at least one single input slice S.sub.ix of S.sub.i1 to S.sub.in sent from the input thread 201 by the at least one processor 103.

(36) According to step 20266a a background content for each output slice S.sub.O1 to S.sub.On is drawn by the at least one processor 103 for the at least one GPU 106 on the front buffer. The background content for each output slice S.sub.O1 to S.sub.On is drawn by the at least one processor 103 for the at least one GPU 106 on the front buffer, if needed. If not needed, step 20266b can be taken directly. The background content may comprise, for example, video, images, graphics or even a desktop. Although the input slices from S.sub.i1 to S.sub.in required for the output slices S.sub.O1 to S.sub.On for the at least one GPU together with background and foreground content of the video input frames may be combined into one pass where everything is drawn at the same time, there are certain advantages if their drawing is separated. Separating drawing the input slices from S.sub.i1 to S.sub.in needed for drawing output slices S.sub.O1 to S.sub.On, and drawing the background and the foreground content of the video input frames into three steps will provide some extra time as the background content can be drawn while the output thread 202 is still waiting for the input thread 201 to get all the input slices from S.sub.i1 to S.sub.in.

(37) As an example of possible practical implementation only, the background content is to be rendered offscreen into a Frame Buffer Object (FBO) by another lower priority thread run by the at least one processor 103 and then just copied to the front buffer in this pass. FBOs are OpenGL Objects, which allow for the creation of user-defined Frame Buffers and with them, one can render to non-Default Frame Buffer locations, and thus render without disturbing the main output display. Multiple FBOs can be used to do double or even triple buffering to allow for variations in processing time. It should be noted that required latency for background content is much higher than the latency of the input content processing.

(38) The separated drawing of the input slices from S.sub.i1 to S.sub.in needed for drawing output slices S.sub.O1 to S.sub.On, and drawing the background and the foreground content of the video input frames may be done, for example, with 2d composited traditional video content. It may also be done with complex 3d scenes but it is more complicated especially if we use multiple inputs.

(39) In step 20266b a sub-set of the plurality of input slices from Si1 to Sin required for drawing output slices S.sub.O1 to S.sub.On for drawing output slices S.sub.O1 to S.sub.On is calculated by the at least one processor 103 on the output thread 202 on the basis of positioning of the at least one video input frame within the at least one video output frame. It should be noticed that when a single output slice S.sub.ox of the output slices from S.sub.O1 to S.sub.On is being sent over wire to the at least one video output display 12 by the at least one GPU 106, then drawing at least the next single output slice S.sub.ox+1 needs to be completed for the at least one GPU 106 by the at least one processor 103 before the at least one GPU 106 has finished sending the single output slice S.sub.ox of the output slices from S.sub.O1 to S.sub.On over the wire to the at least one video output display 12. Thus, a required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising one or more input slices of the plurality of input slices from Si1 to Sin for drawing output slices S.sub.O1 to S.sub.On by the at least one GPU 106 are calculated on the output thread 202 by the at least one processor 103 on the basis of positioning of the at least one video input frame within the at least one video output frame.

(40) In step 20267 the process is set to wait until the input thread 201 has received all the input slices from S.sub.i1 to S.sub.in from the at least one video input card 105 by the at least one processor 103. All the required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising one or more input slices of the plurality of input slices from Si1 to Sin for drawing output slices S.sub.O1 to S.sub.On by the at least one GPU 106 were identified in step 20266b.

(41) In step 20268 the required at least one input slice S.sub.iy, wherein y is from 1 to n, of the plurality of input slices from Si1 to Sin for drawing the output slices S.sub.O1 to S.sub.On for the at least one GPU 106 is drawn by the at least one processor 103. Advantageously, the required at least one input slice S.sub.iy comprising one on more input slices of the plurality of input slices from Si1 to Sin for the output slices S.sub.O1 to S.sub.On for the at least one GPU 106 are drawn by the at least one processor 103. In other words drawing of single output slice S.sub.ox consisting of the required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising one on more input slices of the plurality of input slices from Si1 to Sin needs to be completed by the at least one processor 103 before the at least one GPU 106 starts sending a single output slice S.sub.ox over wire to the at least one video output display 12. Further, when an output slice S.sub.Ox+1 is drawn by the at least one processor 103 for the at least one GPU 106, the required at least one input slice S.sub.iy for drawing the output slice S.sub.Ox+1 by the at least one processor 103 for the at least one GPU 106 may be drawn after drawing the background content according to step 20266a. When an output slice S.sub.Ox+1 is drawn by the at least one processor 103 for the at least one GPU 106, drawing the required at least one input slice S.sub.iy needs to be completed before the at least one GPU 106 starts sending out slice S.sub.Ox+1 over the wire to the at least one video output display 12.

(42) After step 20268 step 20268a or step 20268b or both steps 20268a and 20268b may be taken by the at least one processor 103. However, it is also possible to proceed directly to step 20269 by the at least one processor 103.

(43) Step 20268a can be taken in order to draw the foreground content by the at least one processor 103. According to step 20268a the foreground content for each output slice from S.sub.o1 to S.sub.on is drawn by the at least one processor 103 on the front buffer. The foreground content for each output slice S.sub.o1 to S.sub.on is drawn by the at least one processor 103 on the front buffer, if needed. When an output slice S.sub.Ox+1 is drawn by the at least one processor 103 for the at least one GPU 106, the foreground content for the output slice S.sub.Ox+1 may be drawn by the at least one processor 103 for the at least one GPU 106. When an output slice S.sub.Ox+1 is drawn by the at least one processor 103 for the at least one GPU 106, the foreground content for the output slice S.sub.Ox+1 may be drawn by the at least one processor 103 for the at least one GPU 106 after drawing the required at least one input slice S.sub.iy comprising one on more input slices of the plurality of input slices from Si1 to Sin for drawing the output slice S.sub.Ox+1 by the at least one processor 103 for the at least one GPU 106. Drawing the foreground content needs to be completed before the at least one GPU 106 starts sending out slice S.sub.Ox+1 over the wire to the at least one video output display 12. As explained in connection with step 20266a, this embodiment has certain advantages. After step 20268a step 20268b may be taken by the at least one processor 103. However, it is also possible to proceed directly to step 20269 by the at least one processor 103.

(44) Step 20268b can be taken in order to verify that drawing of the background content or the foreground content has been finished with an allowed time limit by the at least one processor 103. Further, step 20268b can be taken in order verify that drawing of the input slices from S.sub.i1 to S.sub.in required for the output slices S.sub.O1 to S.sub.On for the at least one GPU 106 has been finished with an allowed time limit by the at least one processor 103. Also, in all previous cases, errors in the drawing may be handled here. The allowed time limit is defined by timing when the at least one GPU starts 106 sending the single output S.sub.ox of the output slices from S.sub.o1 to S.sub.on corresponding the required at least one input slice S.sub.iy, wherein y is form 1 to n, comprising one on more input slices of the plurality of input slices from Si1 to Sin to the at least one video output display 12. If the at least one GPU 106 starts sending the single output slice S.sub.ox of the output slices from S.sub.o1 to S.sub.on to the at least one video output display 12 before drawing the background content, the foreground content and/or the required at least one input slice S.sub.iy of the plurality of input slices from Si1 to Sin corresponding to the single output S.sub.ox of the output slices from S.sub.o1 to S.sub.on is completed, then the allowable time limit has been exceeded.

(45) In step 20269 the output thread 202 keeps running by the at least one processor 103 until the last output slice S.sub.on corresponding to the required at least one input slice S.sub.iy is sent by the at least one GPU 106 to the at least one video output display 12. Advantageously, the output thread 202 keeps running by the at least one processor 103 until the last output slice S.sub.on corresponding to the required at least one input slice S.sub.iy, wherein y is from 1 to n, comprising one or more of the plurality of input slices from Si1 to Sin is sent by the at least one GPU to the at least one video output display 12. Correspondingly, the step 20267 and the possible next steps are repeated until the output slices from S.sub.O1 to S.sub.On have been drawn to all the input slices S.sub.i1 to S.sub.in by the at least one processor 103.

(46) According to steps 20268 and 20269 together the required input slices S.sub.i1 to S.sub.in for the output slices S.sub.O1 to S.sub.On are drawn by the at least one processor 103 for the at least one GPU 106 where a single output slice S.sub.ox+1 consisting of the required input slices from S.sub.i1 to S.sub.in is drawn before the at least one GPU 106 completes sending a single output slice S.sub.ox of output slices from S.sub.O1 to S.sub.On corresponding to the required at least one input slice S.sub.iy to the at least one video output display 12 until the last output slice S.sub.on corresponding to the required at least one input slice S.sub.iy is sent by the at least one GPU 106 to the at least one video output display 12.

(47) Advantageously, according to steps 20268 and 20269 together, the required input slices S.sub.i1 to S.sub.in for output slices S.sub.O1 to S.sub.On are drawn by the at least one processor 103 for the at least one GPU 106 where a single output slice S.sub.ox+1 consisting of the required input slices from S.sub.i1 to S.sub.in is drawn by the at least one processor 103 before the at least one GPU 106 completes sending a single output slice S.sub.ox of output slices from S.sub.O1 to S.sub.On corresponding to the required at least one input slice S.sub.iy to the at least one video output display 12 until the last output slice S.sub.on is sent by the at least one GPU 106 to the at least one video output display 12.

(48) Once the at least one GPU 106 has sent all the output slices from S.sub.O1 to S.sub.On corresponding to the input slices S.sub.i1 to S.sub.in to the at least one video output display 12, step 2027 (described further on FIG. 2a) is taken by the at least one processor 103 and the drawing loop is finished on the output thread 202.

(49) Any of the steps described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor. References to computer-readable storage medium and computer should be understood to encompass specialized circuits such as field-programmable gate arrays, application-specific integrated circuits (ASICs), USB flash drives, signal processing devices, and other devices.

(50) Some advantageous embodiments according to the invention were described above. The invention is not limited to the embodiments described. The inventional idea can be applied in numerous ways within the scope defined by the claims attached hereto.