Method for operating a distributed video production system and architecture of a distributed video production system
11622161 · 2023-04-04
Assignee
Inventors
Cpc classification
H04N21/238
ELECTRICITY
H04N5/2228
ELECTRICITY
H04N21/8455
ELECTRICITY
H04N21/236
ELECTRICITY
International classification
H04N21/845
ELECTRICITY
H04N21/231
ELECTRICITY
H04N21/236
ELECTRICITY
Abstract
A new architecture of a distributed video production system is suggested. The distributed video production system enables distributed execution of one or more workflows that are operated by one or more operators at the same time. The present disclosure suggests mechanisms for synchronizing workflows and data streams within the distributed video production system. A workflow orchestrator is aware of all workflows currently executed on the distributed video production system and selects the required processing resources. A run time orchestrator ensures that the blocks are available for processing by controlling the receiving broadcast functionality such that it requests the block only after it is available by taking into account a network delay.
Claims
1. A method for operating a distributed video production system, the method comprising: receiving input video streams from video sources; assigning a timestamp to each video frame of every input video stream; defining a workflow comprising a concatenation of core broadcast functions for processing video streams; mapping each core broadcast function on a processing element within the video production system; determining a size of data blocks for transmission of video streams within the video production system to the processing elements associated with the workflow, wherein the data blocks contain video data; determining processing times of data blocks in each processing element associated with the workflow and transfer times of data blocks between processing elements when executing the workflow; transferring the input video streams in blocks of data to processing elements within the distributed video production system; synchronizing the processing elements to ensure that a data block is available for a receiving processing element when it is needed to perform synchronized processing between input video streams and at least one output video stream, wherein each processing element performs a core broadcast function; and receiving a user input defining an upper limit for an overall latency of the workflow and a number of transferable video streams within the workflow for determining the size of the data blocks.
2. The method according to claim 1, wherein the method further comprises: adapting the size of the data blocks includes limiting occupation of transfer resources by ongoing transfers of other video streams.
3. The method according to claim 1, wherein the method further comprises: storing the determined processing times and transfer times in a memory.
4. The method according to claim 1, wherein the method further comprises; assembling several small data blocks into a bigger hierarchical data block.
5. The method according to claim 4, wherein the method further comprises: processing the small data blocks of a hierarchical data block individually.
6. A distributed video production system comprising: at least one video production server hosting a plurality of processing elements, wherein each processing element is configurable to execute a core broadcast function; an input device for receiving input video streams, wherein the input device comprises an ingest module for assigning time stamps to each incoming data frame of the input video streams; a user interface enabling a user to compose a workflow comprising a concatenation of core broadcast functions, wherein each core broadcast function is mapped on one processing element, and to define an upper limit for an overall latency of the workflow and a number of transferable video streams within the workflow for determining the size of the data blocks; a workflow orchestrator determining a size of data blocks for transmission of video streams within the video production system to the processing elements associated with the workflow, wherein the data blocks contain video data; a run-time orchestrator for determining processing times of data blocks in each processing element associated with the workflow and transfer times of data blocks between processing elements when executing the workflow; wherein the run time orchestrator synchronizes the processing elements to ensure that a data block is available for a receiving processing element when it is needed to perform synchronized processing between input video streams and at least one output video stream, wherein each processing element performs a core broadcast function.
7. The distributed video production system according to claim 6, further comprising an asset management device enabling access to stored video streams.
8. The distributed video production system according to claim 6, further comprising a memory for storing processing times and transfer times associated with the workflow.
9. The distributed video production system according to claim 6, further comprising a plurality user interfaces for a plurality of users.
10. The distributed video production system according to claim 6, wherein the processing elements are provided with a unified interface.
11. The distributed video production system according to claim 6, wherein the user interface is a graphical user interface.
12. The distributed video production system according to claim 6, wherein the run-time and/or workflow orchestrator are hosted on one or several video production servers in the distributed video production system.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) Exemplary embodiments of the present disclosure are illustrated in the drawings and are explained in more detail in the following description. In the figures the same or similar elements are referenced with the same or similar reference signs. It shows:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION
(9) To begin with, several terms which will be used in the following shall be defined.
(10) Distributed Video Production System
(11) In a distributed video production system, the production resources forming the system are located at different physical locations. The different physical locations can be distributed locally in a studio, for instance, and can be remote from each other, e.g. in different cities. The distributed video production system is configured to execute one or a plurality of workflows.
(12) Workflow
(13) A workflow is a combination of broadcast functionalities resulting in a user—defined operation, such as generating a composed image including two camera streams and a video effect to create an interview situation of two people who are at different locations. A workflow is called a remote workflow if it is distributed in the distributed video production system, i.e. the remote workflow is executed at different locations. A workflow that is executed at a single location, for instance on a single server at a single site, is called a local workflow. A workflow that is executed by more than one server remains to be a local workflow if the transfer time of data between the servers for the video production is negligible, e.g. if the involved servers are located in the same studio in proximity to each other.
(14) The workflows can be connected with each other or not. In connected workflows the output of one workflow is used as an input in another workflow. One or several local workflows can be connected with one or several remote workflows.
(15) Broadcast Functionality
(16) A broadcast functionality is a combination of core broadcast functions. The core broadcast functions are executed by one or more processing elements. Network resources are made available for exchanging audio and/or video signals between the processing elements.
(17) Core Broadcast Function
(18) A core broadcast function is a video processing function executed by one or more processing elements. The processing elements process an input stream to generate an output stream, wherein the streams contain audio and/or video signals. Examples of basic broadcast functions are storing, encoding, performing a mixing effect etc.
(19) Processing Element
(20) A processing element is a piece of hardware, for example a central processing unit (CPU) or a graphical processing unit (GPU) or a portion of it. It depends on the nature and complexity of the processing element on what kind of hardware it is mapped. The processing element is the physical basis to perform a core broadcast function. In addition to that, the processing element includes an I/O module.
(21) I/O Module
(22) An I/O module is an input/output module that ensures communication between processing elements and a network ensuring communication between the processing elements. The I/O module provides a data exchange format towards the core broadcast function that is independent from the data exchange format inside the network. The I/O module “abstracts” the network from the core broadcast function. Likewise, the I/O module abstracts the processing element from the rest of the production system, i.e. the rest of the production system is unaware of the physical implementation of the processing element.
(23) Network
(24) A network is a data communication network that interconnects processing elements of the distributed video production system. The network is horizontally scalable allowing addition of more processing elements to the video production system for increased processing capability and/or processing power. The network is also vertically scalable to interconnect a local network with a geographically distributed network.
(25) Network Abstraction
(26) Network abstraction provides an interface that abstracts the physical implementation of the network by organizing the communication as an exchange of blocks, wherein each block may contain one or more images, or only parts of an image, dependent on what is the most efficient way of communicating and storing the information. Any delays introduced by the network communication are automatically taken into account in workflows to prevent the delays from impacting on the synchronicity of the video production. The network also provides for a mechanism that identifies blocks such that the core broadcast function knows which block is being processed. The network abstraction enables the deployment of the distributed video production system either as a single system with local and dedicated interconnects, a LAN set up or as a geographically distributed setup.
(27) Orchestrator
(28) An orchestrator is a software means for automated configuration, coordination, and management of the hardware and software of the distributed video production system. The proposed video production system comprises two kinds of orchestrators.
(29) Workflow Orchestrator
(30) The workflow orchestrator selects the needed processing elements based on the workflows, available processing elements and network costs. The network cost depends on the location of the processing elements, the location of the video sources and the location of the operators of the system. The network cost is to be understood in terms of latency for example by the transmission of data between processing elements that are distant from each other. An increasing latency due to data transmission corresponds to an increasing network cost.
(31) The workflow orchestrator also defines the block size of the communicated data for a specific workflow. The block size is relevant for the communication between the processing elements included in the workflow and influences the latency inside the video production system. Finally, the workflow orchestrator configures the selected processing elements and network components.
(32) Run Time Orchestrator
(33) The run time orchestrator receives information about all workflows and controls the exchange of blocks between processing elements. The run time orchestrator ensures that the blocks are available for processing by controlling the receiving broadcast functionality such that it requests the block only after it is available by taking into account a network delay. This goes together with controlling the network to ensure timely arrival of the blocks. Each block arrives at the processing element when the block is needed for processing.
First Embodiment
(34)
(35) The input devices 102 are located at the same or different locations within the distributed video production system 100 and are interconnected among each other and the video production system 100 as a whole by networks 103a, 103b.
(36) A/V content, video and/or audio streams as well as video and/or audio data are used synonymously. For the sake of brevity, the term “audio” is omitted, and the description simply refers to video streams and video data. It is assumed that most video streams are accompanied with a corresponding audio stream that is processed accordingly, even if it not explicitly mentioned in the description. The same applies for meta-data, even if it not explicitly mentioned in the description, it is assumed that the meta-data are processed jointly with the video and/or audio data.
(37) The network 103a connects storage devices 104a-104c for storing the AV content for later use. The storage devices are interconnected with each other by a network 103c. The storage devices form a database containing AV content assets.
(38) The video production system comprises modules for signal decoding, manipulation (providing mixing and/or effects on the AV content) and play out of a program stream. These modules are implemented on different physical computing devices operating as processing devices 106a, 106b. The program stream is indicated in
(39) A network 103d connects the storage devices 104a-104c with processing device is 106a, 106b.
(40) An asset management device 107 keeps track of all AV assets, e.g., stored audio and/or video streams, that are available within the video production system. The asset management device 107 is interconnected with all other devices within the system, for instance via the network 103b. Additionally, the asset management device 107 is connected with a database 108 containing AV assets, for instance from previous productions.
(41) Input devices 102, storage devices 104a-104c, processing devices 106a, 106b and asset management device 107 are implemented as standalone hardware devices or, preferably, are software based, wherein the software is executed on one or several video production servers. For instance, and with reference to
(42) The video production system 100 comprises a graphical user interface (GUI) 109 for each operator working on a broadcast production using the video production system. I.e. there are as many GUIs 109 as operators. All GUIs are connected by a network 103e and hence the one or several GUI(s) 109 do not have to be at the same location as any other device of the video production system. Each GUI enables an operator to set-up and execute also remote workflows.
(43) The entirety of the communication networks 100a-103e may be implemented as a plurality of separate networks that may even use different data exchange formats. But the networks may also form a single unified network. An intermediate form of both concepts is also possible, i.e. portions of the entire network are unified and some other portions remain separate networks.
(44) In general, the use of networks introduces delays between sender and receiver that are sometimes variable, and which may be large compared to the video framerate that is for instance 50 Hz corresponding to one image frame every 20 ms. How large the delays are and to which extent they are variable depends on the network. Therefore, distributed video production systems contain buffers at one or more steps in a processing chain. The processing chain is formed by a sequence of processing elements that process video streams in a workflow. The buffers are used to store data and ensure that the data are available when the processing element needs the data for processing to implement a basic broadcast function that is contained in a workflow for a broadcast production.
(45) Another aspect that is critical in live productions is latency, or speed of execution, and must be taken into account, too. A convenient way to solve the problem of latencies in a processing chain is to run each processing element in a processing pipeline at maximum speed upon arrival of the input data and store the results in the buffers referred to in the previous paragraph. The subsequent processing elements in the processing pipeline then consume the information.
(46)
(47) In a processing element 200, which is set up as an ingest module, each incoming video frame of an input video stream is time stamped to enable synchronization between the input video streams and output video streams.
(48) Video processing requires synchronization that is achieved in traditional video production systems by a common synchronization signal known as GENLOCK. Incoming video and/or audio frames must therefore be processed internally within a fixed time interval, leading to a worst-case dimensioning of the infrastructure because all processing steps must be finalized before the next frame arrives. The video production server known from EP 3 232 667 A1 utilizes asynchronous internal processing to achieve more flexibility in the dimensioning of the system and utilizing the hardware resources more efficiently. In the latter video production server known from EP 3 232 667 A1, the images are identified by a timestamp which allows processing in the right order and which allows the housekeeping of the buffer.
(49) This approach consumes memory which, depending on the implementation, is a scarce resource. In case of scarce memory implementations, the buffering needs to be limited, and ideally be reduced.
(50) The present disclosure proposes an approach to reduce the required buffer capacity or intermediate storage. Firstly, the problem of variation of transmission or transfer time related to the transmission of data blocks is addressed. Most of the variation can be reduced by making the network congestion-free and by flattening the transmission profile of the senders to avoid bursts. These conditions are achieved in e.g. SDTI based networks and in SDN-managed ST2110 based networks. Other implementations are equally possible. Because the images arrive synchronously in the system, and because the live production requires a synchronous stream of outgoing images, the use of the above-mentioned techniques significantly limits the transmission or transfer delay variation.
(51)
(52) In
(53)
(54) The workflow receives a video stream “video-in” as an input which is processed along the workflow to generate a video output that is indicated as “video out”. The input and output video streams are symbolized by vertical bars 402 on a horizontal timeline 403. Each bar 402 stands for a full image I.sub.in and I.sub.out, respectively. For instance, PE.sub.1 relates to ingesting a video stream, PE.sub.2 relates to reframing the video stream and PE.sub.3 relates to compressing the video stream. Reframing is used for instance when a sport event is covered by a broadcast program and includes selecting a portion of an image that contains the region of interest while other parts of the image are discarded.
(55) The processing of the input video stream as a whole is symbolized by an arrow 404 pointing from the video input stream to the video output stream. The video input stream contains one full image I.sub.in per time period T.sub.period. It is assumed that PE.sub.1 receives the first image of the video stream at time t=0, then it receives every t.sub.i=i×T.sub.period another full image of the video input stream, wherein i=1, 2, 3, . . . . Further, in live video production it is required that the output video stream has the same image frequency as the input video stream, i.e. the output video stream contains one full image every time period T.sub.period, too. Since the processing of the input video stream and the transfer from PE.sub.1 to PE.sub.2 and then to PE.sub.3 also takes time, the sequence of the images of the output video stream is shifted by an overall latency A compared to the image sequence of the input video stream. Specifically, if the n.sup.th image of the video input stream is received at t(n)=n×T.sub.period then the video output stream is outputted at T(n)=n×T.sub.period+A.
(56) The processing times in each processing element PE.sub.n are labeled TPE.sub.n. The processing time depends to a certain degree on the contents of the image. For the sake of explanation, it is assumed a longitudinal object flies from the left to right through the viewing field of the camera. At first, only a small part of the object is visible in the camera image and then more and more of the object, until the object has left the camera image again on the right-hand side. The processing time for encoding such a sequence of images, I.sub.i varies a bit. The variation of the processing time is expressed as Δt.sub.n,i for each processing element PE.sub.n. Specifically, the latency between the entry of image I.sub.i into processing element PE.sub.n and the exit at the output of processing element PE.sub.n is TPE.sub.n+Δt.sub.n,i
(57) In addition to the processing time, there is a network transfer time λ.sub.1 and λ.sub.2 involved for transferring the video images from processing element PE.sub.1 to PE.sub.2 and from processing element PE.sub.2 to processing element PE.sub.3, respectively. Furthermore, a certain amount of network jitter is unavoidable. Therefore, the network transfer times may vary between images which is expressed by an index i. The amount of jitter is described as Δλ.sub.1,i and Δλ.sub.2,i, respectively Hence, the effective times for transferring the video images from processing element PE.sub.1 to PE.sub.2 and from processing element PE.sub.2 to processing element PE.sub.3, are λ.sub.1+Δλ.sub.1,i and Δ.sub.2,i+Δλ.sub.2,i, respectively.
(58) The overall latency A describing the delay between the entry of an image I.sub.i into the input of PE.sub.1 and the exit of the same image at the output of PE.sub.3 can therefore be expressed as
A=(TPE.sub.1+Δt.sub.1,i+λ.sub.1+Δλ.sub.1,i)+(TPE.sub.2+Δt.sub.2,i+Δ.sub.2+Δλ.sub.2,i)+(TPE.sub.3+Δt.sub.3,i).
(59) The processing times TPE.sub.n and the transfer times λ.sub.n are known from an initialization phase of the video processing system. The processing and transfer times are determined by the run-time orchestrator. Once determined, the processing times TPE.sub.n and the transfer times λ.sub.n are stored in a lookup table (LUT). Once a workflow is configured, the run time orchestrator reads out the processing times TPE.sub.n and the transfer times λ.sub.n associated with this particular workflow from the LUT and determines the overall latency à without considering the variation of processing times Δt.sub.1,i and transfer times Δλ.sub.1,i caused by network jitter:
Ã=(TPE.sub.1+λ.sub.1)+(TPE.sub.2+λ.sub.2)+TPE.sub.3
(60) If a processed image frame or block from processing pipeline 401 is needed as an input in another processing pipeline at a time T.sub.0 then the run time orchestrator ensures that an input image frame or block is entered into the processing pipeline at a time instant T.sub.0−Ã.
(61) The synchronicity of the video image streams and congestion free network design with small jitter limits the variation of the network transfer times and the variation of the processing times to low values. The jitter remains in the order of milliseconds. A small amount of jitter and variation of processing times in conjunction with knowing the overall latency à allows for small buffer needs in each of the processing elements. In most cases a buffer size for storing one or two frames is sufficient.
(62) For live video productions it is important that the latency remains as small as possible. In a distributed video production system, the transmission of data blocks between distributed processing elements make a significant contribution to the latency. Therefore, it is important to properly implement the transmission of the data blocks to limit the latency.
(63) Video processing in a distributed video processing system inherently involves transferring images from a storage location on a storage device to a processing element where the images are processed. The storage device is implemented for instance as a hard disk drive, an array of hard disk drives, a RAID array, as solid-state disks or as a cloud storage, to name only a few possible implementations. As it has been discussed above with reference to
(64)
(65) From
(66) In practice, a video production system handles multiple live video streams and transferring one stream continuously would block the system and bring other workflows to a standstill, which is not acceptable in live video productions.
(67) It is therefore difficult to find a good balance between an efficient transfer of a video stream and a small latency for the video production system. The reason for this trade-off is that a block of a second stream can only be stored or transferred if the handling of a block of the first video stream has been completed. This creates an additional latency for the video production system. However, latencies must be avoided as much as possible in live productions as it has been mentioned before.
(68)
(69) The latency T.sub.L also increases if there are more than two streams that need to be stored or transferred. In an example with three streams (not shown), the first block of the third stream can only be stored after the first blocks of the first and second stream have been stored.
(70) A workflow orchestrator determines the block size such that a good compromise is found between the number of video streams that need to be continuously read or written and a maximum acceptable latency. The workflow orchestrator is a software program running on one of the servers inside the video processing system. In one embodiment the workflow orchestrator is hosted on the video production server hosting one or multiple processing devices. Furthermore, there are so-called variable accesses to the stored media streams, e.g. a selection of a few images at a specific time point, the selection of one out of every n images of a video stream, the selection of a section of images, etc. are read from a storage.
(71) For a multi-viewer application, a minimum latency is highly desirable, and the block contains only a single line. In other applications the user of the video production system defines an acceptable magnitude of the latency.
(72) In one embodiment of the video production system its flexibility is increased by organizing the blocks as hierarchical blocks, i.e. a large block can be fetched as smaller blocks from a storage device. Conversely, the concept allows for grouping smaller blocks into one large block. This concept is particularly advantageous for example, if a video stream has been ingested when long latencies were acceptable and later it is used in a live production when latency must be kept low. Of course, the reverse is also possible, namely that the video stream is stored in small blocks. Later it is read out again in large blocks, wherein each large block contains a plurality of the original small blocks.
(73)
(74) The method proposed according to the present disclosure obtains a synchronization of video streams and workflows in a distributed video processing system.
(75) As long as there are no changes or modifications of the workflows executed by the distributed video processing system only steps S1 and S2 are performed and all other steps are skipped. As soon as at least one workflow is changed or modified all steps S1 to S8 are executed again. The reason is that if one workflow has been changed it may have an impact on the entire video processing system in terms of processing and/or transmission times.
(76) TABLE-US-00001 Reference Signs List 100 Distributed video production system 102 Input devices 103a,b Networks 104a-c Storage devices 106a,b Processing devices 107 Asset management device 108 Database 109 Graphical user interface 200 Processing element 201 Network 202 I/O module 203 Processing unit 204 Double-headed arrow 205 Double-headed arrow 300 Workflow 301 Ingest 302 Encode 303 Store 304 Arrow 305 Arrow 307-309 Processing elements 310 Dashed line 311 Processing chain 401 Processing pipeline 402 bar 403 Timeline 404 Arrow 501-503 bar 504 Timeline 506 Initialization time 507 Transmission time 601-602 Video streams 603 Angles