LOCALIZED DYNAMIC VIDEO STREAMING SYSTEM
20230051450 · 2023-02-16
Inventors
Cpc classification
H04N21/21805
ELECTRICITY
H04N21/4126
ELECTRICITY
International classification
Abstract
A computerized system operable to provide multiple video streams of an event. In an ideal embodiment, the system provides live and dynamic streaming of an event such as a sporting event, concert, march, rally, and the like, to allow viewers to watch video of the event from nearly any angle and vantage point.
Claims
1. A video streaming system comprising: a mobile computing device comprising a camera, and a server; the mobile computing device in wireless communication with the server, and the server operable to transmit a plurality of video streams to at least one of the mobile computing device and a remote device; each of the plurality of video streams being recorded from a node comprising a video camera; and wherein the node has a location, the node comprising information about the location.
2. The video streaming system of claim 1 wherein the system comprises a multi-view video streaming system.
3. The video streaming system of claim 1 wherein the server is operable to transmit the plurality of video streams to at least one of the mobile computing device and a remote device based on a location of the mobile computing device.
4. The video streaming system of claim 3 further comprising a GPS receiver, wherein the location of the mobile computing device is determined by the GPS receiver.
5. The video streaming system of claim 1 wherein the plurality of video streams comprises a first plurality of video streams and a second plurality of video streams.
6. The video streaming system of claim 5 wherein at least one of the server, the mobile computing device, and the remote device are operable to group the first plurality of video streams into a first cluster.
7. The video streaming system of claim 6 wherein at least one of the server, the mobile computing device, and the remote device are operable to group the first plurality of video streams into the first cluster based on a proximity of a first plurality of nodes recording the first plurality of video streams.
8. The video streaming system of claim 7 wherein each node of the first plurality of nodes has a location, each node of the first plurality of nodes comprising information about its location.
9. The video streaming system of claim 5 wherein at least one of the server, the mobile computing device, and the remote device are operable to group the second plurality of video streams into a second cluster.
10. The video streaming system of claim 9 wherein at least one of the server, the mobile computing device, and the remote device are operable to group the second plurality of streams into the second cluster based on proximity of a second plurality of nodes recording the second plurality of video streams.
11. The video streaming system of claim 10 wherein each node of the second plurality of nodes has a location, each node of the second plurality of nodes comprising information about its location.
12. The video streaming system of claim 1 wherein at least one of the mobile computing device, the server, and the remote device are operable to calculate a relative position of the node based on the information about the location in order to relatively map the location of the node.
13. The system of claim wherein the plurality of video streams are not aggregated into a single video stream.
14. The system of claim 1 wherein a grouping of the plurality of video streams into a plurality of clusters, by at least one of the server, the mobile computing device, and the remote device plurality, simplifies processing, which increases an efficiency of the system.
15. The system of claim 1 wherein each of the plurality of video streams comprises a different view taken from a different angle of an event.
16. The system of claim 15 wherein at least one of the mobile computing device and the remote device is operable to receive an input selecting one of the plurality of video streams and further operable to display the different view taken from the different angle of the event.
17. A method of viewing a plurality of video streams comprising the steps of: receiving, by at least one of a mobile computing device and a remote device, the plurality of video streams via a networked connection to a server; the plurality of video streams comprising a first plurality of video streams and a second plurality of video streams; each of the first plurality of video streams and the second plurality of video streams recorded from a node comprising a video camera; and calculating, by at least one of the mobile computing device, the server, and the remote device, a relative position of each node recording the first plurality of video streams and each node recording the second plurality of video streams based on information about a location of each node.
18. The method of claim 17 further comprising the step of presenting, on a display of at least one of the mobile computing device and the remote device, an indication of the relative position of each of the plurality of video streams.
19. The method of claim 18 further comprising the step of receiving an input to display one of the plurality of video streams on the display of at least one of mobile computing device and the remote device.
20. The method of claim 19 further comprising the step of displaying the one of the plurality of video streams the display of at least one of the mobile computing device and the remote device.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
DETAILED DESCRIPTION
[0014] The detailed description set forth below in connection with the appended drawings is intended as a description of presently preferred embodiments of the invention and does not represent the only forms in which the present disclosure may be constructed and/or utilized. The description sets forth the functions and the sequence of steps for constructing and operating the invention in connection with the illustrated embodiments.
[0015] Generally, the present disclosure concerns systems and methods of dynamically and relatively mapping video streams, or “nodes” to allow a user to both record video from their position, and/or view video from another position relative to their own. The systems and methods herein may group closely positioned cameras/nodes into clusters, and then dynamically maps other nodes or clusters relative to each other keeping relative positions updated in real time. This allows for a user to view camera streams from other positions and provides a user interface allowing the user to select a stream at a desired location. The system may be used for both live video streaming, as well as replay via pre-recorded video streaming. In another embodiment, the system may also create clusters having only one node such that any node operating within the system is assigned to a cluster. Such an embodiment may simplify processing and algorithm treatment, increasing efficiency. In most embodiments, the system will be localized in that the different video streams are available to a mobile computing device within a certain proximity of the event in question. However, in other embodiments, remote devices not near the event may also access the video system.
[0016] While participation of a node contribute video streams to a specific cluster may be confined by the geographical position of the node, the consumption of the dynamic video streams is not confined by geographical position or broadcast status of the consumer. A consumer who does not contribute video streams is able to access the dynamic video contents, and can experience and browse the relatively mapped videos as a first-person audience. Moreover, the dynamic video streams recorded from an event are stored into server memory with all relevant mapping metadata. Therefore, the dynamic video contents will be available to watch interactively anytime.
[0017] This user interface may allow for relative movement (i.e. view a camera to the right, to the left, opposite, up/down, etc.) and/or may superimpose the camera rode locations onto a map or other diagram of the event location, allowing a user to select the particular location on this map and node interface. The number of cameras in each cluster, and their relative position to each other stays updated in realtime, allowing each node to browse views from other relatively mapped cameras. As a whole, the system and. methods herein allow viewing of an event from multiple viewpoints, and ideally, with enough cameras, allows for seamless dynamic visibility of an event in a specific area. This results in a localized, multi-view or dynamic video streaming system.
[0018] In many embodiments, most if not all of the video recording device nodes and viewing nodes (which may also be recording video, or not) may be mobile computing devices belonging to individuals. The term mobile computing device is used herein to refer to the various portable computerized tools carried by individuals. Such mobile computing devices include, but are not limited to smartphones, tablet computers, smart watches, video camera-equipped accessories such as glasses, hats, and other devices which are generally battery powered devices having a microprocessor and memory as well as networking capability, and optionally having a video camera and/or a display screen.
[0019] In one embodiment, dynamic mapping may be performed by a remote server, while in other embodiments, the mapping may be performed by a user's personal mobile computing device such as their mobile phone. In some cases, both server and mobile computing device can be used in the determination of relative location of a video stream node to the user device. Location information may be aggregated at a remote server, allowing users to access streams based on the requested location.
[0020] However, in one embodiment, the server may not aggregate the video streams into a single video stream.
[0021] In operation, the clustering or close nodes allows a user to select the cluster for a general location, and then to view different streams from the different nodes of the cluster, allowing the user to explore details of any specific cluster. The user interface of the system will allow the user to scroll or otherwise navigate through each cluster, and once the cluster is selected, scroll or otherwise navigate through the different options within a node. This clustering operation also allows for the system to lazily load, stream, and calculate relative locations using only a subset of the videos/streams. As such, this clustering contributes to optimizing the memory, data flow, and computation required by a server or servers, networks, and end user devices. As is well known, battery life is an important issue especially when at a public event away from a charging station. Therefore, optimizing computations, networked data flow, and thus data usage, is important to prolong battery life and reduce energy consumption. This makes the system contemplated herein more practical for frequent and extended use.
[0022] In one embodiment, the system may be operable to use GPS or other location data gathered from the mobile computing devices to efficiently group closely located nodes into clusters of other nearby nodes. Once clustered, additional computerized methods may be employed to further pinpoint relative location of the different nodes to each other and/or to a user's mobile computing device. This may provide for more simple and less resource-intensive cluster positioning, saving more resource-intensive calculations to pinpoint location for the limited nodes of each cluster. In other embodiments, computerized methods for determining location other than, or in addition to, GPS may be used to group nodes into clusters, as well as determining relative locations of the specific nodes.
[0023] While GPS location data is very convenient and handy in many cases, such as when driving or walking on city streets to find directions, resolution of GPS data is limited. GPS has an error of at least a foot or more in any direction, usually at least a few feet. This is a problem for driving directions. However, for the purposes of locating the source of a video stream in a small space such as a sports arena, this uncertainty will be insufficient for proper positioning and relative mapping by the system, and not fully suitable for many embodiments of the system disclosed herein. Therefore, additional or alternative methods may be employed to properly identify the location of a video stream to properly implement the present system.
[0024] In one embodiment, mapping of nodes may be performed using sensor data. Each node (mobile computing device) contains information about its location and the system is able to access that information pinpoint each node. For example, using GPS data in one embodiment, pairwise distance between the nodes may be calculated with the latitude/longitude points on the basis of spherical earth. For example, the haversine formula may be used to calculate the great-circle distance between two points as:
α=sin.sup.2(Δϕ/2)+cos ϕ.sub.1.cos ϕ.sub.2 sin.sup.2(Δλ/2)
[0025] Where ϕ is the latitude and λ is the longitude in radians, Δϕ and Δλ are the differences between latitude and longitude values. The distance d is then computed as:
d=R.2.atan2(√{square root over (α)},√{square root over (1−α))}
[0026] Where R is the radius of earth with a mean value of 6,371 km. This haversine formula remains numerically stable for nodes even for small distances.
[0027] The relative bearing between two nodes is the angle between the north-south line of Earth and the line connecting the two nodes. We compute this angle as:
θ=atan2(sin Δλ.cos ϕ.sub.2. cos ϕ.sub.1 sin ϕ.sub.2−sin ϕ.sub.1 cos ϕ.sub.2 cos Δλ)
[0028] With the estimates of distance and angle between two points, the system is able to calculate a reasonably precise estimate of the relative positions of two nodes.
[0029] In further embodiments, the system can determine the camera direction from a measure of absolute bearing (angle between magnetic/true north and the node itself). This provides which direction the node is facing, independent of other nodes.
[0030] In a further embodiment, accelerometer information may be used to keep track of the location of a node once determined. For example, once the location of the device is accurately known, accelerometer data may be able to tell that a user carrying the device has walked ten steps. Current positon may then be determined based on this information.
[0031] In still a further embodiment, a gyroscope data may be able to provide information about the orientation of the device. This may be particularly useful once an initial orientation of the device is known. The gyroscope may indicate an amount moved from the known position, allowing calculation of the new current position.
[0032] In another embodiment, mapping of the relative positions of nodes may be performed using computer vision techniques to compute the relative positions based on overlapped regions of image/video streams. Examples of these techniques may include, but are not limited to template matching and feature matching, among others.
[0033] For example, using template matching, it is possible to discretize two images from two different nodes with a reasonable resolution and perform a template matching over the discretized tiles. To do that, one option is to calculate the cross-correlation between two streams or images from two different nodes. This essentially is a simple sum of pairwise multiplications of corresponding pixel values of the images.
[0034] Occurrence of a matched tile will indicate that overlapping regions exist between the two images or image streams from the different nodes, and number of matched tiles will quantify the amount of overlap. The location of the matched tiles will give a sense of relative position of the camera capturing the image.
[0035] In another example, using feature matching, it is possible to use scale invariant feature transform (SIFT) to detect and describe features available in an image. SIFT is rotation and scale invariant and proceeds in four main steps: i) finding potential locations for features, ii) locating feature keypoints with reasonable accuracy, iii) assigning orientation to keypoints, iv) represent the keypoints as a high dimensional vector, and finally v) run the keypoint matching process. SIFT keypoints/features of streams coming from different nodes are initially extracted. We then compute a euclidean distance between feature vectors and find the nearest neighbors to detect an overlapping region. From all the matches, subsets of features that agree on an object and its location/scale/orientation etc. between two streams from two different nodes are identified to be good matches with reasonable confidence and considered as a candidate for overlapped region.
[0036] In still further embodiments, location may be mapped based on image comparisons using deep neural networks (“DNN”). For example, a deep neural network may be operable for learning a similarity function. One shot learning may be applied with deep neural networks for learning a similarity function between streams from two different nodes. For example, we may learn a similarity function:
d(stream.sub.1,stream.sub.2)=degree of difference between stream.sub.1 and stream.sub.2
[0037] If the two streams have a big overlapping region, the DNN needs to output a small number. On the other hand, a small or no overlap needs to yield a large number. The magnitude of d will define the amount of overlap. For this purpose, we may use a Siamese network to first encode each of the images from different nodes into an n-dimensional vector of features. We then compute a norm-distance as a measure for similarity. The Siamese network may take streams as input from different nodes and based on these, output a of features representing each stream. The Siamese network may be configured in any number of ways, including various numbers of convolution layers and fully connected layers, without straying from the scope of the invention. The Siamese network then outputs a vector of features for each stream which can be compared.
[0038] With n-dimensional feature vector f(x) computed for each stream, we compute a norm-distance as:
d(x.sub.1,x.sub.2)=∥f(x.sub.1)−f(x.sub.2)∥.sup.2
[0039] The magnitude of d(x.sub.1,x.sub.2) will provide a measure of the amount of overlap.
[0040] For example, the Siamese network contemplated herein is trained in a way to analyze images received from the nodes for the system. The Siamese network may receive two images from two different cameras. Each image will be processed by the Siamese network which will provide an output of features of that image. These feature outputs may then be compared to infer relative position of one relative to the other.
[0041] In further embodiments, we may extend this architecture to work for a general case where no overlap ping exists. This model will take two images (one from two different nodes) as input, analyze reflection, rotation, translation etc. to compute the relative distance and angle between the two nodes. In other words, while there may be no overlapping parts of the image, the system may identify the same or similar elements, such as a basketball hoop, image on a court, or similar other landmark or guidepoint. Based on the differences in perceived size, angle, reflection, rotation, translation, and the like, of this similar element, relative location can be determined.
[0042] In one embodiment, the system may similarly extract n-dimensional feature vectors using, the Siamese network, but these vectors are supposed to encode relative angle information. The distance d between nodes is computed as above. In addition to that, we compute the angle between them as:
[0043] In one embodiment, the system may be operable run all of the above mentioned processes parallely and then build a final ensemble learning model (decision trees, random forest, bagging, boosting etc.) to further improve the output.
[0044] In other embodiments, any two or more of the above methods of calculating relative position may be employed to increase accuracy and confidence. The methods may be calculated simultaneously or in series. Also, it should be understood that other non-disclosed methods of determining relative position may be used without straying from the scope of this invention.
[0045] Executing these methods for real time processing of streams can be computationally extensive. In one embodiment, the system may perform these operations lazily, for only a few nearby and most relevant nodes where the user may hop next get processed at each time step. All other nodes simply stay in the pool. If the user hops to a different node or changes location, the priority for nodes to be processed gets updated accordingly based on location data from the user node provided to the system.
[0046] In one embodiment, a user interface will encourage a user to view video streams from adjacent nodes, such that the user moves from node or cluster to adjacent node or cluster in one direction or other, in a dynamic “spinning” motion about the event at issue. In some embodiments, alternative motion such as moving directly between distant or opposite nodes will be prevented, though not in all cases. This will allow for more efficient and lazy relative mapping, compared to jumping from two very different streams. In other embodiments, jumping from node to node at random will be supported by the system. In still other embodiments, jumping from node to node at random will be possible, but discouraged through inconvenient user interface features.
[0047] All of the above methods proposed are discussed to be focused on individual nodes. But their extension to a cluster of nodes can be straightforward. The GPS coordinates or stream corresponding to a cluster can be assigned as the signal from a particular node (one-hot node best describing the cluster) or the average signals from all the member nodes and so on. After having parameters assigned to a cluster, all the methods described above will readily extend for the cluster. Organizing a group of nodes as a cluster will provide two benefits: [0048] i) The user will be able to browse nodes hierarchically, first hopping from cluster to cluster and then choosing a node for watching its stream. [0049] ii) Clusters will make lazy loading of nodes easier. For example, when the user is browsing inside one cluster, it is safe to skip processing nodes in other clusters.
[0050] One embodiment of forming a cluster involves a manual approach: consider the radius of a cluster as a hyper-parameter and manually tune it for optimizing the performance and user experience. Another embodiment of forming a cluster involves a dynamic approach: dynamically set the size of the radius of the cluster based on the density of nodes in a specific geographic area. The more nodes- and thus the more video streams, in a particular area, the smaller an area covered by a cluster can be. With fewer nodes in an area, a cluster can be larger.
[0051] The system has been described thus far as relative to one user computerized device. As will be understood by those skilled in the art, in practice, many mobile computerized devices may be engaged with the server (or servers depending on scale) as nodes, with each mobile computerized device having the same functionality as discussed with respect to the single device. Indeed a plurality of mobile computing devices may be recording video, streaming video from the system, and/or accessing one of the plurality of different video streams at an event, thereby operating as nodes of the system. Accordingly, systems comprising multiple mobile computing devices as well as multiple servers, operating in the same way as discussed herein with respect to a single user device, are well within the scope of this disclosure, and any of the different embodiments disclosed.
[0052] Turning now to
[0053]
[0054]
[0055]
[0056]
[0057]
[0058] While several variations of the present disclosure have been illustrated by way of example in preferred or particular embodiments, it is apparent that further embodiments could be developed within the spirit and scope of the present disclosure, or the inventive concept thereof. However, it is to be expressly understood that such modifications and adaptations are within the spirit and scope of the present disclosure, and are inclusive, but not limited to the following appended claims as set forth. All different aspects and embodiments disclosed herein may be applicable to other embodiments, such that, unless otherwise noted, all features disclosed herein are interchangeable with other features and versions of the invention.