GEOTAGGED VIDEO SPATIAL INDEXING METHOD BASED ON TEMPORAL INFORMATION

20230079719 · 2023-03-16

    Inventors

    Cpc classification

    International classification

    Abstract

    A geotagged video spatial indexing method for video retrieval based on a two-dimensional (2D) temporal grid is disclosed and includes:

    generating the 2D temporal grid by using the earliest start time, the latest end time of all geotagged video clips and a temporal resolution; calculating row and column number information of each geotagged video clip based on its start time and end time; generating a spatial point for each geotagged video clip based on the row and column number information and obtaining a spatial point set; generating a R-tree spatial index structure corresponding to the spatial point set using a R-tree spatial index method; and locating the corresponding cell in a temporal grid based on retrieval conditions, generating a spatial point based on row and column number information of the grid cell, and finding the spatial point in the R-tree spatial index structure to get the geotagged video corresponding thereto.

    Claims

    1. A geotagged video spatial index method for video retrieval based on a two-dimensional (2D) temporal grid, comprising: step 1: calculating row and column number information of each of geotagged video clips by combining a start time and an end time of each of the geotagged video clips based on an earliest start time T.sub.s, a latest end time T.sub.e and a temporal resolution of the geotagged video clips, and thereby mapping one-dimensional continuous time interval information to a 2D grid space based on corresponding temporal resolution to obtain the 2D temporal grid; wherein in the step 1, the calculating row and column number information of each of geotagged video clips, specifically comprises: step A1: calculating original column number information u and row number information v of the k-th geotagged video clip under the temporal resolution ct using a formula (1): { u = ( t k s - T s ) ct v = ( t k e - T s ) ct ; ( 1 ) where t.sub.k.sup.s and t.sub.k.sup.e represent the start time and the end time of the k-th geotagged video clip respectively; T.sub.s represents the earliest start time among the geotagged video clips, T.sub.s=min({t.sub.k.sup.s}), k=1, 2, 3 . . . K, and K represents a total number of the geotagged video clips; step A2: expressing the original row and column number information in a form of a vector (u, v, 1), and obtaining the column number information j and the row number information i of the k-th geotagged video clip in the 2D temporal grid by calculating the vector according to preset rules; wherein the preset rules comprise: j=u and i=v when 1≤u<(N+1)/2 and 1<v≤(N+1)/2; the vector (u, v, 1) is converted to (j, i, 1) using a formula (2) when v>u+(N+1)/2, 1≤u≤(N+1)/2, and (N+1)/2<v≤N: ( j i 1 ) = ( u v 1 ) × ( 0 1 0 1 0 0 - 1 0 1 ) ; ( 2 ) the vector (u, v, 1) is converted to (j, i, 1) using a formula (3) when (N+1)/2<v≤u+(N+1)/2≤N, and 1≤u≤(N+1)/2: ( j i 1 ) = ( u v 1 ) × ( 1 0 0 0 1 0 0 - ( N + 1 ) 2 1 ) ; ( 3 ) the vector (u, v, 1) is converted to (j, i, 1) using a formula (4) when (N+1)/2<u≤N, and (N+1)/2≤v≤N: ( j i 1 ) = ( u v 1 ) × ( 1 0 0 0 1 0 0 - ( N + 1 ) 2 1 ) ; ( 4 ) where N represents columns of the 2D temporal grid, N=(T.sub.e−T.sub.s−ct)/ct, and T.sub.e represents the latest end time among the geotagged video clips, T.sub.e=max({t.sub.k.sup.e}); step 2: generating a spatial point for each of the geotagged video clips, based on the row and column number information of each of the geotagged video clips in the 2D temporal grid, and building a spatial point set P by the spatial points of the geotagged video clips; step 3: generating a R-tree spatial index structure corresponding to the spatial point set P using a R-tree spatial index method; step 4: according to a starting time and an end time defined by retrieval conditions, generating a corresponding point accord to row and column number information in the 2D temporal grid, and searching the point in the generated R-tree spatial index structure to get geotagged video clips corresponding to the point.

    2. (canceled)

    3. The geotagged video spatial indexing method for video retrieval based on the 2D temporal grid according to claim 1, wherein in the step 2, the generating a spatial point for each of the geotagged video clips, comprises: taking the column number information j of the k-th geotagged video clip in the 2D temporal grid to be as x, and the row number information i of the k-th geotagged video clip in the 2D temporal grid to be as y, and thereby generating the spatial point p.sub.k (x, y) of the k-th geotagged video clip; where k=1, 2, 3 . . . , K, and K represents a total number of the geotagged video clips.

    4. The geotagged video spatial indexing method for video retrieval based on the 2D temporal grid according to claim 1, wherein the step 3 comprises: performing segmentation on a spatial range where the spatial point set P is located to obtain rectangular regions; wherein each of the rectangular regions is called a directory rectangle and used as an intermediate node of a tree structure; and allocating the spatial points to the rectangular regions according to spatial locations, and generating sub-nodes of the intermediate nodes.

    5. The geotagged video spatial indexing method for video retrieval based on the 2D temporal grid according to claim 4, wherein in the step 4, the searching the point in the generated R-tree spatial index structure comprises: searching for the directory rectangle that satisfies the retrieval conditions and locating the searched directory rectangle into the intermediate node corresponding thereto; and searching the sub-nodes of the located intermediate node to find a spatial point object that satisfy the retrieval conditions.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0029] FIG. 1 is a flowchart of a geotagged video spatial indexing method for video retrieval based on temporal information provided by an embodiment of the invention.

    [0030] FIG. 2 is a schematic diagram of the 2D temporal grid provided by an embodiment of the invention.

    [0031] FIG. 3 is a schematic diagram of generating spatial point objects of geotagged video clips provided by an embodiment of the invention.

    [0032] FIG. 4 is a schematic diagram of generating an R-tree spatial index structure provided by an embodiment of the invention.

    [0033] FIG. 5 is a schematic diagram of a process of geotagged video retrieval based on the 2D temporal grid provided by an embodiment of the invention.

    [0034] FIG. 6 is a schematic diagram of a test result of a geotagged video retrieval efficiency provided by an embodiment of the invention.

    DETAILED DESCRIPTION OF THE EMBODIMENTS

    [0035] In order to make the objectives, technical solutions and advantages of the invention clearer, the technical solutions in the embodiments of the invention will be clearly described below in combination with the drawings in the embodiments of the invention. Apparently, the described embodiments are part of the embodiments of the invention, not all of them. Based on the embodiments of the invention, all other embodiments obtained by a person of ordinary skill in the art without making creative labor fall within the scope of the invention.

    [0036] As shown in FIG. 1, an embodiment of the invention discloses a geotagged video spatial indexing method for video retrieval based on a 2D temporal grid, including the following steps.

    [0037] S101: based on an earliest start time T.sub.s, a latest end time T.sub.e and a temporal resolution of all geotagged video clips, calculating row and column number information of each geotagged video clip by combining a start time and an end time of each geotagged video clip, and thereby mapping one-dimensional continuous time interval information to a 2D grid space based on corresponding temporal resolution to obtain the 2D temporal grid.

    [0038] Specifically, this step S101 mainly defines the 2D temporal grid based on a grid structure in combination with the temporal resolution and temporal range.

    [0039] Generally speaking, video clips shot with cameras or smartphones are all shot at a specific moment and have a certain shooting duration, and the time interval can be expressed as [t.sub.s, t.sub.e], where t.sub.s and t.sub.e represent the start time and end time of the video clip, respectively. Please set the start time and end time of the k-th video clip as t.sub.k.sup.s and t.sub.k.sup.e respectively, and t.sub.k.sup.s≤t.sub.k.sup.e. In an illustrated embodiment, calculating the row and column number information of the k-th geotagged video clip specifically includes:

    [0040] step A1: using a formula (1) to calculate original column number information u and row number information v of the k-th geotagged video clip under the temporal resolution ct:

    [00005] { u = ( t k s - T s ) ct v = ( t k e - T s ) ct ; ( 1 )

    [0041] Where t.sub.k.sup.s and t.sub.k.sup.e represent the start time and the end time of the k-th geotagged video clip, respectively; T.sub.s represents the earliest start time among all geotagged video clips, T.sub.s=min({t.sub.k.sup.s}), k=1, 2, 3 . . . K, K represents a total number of the geotagged video clips. The temporal resolution ct can be measured in hours, minutes, seconds, or other different durations.

    [0042] step A2: expressing the original row and column number information in a form of a vector (u, v, 1), and obtaining the new column number information j and the new row number information i of the k-th geotagged video clip in the 2D temporal grid by calculating the vector according to preset rules.

    [0043] The preset rules specifically include:

    [0044] j=u and i=v when 1≤u<(N+1)/2 and 1<v≤(N+1)/2;

    [0045] converting the vector (u, v, 1) to (j, i, 1) using a formula (2) when u>v+(N+1)/2, 1≤u≤(N+1)/2, and (N+1)/2<v≤N:

    [00006] ( j i 1 ) = ( u v 1 ) × ( 0 1 0 1 0 0 - 1 0 1 ) ; ( 2 )

    [0046] converting the vector (u, v, 1) to (j, i, 1) using a formula (3) when (N+1)/2<v≤u+(N+1)/2≤N, and 1≤u≤(N+1)/2:

    [00007] ( j i 1 ) = ( u v 1 ) × ( 1 0 0 0 1 0 0 - ( N + 1 ) 2 1 ) ; ( 3 )

    [0047] converting the vector (u, v, 1) to (j, i, 1) using a formula (4) when (N+1)/2<u≤N, and (N+1)/2≤v≤N:

    [00008] ( j i 1 ) = ( u v 1 ) × ( 1 0 0 0 1 0 0 - ( N + 1 ) 2 1 ) ; ( 4 )

    [0048] where N represents columns of the 2D temporal grid, N=(T.sub.e−T.sub.s−ct)/ct, and T.sub.e represents the latest end time among all the geotagged video clips. T.sub.e=max({t.sub.k.sup.e}).

    [0049] The row and column number information of all geotagged video clips can be obtained by calculating according to the above steps, and finally the 2D temporal grid with M rows and N columns can be obtained, where M=(N+1)/2.

    [0050] It can be seen from the above embodiment that the 2D temporal grid includes a plurality of temporal grid cells, and location information of each temporal grid cell represents time period information of the geotagged video clip corresponding thereto. As shown in FIG. 2, column information of the 2D temporal grid represents the start time of the geotagged video clip, and row information of the 2D temporal grid represents the end time of the geotagged video clip, each temporal grid cell may be used to store an identification ID of the geotagged video clip. Still taking FIG. 2 as an example, it is assumed that the k-th geotagged video clip is located in the i-th row and j-th column of the 2D temporal grid, and its temporal grid is C.sub.k (j, i), and its time semantics are defined as follows:

    [0051] (1) in R1 sub-region, i.e., when 1≤j<M, 1≤i<M and j<i, the time semantics of cell (j, i) starts at j and ends at i.

    [0052] (2) in R2 sub-region, i.e., when M<j≤N, 1≤i<M and j≥i+M, the time semantics of cell (j, i) starts at i and ends at j+1.

    [0053] (3) in R3 sub-region, i.e., when M<j≤N, 1<i≤M and j<i+M, the time semantics of cell (j, i) starts at j and ends at i+M.

    [0054] (4) in R4 sub-region, i.e., when 1≤j≤M, 1≤i≤M and j≥i, the time semantics of cell (j, i) starts at j and ends at i+M.

    [0055] S102: based on the row and column number information of each geotagged video clip in the 2D temporal grid, generating a spatial point for each geotagged video clip, and building a spatial point set P by the spatial points of all geotagged video clips.

    [0056] In an illustrated embodiment, as shown in FIG. 3, generating the spatial point for each geotagged video clip specifically includes: taking the column number information j of the k-th geotagged video clip in the 2D temporal grid to be as x, and the row number information i of the k-th geotagged video clip in the 2D temporal grid to be as y, and thereby generating the spatial point p.sub.k (x, y) of the k-th geotagged video clip; where k=1, 2, 3 . . . , K, and K represents a total number of the geotagged video clips.

    [0057] S103: generating a R-tree spatial index structure corresponding to the spatial point set P using a R-tree spatial index method.

    [0058] In an illustrated embodiment, this step S103 specifically includes: performing segmentation on a spatial range where the spatial point set P is located to obtain rectangular regions; each rectangular region is called a directory rectangle and used as an intermediate node of a tree structure; and allocating the spatial points to the rectangular regions according to spatial locations, and generating sub-nodes of the intermediate nodes. As shown in FIG. 4.

    [0059] S104: accord to that starting time and the end time defined by the retrieval conditions, generating a corresponding point accord to the row and column number information in the 2D temporal grid, and searching the point in the generated R-tree spatial index structure to get the geotagged video clips corresponding to that point.

    [0060] In an illustrated embodiment, the searching the spatial point in the generated R-tree spatial index structure specifically includes: searching for the directory rectangle that satisfies query conditions and locating the searched directory rectangle into the intermediate node corresponding thereto; and then searching the sub-nodes of the located intermediate node to find a spatial point object that satisfy the retrieval conditions. As shown in FIG. 5, a point q represents the defined time retrieval conditions. According to the R-tree index, the point q is located in the directory rectangle, and the video clips that meet the conditions may only exist in the directory rectangle R2. Therefore, it is only compared with the subordinate sub-nodes gv7, gv8, and gv9 of R2, thereby improving the retrieval efficiency.

    [0061] The invention combines massive video retrieval requirements, from the perspective of the 2D grid, and combines the start time and end time of the geotagged video shooting to define the 2D temporal grid according to the corresponding temporal resolution; using the row and column information of grid cells to express temporal information of the geotagged video; converting the grid cells into spatial point objects, and apply the spatial indexing method to construct a temporal index based on the 2D temporal grid, the temporal-based geotagged video query can be performed in a graphical manner, which reduces the processing complexity of temporal information of the geotagged video and significantly improves query efficiency.

    [0062] In order to test the retrieval efficiency of the geotagged video spatial indexing method for video retrieval provided by the invention, the invention also provides the following experimental data.

    [0063] According to the geotagged video spatial indexing method for video retrieval provided by the invention, six geotagged video data sets from 100,000 to 1,000,000 records are used to test the response time of the retrieval. And compared with a retrieval method of traditional timestamp-based B-tree index. As shown in FIG. 6, in the case of different amounts of data, the geotagged video spatial indexing method for video retrieval based on the 2D temporal grid can significantly improve the retrieval efficiency. Among them, the maximum improvement is achieved when there are 800,000 records, and the response time is shortened by 43.34 ms. Although the retrieval efficiency decreased slightly with the increase of data volume, it still shortened the response time by 29.62 ms at 1,000,000 records. In practical applications, query optimization methods such as cache size optimization will further improve retrieval efficiency.

    [0064] Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the invention, not to limit it. Although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that the technical solutions described in the foregoing embodiments can still be modified, or some of the technical features can be equivalently replaced; these modifications or substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the invention.