SHOOTING TARGET COLLATION METHOD, SHOOTING TARGET COLLATION DEVICE, AND PROGRAM
20230018573 · 2023-01-19
Assignee
Inventors
- Hiroyuki ISHIHARA (Musashino-shi, Tokyo, JP)
- Shiro KUMANO (Musashino-shi, Tokyo, JP)
- Takayuki NAKACHI (Musashino-shi, Tokyo, JP)
Cpc classification
G06V40/23
PHYSICS
International classification
Abstract
An object of the invention is to provide a photographed target matching method and the like which allows video image data including a plurality of photographed targets and sensor data from terminals worn by the targets to be automatically associated with one another, so that a data set for analysis can be produced. According to the photographed target matching method according to the present invention, a gravitational acceleration component g.sub.c in a camera coordinate system is estimated for an arbitrary combination of a photographed target j in the camera video image and a terminal i from acceleration vectors a.sub.c.sup.(j)(t) of a plurality of photographed targets j produced from video image data from the camera and acceleration vectors a.sub.d.sup.(i)(t) obtained from the sensors of terminals i worn by the plurality of photographed targets, and a combination of (i, j, τ) is obtained which maximizes the correlation between an acceleration vector (a.sub.c.sup.(j)+g.sub.c) in the video data obtained by adding the gravitational acceleration component in a camera coordinate system and the acceleration vector a.sub.d.sup.(i)(t) of the terminal when these vectors are shifted by the estimated time gap τ and compared to match the target in the camera video image and the terminal.
Claims
1. A photographed target matching method, comprising: photographing a moving image including a plurality of targets to be photographed j by a fixed camera; obtaining acceleration vectors a.sub.c.sup.(j)(t) of the photographed targets j from the moving image; obtaining acceleration vectors a.sub.d.sup.(i)(t) of terminals i carried by the photographed targets j; estimating a gravitational acceleration vector g.sub.c in a camera coordinate system for an arbitrary combination of the photographed target j and the terminal i from the acceleration vectors a.sub.c.sup.(j)(t) and the acceleration vectors a.sub.d.sup.(i)(t); and adding the gravitational acceleration vector g.sub.c to the acceleration vector a.sub.c.sup.(j)(t) and comparing the result with the acceleration vector a.sub.d.sup.(i)(t) to match the photographed target j included in the moving image and data in the terminal i.
2. The photographed target matching method according to claim 1, further comprising: calculating motion similarity between the acceleration vector a.sub.d.sup.(i)(t) and an acceleration obtained by adding the gravitational acceleration vector g.sub.c to the acceleration vector a.sub.c.sup.(j)(t) in matching the photographed target j included in the moving image and the data in the terminal i; calculating gravitational acceleration similarity between the gravitational acceleration vector g.sub.c and the gravitational acceleration vector g′.sub.c estimated assuming that the terminals i are synchronized in time; and detecting a combination of the terminal i and the photographed target j which maximizes an objective function obtained as a weighted sum of the motion similarity and the gravitational acceleration similarity.
3. A photographed target matching device, comprising: an input unit to which acceleration vectors a.sub.c.sup.(j)(t) of photographed targets j included in a moving image photographed by a fixed camera and acceleration vectors a.sub.d.sup.(i)(t) measured by terminals i carried by the photographed targets j are input; an estimating unit which estimates a gravitational acceleration vector g.sub.c in a camera coordination system for an arbitrarily combination of the photographed target j and the terminal i from the acceleration vectors a.sub.c.sup.(j)(t) and the acceleration vectors a.sub.d.sup.(i)(t); and a detecting unit which adds the gravitational acceleration vector g.sub.c to the acceleration vector a.sub.c.sup.(j)(t) and compares the result with the acceleration vector a.sub.d.sup.(i)(t) to match the photographed target j included in the moving image and data in the terminal i.
4. The photographed target matching device according to claim 3, wherein the estimating unit estimates the gravitational acceleration vector g′.sub.c when it is assumed that the terminals i are synchronised in time, and when the photographed target j included in the moving image and data in the terminal i are matched, the detecting unit calculates motion similarity between the acceleration vector a.sub.d.sup.(i)(t) and acceleration obtained by adding the gravitational acceleration vector g.sub.c to the acceleration vector a.sub.c.sup.(j)(t), calculates gravitational acceleration similarity between the gravitational acceleration vector g.sub.c and the gravitational acceleration vector g′.sub.c, and detects a combination of the terminal i and the photographed target j which maximises an objective function obtained as a weighted sum of the motion similarity and the gravitational acceleration similarity.
5. A program for causing a computer to execute a photographed target matching method, the photographed target matching method comprising: photographing a moving image including a plurality of targets to be photographed j with a fixed camera; obtaining acceleration vectors a.sub.c.sup.(j)(t) of the photographed targets j from the moving image; obtaining acceleration vectors a.sub.d.sup.(i)(t) from terminals i carried by the photographed targets j; estimating a gravitational g.sub.c in a camera coordinate system for an arbitrary combination of the photographed target j and the terminal i from the acceleration vectors a.sub.c.sup.(j)(t) and the acceleration vectors a.sub.d.sup.(i)(t); and adding the gravitational acceleration vector g.sub.c to the acceleration vector a.sub.c.sup.(j)(t) and comparing the result with the acceleration vector a.sub.d.sup.(i)(t) to match the photographed target j included in the moving image and data in the terminal i.
6. The program according to claim 5, wherein when the photographed target j included in the moving image and the data in the terminal i are matched, the method further comprises: calculating motion similarity between the acceleration vector a.sub.d.sup.(i)(t) and an acceleration obtained by adding the gravitational acceleration vector g.sub.c to the acceleration vector a.sub.c.sup.(j)(t); calculating gravitational acceleration similarity between the gravitational acceleration vector g.sub.c and the gravitational acceleration vector g′.sub.c on the basis of the assumption that the terminals i are synchronised in time; detecting a combination of the terminal i and the photographed target j which maximises an objective function obtained as a weighted sum of the motion similarity and the gravitational acceleration similarity.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
DESCRIPTION OF EMBODIMENTS
[0035] Embodiments of the present invention will be described in conjunction with the accompanying drawings. The following embodiments are examples only and are not intended to limit the present invention. The elements denoted by the same reference characters refer to the same elements in the specification and the drawings.
[0036] [Measuring System]
[0037]
[0038] The wearable devices 12 and the fixed camera 11 have their own clocks and give timestamps to moving images and acceleration data according to the clocks. It is not guaranteed that the clock of each of the wearable devices 12 and the clock of the fixed camera 11 are synchronized with each other.
[0039] [Definitions]
[0040] The definitions of parameters which appear in the following description will be described.
[0041] N is the number of targets (number of people) to be photographed.
[0042] t is time.
[0043] An acceleration signal (vector) measured with a wearable device 12-i is represented as follows.
[Math A01]
a.sub.d.sup.(i)(t) (i=1,2, . . . , N) (A01)
[0044] The vector has three axis components for x, y, and z axes. The coordinate system depends on the direction of the wearable device 12 (changing with time).
[0045] The acceleration signal (vector) of a photographed target (person) j in a video image is represented as follows.
[Math A02]
a.sub.c.sup.(j)(t) (j=1,2, . . . , N) (A02)
[0046] The vector has three axis components for x, y, and z axes. The coordinate system depends on the direction of the fixed camera 11.
[0047] A gravitational acceleration component (vector) measured with the wearable device 12-i is represented as follows.
[Math A03]
g.sub.d.sup.(i)(t) (i=1,2, . . . , N) (A03)
[0048] The coordinate system depends on the direction of the wearable device 12 (changing with time).
[0049] The result of matching between the ID of a wearable device estimated by the photographed target matching device 301 and the ID of a person in the camera is represented as follows.
[Math A04]
C*={(i.sub.n*,j.sub.n*)} (n=1,2, . . . , N) (A04)
where τ is the timestamp lag between the real fixed camera 11 and the wearable device 12, and τ* is the timestamp lag between the fixed camera 11 estimated by the photographed target matching device 301 and the wearable device 12.
[0050] The track (vector) of a photographed target (person) j in a video image is represented as follows.
[Math A05]
P.sub.c.sup.(j)(t) (j=1,2, . . . , N) (A05)
[0051]
[0052] [Object]
[0053] The photographed target matching device 301 is directed to estimation of the following two things. [0054] (1) Matching C* [0055] (2) Lag amount τ*
[0056] [Assumption]
[0057] It is assumed that during measuring, all the photographed targets (persons) are visible in a video image.
[0058] It is assumed that the timestamps between the plurality of wearable devices 12 are synchronized in a known manner.
[0059] [Details]
[0060]
[0061] Data from Wearable Device 12
[0062] Each of persons to be photographed target wears a wearable device 12, and the wearable device 12 measures acceleration associated with the motion of the person. The wearable device 12 can also measure gravitational acceleration. Therefore, the acceleration data (vector) a.sub.d.sup.(i)(t) output by the wearable device 12 can be represented by the following expression.
[Math 1]
a.sub.d.sup.(i)(t)=a.sub.d:m.sup.(i)(t)+g.sub.d.sup.(i)(t) (1)
where a.sub.d:m.sup.(i)(t) is acceleration by the motion of a person (an acceleration component other than the gravitational acceleration component), and g.sub.d.sup.(i)(t) is the gravitational acceleration component.
[0063] Data from Fixed Camera 11
[0064] The fixed camera captures a moving image of N persons to be photographed target. The moving image is input to the converting unit 15. The converting unit 15 detects a person from the moving image using an existing person detection algorithm. The converting unit 15 also obtains the position P.sub.c.sup.(j)(t) (see Expression A05) of the detected person in the video image. Then, the converting unit 15 calculates the acceleration vector a.sub.c.sup.(j)(t) in Expression A02 using the following expression.
[0065] Note that Δt is the frame rate of the moving image.
[0066] In
[0067] Here, the acceleration data a.sub.d.sup.(i)(t) from the wearable device 12 includes a gravitational acceleration component, but the data a.sub.c.sup.(j)(t) from the fixed camera 11 converted by the converting unit 15 does not include a gravitational acceleration component. Therefore, the acceleration data a.sub.d.sup.(i)(t) and the acceleration data a.sub.c.sup.(j)(t) cannot be compared directly.
[0068] Therefore, the photographed target matching device 301 introduces the gravitational acceleration g.sub.c into the camera coordinate system. Since the fixed camera 11 is fixed, the gravitational acceleration g.sub.c is estimated as a constant vector. The estimation method will be described in the following.
[0069] The following expression is established between the acceleration data a.sub.d.sup.(i)(t) and the acceleration data a.sub.c.sup.(j)(t). In the expression, the subscripts i and j and the variable (t) are omitted.
[Math 3]
∥a.sub.c∥.sub.2=∥a.sub.d:m∥.sub.2,
a.sub.c.sup.Tg.sub.c=a.sub.d:m.sup.Tg.sub.d,
∥g.sub.c∥.sub.c=∥g.sub.d∥.sub.2=9.8.sup.2 [m/s.sup.2] (3)
where ∥x∥.sub.2 is the L2 norm of the vector x, and x.sup.T is the transposed matrix of the vector x.
[0070] The relation represented by the following expression is established from Expressions 1 and 3. Also in the following expression, the subscripts i and j and the variable (t) are omitted.
[0071] In Expression 4, the only unknown vector is g.sub.c. Therefore, the extracting unit 22 extracts M samples each from the acceleration data a.sub.d.sup.(i)(t) and the acceleration data a.sub.c.sup.(j)(t), and the estimating unit 23 formulates the following simultaneous equations to estimate g.sub.c. Also in the following expression, the subscripts i and j and the variable (t) are also omitted.
[0072] Note that g.sub.c in a general expression is as follows.
[Math 5a]
g.sub.c=argmin.sub.g.sub.
[0073] The M samples are acceleration data pieces at time t=t.sub.1, t=t.sub.2, . . . , and t=t.sub.M, as shown in the example in the following expression.
[Math 5b]
M sample of a.sub.d.sup.(i)(t)=[a.sub.d.sup.(i)(t=t.sub.1),a.sub.d.sup.(i)(t=t.sub.2), . . . , a.sub.d.sup.(i)(t=t.sub.M−1),a.sub.d.sup.(i)(t=t.sub.M)] (5b)
[0074] Here, assuming that the timestamps between the wearable devices 12 are synchronized, Expression 5 can be extended as follows. Also in the following expression, the subscripts i and j and the variable (t) are omitted.
[0075] Note that g′.sub.c in a general expression is as follows.
[Math 6a]
g′.sub.c=argmin.sub.g′.sub.
[0076] When the timestamps between the wearable devices 12 are synchronized, the effect of a large noise if any in a signal from any of the wearable devices 12 can be reduced.
[0077] In the following description, the gravitational acceleration component estimated by Expression 5 which depends on i, j and τ is expressed by
[Math A06]
g.sub.c.sup.(i,j)(τ) (A06)
[0078] While the gravitational acceleration component estimated by Expression 6 which depends only on τ is expressed by
[Math A07]
g′.sub.c(τ) (A07)
[0079] The gravitational acceleration g.sub.c according to Expression A06 or A07 estimated by the estimating unit 23 is added to the acceleration vector a.sub.c.sup.(j)(t) according to Expression 2, so that the result can be directly compared with the acceleration vector a.sub.d.sup.(i)(t) according to Expression 1.
[0080] The direct comparison is made at the detecting unit 24. Here, a specific example of the direct comparison will be described. First, in order to compensate for the effect of the difference between the coordinate systems of the wearable device 12 and the fixed camera 11, the L-2 norms of the respective acceleration vectors are used as the feature quantities for comparison as shown in the following expressions.
[Math A08]
s.sub.d.sup.i(t)=∥a.sub.d.sup.(i)(t)∥.sub.2 (A08)
[Math A09]
s.sub.c.sup.j(t+τ)=∥a.sub.c.sup.(j)(t+τ)+g′.sub.c(τ)∥.sub.2 (A09)
[0081] Expression A09 represents a vector obtained as s.sub.c.sup.j(t) is shifted by time τ.
[0082] The detecting unit 24 calculates the correlation between these two acceleration values by the following expression, and the combination of i, j, and τ which maximizes the correlation value is obtained as the estimation result.
[0083] The correlation in Expression 7 will be referred to as “motion similarity”.
[0084] Here, the detecting unit 24 preferably introduces a parameter called Gravity Direction Consistency (GDC). The gravitational acceleration component in Expression A06 has different values depending on combinations of i, j, and τ, and is correct when the combination of i, j, and τ is correct. Meanwhile, the gravitational acceleration component in Expression A07 depends only on τ and has a correct value when τ is correct. More specifically, for the correct combination of i, j, and τ, the gravitational acceleration component with Expression A06 is equal to the gravitational acceleration component with Expression A07. GDC describes this constraint in the following expression representing “gravitational acceleration similarity”.
[0085] The detecting unit 24 generates a weighted sum of motion similarity in Expression 7 and gravitational acceleration similarity in Expression 8.
where λ is a weight coefficient (0≤λ≤1).
[0086] The detecting unit 24 further produces the following relation for each τ.
[Math A10]
C(τ)={c.sub.1=(ĩ.sub.1,{tilde over (j)}.sub.1,), . . . , c.sub.N=(ĩ.sub.N,{tilde over (j)}.sub.N,)} (A10)
[0087] Here, c.sub.1 to c.sub.N in C(τ) are the combinations of the estimated photographed target j and the number (ID) of terminal i. The mark “˜” above i and j in Expression A10 indicates that it is ID after estimation.
[0088] Finally, the detecting unit 24 finds and outputs the set of i, j, and τ which maximizes the objective function as shown in the following expression.
[0089]
[0090] Expressions 7 and 8 for calculating similarity are examples, and other expressions may be used to obtain similarity.
[0091]
[0092] Instep S06, step S06a and step S06b are performed. When performing steps S06a and S06b, the gravitational acceleration vector g.sub.c and the gravitational acceleration vector g′.sub.c are estimated for each combination of i, j, and τ from both Expressions 5 and 6 in step S05.
[0093] Then, the detecting unit 24 calculates motion similarity in Expression 7 and gravitational acceleration similarity in Expression 8 (step S06a). Then, the detecting unit 24 adds the motion similarity and the gravitational acceleration similarity as in Expression 9 and detects the combination of i, j, and τ which maximizes the result (step S06b).
Other Embodiments
[0094] The photographed target matching device 301 can also be implemented by a computer and a program, and the program can be recorded in a recording medium or provided over a network.
[0095]
[0096] The network 135 is a data communication network. The network 135 may be a private or public network and may include some or all of (a) a personal area network covering, for example, a room, (b) a local area network covering, for example, a building, (c) a campus area network covering, for example, a campus, (d) a metropolitan area network covering, for example, a city, (e) a wide area network covering, for example, an area connected across city, regional or national boundaries, and (f) the Internet. The communication is carried out by electronic and optical signals over the network 135.
[0097] The computer 105 includes a processor 110 and a memory 115 connected to the processor 110. Although the computer 105 is described herein as a standalone device, the computer arrangement is not limited in this manner, and the computer may be connected to any other device (which is not shown) in the distributed processing system.
[0098] The processor 110 is an electronic device including a logic circuit which responds to and executes instructions.
[0099] The memory 115 is a tangible computer-readable storage medium on which a computer program is encoded. In this regard, the memory 115 stores data and instructions, or program codes which can be read and executed by the processor 110 to control the operation of the processor 110. The memory 115 can be realized by a random access memory (RAM), a hard drive, or a read-only memory (ROM), or a combination thereof. One of the elements of the memory 115 is a program module 120.
[0100] The program module 120 includes an instruction for controlling the processor 110 to perform the process described herein. Herein, the kinds of operation are described as being executed by the computer 105 or a method or process or sub-processes thereof, but these kinds of operation are actually executed by the processor 110.
[0101] The term “module” is used herein to refer to functional operation which may be embodied either as a stand-alone component or an integrated configuration including a plurality of subordinate components. Therefore, the program module 120 can be realized as a single module or as a plurality of modules which operate in cooperation with one another. Although the program module 120 is described herein as being installed in the memory 115 and thus realized as software, the module can be realized as hardware (for example as electronic circuitry), firmware, software, or any combination thereof.
[0102] The program module 120 is illustrated as already being loaded in the memory 115 but may be configured to be located on the storage device 140 for later loading into the memory 115. The storage device 140 is a tangible, computer-readable storage medium which stores the program module 120. Examples of the storage device 140 include a compact disk, a magnetic tape, a read-only memory, an optical storage media, a memory unit including a hard drive or multiple parallel hard drives, and a Universal Serial Bus (USB) flash drive. Alternatively, the storage device 140 maybe a random access memory or any other kind of electronic storage device located in a remote storage system (which is not shown) and connected to the computer 105 over the network 135.
[0103] The system 100 further includes a data source 150A and a data source 150B collectively referred to herein as data sources 150 and communicatively connected to the network 135. In practice, the data source 150 may include any number of data sources, i.e., one or more data sources. The data source 150 can include non-systemized data and can include social media.
[0104] The system 100 further includes a user device 130 which is operated by the user 101 and connected to the computer 105 over the network 135. The user device 130 can be an input device such as a keyboard or a voice recognition subsystem for allowing the user 101 to communicate information and command selection to the processor 110. The user device 130 further includes an output device such as a display device or printer or a speech synthesizer. A cursor control unit such as a mouse device, a trackball, and a touch-sensitive screen allows the user 101 to manipulate the cursor on the display device to communicate further information and command selection to the processor 110.
[0105] The processor 110 outputs the results 122 of the execution of the program module 120 to the user device 130. Alternatively, the processor 110 can bring the output to a storage device 125 such as a database and a memory or to a remote device (which is not shown) over the network 135.
[0106] For example, the program which performs the flowchart in
[0107] It should be construed that the terms “comprises,” “comprising,” “includes,” or “including” specify the presence of stated features, integers, steps, or elements, but do not preclude the presence of one or more other features, integers, steps, or elements or groups thereof. The indefinite articles “a” and “an” do not preclude the presence of an embodiment including a plurality of the referenced items.
[0108] Note that the present invention is not limited by the above described embodiments but can be carried out in various forms without departing from the gist and scope of the invention. In short, the present invention is not limited by the above described embodiments as it is and can be embodied by the modification of components without departing from the gist and scope of the invention in the application.
[0109] Also, various inventions can be formed by appropriate combinations of the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all components shown in the embodiments. Components across different embodiments may be combined as appropriate.
INDUSTRIAL APPLICABILITY
[0110] The photographed target matching method, the photographed target matching device, and the program according to the present invention may be applied, for example, to integrating (person matching and time synchronization) video image data and sensor data with wearable terminals to produce a data set for analysis, or estimating and analyzing states such as actions and emotions of a person in a video image to create a multimodal data set.
REFERENCE SIGNS LIST
[0111] 11 Fixed camera [0112] 12 Wearable device carried by photographed target (person) [0113] 15 Converting unit [0114] 21 Input unit [0115] 22 Extracting unit [0116] 23 Estimating unit [0117] 24 Detecting unit [0118] 100 System [0119] 101 User [0120] 105 Computer [0121] 110 Processor [0122] 115 Memory [0123] 120 Program Module [0124] 122 Result [0125] 125 Storage Device [0126] 130 User Device [0127] 135 Network [0128] 140 Storage Device [0129] 150 Data Source