Information processing apparatus and information processing method
12169965 ยท 2024-12-17
Assignee
Inventors
Cpc classification
H04N1/00331
ELECTRICITY
G06F3/04842
PHYSICS
G06V30/20
PHYSICS
G06F3/0488
PHYSICS
G06F2203/04101
PHYSICS
G06V30/1444
PHYSICS
G06V40/28
PHYSICS
G06F3/0425
PHYSICS
G06F3/0346
PHYSICS
International classification
G06V10/94
PHYSICS
G06F3/0488
PHYSICS
G06V30/20
PHYSICS
Abstract
Provided is an information processing apparatus capable of improving detection accuracy of a position pointed by a target object. An acquisition unit acquires a distance image indicating a distance to each object present within a predetermined range. Subsequently, a vector calculation unit calculates a vector extending from the target object present within the predetermined range in a direction pointed by the target object on the basis of the acquired distance image. Subsequently, an intersection calculation unit calculates a position of an intersection of a predetermined surface present within the predetermined range and the calculated vector on the basis of the acquired distance image. Subsequently, a processing execution unit executes processing corresponding to the calculated position of the intersection.
Claims
1. An information processing apparatus, comprising: an acquisition unit configured to acquire a distance image that indicates a distance to each object within a range of a surface; a vector calculation unit configured to calculate, for each direction pointed by a target object, a first vector that extends from the target object within the range of the surface based on the acquired distance image; an intersection calculation unit configured to calculate a position of each intersection of a plurality of intersections of the surface and the first vector for the each direction pointed by the target object; and a processing execution unit configured to: calculate a first position of a specific representative point of a plurality of representative points from a first intersection of the plurality of intersections calculated before a specific time to a second intersection of the plurality of intersections calculated immediately before, wherein the first intersection is centred on the second intersection.
2. The information processing apparatus according to claim 1, wherein the target object is a finger.
3. The information processing apparatus according to claim 2, wherein the plurality of representative points includes a first representative point of a hand, and the vector calculation unit includes: a representative point position calculation unit configured to calculate a position of the first representative point of the hand based on the acquired distance image, and a calculation execution unit configured to calculate the first vector based on the calculated position of the first representative point.
4. The information processing apparatus according to claim 3, wherein the representative point position calculation unit is further configured to estimate the position of the first representative point of the hand based on the acquired distance image and a learning model learned by teacher data, and the teacher data a specific distance image of the hand and a specific position of the specific representative point in the acquired distance image.
5. The information processing apparatus according to claim 4, wherein the plurality of representative points includes a second representative point on a wrist side and a third representative point on a fingertip side, and the vector calculation unit is further configured to calculate a second vector toward the surface through a position of the second representative point and a position of the third representative point.
6. The information processing apparatus according to claim 4, further comprising a gesture determination unit configured to determine a pointing gesture with a finger based on the position of the first representative point, wherein in a case the pointing gesture is performed, the vector calculation unit is further configured to calculate the first vector based on the position of the first representative point.
7. The information processing apparatus according to claim 6, wherein the gesture determination unit is further configured to determine the pointing gesture based on the position of the first representative point and a learning model learned by teacher data, and the teacher data includes the specific position of the specific representative point of the hand and information that indicates whether the position of the first representative point corresponds to a position of the specific representative point at a time of the pointing gesture.
8. The information processing apparatus according to claim 1, further comprising a captured image acquisition unit configured to acquire a captured image in the range in which the distance image is generated, wherein the processing execution unit is further configured to: calculate, after a lapse of the specific time from the calculation of the first position, a second position that is a position of a representative point of the plurality of intersections from the first intersection to the second intersection and execute a specific process on a region in the captured image corresponding to the calculated first position and the second position.
9. The information processing apparatus according to claim 8, further comprising a region setting unit configured to set a plurality of rectangular regions, wherein each of the plurality of rectangular regions includes at least a part of a character in the captured image, the plurality of rectangular regions comprises a first rectangular region and a second rectangular region, and the processing execution unit is further configured to: calculate a third position that is a position in the captured image corresponding to the first position, correct the calculated third position to a center position of the first rectangular region to which the third position belongs, calculate a fourth position that is a position in the captured image corresponding to the second position, correct the calculated fourth position to a center position of the second rectangular region to which the fourth position belongs, and perform the specific process on the region in the captured image specified by the corrected third position and fourth position.
10. The information processing apparatus according to claim 8, further comprising: a region setting unit configured to set a plurality of rectangular regions, wherein each of the plurality of rectangular regions includes at least a part of a character in the captured image; a distribution calculation unit configured to calculate, for each position in a specific direction perpendicular to a writing direction of the character, a distribution of a sum of a number of pixels that belong to a specific rectangular region of the plurality of rectangular regions among pixels of the captured image, wherein the pixels of the captured image are in the writing direction; and an estimation line setting unit configured to set an estimation line, which is a straight line that extends in the writing direction, at a distribution position where the distribution reaches a peak value, wherein the peak value corresponds to a specific threshold or more among positions in the specific direction perpendicular to the writing direction, the processing execution unit is further configured to: calculate a third position that is a position in the captured image corresponding to the first position, correct the calculated third position to a position on the estimation line closest to the third position, calculate a fourth position that is a position in the captured image corresponds to the second position, correct the calculated fourth position to a position on the estimation line closest to the fourth position, and perform the specific process on the region in the captured image specified by the corrected third position and fourth position.
11. The information processing apparatus according to claim 9, wherein the specific process includes an optical character recognition (OCR) processing to recognize the character and search processing to search for information that regards the character recognized by the OCR processing.
12. The information processing apparatus according to claim 1, wherein the processing execution unit is further configured to control a projection unit such that a specific image is projected at the position of the each intersection of the plurality of intersections.
13. The information processing apparatus according to claim 1, wherein the processing execution unit is further configured to display, on a display screen, a processing result.
14. An information processing method, comprising: acquiring a distance image that indicates a distance to each object within a range of a surface; calculating, for each direction pointed by a target object, a vector that extends from the target object within the range of the surface based on the acquired distance image; calculating a position of each intersection of a plurality of intersections of the surface and the calculated vector for the each direction pointed by the target object; and calculating a first position of a specific representative point of a plurality of representative points from a first intersection of the plurality of intersections calculated before a specific time to a second intersection of the plurality of intersections calculated immediately before, wherein the first intersection is centered on the second intersection.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
MODES FOR CARRYING OUT THE INVENTION
(16) Hereinafter, embodiments of an information processing apparatus and an information processing method of the present disclosure will be described with reference to the drawings.
(17) However, the embodiments described below are merely examples, and are not intended to exclude various modifications and applications of techniques that are not explicitly described below. The present disclosure can be variously modified and implemented without departing from a gist thereof. For example, the embodiments may be implemented in combination.
(18) Furthermore, in the following drawings, the same or similar portions are denoted by the same or similar reference numerals. Furthermore, the drawings are schematic, and do not necessarily coincide with actual dimensions, ratios, and the like. The drawings may include portions having different dimensional relationships and ratios.
(19) Furthermore, effects described in the present specification are merely examples and are not limited, and there may be other effects.
(20) The embodiments of the present disclosure will be described in the following order. 1. First Embodiment: Information Processing Apparatus and Information Processing Method 1-1 Overall Configuration of Information Processing Apparatus 1-2 Contents of Search Processing 1-3 Contents of Display Processing 1-4 Modifications 2. Second Embodiment: Information Processing Apparatus and Information Processing Method
1. First Embodiment
(21) [1-1 Overall Configuration of Information Processing Apparatus]
(22)
(23) As illustrated in
(24) The distance measurement unit 7 is a device that sequentially outputs a distance image indicating a distance to each object present within a predetermined range. The distance image is an image indicating a distance (depth value) to an object for each pixel, and is also called a depth image.
(25) Note that although
(26) The imaging unit 8 is a device that sequentially generates a captured image within a predetermined range in which a distance image is generated.
(27) The projection unit 9 is a device that projects various calculation results and the like by the device main body 6 onto the object 3 placed in the reading region 2.
(28) The display unit 10 is a device that displays various calculation results and the like by the device main body 6.
(29) The device main body 6 includes hardware resources such as a storage device 11, a processor 12, a random access memory (RAM) 13, and the like. The storage device 11, the processor 12, and the RAM 13 are connected to each other by a system bus 14. Moreover, the distance measurement unit 7, the imaging unit 8, the projection unit 9, the display unit 10, and a drive 15 are connected to the system bus 14.
(30) The storage device 11 is a secondary storage device including a hard disc drive (HDD), a solid state drive (SSD), and the like. The storage device 11 stores a program of the information processing apparatus 1 executable by the processor 12. Furthermore, various data necessary for executing the program are stored.
(31) The processor 12 is various processors such as a central processing unit (CPU), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), and the like. The processor 12 loads a program and the like stored in the storage device 11 into the RAM 13 and executes the program and the like, performs calculation of various processing, logical determination, and the like, and controls each configuration connected to the system bus 14. For example, the processor 12 implements a distance image processing unit 12a, an image picture processing unit 12b, a finger-hand posture estimation unit 12c, an object selection unit 12d, a display information generation unit 12e, and a layout detection unit 12f as illustrated in
(32) Then, by the acquisition unit 16, the vector calculation unit 18 (the representative point position calculation unit 19, the calculation execution unit 20), the gesture determination unit 21, the intersection calculation unit 22, the processing execution unit 23, and the region setting unit 24, and on the basis of outputs from the distance measurement unit 7 and the imaging unit 8, in a case where the user points at the upper surface S1 of the object 3 with the finger 4 (target object 4) in a non-contact manner, the processor 12 executes search processing of calculating a pointed position, executing processing (for example, information search) according to the calculated position, and causing the projection unit 9 and the display unit 10 to display an image. During the execution of the search processing, the user performs an operation of sequentially pointing the finger 4 (target object 4) to positions on both ends of a region in which a character and the like for which information search is performed is present (hereinafter, also referred to as selection region) in a region of the upper surface S1 of the object 3.
(33) Note that the program executed by the processor 12 (computer) is, for example, provided by being recorded in a removable medium 15a, which is a package medium including, for example, a magnetic disk (including a flexible disk), an optical disk (a compact disc-read only memory (CD-ROM), a digital versatile disc (DVD), and the like), a magneto-optical disk, a semiconductor memory, and the like. Alternatively, for example, the program is provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting. Then, the program can be installed on the storage device 11 via the system bus 14 by mounting the removable medium 15a on the drive 15. Furthermore, the program can be received by a communication unit (not illustrated) via a wired or wireless transmission medium and installed in the storage device 11. Moreover, the program can be installed in the storage device 11 in advance.
(34) [1-2 Contents of Search Processing]
(35) Next, search processing executed by the acquisition unit 16, the vector calculation unit 18 (the representative point position calculation unit 19, the calculation execution unit 20), the gesture determination unit 21, the intersection calculation unit 22, the processing execution unit 23, and the region setting unit 24 will be described. The search processing is executed when the object 3 is placed in the reading region 2.
(36) As illustrated in
(37) Subsequently, the processing proceeds to step S102, and the representative point position calculation unit 19 calculates a position of a predetermined representative point (hereinafter, also referred to as skeleton point 26) of a hand 25 on the basis of the distance image acquired in step S101, as illustrated in
(38) Subsequently, the processing proceeds to step S103, and the gesture determination unit 21 determines whether the user performs a pointing gesture with the finger 4 on the basis of the position of the skeleton point 26 calculated in step S102. As the pointing gesture, for example, a hand gesture in which an index finger is extended and other fingers are bent can be adopted. As a method of determining the pointing gesture, for example, whether the user performs the pointing gesture is determined on the basis of the position of the skeleton point 26 calculated in step S102 using a learning model learned by teacher data including the position of the skeleton point 26 of the hand 25 and information indicating whether the position of the skeleton point 26 is a position of the skeleton point 26 at the time of the pointing gesture. According to the method using the learning model, whether the user performs the pointing gesture can be determined by inputting the estimated position of the skeleton point 26 of the hand 25, and the determination can be easily performed. Then, in a case where the gesture determination unit 21 determines that the pointing gesture is performed (Yes), the processing proceeds to step S104. On the other hand, in a case where it is determined that the pointing gesture is not performed (No), the processing returns to step S101.
(39) Subsequently, the processing proceeds to step S104, and the calculation execution unit 20 calculates a vector 27 extending from the finger 4 in a direction pointed by the finger 4 on the basis of the positions of the skeleton points 26 calculated in step S102, as illustrated in
(40) Subsequently, the intersection calculation unit 22 calculates, on the basis of the distance image acquired in step S101, a position of an intersection 28 of the vector 27 calculated by the vector calculation unit 18 and the upper surface S1 of the object 3 (an object existing within the predetermined range where the distance image is generated;
(41) Subsequently, the processing proceeds to step S105, and the processing execution unit 23 controls the projection unit 9 so that the predetermined image 29 is projected at the position of the intersection 28 calculated in step S104. In other words, it can be said that the processing execution unit 23 executes processing according to the position of the intersection 28. For example, a circle can be used as the predetermined image 29. Therefore, the position pointed by the finger 4 can be fed back to the user, and the user can more reliably point to a desired position of the object 3 with the finger 4.
(42) Subsequently, the processing proceeds to step S106, and the processing execution unit 23 determines whether the user has performed command input. For example, as illustrated in
(43) On the other hand, in a case where the processing execution unit 23 determines that all the intersections 28 of the intersections 28 are located within the predetermined region 30, it is determined whether the information processing apparatus 1 is in the command standby state Ph1. Then, in a case where it is determined to be in the command standby state Ph1, it is determined that a start command has been performed as the command input (start command detected), and the process proceeds to step S107. On the other hand, in a case where it is determined not to be in the command standby state Ph1, it is determined that an end command has been performed as the command input (end command detected), and the process proceeds to step S108. Therefore, when the finger 4 continues to point at the same position, the processing exits repetition of the flow of steps S101 to S106, and proceeds to step S107 or S108.
(44) In step S107, the processing execution unit 23 calculates a position of one end of the selection region (selection start position), and then returns to step S101. For example, a position of a representative point of the plurality of intersections 28 calculated by repeating the flow of steps S101 to S106 a plurality of times (hereinafter, also referred to as first position) on the upper surface S1 of the object 3 (document) is calculated. Specifically, the first position may be calculated on the basis of all the intersections 28 from the intersection 28 old to the intersection 28 new among the plurality of calculated intersections 28. As a method of calculating the first position, for example, a method of calculating an average of coordinate values of all the intersections 28 from the intersection 28 old to the intersection 28 new can be adopted. Therefore, the user can set the first position indicating one end of the selection region by continuously pointing at the upper surface S1 of the object 3 (document) with the finger 4 in a non-contact manner for a predetermined time (one second) while performing the pointing gesture. Subsequently, the processing execution unit 23 controls the display unit 10 so that an image indicating one end of the selection region is displayed in superposition with the captured image displayed in step S101 at a position in the captured image corresponding to the first position. Furthermore, in step S107, as illustrated in
(45) On the other hand, in step S108, the processing execution unit 23 calculates the position of the other end of the selection region (selection end position). For example, a position of a representative point of the plurality of intersections 28 calculated after the state transitions to the selection start state Ph2 (hereinafter, also referred to as second position) on the upper surface S1 of the object 3 (document) is calculated. Specifically, the second position may be calculated on the basis of all the intersections 28 from the intersection 28 old to the intersection 28 new among the plurality of calculated intersections 28. As a method of calculating the second position, for example, a method of calculating an average of coordinate values of all the intersections 28 from the intersection 28 old to the intersection 28 new can be adopted. Therefore, the user can set the second position indicating the other end of the selection region by performing the pointing gesture again after setting the first position, and continuously pointing at the upper surface S1 of the object 3 (document) with the finger 4 in a non-contact manner for a predetermined time (one second) while performing the pointing gesture. Subsequently, the processing execution unit 23 controls the display unit 10 so that an image indicating the other end of the selection region is displayed in superposition with the captured image displayed in step S101 at a position in the captured image corresponding to the second position. Furthermore, in step S108, as illustrated in
(46) Subsequently, the processing proceeds to step S109 where the region setting unit 24 and the processing execution unit 23 specify a region in the captured image corresponding to the first position calculated in step S107 and the second position calculated in step S108, execute predetermined processing (for example, OCR processing) on the specified region, and execute display processing of displaying a processing result on the display unit 10. Then, the processing returns to step S101.
(47) [1-3 Contents of Display Processing]
(48) Next, display processing executed by the region setting unit 24 and the processing execution unit 23 will be described.
(49) As illustrated in
(50) Subsequently, the processing proceeds to step S202, and the processing execution unit 23 calculates a position in the captured image corresponding to the first position calculated in step S107 (hereinafter, also referred to as third position 32). For example, after a pixel in the distance image corresponding to the first position is calculated, calibration is performed by edge detection of the object 3 (document) on the distance image and the captured image, correspondence between each pixel of the object 3 (document) in the distance image and each pixel of the object 3 (document) in the captured image is analyzed, and the pixel in the captured image corresponding to the calculated pixel in the distance image is calculated on the basis of an analysis result to obtain the third position 32. Subsequently, the processing execution unit 23 corrects the calculated third position 32 to a center position of the rectangular region 31 to which the third position 32 belongs. Hereinafter, the corrected third position 32 is also referred to as corrected third position 33. Subsequently, the processing execution unit 23 controls the display unit 10 so that an image indicating the corrected third position 33 is displayed in superposition with the captured image displayed in step S101.
(51) Subsequently, the processing execution unit 23 calculates a position in the captured image corresponding to the second position calculated in step S108 (hereinafter, also referred to as fourth position 34). For example, similarly to the method for calculating the third position 32, after a pixel in the distance image corresponding to the second position is calculated, calibration is performed by edge detection of the object 3 (document) on the distance image and the captured image, correspondence between each pixel of the object 3 (document) in the distance image and each pixel of the object 3 (document) in the captured image is analyzed, and the pixel in the captured image corresponding to the calculated pixel in the distance image is calculated on the basis of an analysis result to obtain the fourth position 34. Subsequently, the processing execution unit 23 corrects the calculated fourth position 34 to a center position of the rectangular region 31 to which the fourth position 34 belongs. Therefore, for example, in a case where a space between characters is erroneously designated or in a case where a space between lines is erroneously designated, the third position 32 and the fourth position 34 can be corrected in the rectangular region 31 including at least a part of the character. Hereinafter, the corrected fourth position 34 is also referred to as corrected fourth position 35. Subsequently, the processing execution unit 23 controls the display unit 10 so that an image indicating the corrected fourth position 35 is displayed in superposition with the captured image displayed in step S101.
(52) Subsequently, the processing proceeds to step S203, and as illustrated in
(53) As a method of searching for the information regarding the character included in the region 36, for example, a method of executing optical character recognition (OCR) processing that recognizes the character included in the specified region 36 and search processing that searches for information regarding the character recognized in the OCR processing can be adopted. Furthermore, for example, a web page including the character and a meaning of the character can be adopted as the information regarding the character. Subsequently, as illustrated in
(54) As described above, in the information processing apparatus 1 according to the first embodiment, the vector calculation unit 18 calculates the vector 27 extending from the target object 4 (finger 4) in the direction indicated by the target object 4 (finger 4) on the basis of the acquired distance image. Subsequently, the intersection calculation unit 22 calculates the position of the intersection 28 of the upper surface S1 (predetermined surface) of the object 3 (document) and the calculated vector 27 on the basis of the acquired distance image. Subsequently, the processing execution unit 23 executes processing corresponding to the calculated position of the intersection 28. Therefore, in a case where the user points at the upper surface S1 of the object 3 (document) with the target object 4 (finger 4) in a non-contact manner, the pointed position can be calculated as the intersection 28. Therefore, for example, unlike the method in which the user touches the predetermined surface with the target object 4 (finger 4), the upper surface S1 (predetermined surface) of the object 3 (document) and the target object can be prevented from being assimilated in the distance image, and recognition accuracy of the target object 4 (finger 4) can be improved. Therefore, it is possible to provide the information processing apparatus 1 capable of improving detection accuracy of a position pointed by the target object 4 (finger 4).
(55) Here, as a method of recognizing a character and the like existing on the upper surface S1 (predetermined surface) of the object 3 (document), for example, there is a method of causing a user to touch a character and the like with the target object 4 (finger 4) and recognizing the character and the like existing at a touch position from a captured image captured by the imaging unit 8 above the object 3 (document). However, in such a method, the character and the like are hidden by the target object 4 (finger 4), the character and the like are not shown in the captured image, and there is a possibility that recognition accuracy of the character and the like is deteriorated.
(56) On the other hand, in the information processing apparatus 1 according to the first embodiment, in a case where the processing execution unit 23 determines that all the intersections 28 from the intersection 28 old calculated a predetermined time (one second) before to the intersection 28 new calculated immediately before among the calculated intersections 28 are located within the predetermined region 30, of the upper surface S1 (predetermined surface) of the object 3, centered on the intersection 28 old calculated the predetermined time (one second) before, the position (first position) of the representative point of all the intersections 28 is calculated. Subsequently, after a lapse of a predetermined time (one second) or more from calculation of the first position, in a case where it is determined that all the intersections 28 from the intersection 28 old calculated the predetermined time (one second) before to the intersection 28 new calculated immediately before are located in the region, of the upper surface S1 (predetermined surface) of the object 3, centered on the intersection 28 old calculated the predetermined time (one second) before, the position (second position) of the representative point of all the intersections 28 is calculated. Subsequently, OCR processing of recognizing a character and search processing of searching for information regarding the character recognized by the OCR processing are executed in the region 36 in the captured image corresponding to the calculated first position and second position. Therefore, it is possible to prevent a character and the like from being hidden by the target object 4 (finger 4), to acquire a more appropriate captured image, and to improve recognition accuracy of the character and the like.
(57) Furthermore, as another method of recognizing a character and the like present on the upper surface S1 (predetermined surface) of the object 3 (document), for example, there is a method of causing a user to trace a character and the like with a pen-type scanner dictionary and recognizing the character and the like present at a traced position with the pen-type scanner dictionary. However, in such a method, it is necessary to trace the character and the like with a button of the pen-type scanner dictionary, which takes time and effort. In particular, in a case where a character and the like extends over a plurality of lines, it is necessary to trace each line in order, and it is necessary to perform an ON/OFF operation of a scan button at the time of moving to a line, which takes much time and effort.
(58) Furthermore, as another method of recognizing a character and the like existing on the upper surface S1 (predetermined surface) of the object 3 (document), for example, there is a method of causing a document scanner connected to a personal computer to scan the entire upper surface S1 (predetermined surface) of the object 3 (document), causing the personal computer to perform OCR processing on an entire image obtained by the scanning to recognize the character, causing a user to select an arbitrary character among a recognition result by operating the personal computer, and searching for information regarding the selected character using the personal computer. However, in such a method, it is necessary to perform the OCR processing on the entire image obtained by the scanning, to select the character for which information is desired to be obtained from the character obtained by the OCR processing, and to perform the search operation for the information regarding the selected character, which takes time and effort. In particular, in a case where there are many characters in the captured image, it takes much time and effort.
(59) On the other hand, in the information processing apparatus 1 according to the first embodiment, the target object 4 (the finger 4) sequentially points at the positions of both ends of the region (selection region) where the character and the like desired to be subjected to the information search are present, so that the information regarding the character can be searched and time and effort required for the information search can be reduced.
(60) [1-4 Modifications]
(61) (1) In the first embodiment, an example has been described in which both the projection unit 9 and the display unit 10 are provided, the projection unit 9 is caused to project the predetermined image 29 at the position of the intersection 28, and the display unit 10 is caused to display the captured image, the region 36 in the captured image, and the processing result by the processing execution unit 23. However, other configurations can be adopted. For example, the projection unit 9 may be configured to perform at least one of projection of the predetermined image 29 at the position of the intersection 28, projection of the region 36, or projection of the processing result by the processing execution unit 23. Furthermore, for example, the display unit 10 may be configured to perform at least one of display of the captured image, display of the predetermined image 29 at the position corresponding to the intersection 28 in the captured image, display of the region 36 in the captured image, or display of the processing result by the processing execution unit 23. (2) Furthermore, in the first embodiment, an example has been described in which the object 3 is placed in the reading region 2, and the upper surface S1 of the object 3 is set as predetermined surface and is pointed by the target object 4 in a non-contact manner. However, another configuration can be adopted. For example, a surface on which the reading region 2 is formed may be set as predetermined surface, an image forming a user interface may be projected on the reading region 2 by a projector and the like, and the projected image (for example, an image of a switch) may be pointed by the target object 4 in a non-contact manner.
2. Second Embodiment
(62) Next, an information processing apparatus 1 according to a second embodiment will be described. The information processing apparatus 1 according to the second embodiment is obtained by changing a part of the configuration of the information processing apparatus 1 according to the first embodiment. An overall configuration of the information processing apparatus 1 according to the second embodiment is similar to that in
(63) The information processing apparatus 1 according to the second embodiment is different from that of the first embodiment in a method of correcting the third position 32 and the fourth position 34 in a captured image. Specifically, the processor 12 implements a distribution calculation unit 37 and an estimation line setting unit 38 illustrated in
(64) In step S301, as illustrated in
(65) Subsequently, the process proceeds to step S302, and as illustrated in
(66) Subsequently, the processing proceeds to step S303, and as illustrated in
(67) Furthermore, the present disclosure may include the following technical matters.
(68) (1)
(69) An information processing apparatus including: an acquisition unit that acquires a distance image indicating a distance to each object present within a predetermined range; a vector calculation unit that calculates a vector extending from a target object present within the predetermined range in a direction pointed by the target object on the basis of the distance image acquired by the acquisition unit; an intersection calculation unit that calculates a position of an intersection of a predetermined surface present within the predetermined range and the vector calculated by the vector calculation unit on the basis of the distance image acquired by the acquisition unit; and a processing execution unit that executes processing according to the position of the intersection calculated by the intersection calculation unit.
(2)
(70) The information processing apparatus according to claim 1, in which the target object is a finger.
(3)
(71) The information processing apparatus according to (2), in which the vector calculation unit includes a representative point position calculation unit that calculates a position of a predetermined representative point of a hand on the basis of the distance image acquired by the acquisition unit, and a calculation execution unit that calculates the vector on the basis of the position of the representative point calculated by the representative point position calculation unit.
(4)
(72) The information processing apparatus according to (3), in which the representative point position calculation unit estimates the position of the representative point of the hand on the basis of the distance image acquired by the acquisition unit by using a learning model learned by teacher data including a distance image of the hand and a position of the representative point in the distance image.
(5)
(73) The information processing apparatus according to (4), in which the vector calculation unit calculates a vector toward the predetermined surface through a position of the representative point on a wrist side and a position of the representative point on a fingertip side among the representative points calculated by the representative point position calculation unit.
(6)
(74) The information processing apparatus according to (4) or (5), further including a gesture determination unit that determines whether a user performs a pointing gesture with a finger on the basis of the position of the representative point calculated by the representative point position calculation unit, in which in a case where it is determined that the pointing gesture is performed, the vector calculation unit calculates the vector on the basis of the position of the representative point calculated by the representative point position calculation unit.
(7)
(75) The information processing apparatus according to (6), in which the gesture determination unit determines whether the user performs the pointing gesture on the basis of the position of the representative point calculated by the representative point position calculation unit by using a learning model learned by teacher data including the position of the representative point of the hand and information indicating whether the position of the representative point is a position of the representative point at the time of the pointing gesture.
(8)
(76) The information processing apparatus according to any one of (1) to (7), further including a captured image acquisition unit that acquires a captured image in a predetermined range in which the distance image is generated, in which the processing execution unit calculates a first position that is a position of a representative point of all intersections in a case where it is determined that all the intersections from an intersection calculated a predetermined time before to an intersection calculated immediately before among the intersections calculated by the intersection calculation unit are located in a region, of the predetermined surface, centered on the intersection calculated the predetermined time before, calculates a second position that is a position of a representative point of all intersections in a case where it is determined that all the intersections from the intersection calculated the predetermined time before to the intersection calculated immediately before after a lapse of the predetermined time or more from calculation of the first position are located in the region, of the predetermined surface, centered on the intersection calculated the predetermined time before, and executes predetermined processing on a region in the captured image corresponding to the calculated first position and second position.
(9)
(77) The information processing apparatus according to (8), further including a region setting unit that sets a plurality of rectangular regions each including at least a part of a character in the captured image, in which the processing execution unit calculates a third position that is a position in the captured image corresponding to the first position, corrects the calculated third position to a center position of the rectangular region to which the third position belongs, calculates a fourth position that is a position in the captured image corresponding to the second position, corrects the calculated fourth position to a center position of the rectangular region to which the fourth position belongs, and performs the predetermined processing on a region in the captured image specified by the corrected third position and fourth position.
(10)
(78) The information processing apparatus according to (8), further including: a region setting unit that sets a plurality of rectangular regions each including at least a part of a character in the captured image; a distribution calculation unit that calculates, for each position in a direction perpendicular to a writing direction of the characters, a distribution of the sum of the number of pixels belonging to the rectangular region among pixels of the captured image arranged in the writing direction; and an estimation line setting unit that sets an estimation line, which is a straight line extending in the writing direction, at a position where the distribution calculated by the distribution calculation unit takes a peak value and the peak value is a predetermined threshold or more among the positions in the direction perpendicular to the writing direction, in which the processing execution unit calculates a third position that is a position in the captured image corresponding to the first position, corrects the calculated third position to a position on the estimation line closest to the third position, calculates a fourth position that is a position in the captured image corresponding to the second position, corrects the calculated fourth position to a position on the estimation line closest to the fourth position, and performs the predetermined processing on a region in the captured image specified by the corrected third position and fourth position.
(11)
(79) The information processing apparatus according to claim 9 or 10, in which the predetermined processing is processing including optical character recognition (OCR) processing of recognizing the character and search processing of searching for information regarding the character recognized by the OCR processing.
(12)
(80) The information processing apparatus according to any one of (1) to (11), in which the processing execution unit controls a projection unit such that a predetermined image is projected at the position of the intersection calculated by the intersection calculation unit.
(13)
(81) The information processing apparatus according to any one of (1) to (12), in which the processing execution unit causes a display unit to display a processing result by the processing execution unit.
(14)
(82) An information processing method including: acquiring a distance image indicating a distance to each object present within a predetermined range; calculating a vector extending from a target object present within the predetermined range in a direction pointed by the target object on the basis of the acquired distance image; calculating a position of an intersection of a predetermined surface present within the predetermined range and the vector calculated by the vector calculation unit on the basis of the acquired distance image; and executing processing according to the calculated position of the intersection.
REFERENCE SIGNS LIST
(83) 1 Information processing apparatus 2 Reading region (document) 3 Object 4 Target object (finger) 5 User interface 6 Device main body 7 Distance measurement unit 8 Imaging unit 9 Projection unit 10 Display unit 11 Storage device 12 Processor 12a Distance image processing unit 12b Image picture processing unit 12c Finger-hand posture estimation unit 12d Object selection unit 12e Display information generation unit 12f Layout detection unit 13 RAM 14 ROM 15 System bus 16 Acquisition unit 17 Captured image acquisition unit 18 Vector calculation unit 19 Representative point position calculation unit 20 Calculation execution unit 21 Gesture determination unit 22 Intersection calculation unit 23 Processing execution unit 24 Region setting unit 25 Hand 26 Skeleton point 27 Vector 28 Intersection 29 Predetermined image 30 Predetermined region 31 Rectangular region 32 Third position 33 Corrected third position 34 Fourth position 35 Corrected fourth position 36 Region 36a Band-shaped region 36b Band-shaped region 36c Band-shaped region 37 Distribution calculation unit 38 Estimation line setting unit 39 Distribution 40 Estimation line