METHODS AND APPARATUS FOR AUTOMATIC HAND POSE ESTIMATION USING MACHINE LEARNING
20220189195 · 2022-06-16
Assignee
Inventors
Cpc classification
International classification
G06V40/10
PHYSICS
Abstract
Systems and methods for hand pose estimation are provided. For example, a computing device may obtain an image, such as an image of a hand. The computing device may apply one or more preprocessing processes to the image to generate an augmented image. Further, the computing device may apply a first machine learning process to the augmented image to generate a plurality of keypoints. The computing device may also apply a second machine learning process to the plurality of keypoints to generate a plurality of depth values. The computing device may further determine a plurality of angles based on the plurality of keypoints and the plurality of depth values. In some examples, the computing device may generate a model comprising a plurality of segments based on the plurality of angles. The computing device may store the plurality of angles and, in some examples, the model in a memory device.
Claims
1. A system comprising: a memory device; and a computing device communicatively coupled to the memory device, wherein the computing device is configured to: obtain an image; apply one or more preprocessing processes to the image to generate an augmented image; apply a first machine learning process to the augmented image to generate a plurality of keypoints; apply a second machine learning process to the plurality of keypoints to generate a plurality of depth values; determine a plurality of angles based on the plurality of keypoints and the plurality of depth values; and store the plurality of angles in the memory device.
2. The system of claim 1, wherein the computing device is configured to: generate a model comprising a plurality of segments, wherein the plurality of segments are oriented based on the plurality of angles; and store the model in the memory device.
3. The system of claim 1, wherein the computing device is configured to: transmit a request to a second computing device, wherein the request causes the second computing device to display a request to capture an image; receive, in response to the request, the image; and store the image in the memory device.
4. The system of claim 3, wherein the request comprises an orientation image comprising joints of a hand at a plurality of angles.
5. The system of claim 3, wherein the request comprises orientation instructions.
6. The system of claim 1, wherein the one or more preprocessing processes comprise at least one of a color jitter, a blurring, a black and white, a flip, a resize, a shift, and a zoom.
7. The system of claim 1, wherein applying the second machine learning process to the plurality of keypoints comprises identifying a first keypoint closest to a foreground of the image, and identifying a depth ratio for each keypoint based on the first keypoint.
8. The system of claim 1, wherein the plurality of keypoints identify a location of one or more pixels of the image.
9. The system of claim 1, wherein determining the plurality of angles comprises determining a plurality of distances between the plurality of keypoints.
10. A method by a computing device comprising: obtaining an image; applying one or more preprocessing processes to the image to generate an augmented image; applying a first machine learning process to the augmented image to generate a plurality of keypoints; applying a second machine learning process to the plurality of keypoints to generate a plurality of depth values; determining a plurality of angles based on the plurality of keypoints and the plurality of depth values; and storing the plurality of angles in a memory device.
11. The method of claim 10, further comprising: generating a model comprising a plurality of segments, wherein the plurality of segments are oriented based on the plurality of angles; and storing the model in the memory device.
12. The method of claim 10, further comprising: transmitting a request to a second computing device, wherein the request causes the second computing device to display a request to capture an image; receiving, in response to the request, the image; and storing the image in the memory device.
13. The method of claim 12, wherein the request comprises an orientation image comprising joints of a hand at a plurality of angles.
14. The method of claim 12, wherein the request comprises orientation instructions.
15. The method of claim 10, wherein applying the second machine learning process to the plurality of keypoints comprises identifying a first keypoint closest to a foreground of the image, and identifying a depth ratio for each keypoint based on the first keypoint.
16. The method of claim 10, wherein determining the plurality of angles comprises determining a plurality of distances between the plurality of keypoints.
17. A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by at least one processor, cause a device to perform operations comprising: obtaining an image; applying one or more preprocessing processes to the image to generate an augmented image; applying a first machine learning process to the augmented image to generate a plurality of keypoints; applying a second machine learning process to the plurality of keypoints to generate a plurality of depth values; determining a plurality of angles based on the plurality of keypoints and the plurality of depth values; and storing the plurality of angles in a memory device.
18. The non-transitory computer readable medium of claim 17, wherein the operations further comprise: generating a model comprising a plurality of segments, wherein the plurality of segments are oriented based on the plurality of angles; and storing the model in the memory device.
19. The non-transitory computer readable medium of claim 17, wherein the operations further comprise: transmitting a request to a second computing device comprising an orientation image, wherein the request causes the second computing device to display the orientation image; receiving, in response to the request, the image, wherein the image was captured by the second computing device; and storing the captured image in a memory device.
20. The non-transitory computer readable medium of claim 17, wherein applying the second machine learning process to the plurality of keypoints comprises identifying a first keypoint closest to a foreground of the image, and identifying a depth ratio for each keypoint based on the first keypoint.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The features and advantages of the present disclosures will be more fully disclosed in, or rendered obvious by the following detailed descriptions of example embodiments. The detailed descriptions of the example embodiments are to be considered together with the accompanying drawings wherein like numbers refer to like parts and further wherein:
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
DETAILED DESCRIPTION
[0023] The description of the preferred embodiments is intended to be read in connection with the accompanying drawings, which are to be considered part of the entire written description of these disclosures. While the present disclosure is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and will be described in detail herein. The objectives and advantages of the claimed subject matter will become more apparent from the following detailed description of these exemplary embodiments in connection with the accompanying drawings.
[0024] It should be understood, however, that the present disclosure is not intended to be limited to the particular forms disclosed. Rather, the present disclosure covers all modifications, equivalents, and alternatives that fall within the spirit and scope of these exemplary embodiments. The terms “couple,” “coupled,” “operatively coupled,” “operatively connected,” and the like should be broadly understood to refer to connecting devices or components together either mechanically, electrically, wired, wirelessly, or otherwise, such that the connection allows the pertinent devices or components to operate (e.g., communicate) with each other as intended by virtue of that relationship.
[0025] Among other advantages, the embodiments may provide flexibility measurements, such as for joints and wrists, through machine learning processes that operate on still images and video without the need to visit a medical professional. As such, patient recovery times are reduced, thereby increasing patient satisfaction. Further, the embodiments may allow medical providers and patients to utilize their time and resources more effectively by enabling the use of telehealth for acute injuries, fractures, stiffness, and post-rehab visits. In addition, the embodiments may allow patient information (e.g., measured joint angles) to easily be transferred to a patient's chart, which may be used for medical billing. Further, the embodiments may allow medical professionals to more quickly reference previous patient information to gauge improvement, and allow for escalation or de-escalation of rehab protocols. Similarly, the embodiments may allow patients the ability to track their own progress as well. Persons of ordinary skill in the art having the benefit of these disclosures would recognize additional advantages as well.
[0026] Turning to the drawings,
[0027] Each of hand pose computing device 102, medical computing device 114, and patient computing device 112 may each include any hardware or hardware and software combination that allows for processing data. For example, each can include one or more processors, one or more field-programmable gate arrays (FPGAs), one or more application-specific integrated circuits (ASICs), one or more state machines, digital circuitry, or any other suitable circuitry. For example, each of hand pose computing device 102, medical computing device 114, and patient computing device 112 can be a computer, a workstation, a laptop, a server, or any other suitable computing device. In addition, each can transmit and receive data over communication network 118.
[0028]
[0029] Processors 201 can include one or more distinct processors, each having one or more cores. Each of the distinct processors can have the same or different structure. Processors 201 can include one or more central processing units (CPUs), one or more graphics processing units (GPUs), application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like.
[0030] Instruction memory 207 can store instructions that can be accessed (e.g., read) and executed by processors 201. For example, instruction memory 207 can be a non-transitory, computer-readable storage medium such as a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), flash memory, a removable disk, CD-ROM, any non-volatile memory, or any other suitable memory. Processors 201 can be configured to perform a certain function or operation by executing code, stored on instruction memory 207, embodying the function or operation. For example, processors 201 can be configured to execute code stored in instruction memory 207 to perform one or more of any function, method, or operation disclosed herein.
[0031] Additionally processors 201 can store data to, and read data from, working memory 202. For example, processors 201 can store a working set of instructions to working memory 202, such as instructions loaded from instruction memory 207. Processors 201 can also use working memory 202 to store dynamic data created during the operation of hand pose computing device 102. Working memory 202 can be a random access memory (RAM) such as a static random access memory (SRAM) or dynamic random access memory (DRAM), or any other suitable memory.
[0032] I/O devices 203 can include any suitable device that allows for data input or output. For example, I/O devices 203 can include one or more of a keyboard, a touchpad, a mouse, a stylus, a touchscreen, a physical button, a speaker, a microphone, or any other suitable input or output device.
[0033] Communication port(s) 209 can include, for example, a serial port such as a universal asynchronous receiver/transmitter (UART) connection, a Universal Serial Bus (USB) connection, or any other suitable communication port or connection. In some examples, communication port(s) 209 allows for the programming of executable instructions in instruction memory 207. In some examples, communication port(s) 209 allow for the transfer (e.g., uploading or downloading) of data, such as patient data.
[0034] Display 206 can be any suitable display, and may display user interface 205. User interfaces 205 can enable user interaction with computing device 200. In some examples, a user can interact with user interface 205 by engaging I/O devices 203. In some examples, display 206 can be a touchscreen, where user interface 205 is displayed on the touchscreen.
[0035] Transceiver 204 allows for communication with a network, such as the communication network 118 of
[0036] Referring back to
[0037] Communication network 118 can be a WiFi network, a cellular network such as a 3GPP® network, a Bluetooth® network, a satellite network, a wireless local area network (LAN), a network utilizing radio-frequency (RF) communication protocols, a Near Field Communication (NFC) network, a wireless Metropolitan Area Network (MAN) connecting multiple wireless LANs, a wide area network (WAN), or any other suitable network. Communication network 118 can provide access to, for example, the Internet.
[0038] In some examples, hand pose computing device 102 transmits an image request message to patient computing device 112. In response to receiving the image request message, patient computing device 112 executes an application (e.g., an “App”) that allows patient 122 to capture an image, such as an image of their hand. For example, the application may activate a camera of patient computing device 112. Patient 122 may place their hand in front of the camera and, in response to an input from patient 122, capture an image of the hand. In some examples, the image request message includes hand pose orientation instructions that instruct patient 122 on how to orient their hand when capturing the image.
[0039] In some examples, the hand pose orientation instructions may include text that is displayed (e.g., via display 206) to patient 122. In some examples, the hand pose orientation instructions may include an orientation image, such as an image illustrating joints of a hand at various angles. Patient computing device 112 may display the orientation image to patient 122, and patient 122 may capture an image of their hand in an orientation in accordance with the orientation image. In some examples, the orientation image may be a hand pose model, such as the hand pose model described below with respect to
[0040] Further, patient computing device 112 may transmit an image data response message to hand pose computing device 102 in response to the image request message. The image data response message may include the image captured by patient 122. In some examples, the image data response message is encrypted according to any suitable encryption process, such as one using a public and private key. Hand pose computing device 102 may receive the image data response message, and may store the image data response message in database 116.
[0041]
[0042] The medical professional 124 may provide input (e.g., via I/O device 203) to medical computing device 114 to cause medical computing device 114 to send a message to hand pose computing device 102. In response to receiving the message, hand pose computing device 102 transmits an image request 402 to patient computing device 112. The image request 402 may cause patient computing device 112 to executes the application (e.g., an “App”) that allows patient 122 to capture an image of their hand. In some examples, the image request 402 includes hand pose orientation instructions as described herein. Patient 122 may capture an image of their hand in accordance with the hand pose orientation instructions. Once the image is captured, patient computing device 112 may transmit image data 404 to hand pose computing device 102. The image data 404 may identify and characterize the captured image. Hand pose computing device 102 may store the image data 404 in database 116.
[0043] Referring back to
[0044] After preprocessing, hand pose computing device 102 may apply a machine learning process (e.g., algorithm) to the preprocessed image to generate keypoints. For example, hand pose computing device 102 may apply a trained convolutional neural network to the preprocessed image to generate the keypoints. The trained convolutional neural network may generate the keypoints using a series of masking on each image to determine both x and y axis data. The x and y data may be overlaid on the image during post processing to visually label each keypoint. Each keypoint may be associated with one or more pixels of the preprocessed image, and may identify the location of the one or more pixels (e.g., defined by x and y coordinates corresponding to the preprocessed image). The keypoints may identify one or more joints in the preprocessed image. For example,
[0045] Further, hand pose computing device 102 may apply a second machine learning process to the keypoints to generate depth values for each image pixel. For example, hand pose computing device 102 may apply a second trained convolutional neural network to the keypoints to generate the depth values. The second trained convolutional neural network may include, for example, eight layers. Incorporating eight layers instead of, for example, seven layers allows the convolutional neural network to operate on additional features.
[0046] In some examples, the second trained convolutional neural network generates the depth values by identifying a first keypoint closest to the foreground of the image. The first keypoint may be determined using a series of masking over each image to determine which range of pixels are closest (and furthest away) from the camera's point of view. The second trained convolutional neural network may assign the closest keypoint a depth value (e.g., 1). Further, the second trained convolutional neural network may determine a depth value for each of the remaining keypoints based on the closest keypoint. Each of the depth values may be, for example, a ratio, where each ratio identifies a “depth distance” from the closest keypoint to the corresponding keypoint.
[0047] Hand pose computing device 102 may then determine joint angles based on the keypoints and corresponding depth values. For example, hand pose computing device 102 may apply one or more algorithms to the keypoints (e.g., keypoint vectors) and depth values to determine the joint angles. As an example, hand pose computing device 102 may employ a Euclidean Distance algorithm to determine distances between keypoints. For example,
[0048] Referring back to
[0049] In some examples, hand pose computing device 102 overlays the model over the image received from patient computing device 112. Hand pose computing device 102 may overlay the model based on corresponding pixel locations. For example, hand pose computing device 102 may overlay a model pixel located at coordinate (0,0) over the received image pixel at coordinate (0,0), such as when the resolution of the received image and the model are the same. In some examples, hand pose computing device 102 may resize the model and/or the received image to align the model and received image. In some examples, the model is transparent, such that the image of the hand may be seen through the portions of the generated model including segments and keypoints.
[0050] For example,
[0051] In some examples, hand pose computing device 102 transmits the model to medical computing device 114. Further, medical computing device 114 may display the received model to medical professional 124. As such, medical professional 124 may assess the image, and determine patient's 122 progress, such as progress from a hand injury.
[0052]
[0053] In this example, database 116 stores patient data 320. Patient data 320 may include, for each patient 122, a name 322, a phone number 324, and an email address 326. Patient data 320 may also include, for each patient 122, one or more images 328. Each image 328 may be captured by patient computing device 112 for a patient 122 as described herein, and stored in database 116 within patient data 320 corresponding to the patient 122.
[0054] Data augmentation engine 302 obtains image 328 for a patient 122, and preprocesses the image 328. For example, data augmentation engine 302 may apply one or more of color jitter, blurring, black and white, flip, resize, shift, or zoom processes to image 328. Data augmentation engine 302 generates augmented image data 303 identifying and characterizing the preprocessed image 328, and provides augmented image data 303 to keypoint generation engine 304.
[0055] Keypoint generation engine 304 may generate keypoint data 305 identifying and characterizing one or more keypoints based on augmented image data 303. For example, keypoint generation engine 304 may apply a trained convolutional neural network to the preprocessed image to generate the keypoints. Each keypoint may be associated with one or more pixels of augmented image data 303, and may identify the location of the one or more pixels. Each keypoint may identify one or more joints of augmented image data, for example. In some examples, keypoint data 305 is a keypoint vector.
[0056] Depth generation engine 306 obtains the keypoint data 305 and generates depth values based on keypoint data 305. For example, depth generation engine 306 may apply a trained convolutional neural network to keypoint data 305 to generate the depth values. The second trained convolutional neural network may include, for example, eight layers. In some examples, the second trained convolutional neural network generates the depth values by identifying a first keypoint closest to the foreground of the image, and identifies a depth ratio for each keypoint based on the first keypoint. Depth generation engine 306 may generate depth data 307 identifying and characterizing the depth ratios.
[0057]
[0058] Referring back to
[0059] Keypoint based model generation engine 310 generates a model based on angle data 309 and keypoint data 305. For example, keypoint based model generation engine 310 may generate an image that includes segments (e.g., segments 452) between the keypoints (e.g., keypoints 454) that represent joints, with the segments oriented in accordance with the corresponding joint angle.
[0060] In some examples, keypoint based model generation engine 310 overlays the model over image 328 to generate an overlaid image (e.g., overlaid image 700). Keypoint based model generation engine 310 may overlay the model based on corresponding pixel locations. In some examples, keypoint based model generation engine 310 resizes the model and/or image 328 to align the model and image 328. In some examples, the model is transparent, such that the image of the hand may be seen through the portions of the generated model including segments and keypoints. Keypoint based model generation engine 310 generates keypoint based model data 330 identifying and characterizing the generated model, and stores keypoint based model data 330 in database 116.
[0061]
[0062] Proceeding to step 906, a machine learning process is applied to the plurality of keypoints to generate a depth value for each of the plurality of keypoints. For example, hand pose computing device 102 may apply a trained convolutional neural network to the plurality of keypoints to generate the depth values. At step 908, a distance between each of the plurality of keypoints and at least one neighboring keypoint is determined. For example, hand pose computing device 102 may apply a Euclidian distance equation to the keypoints to determine the distances.
[0063] At step 910, a plurality of angles are determined based on the distances. For example, hand pose computing device 102 may apply known algebraic equations that operate on distances to determine the plurality of angles. At step 912, a model of the hand is generated based on the distances and the plurality of angles. For example, hand pose computing device 102 may generate a model, such as hand pose model 450, that identifies a plurality of segments (e.g., segments 452) between keypoints (e.g., keypoints 454). The segments are angled based on the plurality of angles (e.g., angles 456). For example, each angle may correspond to an angle between segments, where each segment is between two keypoints, for example. In some examples, the model identifies the angles. For example, the model may include text identifying a value of each joint angle (e.g., joint angles 704).
[0064] At step 914, the model is stored in a data repository. For example, hand pose computing device 102 may store the model in database 116. In some examples, hand pose computing device 102 transmits the model to medical computing device 114. The method then ends.
[0065]
[0066] Proceeding to step 1006, an input is received. For example, patient computing device 112 may receive, via I/O device 203, an input from patient 122. At step 1008, in response to the image, the hand image is captured. For example, patient 122 may place a hand in from of a camera of patient computing device 112, and provide the input. In response to the input, an application executed by patient computing device 112 may cause a camera to capture an image of the patient's 112 hand. At step 1010, the hand image is transmitted in response to the request. For example, patient computing device 112 may transmit image data 404 identifying and characterizing the captured image to hand pose computing device 102. In some examples, hand pose computing device 102 stores the image data 404 in database 116. The method then ends.
[0067] In some examples, a system includes a memory device and a computing device (e.g., hand pose computing device 102). The computing device is communicatively coupled to the memory device, and is configured to obtain an image. The image may be obtained from the memory device, for example. The computing device is also configured to apply one or more preprocessing processes to the image to generate an augmented image. Further, the computing device is configured to apply a first machine learning process to the augmented image to generate a plurality of keypoints. The computing device is also configured to apply a second machine learning process to the plurality of keypoints to generate a plurality of depth values. Further, the computing device is configured to determine a plurality of angles based on the plurality of keypoints and the plurality of depth values. The computing device is also configured to store the plurality of angles in the memory device.
[0068] In some examples, the computing device is further configured to generate a model including a plurality of segments, where the plurality of segments are oriented based on the plurality of angles. The computing device is also configure to store the model in the memory device.
[0069] In some examples, a system includes a memory device and a computing device (e.g., patient computing device 112). The computing device is communicatively coupled to the memory device, and is configured to transmit a request to a second computing device including an orientation image. The request causes the second computing device to display the orientation image. Further, the computing device is configured to receive, in response to the request, a captured image, where the captured image was captured by the second computing device. The computing device is further configure to store the captured image in the memory device.
[0070] In some examples, a method by a computing device (e.g., hand pose computing device 102) includes obtaining an image. The method also includes applying one or more preprocessing processes to the image to generate an augmented image. Further, the method includes applying a first machine learning process to the augmented image to generate a plurality of keypoints. The method also includes applying a second machine learning process to the plurality of keypoints to generate a plurality of depth values. The method further includes determining a plurality of angles based on the plurality of keypoints and the plurality of depth values. The method also includes storing the plurality of angles in a memory device.
[0071] In some examples, the method further includes generating a model including a plurality of segments, where the plurality of segments are oriented based on the plurality of angles. The method also includes storing the model in the memory device.
[0072] In some examples, a method by a computing device (e.g., patient computing device 112) includes transmitting a request to a second computing device comprising an orientation image, wherein the request causes the second computing device to display the orientation image. The method may also include receiving, in response to the request, a captured image, wherein the captured image was captured by the second computing device. Further, the method includes storing the captured image in a memory device.
[0073] In some examples, a non-transitory computer readable medium has instructions stored thereon, where the instructions, when executed by at least one processor, causes a device (e.g., hand pose computing device 102) to perform operations. The operations include obtaining an image. The operations also include applying one or more preprocessing processes to the image to generate an augmented image. Further, the operations include applying a first machine learning process to the augmented image to generate a plurality of keypoints. The operations also include applying a second machine learning process to the plurality of keypoints to generate a plurality of depth values. The operations further include determining a plurality of angles based on the plurality of keypoints and the plurality of depth values. The operations also include storing the plurality of angles in a memory device.
[0074] In some examples, the operations further include generating a model including a plurality of segments, where the plurality of segments are oriented based on the plurality of angles. The operations also include storing the model in the memory device.
[0075] In some examples, a non-transitory computer readable medium has instructions stored thereon, where the instructions, when executed by at least one processor, causes a device (e.g., patient computing device 112) to perform operations. The operations include transmitting a request to a second computing device comprising an orientation image, where the request causes the second computing device to display the orientation image. The operations also include receiving, in response to the request, a captured image, wherein the captured image was captured by the second computing device. Further, the operations include storing the captured image in a memory device.
[0076] The foregoing is provided for purposes of illustrating, explaining, and describing embodiments of these disclosures. Modifications and adaptations to these embodiments will be apparent to those skilled in the art and may be made without departing from the scope or spirit of these disclosures.