SURFACE RECOGNITION
20220237894 · 2022-07-28
Assignee
Inventors
Cpc classification
G06F18/214
PHYSICS
G06T1/20
PHYSICS
G06T17/20
PHYSICS
G06V40/10
PHYSICS
G06V10/25
PHYSICS
G06T7/521
PHYSICS
International classification
G06T17/20
PHYSICS
G06T7/521
PHYSICS
Abstract
System and related methods for applying machine learning to the classification of surface materials using images of spots of lights, such as resulting from a laser beam impinging the surface. A classifier trained using such spot images, resulting from light beams imping the surface, achieves excellent classification results, in spite of a lack of fine surface details in these images as compared to a more uniformly lit larger scene that would appear to contain more information on the surface type. Classifiers can achieve classification accuracies on biological tissues significantly above 90% using a number of well-known classifier architectures. The classification results can be used to generate a map of classified surface types and the combination of such with a three-dimensional model of a surface having classified surface portions reconstructed from a pattern of spots projected onto the surface.
Claims
1. A method of training a computer-implemented classifier for classifying a surface portion of a surface as one of a predefined set of surface types, wherein the classifier takes an input image of a surface portion as an input and produces an output indicating a surface type of the predefined set, the method comprising: obtaining a data set of input images of surface portions, wherein each input image comprises an image of a spot on a respective surface portion resulting from a beam of light generated by a light source and impinging on the respective surface portion and the data set associates each input image with a corresponding surface type; and training the classifier using the data set.
2. The method according to claim 1, wherein obtaining the data set comprises: shining a light beam onto a plurality of surface portions of different surface types; obtaining an input image for each of the surface portions and associating each input image with the corresponding surface types.
3. A method of classifying a surface portion as one of a predefined set of surface types, wherein the classifier takes an input image of a surface portion as an input and produces an output indicating a surface type of the predefined set, the method comprising: obtaining an input image of a spot on the surface portion resulting from a beam of light generated by a light source and impinging on the surface portion; providing the input image as an input to a classifier, wherein the classifier was trained using the method according to claim 1; obtaining an output of the classifier in response to the input image; and determining a surface type of the surface portion based on the output.
4. The method according to claim 3, wherein obtaining the image comprises: shining a light beam onto the surface portion and obtaining the input image.
5. The method according to claim 1, wherein obtaining the input image comprises: detecting the spot in a captured image; and extracting a cropped image of the captured image comprising the spot and a border around the spot.
6. The method according to claim 1, wherein the input image comprises at least a quarter of image pixels corresponding to the spot and having a pixel value in the top ten percentiles of pixel values.
7. The method according to claim 3 comprising: obtaining a plurality of input images, each input image corresponding to a spot on a respective surface portion of the surface resulting from a respective beam of light generated by a light source and impinging on the respective surface portion; providing each input image as an input to the classifier; obtaining an output of the classifier in response to each input image; and determining a surface type of the respective surface portion based on each output.
8. The method according to claim 7, wherein obtaining the input images comprises: detecting each spot in a captured image; and extracting a respective cropped image of the captured image comprising the spot and a border around the spot.
9. The method according to claim 7, comprising: altering an image of the surface for display on a display device to visually indicate in a displayed image the corresponding determined surface type for each of the surface portions.
10. The method according to claim 7, wherein the respective beams are projected onto the surface according to a predetermined pattern, the method comprising: analysing a pattern of the spots on the surface to determine a three-dimensional shape of the surface.
11. The method according to claim 10 comprising: rendering a view of the three-dimensional shape of the surface visually indicating the determined surface type for each of the surface portions.
12. The method according to claim 1, wherein the set of predefined surface types comprises biological tissue surfaces.
13. The method according to claim 1, wherein the predefined set of surface types comprises one or more of the surface types of muscle, fat, bone and skin surfaces.
14. The method according to claim 1, wherein the predefined set of surface types comprises a metallic surface.
15. A computer-implemented classifier trained using the method of claim 1.
16. The method according to claim 1, wherein the classifier is an artificial neural network.
17. The method according to claim 16, wherein the artificial neural network is a convolutional neural network.
18. The method according to claim 17, wherein the convolutional neural network is one of googLeNet, Alexnet, densenet101 or VGG-16.
19. The method according to claim 1, wherein the classifier takes as a further input one or more values indicative of a distance between a light source used to generate the beam and the surface and/or a distance between an image capture device used to capture the image and the surface.
20. One or more computer-readable media comprising: coded instructions that, when run on a computing device, implement the method according to claim 1.
21. A system for classifying a surface portion as one of a predefined set of surface types, the system comprising: a light source for generating one or more light beams; an image capture device for capturing images of respective spots resulting from the one or more light beams impinging on a surface; a processor coupled to the image capture device and configured to implement a method according to claim 3.
22. The method according to claim 1, wherein the light has a wavelength in the range of 400-60 nm, preferably 850 nm or in the near infrared spectrum.
23. The method, according to claim 1, wherein a beam diameter is less than 3 mm at the surface.
24. The method according to claim 1, wherein the light source is configured to emit coherent light.
25. The method according to claim 24, wherein the light source comprises a laser or light emitting diode.
26. The method according to claim 1, wherein the light source comprises an optical element to generate a pattern of beams, for example a diffraction grating, hologram, spatial light modulator or steerable mirror.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0022] Specific embodiments are now described by way of example only for the purpose of illustration and with reference to the accompanying drawings, in which:
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
DETAILED DESCRIPTION
[0035] With reference to
[0036] Some embodiments use other light sources than a laser, for example an LED or a laser diode. The wavelength of the emitted light may be, for example, in the red or infrared part of the spectrum, or as described above and the beam diameter may be 3 mm or less (in case of a pattern being generated other than by collimated beams, for example using a hologram to generate a pattern on the surface, a corresponding spot size of 3 mm or less can be defined on the surface or a notional flat surface coinciding with the surface). An image capture device 110, such as a camera, is configured to capture images of the pattern of spots on the surface 108. An optional second (or further) image capture device 110′ may be included to deal with potential occlusions by capturing an image from a different angle than the image capture device.
[0037] A camera controller 112 is coupled to the image capture device 110 (and 110′ if applicable) to control image capture and receive captured images. A light source controller 114 is coupled to the laser 104 and, if applicable, the optical element 106 to control the beam pattern with which the surface 108 is illuminated. A central processor 116 and memory 118 are coupled to the camera and light source controllers 112, 114 by a data bus 120 to coordinate pattern generation and image capture and pre-process captured images to produce images of surface portions containing a spot each. A machine learning engine 122 is also connected to the data bus 120, implementing a classifier, for example an ANN or CNN, that takes pre-processed spot images as input and outputs surface classifications. Further, in some embodiments, a stereo engine 124 is connected to the data bus 120 to process the image of the surface 108 to infer a three-dimensional shape of the surface. The central processor is configured to use the surface classifications and where applicable three-dimensional surface shape to generate an output image for display on a display device (not shown) via a display interface (also not shown). Other interfaces, such as interfaces for other inputs or outputs, like a user interface (touch screen, keyboard, etc) and network interface are also not shown.
[0038] It will be understood, that stereo reconstruction of the surface and the corresponding components are optional, as is the projection of a pattern of a plurality of spots, with some embodiments only having a single spot projected, so that the optical element 106 may not be required. Alternative arrangements for generating a beam pattern are equally possible. It will further be appreciated that the described functions can be distributed in any suitable way. For example, all computation may be done by the central processor 116, which in turn may itself be distributed (as may be the memory 118). Functions may be distributed between the central processor 116 and any co-processors, for example engines 122, 124 or others, in any suitable way. Likewise, any or all described computations may equally be performed remotely in the cloud on dedicated servers or services such as AWS™, with the system adapted appropriately. By way of overview with reference to
[0039] With reference to
[0040] Once a trained classifier is stored ready for use, with reference to
[0041] With reference to
[0042] Isolating 606 the spots includes determining the coordinates of each isolated spot (for example with reference to the brightness peak or a reference point in the cropped image) in a frame of reference. The frame of reference may for example be fixed on the image capture device and the transformation may be obtained from knowledge of the disposition of the image capture device relative to the imaged surface. The surface portion corresponding to each imaged spot is classified 610 as described above and the classification results for each spot/surface portion are amalgamated 612 into a surface type map by associating the respective surface type for each spot/surface portion with the respective determined coordinates in the map.
[0043] The map may be used for example for automated control of a robot, such as a surgical robot or may be displayed, for example associating each surface type with a corresponding visual label and overlaying the resulting visual map over an image of the surface. The overlay of the map on the image of the surface may be based on the known coordinate transformation between the surface and the image capture device, or the map coordinates may already be in the frame of reference of the image capture device, as described above. The spots may be generated by infrared light, in which case they are not visible to a human observer in the image and the surface labels can be directly superimposed without additional visual distraction by way of a colour code or other symbols. Alternatively, visible spots for visible light patterns can be retained in the image or may be removed by image processing.
[0044] As described above, in some embodiments multiple image capture devices, for example a second image capture device 110′ in addition to the image capture device 110, are used to capture images of the surface, for example to deal with the potential of occlusion of portions of the surface in one image capture device view. In these embodiments, steps 604 to 610 are repeated for the image(s) captured by the second or further image capture devices, as indicated by reference signs 604′ to 610′ in
[0045] The registered classification of surface portions as described above with reference to
[0046] Embodiments that comprise such three-dimensional surface reconstruction comprise the same steps as described above with the addition of a step of calculating 702 depth, for example for each pixel of the surface, or at each identified spot, based on the pattern of spots in the image. The resulting depth information is combined 704 with the surface type map resulting from step 612 to form a reconstructed scene in terms of a three-dimensional model of the imaged surface labelled with surface types based, for example, on a suitable mesh with colour coded cells or tetrahedrons centred on the coordinates identified for each classified spot. Depth may be defined as a distance to an actual or notional camera or as a position along a direction extending away from the surface, for example normal to the surface, such as normal to a plane corresponding to a plane along which the surface extends.
[0047] In a specific example, images obtained using systems and methods described above were used to train a number of known CNN architectures. A Class II red laser (650±10 nm, <1 mW) was used to project spots onto four different tissues obtained from a cadaver: bone; skin, fat and muscle. A 1280×720 pixel CMOS camera was used to capture 1000 images of each tissue type being impinged by the laser. The images were captured from multiple areas of the cadaver at various distance from the camera and laser, resulting in a range of spot sizes. The full 1280×720 images were cropped to isolate the pixels around the laser spots using intensity/greyscale brightness thresholding based on the local maxima within the image, with a cropped area suitably scaled to capture the full perimeter of the laser spot and the cropped images were resized to 224×224 pixels using bicubic interpolation to fit the input of the CNN architectures used, resulting in images as illustrated in
[0048] The network weights were initialised with pre-trained weights available in the Deep Learning Toolbox, which in particular provides usefully adapted filters in the convolution layers, and a non-zero learning rate was used for the entire network so that all weights, including in the convolution layers, were adapted during training. The network was trained for 100 epochs using half of the images of each tissue type (total 2000 images) with the remaining images reserved for testing the recognition accuracy of the trained network.
[0049] Recognition accuracy was found to be mostly in the high nineties: skin (99.2%); bone (97.8%); muscle (97.0%); and fat (93.4%), with respective false-positive rates of 0.8%, 2.2%, 97.0% and 6.6% and false-negative rates of 2.2%, 1.2%, 5.5% and 3.7%. The average recognition accuracy was 96.9%. Similarly, promising results were obtained using other CNN architectures, specifically Alexnet, Denenet101 and VGG-16 again using the MATLAB™ Deep Learning Toolbox, with the output layer adapted accordingly, as described above. Average recognition accuracy for these architectures on the same training and test data was evaluated as 95%, 93% and 92%. Notably, the dataset used in this disclosure provides excellent generalisation on a large test data set with high correct recognition rates using out of the box network architectures so that the skilled person will appreciate that the high recognition rates are likely to be due to the chosen image type having a high information content in their brightness structure with respect to surface types, irrespective of the specific nature of the classifier used.
[0050]
[0051] (PC), a tablet computer, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single computing device is illustrated, the term “computing device” shall also be taken to include any collection of machines (e.g., computers) that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
[0052] The example computing device 900 includes a processing device 902, a main memory 904 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), a static memory 906 (e.g., flash memory, static random access memory (SRAM), etc.), and a secondary memory (e.g., a data storage device 918), which communicate with each other via a bus 930.
[0053] Processing device 902 represents one or more general-purpose processors such as a microprocessor, central processing unit, or the like. More particularly, the processing device 902 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 902 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. Processing device 902 is configured to execute the processing logic (instructions 922) for performing the operations and steps discussed herein.
[0054] The computing device 900 may further include a network interface device 908. The computing device 900 also may include a video display unit 910 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT), an alphanumeric input device 912 (e.g., a keyboard or touchscreen), a cursor control device 914 (e.g., a mouse or touchscreen), and an audio device 916 (e.g., a speaker).
[0055] The data storage device 918 may include one or more machine-readable storage media (or more specifically one or more non-transitory computer-readable storage media) 928 on which is stored one or more sets of instructions 922 embodying any one or more of the methodologies or functions described herein. The instructions 922 may also reside, completely or at least partially, within the main memory 904 and/or within the processing device 902 during execution thereof by the computer system 900, the main memory 904 and the processing device 902 also constituting computer-readable storage media.
[0056] The various methods described above may be implemented by a computer program. The computer program may include computer code arranged to instruct a computer to perform the functions of one or more of the various methods described above. The computer program and/or the code for performing such methods may be provided to an apparatus, such as a computer, on one or more computer readable media or, more generally, a computer program product. The computer readable media may be transitory or non-transitory. The one or more computer readable media could be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium for data transmission, for example for downloading the code over the Internet. Alternatively, the one or more computer readable media could take the form of one or more physical computer readable media such as semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random-access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, and an optical disk, such as a CD-ROM, CD-R/W or DVD.
[0057] In an implementation, the modules, components and other features described herein can be implemented as discrete components or integrated in the functionality of hardware components such as ASICS, FPGAs, DSPs or similar devices.
[0058] A “hardware component” is a tangible (e.g., non-transitory) physical component (e.g., a set of one or more processors) capable of performing certain operations and may be configured or arranged in a certain physical manner. A hardware component may include dedicated circuitry or logic that is permanently configured to perform certain operations. A hardware component may be or include a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC. A hardware component may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations.
[0059] Accordingly, the phrase “hardware component” should be understood to encompass a tangible entity that may be physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
[0060] In addition, the modules and components can be implemented as firmware or functional circuitry within hardware devices. Further, the modules and components can be implemented in any combination of hardware devices and software components, or only in software (e.g., code stored or otherwise embodied in a machine-readable medium or in a transmission medium).
[0061] Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “receiving”, “determining”, “comparing”, “enabling”, “maintaining,” “identifying”, “obtaining”, “taking”, “classifying”, “training”, “associating”, “providing”, “detecting”, “analysing”, “rendering” or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
[0062] It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other implementations will be apparent to those of skill in the art upon reading and understanding the above description. Although the present disclosure has been described with reference to specific example implementations, it will be recognized that the disclosure is not limited to the implementations described but can be practiced with modification and alteration within the spirit and scope of the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.