Automated crab meat picking system and method
11337432 · 2022-05-24
Assignee
Inventors
- Yang Tao (North Potomac, MD, US)
- Robert Vinson (Rockville, MD, US)
- Dongyi Wang (College Park, MD, US)
- Maxwell Holmes (Washington, DC, US)
- Gary E. Seibel (Westminster, MD, US)
Cpc classification
G06V20/70
PHYSICS
G06V10/454
PHYSICS
A22C29/025
HUMAN NECESSITIES
International classification
A22C29/02
HUMAN NECESSITIES
Abstract
A vision guided intelligent system for automated crab meat picking operates in a fully automated or semi-automatic modes of operation using a crab meat picking routine based on (a) the CNN model localization of back-fin knuckles algorithm, and/or (b) the Deep Learning model which accurately locates not only knuckle positions, but also crab legs and crab cores, with a high pixel accuracy (up to 0.9843), and low computation time. The subject system uses a concept of analyzing crab morphologies obtained from digital crab images, and, using a Deep Learning architecture integrated in the system, segments crab images into five regions of interest in a single step with high accuracy and efficiency. The image segmentation results are used for generating crab cut lines in XYZ and angular directions, determining starting cutting points in Z plane, and guiding cutting tools and end effectors to automatically cut crabs and harvest crab meat.
Claims
1. A system for automatic crab processing, comprising: a conveyor belt having a first end and a second end thereof, a plurality of crabs being loaded on the conveyor belt at said first end thereof in a spaced apart relationship one respective to another with a predetermined distance therebetween, said conveyor belt being controllably linearly displaced in a stepped fashion with a step pitch corresponding to said predetermined distance between crabs; a dual-modality imaging sub-system disposed in alignment with said conveyor belt in proximity to said first end thereof, said dual-modality imaging sub-system is to obtain one or more images of each of said crabs traveling on said conveyor belt, and said dual-modality imaging sub-system is to generate a respective 3-D crab model for each of said crabs based, at least in part, on 2-D crab morphology information and 3-D laser information obtained for respective said crabs; at least one cutting station positioned along said conveyor belt downstream from said dual-modality imaging sub-system, said at least one cutting station being equipped with a controllable cutting tool for automatic crab cutting; a computer system operatively coupled to said dual-modality imaging sub-system for receiving said images of the crabs and processing said images; an image processing sub-system integrated with said computer system and configured for processing said images, said image processing sub-system generating instructions relative to at least a cutline of each of said crabs based on the respective 2-D crab morphology acquired from each of said images; and a control sub-system operatively coupled to said computer system, said at least one cutting station, and said conveyor belt to control and synchronize motion of said conveyor belt and operation of said cutting tool at said at least one cutting station in accordance with said at least cutline output by said image processing sub-system for automatic crab cutting.
2. The system of claim 1, further comprising a meat removal station disposed downstream of said at least one cutting station and equipped with at least one controllable end effector tool for automatic crab meat removal, wherein said control sub-system is further operatively coupled to said meat removal station to control and synchronize said at least one end effector tool at said meat removal station for automatic meat removal.
3. The system of claim 1, wherein said image processing sub-system is a Convolution Neutral Network (CNN) processing sub-system configured to determine, based on color information extracted from said images, a back-fin knuckle position, and to generate said cutline in the XY plane based on the back-fin knuckle position, said cutline being transmitted from said image processing sub-system to said control sub-system and said at least one cutting station to actuate said cutting tool in accordance therewith.
4. The system of claim 1, wherein said image processing sub-system includes a Deep Learning model-based segmentation processing sub-system and a post-segmentation processing sub-system, wherein said Deep Learning model-based segmentation processing sub-system is adapted for in-one-step segmentation of a respective one of said images into a plurality of regions-of-interest (ROIs), including at least a conveyor background, crab legs, crab core, back-fin knuckles, and crab back bones, and wherein said post-segmentation processing sub-system is adapted for computing the cutline in at least the XY plane using position of said back-fin knuckles via a template matching routine, and for locating crab chamber meat based on position of a crab core.
5. The system of claim 1, wherein said dual-modality imaging sub-system includes a Red-Green-Blue (RGB) color imaging sub-system and a 3-dimensional (3-D) laser imaging sub-system, wherein said RGB color imaging sub-system generates the 2-D crab morphology information of respective said crabs, including a crab size and a crab orientation on the conveyor belt, and wherein said 3-D laser imaging sub-system generates the 3-D laser information of respective said crabs.
6. The system of claim 5, wherein said at least one cutting station includes: a vertical cutting station for cutting each of the crabs along lump meat walls of each respective crab to reveal jumbo meat components based on respective 2-D crab morphology information of each crab, and an angled cutting station for performing an angular cut based on a 3-D crab model of a respective crab to expose small compartments of the respective crab's body, wherein each of said vertical cutting station and said angled cutting station are equipped with said cutting tool configured with a waterjet nozzle for performing cuts in XYZ directions.
7. The system of claim 6, wherein said control sub-system includes a set of digitally controlled, at a predetermined motion speed and precision, linear motors operatively coupled to said vertical cutting station, an angular servo-motor operatively coupled to said angled cutting station, and a step motion servo-motor operatively coupled to said conveyor belt, wherein said control sub-system tracks a position of each of said crabs at said vertical cutting station, said angled cutting station, and a meat removal station.
8. The system of claim 7, wherein said control sub-system includes an encoder sub-system coupled to said computer system and generating a clock signal processed by said computer system to synchronize and track the operation of said conveyor belt, said dual-modality imaging sub-system, said at least one cutting station, and said meat removal station, and wherein said image processing sub-system, said vertical cutting station, said angled cutting station, and said control sub-system share said instructions generated by said image processing sub-system.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(31) Referring to
(32) The cutting stations are positioned downstream the imaging system 18, and may be presented by an X,Y vertical cutting station 20 and an angular cutting station 22.
(33) The meat removal station 24 is located downstream of the cutting stations 20 and 22 and may be positioned proximal to the discharge edge 16 of the conveyor belt 12. The meat removal station 24 is adapted for removal of the lump meat and chamber meat which is revealed by cutting the crabs either at the XY/vertical cutting station 20, or at the station 20 and the angular cutting station 22.
(34) The subject system 10 can operate either in a fully automated mode of operation or in a semi-automatic mode of operation. In the fully automated mode of operation, the meat removal station 24 automatically picks the crab meat following the cutting stations(s) 20,22, thus totally replacing human involvement in the process. In the semi-automatic mode of operation, the manual meat picking can be used after the crab is automatically cut at the cutting station(s) 20,22 along the crab body boundary and the top cover to reveal the crab meat chambers. In the semi-automatic mode of operation, the subject system permits the human co-robot activity where the machine performs the cutting, while human operators are involved in the manual crab meat picking. In commercial settings, either fully or semi-automated modes of operation may be chosen, depending on the facilities' financial abilities, labor availability, costs, etc.
(35) The entrance edge 14 serves to load the crabs 26 on the conveyor belt 12 upstream the vision station 18. Prior to loading onto the conveyor belt 12, crabs 26 are steamed and cooled to solidify the protein according to the current industrial crab meat picking practice. Crabs may be all de-backed to remove the top shell and internal organs.
(36) The conveyor belt 12 is, for example, a mesh conveyor formed from stainless steel which is driven by a servo motor 28 in a stepped fashion. As shown in
(37) Each cutting station, either XY cutting station 20 (performing vertical cuts in the XY plane), or the angular cutting station 22, is equipped with a cutting tool, which may be represented, for example, by a high-pressure water-jet cutting tool. The cutting tool is a robotic sub-system which is configured with a high-pressure water-jet hose 30 (at the XY cutting station 20), and the high-pressure water-jet hose 32 (at the angular cutting station 24) which serve to supply water to water-jet nozzles (knives) 34 and 36, respectively. The water-jet knife 34 (referred to herein also as XYZ water-jet knife) at the XY cutting station 20 is controlled to make vertical cuts (Z direction) in two directions at the XY (horizontal) plane. The water-jet knife 36 (referred to herein also as XYZ plus angular knife) at the angular cutting station 24 has more degrees of freedom, and makes a two directional horizontal XY cut, a vertical cut, and an angular cut.
(38) To be able to make an angular cut, the water-jet knife 36 is controllably positioned to assume a required angular orientation regarding X, Y and Z axes, and actuated for the cutting action by the subject control sub-system (as will be detailed in further paragraphs).
(39) The meat removal station 24 is equipped with end effectors 38, 40 such as, for example, spoon-like end effectors, and comb-like end effectors (schematically shown in
(40) For example, the end effector 38 may be adapted for lump meat removal and can move in XYZ directions, and rotate. The end effector 40 may be adapted for chamber meat removal, and can be controllably displaced in XYZ directions, as well as rotate, as needed for the crab chamber meat removal.
(41) Subsequently to the chamber and lump meat removal at the meat removal station 24, the crab meat travels on a pair of meat belts 42 which extend from the meat removal station 24 to the discharge edge 16 to carry the removed lump and chamber meat to a crab meat packing station 44 (schematically shown in
(42) Referring to
(43) The vision station (sub-system) 18 includes, for example, a Charged Coupled Device (CCD) camera 46 for two-dimensional (2-D) RGB (red-green-blue) image acquisition. The vision station 18 is also equipped with a laser scanning system 48 for the 3-D laser scanning of the crabs 26 while the crabs travel on the conveyor belt 12 from the entrance edge 14 through the vision station 18. A ring-like source 50 (which may be equipped with numerous LEDs 52) is mounted underneath the CCD camera 46 to provide stable lighting.
(44) The entire vision sub-system 18 is enclosed in an IP 69 waterproof food grade enclosure 54 for food sanitation safety purposes and electronic protection.
(45) The autonomous crab meat picking system 10 further includes a computer system 56 (best presented in
(46) The subject system 10 further includes a control sub-system 60 which is operatively coupled to the computer system 56 to control the operation of the mechanical sub-system 62 for executing operation of the cutting tools, end effectors, and conveyor belt, as well as other involved mechanical parts, in a highly synchronized fashion. The control sub-system 60 supports the operation of the entire system 10 in accordance with the results of the images processing by the algorithm 58 running on the computer sub-system 56.
(47) The mechanical sub-system 62 in the context of the subject system 10, includes the mechanisms supporting the operation of the conveyor belt 12, robotics of the cutting tools and crab meat removing end effectors. Various sub-systems of the subject system 10 cooperate with each other in a highly synchronized order. Various actuators, such as motors (servo-motors) adapted to move the conveyor belt 12, the cutting tools 34, 36, and the end effectors 38, 40, as well as other associated parts of the system 10 cooperating with each other to support the operation of the subject system, are controlled by the control sub-system 60 in accordance with instructions generated by the image processing sub-system 58.
(48) The control sub-system 60 also controls the motion of the 3-D scanning laser sub-system 48 of the vision station 18, and synchronizes the image taking routine by the CCD camera 46 (Glaser scanning system 48) with other routines executed in the system.
(49) The control sub-system 60 may be a customized SST LinMot robotic gantry motion system equipped with controllers permitting 4 degrees of freedom, for the XYZ and angular displacement of the water-jet knives 34, 36 and the end effectors 38, 40 at the cutting stations 20, 22 and meat removal station 24, respectively.
(50) For synchronization and tracking of the operation of the system 10, the control sub-system 60 further includes an encoder 64 operating in cooperation with the servo-motor 28 which actuates the stepwise displacement of the conveyor belt 12. The readings of the encoder 64 are transmitted to the computer system 56 to be processed and are used by the control sub-system 60 to control the system operation.
(51) The control sub-system 60 operates in correspondence with the algorithm 58 embedded in the computer system 56, and controls the operation of the conveyor belt 12, the laser system 48, the cutting tools 34, 36, and the end effectors 38, 40, as well as the meat belts 42, in accordance with the results of the images processing (computed by the algorithm 58.
(52) The control sub-system 60 cooperates with the computer system 56 and the algorithm 58 therein, as well as the encoder 64, to synchronize the operation of various parts of the mechanical sub-system 62.
(53) As shown in
(54) The control sub-system 60 further cooperates with the clock 76 in accordance with which the cutting curves generated at the cutting curves sub-system 72 are supplied to the crab cut processing sub-system 78, which controls the cutting trajectories of the water-jet knives 34 and 36 at the cutting stations 20 and 22, respectively, to cut the crabs, in accordance with the generated cut lines.
(55) The computer system 56 coordinates the operation of the clocks 68, 70, 76, in accordance with the main block 66, to synchronize and control the operation of the entire system 10.
(56) The subject system 10 is envisioned for operation in at least two modes of operation. For this purpose, as shown in
(57) In accordance with the CNN for back-fin knuckles detection routine 59, as schematically shown in
(58) The RGB camera 46, which preferably is a CCD camera, and the laser system 48 are exposed to a calibration process, and their operation is synchronized by the control sub-system 60 using the clocks 68 and 70, as shown in
(59) The output of the laser system 48, i.e., the 3-D laser images 82, carry the crab height information which is used for generation of the vertical cut line. The 2-D RGB images 80 are processed for the back-fin knuckles detection in the back-fin knuckles detection sub-system 84 which are used (as will be detailed in further paragraphs) for computing the horizontal plane) cut line (XY plane) through application of the cut line template routine in the cut line template processor sub-system 86.
(60) Subsequent to the image processing in the cutline template processing sub-system 86 and the back-fin knuckle detection processing sub-system 84, the XY is computed in the cutline in XY axis processing sub-system 86. The cutline in the XY axis is supplied from the processing sub-system 88 to the mechanical system 62 which is controlled by the control sub-system 60 to cut the crab for the crab case exposure at the cutting stations 20 and 22, and for meat picking at the meat removal station 24.
(61) As shown in
(62)
(63) Although, most crabs share similar morphological structures, it is impossible to directly calculate the cutline from the images because of the noise found around the crab's main body, which is produced by the eggs, lungs tissues, and leg joints. However, the pair of backfin knuckles 100 (shown in
(64) Preliminary studies resulted in the finding that distance (L) between two knuckles and distance (W) between the mid-point of the junction cartilage and the knuckle line 102 (shown in
(65) In the present system, a template matching routine is used to generate the cutline in the XY plane, based on the crabs morphological information and the knuckle locations acquired via the subject imaging method.
(66) In the digital crab images, the knuckles are small and amorphous objects. Small object detection is challenging, since there may be overwhelming misleading objects. The basic solution to this detection challenge may be transformation of a global detection problem into a local detection problem. In the subject system, a localization process is used which rules out a large percentage of noise and offers a rough target region where small objects are much easier to detect.
(67) The localization strategy may be based on human experience. However, such approach is not reliable. For this reason, the present algorithm 59 uses a sliding window routine to slice the picture (image) into small pieces (sub-images, or regions) and subsequently design a classification model to determine the presence of a target object (back-fin knuckle) in each small sub-image.
(68) The classification model in question is contemplated to accept various features related to the small target as an input, including color, shape, or both.
(69) Conventional classification models may be used which may include principle component analysis (PCA), support vector machines (SVM), and discrimination models. Artificial neural networks (ANNs) are another classification model acceptable for the subject system.
(70) Convolutional Neural Networks (CNNs) model which improves the traditional ANNs classification models, is considered for application in the subject system 10. The CNN classification model uses the internal properties of the 2D images and shares weights to avoid the overfitting problem to some degree.
(71) With the development of highly powerful and efficient Graphic Processing Units (GPUs) and the advent of deep networks, the CNNs alter the track of computer vision development. CNNs utilize numerous (millions) of hidden features to achieve complicated image analysis tasks. Region-based CNNs and versions thereof have been widely used in the object detection area in recent years to find a particular set of objects in an image. However, the performance of CNNs is highly dependent on the image training dataset. Lack of a sufficient training data volume in specific area may hamper CNNs application in the industry.
(72) The subject system and method support the steps of: (a) crab image acquisition, (b) image processing, (c) knuckle sub-image dataset preparation, (d) network training and validation, (e) final knuckle positions determination, and (f) cutline generation in the XY plane.
(73) In the present system 10, as shown in the flow chart depicted in
(74) If the sub-image is determined in Step 4 to contain a knuckle, the CNN classifier localizes a rough region for it in Step 5. To obtain exact knuckles positions in Step 7, an additional k-means clustering routine is applied in Step 6 to the rough region in order to cluster pixels in the rough region based on their color.
(75) The exact position generated in Step 7 is subsequently used in Step 8 for the cutline generation, which, in its turn, is supplied to the control sub-system 60 for the cutting tool actuation in Step 9.
(76) Statistical data will be presented in the following paragraphs to demonstrate the results of the subject CNN-guided knuckle detection routine. The sub-image classification accuracy attained in the subject system is comparable with SVM and ANN methods. The final detection performance after k-means clustering is compared with clustering results based on traditional localization method established upon human-level assumptions and complex blob analysis.
(77) Experiment Material and Image Acquisition
(78) As shown in
(79) The mechanical operation sub-system 62 included a set of waterjet knives with a stream of 1 mm at 30,000 PSI, which makes sharp and fine bladeless cuts.
(80) The basic concept supporting the knuckle segmentation approach is the use of the red color information to separate the knuckles from the crab body. However, as knuckles are small objects relative to the whole image, a localization strategy is needed in advance to determine a rough knuckle region and to avoid the obfuscating objects such as eggs and legs. In the rough knuckle region, pixel-level clustering based on red color information extracts the final knuckle location.
(81) For knuckle localization, the easiest procedure is to manually set up standards. For example, it can be assumed that knuckles will be located near the crab-background boundary. Every crab usually has two knuckles, and they have a particular distance to the crab centroid. Most of these standards rely on human assumptions and are unreliable. The localization results usually are unable to ensure that a knuckle is the main part of the region and other red objects may still exist in the region.
(82) An alternative concept for localization of the knuckles, used in the subject system, is to train a classifier to point out whether the detected object is in a region. In this implementation of the subject system, the binary knuckle detection classifier is trained by thousands of sub-images, both including knuckle and devoid of non-knuckle sub-images. The size of sub-images is predetermined to be larger than the size of the back-fin knuckles. The size of the sub images also is chosen to ensure that the knuckle can be the key part in the sub-image. In the experiment and studies of the subject system performance, the size of sub-images was chosen to be is 81 pixels×81 pixels, with a pixel resolution of 0.2 mm×0.2 mm per pixel.
(83) After obtaining the rough knuckle positions in Step 5 (shown in
(84) Image preprocessing, as shown in
(85) For the background removal, the Otsu algorithm is applied. The Otsu algorithm automatically determines a threshold to separate the conveyor belt and the crab body. The saturation value of the conveyor belt is much lower than the saturation levels for a crab. The Otsu algorithm operates to maximize the inter-class variance based on the saturation gray scale histogram, and divides the pixels into two classes. A typical image after the conveyor belt background removal is shown in
(86) To determine the crab body centroid in Step C (of
(87) Finally, the watershed routine is applied to the distance transformation result in Step F. The principle of the watershed algorithm is to merge pixels with similar gray scale value into a group. In this application, because the leg pixels are closer to the background than the crab body pixels, the watershed method based on the distance transformation results in a specific grouping for the crab body, as shown in
(88) Some other alternative image processing methods could also be able to remove the leg. However, in this case, accurate leg removal and centroid location are not necessary for following processes, and the updated centroid located between two knuckles is good enough for the following work.
(89) CNN Based Crab Knuckle Localization
(90) Dataset Preparation
(91) Ninety full-sized (1200×800 pixel) background removed RGB crab images were used with a pixel resolution of 0.2 mm×0.2 mm per pixel in the training dataset. Another set of 43 full-sized background removed images from a different batch of crabs were used for comparison. Using a different batch of crabs for the comparison images can validate the generalization of the classification model effectively.
(92) All 133 images were knuckle-labeled. Particularly, other leg and crab core joints were also marked with a different label, which was helpful in the subsequent sub-image selection operation. An 81 pixels×81 pixels sliding window sliced the full sized image into small sub-images (regions). The size of the sliding window was determined by the actual knuckle size in the raw image. It should be larger than the size of knuckles, but it also needs to ensure that the knuckle is the key part in the sub-image. The application of the sliding window started from the (0,0) pixel of a full sized image and moved in X/Y direction with the stride of 5 pixels.
(93) All sub-images were classified into three types intuitively, i.e., the background (all black pixels), the crab (all color pixels), and the boundary (including both color and background pixels). Knuckles always belong to boundary sub-images, and thus background and crab sub-images were discarded, and were not included in the training and test datasets.
(94) For the boundary sub-images, the ones with more than 80×40.5 color pixels were regarded as effective sub-images, because the opposite case cannot ensure the knuckle is the main part of the sub-image. Therefore, the opposite cases were also completely discarded in advance and were not included in the training and test dataset.
(95) Returning to
(96) The dataset generated from the above process still had multiple times more of non-knuckle sub-images than the knuckle sub-images. To ensure the training set has a comparable scale of positive and negative training samples, some non-knuckle images were discarded. Instead of discarding randomly, a larger percentage of non-knuckle images with leg and crab core components were kept, and a smaller percent of images with only meat or only leg components was kept. This process was implemented with the formally mentioned leg and crab core joint labels. This strategy permits the obtaining of better test results when new images are obtained. The final dataset included 21,509 knuckle sub-images and 50,021 non-knuckle sub-images.
(97) In the model training procedures, the sub-images generated from 68 full-sized training images (75% of 90 images in training dataset) were used for training. The sub-images from 22 full-sized training images remaining (25% of 90 training image) were used for validation.
(98) CNN Architecture and Network Training
(99) The basic idea of CNN architecture is to utilize hundreds of convolutional filters to describe hidden features in an image. Modern CNN architectures integrate convolutional layers with maxpooling layers and rectification linear unit (ReLU) layers, establishing the idea of a Deep Neural Network.
(100) In the CNN architecture 59 (shown in
(101) The mathematic expression of the three layers is shown in Eq. 1, where K is a (2N+1)×(2N+1) filter matrix and I represents a sub-image sample.
(102) Generally, CNN architecture is equipped with a couple of fully-connected layers to perform the final classification, similar to the traditional ANN model
(103)
(104) The subject CNN architecture follows the idea of the VGG model developed by the Visual Geometry Group of University of Oxford. It uses 3×3 convolutional filters in each layer instead of larger filters to construct a deeper network. The subject network architecture is shown in Table 1. In the experiment, some classical AlexNet, GoogleNet, VGG and ResNet was used. For each network, different number of layers, different number of convolutional filters and different size of convolutional layers were tested. There were no obvious differences among the best results of the different models. The presented architecture shows the best experimental result.
(105) In the training procedure, adaptive gradient descent is used to minimize the cross entropy (training loss) as Eq. 2 shows, where y′ is the true distribution and y is the predicted probability distribution.
(106) In the binary classification application, for non-knuckle sample, y′ is [1, 0], and for knuckle detection, y′ is [0, 1], where y represents the probability of two classes, which was computed by sigmoid function.
(107) The learning rate was set as 0.001, and it automatically decreases based on past gradients without additional parameters needed.
(108) TABLE-US-00001 TABLE 1 Knuckle detection CNN architecture Layer parameters Output size Input Data N/A 81 × 81 × 3 Conv 1 kernel size: 3, stride: 1 79 × 79 × 64 Conv 2 kernel size: 3, stride: 1 77 × 77 × 64 Conv 3 kernel size: 3, stride: 1 75 × 75 × 64 Conv 4 kernel size: 3, stride: 1 73 × 73 × 64 Max Pool 1 kernel size: 2, stride: 2 36 × 36 × 64 Conv 5 kernel size: 3, stride: 1 34 × 34 × 128 Conv 6 kernel size: 3, stride: 1 32 × 32 × 128 MaxPool 2 kernel size: 2, stride: 2 16 × 16 × 128 Conv 7 kernel size: 3, stride: 1 14 × 14 × 256 Conv 8 kernel size: 3, stride: 1 12 × 12 × 256 FC 1 N/A 2048 FC 2 N/A 2048 FC 3 N/A 2 Compute the loss and accuracy
H.sub.y′=Σy′×log(y) (Eq. 2)
(109) The batch size for training was 256, and the network converged after 5,000 iterations. The cross entropy and curve accuracy are shown in
(110) The training accuracy may float up and down for every training step, but the general tendency is going up. On the contrary, the cross entropy (loss) goes down. 99.6% training accuracy and 98.7% validation accuracy were attained. The CNN architecture was implemented in both caffe and tensorflow with a Nvidia GeForce Titan X GPU.
(111) Polling and the Rough Knuckle Position Determination
(112) After the CNN model is well-trained, the rough knuckles positions in full-sized test images are ready to be determined in Step 5 of
(113) For a new crab image, all preprocessing procedures were performed in advance. In the test phase, an 81×81 sliding window was still needed to slice the test images into sub-images, and the subject CNN model just validated whether an effective sub-image contained a knuckle or was devoid of the knuckle.
(114) The size of test crab image was 800×1200, and the test sliding window started from the coordinate (0, 0) to (720, 1120) with a 5-pixel stride at both x and y axes.
(115) The subject CNN model can obtain 99% training and validation accuracy. However, both of the accuracy values are based on sub-images 108 (shown in
(116) To rule out the incorrect prediction cases, an additional polling routine was applied in Step 5 (shown in
(117)
(118)
(119) K-Means Clustering in Rough Knuckle Region
(120) K-means clustering is a classic unsupervised learning method, which operates to cluster pixels based on minimizing the total within-class distance, which is expressed in Eq. 4, where k is the number of clustering types, x is a d-dimensional feature vector, and u.sub.i is the mean vector in each group.
(121)
(122) The a* value in L*a*b* color space is a sensitive feature to describe knuckle colors, as shown in
(123) In the experiment, binary k-means clustering routine in a* channel twice was executed in Step 6 (shown in
(124) The final knuckle segmentation results generated in Step 7 of
(125) Template Matching Based on Knuckle Location
(126) Based on the fact that different crabs share a similar morphological structure, as shown in
(127) The template matchings needs the execution of three sequential routines, i.e., rotation, scaling and translation. First, the rotation, based on the Eq. 5 is performed. Then scaling is performed, based on Eq. 6. To simplify the process, the coordinate was transformed to the polar coordinate, and the scaling was executed by multiplying r the radius (in polar coordinates) by d, and the coordinate was changed back. Finally, the translation was conducted based on both the left knuckle and right knuckle via computing the average translation distance. The cutline 122 generated from the template matching in Step 8 of
(128) Rotation:
x.sub.r=(x−c.sub.x)cos θ−(y−c.sub.y)sin θ+c.sub.x
y.sub.r=(x−c.sub.x)sin θ+(y−c.sub.y)cos θ+c.sub.y (Eq. 5)
(129) Scaling:
(130)
(131) where, the rotation factor θ is determined by the angle between the template knuckle line and the knuckle line, d is the ratio of the template knuckle line to the knuckle line, (x, y) is a point in the template image, (x.sub.r, y.sub.r) is the coordinate after rotation, (x.sub.s, y.sub.s) is the coordinate after scaling, and (c.sub.x, c.sub.y) is the coordinate of the template centroid 112.
(132) The cutline template was generated based on the preliminary studies presented in previous paragraphs. In the crab industry, there is no standard to qualify the quality of the cutline ground truth. The resolution of the subject waterjet system is 1 mm, which permits 5 pixels error in images. With the cutline in the XY plane, the present system opens the chitin walls to expose the jumbo meat for picking.
(133) The subject CNN-based knuckle determination approach was tested on the prepared test dataset, which included numerous full-sized images. The sub-image classification ability of the subject CNN model was validated, and its performance was compared with the traditional SVM model and the ANN model.
(134) Ten thousand (10,000) test sub-images were generated from the 43 images in accordance with the procedure presented in the previous paragraphs for the training dataset. Only boundary sub-images were included in the test dataset, instead of the background and crab sub-images. In the test dataset, the ratio of positive sample and negative sample followed the values in the training dataset, which can reflect the ratio in real applications.
(135) The SVM and ANN models were trained by the same training dataset on 90 images. To avoid the overfitting problem of the SVM and ANN models and to accelerate the training process, the Principle Component Analysis (PCA) was conducted on the image data in advance, and the first 500 components were chosen to describe the image features. For the SVM, three common kernels, including the polynomial kernel, the Gaussian kernel and the sigmoid kernel, were tested separately. After adjusting the free parameters in the three kernels, experiments showed the Gaussian kernel obtained the best result, with 96.7% training accuracy and 88.1% test accuracy.
(136) For the ANN model, the number of hidden layers was set as 1 to 3 separately. Experiments showed a two-hidden-layer network with 20 and 5 neurons in each hidden layer can achieve the best balance between the training and test accuracy, with 95.2% training accuracy and 85.7% test accuracy.
(137) There were 86 knuckles in the 43 test images. After the polling process, the rough knuckle determination accuracy of the CNN model reached 97.67. Compared to the manual localization process, the method demonstrates great advantages.
(138) In the manual localization process, background removal and watershed-based leg removal is conducted in advance, and an assumption is made that knuckles are located near the lower boundary of the crab body. In practice, the distance computation is achieved by a KD-tree data structure. In this region, binary k-means clustering on a* channel is applied once to rule out crab body pixels. Remaining regions include back shell, knuckles, and legs. Based on the constraints of blob size, distance to the crab centroid, and blob eccentricity, two regions can be selected as the left and right knuckles, and the final localization accuracy can be 82.56%, which is inferior to the CNN based localization.
(139) A qualification experiment was conducted to describe the final knuckle segmentation result. A Jaccard index J(A, B) was defined as the metrics, whose mathematic expression is presented in Eq. 7. Where, A is the segmentation result in the present method, B is the segmentation ground truth, and the cardinal number of a region represents its area.
(140)
(141) Applying k-means clustering, presented in the previous paragraphs, on the 84 correct rough knuckle regions acquired by the subject CNN model, every region attains a Jaccard value, and the average is 0.91. Because the subject clustering routine is the pixel based clustering, it may be affected by some impulse noise. However, a 0.91 Jaccard index is sufficient enough to obtain the cutline determination.
(142) As shown in
(143) The watershed algorithm can effectively eliminate the problem, as shown in
(144) In extreme cases, for example if there are bodies which have a comparably low saturation value with the background and are connected to the background, the watershed method may fail and would split the crab body into two or more pieces. In the present test dataset, this condition does not exist, since the crab pixels and conveyor belt pixels show significant differences in saturation channel, and the background removal routine demonstrates a high precision with a 0.97 Jaccard value (Eq. 7).
(145) In practice, the light source and camera settings need to be adjusted in different work environments. If the watershed fails, the crabs need to be rejected and be reprocessed to clean the dirt manually.
(146) The Strategy for Training Data Preparation
(147) When preparing the training dataset, some negative samples should be discarded to obtain a comparable negative (non-knuckle sub-images) and positive (knuckle sub-images) sample. Non-effective sub-images can be discarded. For effective sub-images, they can be divided into three types, only the background (type 1), only the crab body and background (type 2), and leg, body and background (type 3).
(148) All type 1 and 2 sub-images are negative samples, and the type 3 images comprise both negative samples and positive samples. To a classification machine, the type 3 sub-images are more likely to produce the false positive cases when new test images are attained, which can further affect the determination of the rough knuckle positions.
(149) Instead of discarding non-knuckle sub-images randomly, more negative samples belonging to type 3 tend to be kept. The weight map in the polling step is shown in
(150) Selection of k-Means Parameters
(151) In general, the clustering method results rely greatly on the selection of features and the relationships among different features. The representativeness and robustness of features are the key factors to obtain the ideal clustering results. In the subject knuckle detection application, the a* channel in L*a*b* color space is the most reliable feature to describe the knuckle pixels compared to RGB values.
(152) There are two potential problems in considering the R, G, B values as clustering parameters. The R, G, B values of the crab body pixels in sub-images are very random. They could be very dark or very bright, and thus can affect clustering results.
(153) The second critical problem accompanying the RGB based clustering is that when the clustering is finished, there is not a reliable way to determine which class represents the knuckles. However, if the clustering is based on a* value, it is always true that the class with largest a* value represents the knuckle.
(154) To determine the number of clustering classes in the a* based clustering, an additional experiment was conducted based on the subject dataset. There are three possible parts in the rough knuckle position including the leg, the knuckle, and the crab body. Knuckle pixels have the highest a* value, legs are middle, and crab body has the lowest a* value. The a* value differences between knuckle and crab body are obvious, but between the knuckle and the leg are insignificant.
(155) To choose adequate clustering classes, single 2-class clustering and single 3-class clustering were validated. 2-class clustering means all the pixels in the rough knuckle region would be clustered into two classes. Single 2-class clustering separate the crab main body pixels from others. Similarly, 3-class clustering was expected to separate the leg, the knuckle and the crab main body apart. Between the 2-class clustering and the 3-class clustering, 2-class clustering performs better,
(156) In order to execute the segmentation routine more precisely and to further separate the knuckle pixels from the trivial leg pixels, the second 2-class clustering was conducted based on the first clustering result and returned the class with highest a* value. The main problem of performing 2-class clustering is that it can result in some over segmentation cases, and weaken the final performance. Therefore, the second clustering is regarded as an optional step in practice, and is conducted only when the a* value difference between two classes is far enough (>10). An average 0.91 Jaccard value can be achieved using the constrained 2-class clustering twice in the 84 correct rough knuckle regions.
(157) Although most crabs share similar morphological structures complicated and amorphous crab components, like crab legs, crab cores, eggs, heart, and stomach, are difficult to be described via traditional image features including color and textures. The subject system, in an alternative implementation, uses the Deep Learning methodology as a powerful image processing tool. Compared to traditional image processing techniques, the Deep Learning method can extract many describable and undescribable image feature. The applications of deep learning in the agriculture area are very limited, and most research focuses on cereal and plant image classifications. However, the use of the Deep Learning concept for the crab meat picking has never been proposed in the related communities.
(158) As shown in
(159) The crab processing system 10 (as shown in
(160) Referring again to
(161) The cutting station 22 performs the angular tilted cut based on 3D crab information acquired by the 3-D scanning laser system 48. The operation of the angular cutting station 22 is adapted to remove the fin chamber meat cartilage to expose small compartments hidden below the cartilages.
(162) Each gantry station 20, 22 is mounted with a water-jet knife system 34, 36, each configured with a water-jet hose 30, 32, respectively, and having a super small water-jet nozzle (serving as a knife) that is displaceable in XYZ directions. In the exemplary implementation, the stainless waterjet nozzle 30, 32 weighs about 2 lbs, and has a small orifice (about 0.006″) that ensures the high cut resolution.
(163) The control sub-system 60 includes a set of linear motors 31, which can be digitally controlled at the motion speed of 1,000 mm/sec at 0.01 mm precision to produce cutting action for the knife 34 at the cutting station 20. An additional angular servo motor 29 is integrated in the angular cutting station 22 to actuate an angular displacement of the water-jet nozzle (knife) 36 in XYZ directions to perform angular tilt cuts.
(164) The fourth station of the system 10 (also referred to herein as the meat removal station 24), performs meat picking by manipulating the end effectors 38, 40. For lump meat, spoon end effectors can mimic hand picking trajectory and conduct the meat harvesting from crab back-fin knuckles, which ensures the integrity of lump meat. For chamber meat, the comb like end effector is actuated to brush off the meat.
(165) The mesh conveyor 12 is driven by the servomotor 28 running in step motion. The control sub-system 60 is adapted to control the motion of the conveyor belt 12 by number of steps in distance (for example, 6″ per crab pitch, no crab overlap) in accordance with instructions generated by the image processing system 58, particularly, the Deep Learning Processor 130, embedded therein. The processor 58 knows the exact position of each object (crab) (based on the FIFO principles) at different stations.
(166) The present design ensures that four working stations (the imaging station 18, the cutting stations 20, 22, and the meat removal station 24) cooperate in substantially parallel fashion and the stations 20, 22, 24 can share the information captured from the vision station 18.
(167) The shaft encoder 64 provides the clock signal of the system 10, whose resolution may be, for example, 10,000 pulses/revolution. For the conveyor's shaft of 2″ in diameter, the conveyor tracking resolution can reach 0.016 mm/pulse. An encoder control board (not shown in the Drawings) is plugged in a PCIe bus slot in the computer system (PC) 56 to handle the interface functions between the encoder 64 and the software 58.
(168) Deep Learning is an emerging concept for application in an artificial intelligence area. As a branch of the Deep Learning concept, the Convolutional Neural Network (CNN) is presented in the previous paragraphs as a processing sub-system 59 of the subject algorithm 58. CNN trains hundreds of convolutional filters by itself, which enables the subject system to excavate numerous potential image features, and to combine them nonlinearly. Compared to traditional image feature operators, like HOG, Gabor, and GLCM, CNN can extract more image features. From another aspect, different from traditional artificial neural network architecture, convolution operations in CNN decreases the risk of overfitting by utilizing the image spatial information.
(169) The on-going Deep Learning research in agriculture applications focuses on image classifications, and some applications target image segmentation. A typical CNN architecture for classification problem integrates convolutional layers to extract image features, maxpooling layers for down sampling and avoiding overfitting, and activation layers to combine image features nonlinearly. At the end of these layers, the high-dimensional image features are regarded as the input of a traditional neural network to implement image classifications, and the final output of a classification problem is a class label.
(170) Comparatively, the result of the segmentation routine includes the localization information. Its output is expected to be an image, whose size should be the same as the input image, and each pixel of the image is assigned to a category.
(171) Fully Convolutional Network (FCN) starts the research using CNN model to achieve the image segmentation task. It makes use of down sampling CNN architectures to extract image features. Subsequently, up-sampling strategies are applied to these features to recover a well-segmented image. This concept originates from a special neural network named auto-encoder (AE), which is an unsupervised learning strategy. Its architecture comprises two serially connected sub-networks, i.e., the encoder network and the decoder network. The input of the encoder network is the raw data, and the outputs of the encoder network are high-dimensional data features. These data features are further fed into the decoder network, and the output of the decoder network is raw data, identical as the input of the encoder network. In essence, AE is a dimension reduction tool. It can automatically extract some low dimensional features, which can well represent the raw data.
(172) Compared to the classic non-linear dimension reduction method, like kernel PCA, AE shows great superiority, the convolutional autoencoder is an updated version of AE, which makes use of the advantage of CNN to increase the robustness of AE.
(173) The FCN can be regarded as a special convolutional autoencoder, but it is a supervised learning method. Instead of recovering the raw image, the output of FCN is manually labelled, well-segmented image. FCN transforms the segmentation problem into a pixel level classification problem, and every pixel in the image will be classified into a specific class. In the experiment and study of the subject system, the pixels have been classified into five classes.
(174) Network Architecture, Dataset Preparation and Network Training
(175) The problem of the FCN is that directly training the upsampling process is difficult without a large dataset. Research shows that, instead of the rudimentarily decoding of the image features from the encoder network, sharing some information from encoder process can improve the performance of the segmentation network.
(176) U-net is a symmetric network architecture proposed in the medical imaging community to perform the binary cell segmentation task. The U-net copies and concatenates the features from the encoding layers to their corresponding upsampling decoding layers, as shown in
(177)
(178) In the experiments, 90 well labelled crab images (800*800 pixels) shown, for example, in
(179) In the experiment, non-risky elastic deformation strategies (such as translation, mirroring, and random cropping) were directly conducted in the training and validation dataset. Some other deformation strategies (such as rotation, shear stretch, and zoom in and out) were conducted with constraint.
(180) For the rotation strategy, because there is a mechanical component in the machine to hold down crabs, the image rotation range was set at ±25 degree. For the shear stretch strategy, which is relatively rarely implemented in practice, its range was set at ±10 degree. For the zoom in and out strategy, the crab size in the augmented images was set at ±20%.
(181) In these methods, pixel displacements were computed using bi-cubic interpolation. Other common data augmentation operations in Deep Learning area, (such as color augmentation or contrast enhancement) were not useful because the light source of the vision station 18 is stable, the camera settings are adjustable, and the vision system needs to be corrected regularly. By random generation of the deformation parameters for a single crab raw image, 300 images were selected from the augmented images, and labelled images were deformed correspondingly. Finally, there were total 9,000 images which were used for network training. 3,000 images were used for validation, and their RGB values were normalized to 0-1 range.
(182) The subject network was implemented using Python Keras library with Nvidia GeForce Titan X GPU. The training process used the adaptive gradient descent algorithm Adam to optimize the loss function.
(183) The loss function, in accordance with Equation 8, was set as pixel-level categories cross entropy, where y is predicted probability distribution, generated by Softmax layer, and y′ is the ground truth distribution, expressed as one-hot format with a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).
(184) The batch size of training process was set at 4 to make full use of the graph card memory. Best model in the validation dataset was saved for a subsequent utilization.
H.sub.y′=Σy′×log(y) (Eq. 8)
(185) The crab core was XY cut at the cutting station 20. The core cut was based on the image segmentation results and was guided by a template matching concept detailed in previous paragraphs.
(186) Deep Learning segmentation model 130 is integrated into the crab processing system 10. The Deep Learning segmentation architecture 130 may operate as an alternative mode of operation to the knuckle localization processing approach 59, or both modes of operation may be used.
(187) As shown in
(188) TABLE-US-00002 TABLE 2 Network Training Metrics Classification Classification Loss Accuracy Training dataset 0.0289 0.9892 Validation dataset 0.0196 0.9943
(189) The test phase was performed online in Step 3. In order to seamlessly integrate the trained Python network model into a C++ project, the python class was wrapped into C++ code with the C++ boost library. The segmentation results computed in Step 4 were transformed to C++ array for generating crab cutline in Step 5, and were coordinated with other image acquisition codes and motion control codes.
(190)
(191) To quantify the segmentation performance of the model, 50 images in the test dataset were segmented and their results are compared with the ground truth. The segmentation routine can be considered as a pixel-level classification problem. The multi-class confusion matrix was established based on the pixels, as shown in 3. The total number of test pixels are 32,000,000 and the average pixel level accuracy was 0.9843.
(192) TABLE-US-00003 TABLE 3 Mutli-class confusion matrix for test dataset (GT: ground truth, P: predicted results) Backfin Back Background Legs Crab Core Knuckles Bones (GT) (GT) (GT) (GT) (GT) Background 23675208 143084 43328 7735 13961 (P) (99.16%) (3.62%) (1.15%) (3.46%) (8.05%) Legs (P) 148957 3784552 25944 3176 0 (0.62%) (95.78%) (0.69%) (1.42%) (0%) Crab Core 36417 20840 3685632 7920 6641 (P) (0.15%) (0.53%) (97.61%) (3.54%) (3.83%) Backfin 7830 2616 10313 199628 1247 Knuckles (P) (0.03%) (0.07%) (0.27%) (89.29%) (0.72%) Back Bones 7577 0 10804 5112 151478 (P) (0.03%) (0%) (0.29%) (2.29%) (87.39%) Total Pixels 23875989 3951092 3776021 223571 173327 Pixel-level 0.9916 0.9578 0.9761 0.8929 0.8739 Accuracy
(193) Crab backfin knuckles were the key components in determining crab cutline in XY plane. To further quantify the performance of knuckle segmentation, the largest two knuckle blobs were extracted to rule out possible noises. Based on the centroid position of the crab core, two knuckles can be determined as left knuckle and right knuckle for both ground truth segmentation and predicted segmentation. Comparison results of the predicted and ground truth knuckle positions, including X and Y positions of both left and right knuckles, proved a high correlation between the predicted position and the ground truth position. With 5-pixel resolution, 81% knuckles were found correctly. With 10-pixel resolution, 97% knuckles were found correctly.
(194) The distance between two knuckles is representative of the size of the crab. With 0.2 mm pixel resolution of the subject system, the accuracy of the predicted knuckle distance achieved was about 1 mm.
(195) The subject network architecture shows great potential in preciseness of the automatic crab meat picking, including: (a) the Deep Learning processing approach segments different crab components in parallel, and the network model is capable of balancing the segmentation accuracy of different classes of the crab body segments. Specifically, for the centroid of the crab core, the average deviation between the predicted centroid and the labelled centroid attained was 12.37 vs. 11.94 pixels. (b) the U-net model (Deep Learning) is an all-in-one segmentation model and does not need any additional configurations in the test phase, and thus attains an extremely short computation time (about 241 ms).
(196) In the subject Patent Application, an automated crab picking system is disclosed which uses intelligent vision sensors to guide high-pressure waterjets and other end effectors to expose and to harvest crab meats. The subject processing architecture focuses on understanding of crab's morphology based on 2D crab images. To achieve the subject system objectives, the Deep Learning model is integrated into the system. It can automatically segment the crab images into, for example, five ROIs (Region-of-Interest), with high speed and high accuracy, including (1) conveyor background, (2) crab legs, (3) crab core, (4) back-fin knuckles and (5) crab back bones. Among them, crab back-fin knuckles can determine the crab cutline in XY plane via a template matching strategy. Crab core can be used for locating the chamber meat of crab in the subsequent processing steps. The subject system can operate based on the patch based knuckle detection algorithm, and/or based on the Deep Learning algorithm model which can segment different crab components in a single step.
(197) For the Deep Learning model, the computation time cost for single test image can sufficiently decrease about 50 folds compared to the patch-based knuckle detection. The average pixel classification accuracy in test dataset can reach 0.9843. As the initial step of crab meat picking machine, the Deep Learning model guides the machine to understand 2D crab morphologies and is integrated into the system seamlessly, which can meet the accuracy and efficiency requirements of a real-time crab machine.
(198) Although this invention has been described in connection with specific forms and embodiments thereof, it will be appreciated that various modifications other than those discussed above may be resorted to without departing from the spirit or scope of the invention as defined in the appended claims. For example, functionally equivalent elements may be substituted for those specifically shown and described, certain features may be used independently of other features, and in certain cases, particular locations of elements, steps, or processes may be reversed or interposed, all without departing from the spirit or scope of the invention as defined in the appended claims.