Automated crab meat picking system and method

Abstract

A vision guided intelligent system for automated crab meat picking operates in a fully automated or semi-automatic modes of operation using a crab meat picking routine based on (a) the CNN model localization of back-fin knuckles algorithm, and/or (b) the Deep Learning model which accurately locates not only knuckle positions, but also crab legs and crab cores, with a high pixel accuracy (up to 0.9843), and low computation time. The subject system uses a concept of analyzing crab morphologies obtained from digital crab images, and, using a Deep Learning architecture integrated in the system, segments crab images into five regions of interest in a single step with high accuracy and efficiency. The image segmentation results are used for generating crab cut lines in XYZ and angular directions, determining starting cutting points in Z plane, and guiding cutting tools and end effectors to automatically cut crabs and harvest crab meat.

Claims

1. A system for automatic crab processing, comprising: a conveyor belt having a first end and a second end thereof, a plurality of crabs being loaded on the conveyor belt at said first end thereof in a spaced apart relationship one respective to another with a predetermined distance therebetween, said conveyor belt being controllably linearly displaced in a stepped fashion with a step pitch corresponding to said predetermined distance between crabs; a dual-modality imaging sub-system disposed in alignment with said conveyor belt in proximity to said first end thereof, said dual-modality imaging sub-system is to obtain one or more images of each of said crabs traveling on said conveyor belt, and said dual-modality imaging sub-system is to generate a respective 3-D crab model for each of said crabs based, at least in part, on 2-D crab morphology information and 3-D laser information obtained for respective said crabs; at least one cutting station positioned along said conveyor belt downstream from said dual-modality imaging sub-system, said at least one cutting station being equipped with a controllable cutting tool for automatic crab cutting; a computer system operatively coupled to said dual-modality imaging sub-system for receiving said images of the crabs and processing said images; an image processing sub-system integrated with said computer system and configured for processing said images, said image processing sub-system generating instructions relative to at least a cutline of each of said crabs based on the respective 2-D crab morphology acquired from each of said images; and a control sub-system operatively coupled to said computer system, said at least one cutting station, and said conveyor belt to control and synchronize motion of said conveyor belt and operation of said cutting tool at said at least one cutting station in accordance with said at least cutline output by said image processing sub-system for automatic crab cutting.

2. The system of claim 1, further comprising a meat removal station disposed downstream of said at least one cutting station and equipped with at least one controllable end effector tool for automatic crab meat removal, wherein said control sub-system is further operatively coupled to said meat removal station to control and synchronize said at least one end effector tool at said meat removal station for automatic meat removal.

3. The system of claim 1, wherein said image processing sub-system is a Convolution Neutral Network (CNN) processing sub-system configured to determine, based on color information extracted from said images, a back-fin knuckle position, and to generate said cutline in the XY plane based on the back-fin knuckle position, said cutline being transmitted from said image processing sub-system to said control sub-system and said at least one cutting station to actuate said cutting tool in accordance therewith.

4. The system of claim 1, wherein said image processing sub-system includes a Deep Learning model-based segmentation processing sub-system and a post-segmentation processing sub-system, wherein said Deep Learning model-based segmentation processing sub-system is adapted for in-one-step segmentation of a respective one of said images into a plurality of regions-of-interest (ROIs), including at least a conveyor background, crab legs, crab core, back-fin knuckles, and crab back bones, and wherein said post-segmentation processing sub-system is adapted for computing the cutline in at least the XY plane using position of said back-fin knuckles via a template matching routine, and for locating crab chamber meat based on position of a crab core.

5. The system of claim 1, wherein said dual-modality imaging sub-system includes a Red-Green-Blue (RGB) color imaging sub-system and a 3-dimensional (3-D) laser imaging sub-system, wherein said RGB color imaging sub-system generates the 2-D crab morphology information of respective said crabs, including a crab size and a crab orientation on the conveyor belt, and wherein said 3-D laser imaging sub-system generates the 3-D laser information of respective said crabs.

6. The system of claim 5, wherein said at least one cutting station includes: a vertical cutting station for cutting each of the crabs along lump meat walls of each respective crab to reveal jumbo meat components based on respective 2-D crab morphology information of each crab, and an angled cutting station for performing an angular cut based on a 3-D crab model of a respective crab to expose small compartments of the respective crab's body, wherein each of said vertical cutting station and said angled cutting station are equipped with said cutting tool configured with a waterjet nozzle for performing cuts in XYZ directions.

7. The system of claim 6, wherein said control sub-system includes a set of digitally controlled, at a predetermined motion speed and precision, linear motors operatively coupled to said vertical cutting station, an angular servo-motor operatively coupled to said angled cutting station, and a step motion servo-motor operatively coupled to said conveyor belt, wherein said control sub-system tracks a position of each of said crabs at said vertical cutting station, said angled cutting station, and a meat removal station.

8. The system of claim 7, wherein said control sub-system includes an encoder sub-system coupled to said computer system and generating a clock signal processed by said computer system to synchronize and track the operation of said conveyor belt, said dual-modality imaging sub-system, said at least one cutting station, and said meat removal station, and wherein said image processing sub-system, said vertical cutting station, said angled cutting station, and said control sub-system share said instructions generated by said image processing sub-system.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1A is a pictorial view of the subject crab meat picking system;

(2) FIG. 1B is an operational flowchart diagram of the subject system in the CNN-based back-fin knuckle detection mode of operation;

(3) FIG. 1C is a flowchart diagram of execution of the routine for crab cutline generation based on the back-fin knuckle determination;

(4) FIG. 1D details the image pre-processing routine for the crab cutline generation process based on the back-fin knuckle determination;

(5) FIG. 2 is a schematic representation of the conveyor belt linearly displaceable in stepped fashion with crabs positioned thereon;

(6) FIG. 3 is a schematic representation of the control sub-system integrated with the subject system;

(7) FIG. 4 is a schematic representation of the imaging system in the subject crab meat picking system;

(8) FIG. 5 is a typical crab image acquired by the subject imaging system;

(9) FIG. 6 is a representation of the crab body morphological structure;

(10) FIG. 7 is a diagram representative of the preliminary result plot of widths W to lengths L ratio of the crab body;

(11) FIG. 8 is a typical a* channel image of the crab body;

(12) FIG. 9 is an image of the crab body after the conveyor background removal;

(13) FIGS. 10A-10B demonstrate the watershed procedure, where FIG. 10A depicts the result of distance transformation for the watershed routine, and FIG. 10B shows the result for the watershed segmentation routine;

(14) FIG. 11 is a diagram representing the tendency of training cross entropy curve;

(15) FIG. 12 is a diagram representative of the tendency of training accuracy curve;

(16) FIG. 13 is the image of the polling weight map;

(17) FIG. 14 is the resulting image of the CNN-based back-fin knuckles localization;

(18) FIG. 15 shows an a* channel image with rough knuckles localization;

(19) FIG. 16 is representative of a final knuckle detection result;

(20) FIG. 17 is an image produced by the cut line template routine;

(21) FIG. 18 is an image depicting a cutline generated from a template matching routine;

(22) FIG. 19 is an image of the naïve centroid and its polling result;

(23) FIG. 20 is an image of the updated centroid and its polling result;

(24) FIG. 21 is the image of polling weight map using the CNN model trained by completely random non-knuckle samples;

(25) FIG. 22 is an image resulting from the K-means clustering routine based on RGB (red-green-blue) value;

(26) FIG. 23 is an image resulting from the K-means clustering routine based on a* value;

(27) FIG. 24 is a segmented crab image using the Deep Learning segmentation model;

(28) FIG. 25A is representative of the crab segmentation Neural Network architecture used for Deep Learning crab morphology segmentation;

(29) FIG. 25B is an operational flowchart diagram for generation crab cutline based on the Deep Learning processing routine;

(30) FIGS. 26A and 26C show typical crab images, and FIGS. 26B and 26D show corresponding segmentation results for the images of FIGS. 26A and 26C, respectively, using the well-trained U-net model.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

(31) Referring to FIGS. 1A and 2-4, the autonomous crab meat picking system 10 of the present invention includes a conveyor belt 12 linearly displaceable in a stepped fashion between the front (entrance) edge 14 and the rear (discharge) edge 16, a vision station (also referred to herein as an imaging system) 18, located in proximity to the entrance edge 14 of the conveyor belt 12, one (or more) cutting stations 20, 22, and a meat removal station 24.

(32) The cutting stations are positioned downstream the imaging system 18, and may be presented by an X,Y vertical cutting station 20 and an angular cutting station 22.

(33) The meat removal station 24 is located downstream of the cutting stations 20 and 22 and may be positioned proximal to the discharge edge 16 of the conveyor belt 12. The meat removal station 24 is adapted for removal of the lump meat and chamber meat which is revealed by cutting the crabs either at the XY/vertical cutting station 20, or at the station 20 and the angular cutting station 22.

(34) The subject system 10 can operate either in a fully automated mode of operation or in a semi-automatic mode of operation. In the fully automated mode of operation, the meat removal station 24 automatically picks the crab meat following the cutting stations(s) 20,22, thus totally replacing human involvement in the process. In the semi-automatic mode of operation, the manual meat picking can be used after the crab is automatically cut at the cutting station(s) 20,22 along the crab body boundary and the top cover to reveal the crab meat chambers. In the semi-automatic mode of operation, the subject system permits the human co-robot activity where the machine performs the cutting, while human operators are involved in the manual crab meat picking. In commercial settings, either fully or semi-automated modes of operation may be chosen, depending on the facilities' financial abilities, labor availability, costs, etc.

(35) The entrance edge 14 serves to load the crabs 26 on the conveyor belt 12 upstream the vision station 18. Prior to loading onto the conveyor belt 12, crabs 26 are steamed and cooled to solidify the protein according to the current industrial crab meat picking practice. Crabs may be all de-backed to remove the top shell and internal organs.

(36) The conveyor belt 12 is, for example, a mesh conveyor formed from stainless steel which is driven by a servo motor 28 in a stepped fashion. As shown in FIGS. 1A, 2 and 4, the crabs 26 are loaded onto the conveyor belt 12 at a predetermined distance, for example, 6 inches apart. The positioning of the crabs 26 on the conveyor belt 12 at a predetermined spaced apart relationship one from another facilitates crabs tracking and synchronization operation of all involved sub-systems of the subject system 10.

(37) Each cutting station, either XY cutting station 20 (performing vertical cuts in the XY plane), or the angular cutting station 22, is equipped with a cutting tool, which may be represented, for example, by a high-pressure water-jet cutting tool. The cutting tool is a robotic sub-system which is configured with a high-pressure water-jet hose 30 (at the XY cutting station 20), and the high-pressure water-jet hose 32 (at the angular cutting station 24) which serve to supply water to water-jet nozzles (knives) 34 and 36, respectively. The water-jet knife 34 (referred to herein also as XYZ water-jet knife) at the XY cutting station 20 is controlled to make vertical cuts (Z direction) in two directions at the XY (horizontal) plane. The water-jet knife 36 (referred to herein also as XYZ plus angular knife) at the angular cutting station 24 has more degrees of freedom, and makes a two directional horizontal XY cut, a vertical cut, and an angular cut.

(38) To be able to make an angular cut, the water-jet knife 36 is controllably positioned to assume a required angular orientation regarding X, Y and Z axes, and actuated for the cutting action by the subject control sub-system (as will be detailed in further paragraphs).

(39) The meat removal station 24 is equipped with end effectors 38, 40 such as, for example, spoon-like end effectors, and comb-like end effectors (schematically shown in FIG. 1A). The meat removal end effectors may be specifically adapted for removal of the lump meat and chamber meat revealed by cutting the crab body (to be detailed in further paragraphs).

(40) For example, the end effector 38 may be adapted for lump meat removal and can move in XYZ directions, and rotate. The end effector 40 may be adapted for chamber meat removal, and can be controllably displaced in XYZ directions, as well as rotate, as needed for the crab chamber meat removal.

(41) Subsequently to the chamber and lump meat removal at the meat removal station 24, the crab meat travels on a pair of meat belts 42 which extend from the meat removal station 24 to the discharge edge 16 to carry the removed lump and chamber meat to a crab meat packing station 44 (schematically shown in FIG. 1A). Shells also may travel on the conveyor belt 12 aside from the meat belts 42 to the discharge edge 16 and be removed from the system and discarded.

(42) Referring to FIGS. 1A, 1B and 4, the vision station (imaging system) 18 is preferably a dual-modality vision system which may produce both the 2-D color images and the three-dimensional (3-D) images.

(43) The vision station (sub-system) 18 includes, for example, a Charged Coupled Device (CCD) camera 46 for two-dimensional (2-D) RGB (red-green-blue) image acquisition. The vision station 18 is also equipped with a laser scanning system 48 for the 3-D laser scanning of the crabs 26 while the crabs travel on the conveyor belt 12 from the entrance edge 14 through the vision station 18. A ring-like source 50 (which may be equipped with numerous LEDs 52) is mounted underneath the CCD camera 46 to provide stable lighting.

(44) The entire vision sub-system 18 is enclosed in an IP 69 waterproof food grade enclosure 54 for food sanitation safety purposes and electronic protection.

(45) The autonomous crab meat picking system 10 further includes a computer system 56 (best presented in FIGS. 1A and 3) operatively coupled to the vision sub-system 18. The computer system 56 is responsible for processing images in accordance with the image processing algorithm 58 (detailed in further paragraphs). The computer system 56 also provides support for fully automated synchronized operation of the subject system 10.

(46) The subject system 10 further includes a control sub-system 60 which is operatively coupled to the computer system 56 to control the operation of the mechanical sub-system 62 for executing operation of the cutting tools, end effectors, and conveyor belt, as well as other involved mechanical parts, in a highly synchronized fashion. The control sub-system 60 supports the operation of the entire system 10 in accordance with the results of the images processing by the algorithm 58 running on the computer sub-system 56.

(47) The mechanical sub-system 62 in the context of the subject system 10, includes the mechanisms supporting the operation of the conveyor belt 12, robotics of the cutting tools and crab meat removing end effectors. Various sub-systems of the subject system 10 cooperate with each other in a highly synchronized order. Various actuators, such as motors (servo-motors) adapted to move the conveyor belt 12, the cutting tools 34, 36, and the end effectors 38, 40, as well as other associated parts of the system 10 cooperating with each other to support the operation of the subject system, are controlled by the control sub-system 60 in accordance with instructions generated by the image processing sub-system 58.

(48) The control sub-system 60 also controls the motion of the 3-D scanning laser sub-system 48 of the vision station 18, and synchronizes the image taking routine by the CCD camera 46 (Glaser scanning system 48) with other routines executed in the system.

(49) The control sub-system 60 may be a customized SST LinMot robotic gantry motion system equipped with controllers permitting 4 degrees of freedom, for the XYZ and angular displacement of the water-jet knives 34, 36 and the end effectors 38, 40 at the cutting stations 20, 22 and meat removal station 24, respectively.

(50) For synchronization and tracking of the operation of the system 10, the control sub-system 60 further includes an encoder 64 operating in cooperation with the servo-motor 28 which actuates the stepwise displacement of the conveyor belt 12. The readings of the encoder 64 are transmitted to the computer system 56 to be processed and are used by the control sub-system 60 to control the system operation.

(51) The control sub-system 60 operates in correspondence with the algorithm 58 embedded in the computer system 56, and controls the operation of the conveyor belt 12, the laser system 48, the cutting tools 34, 36, and the end effectors 38, 40, as well as the meat belts 42, in accordance with the results of the images processing (computed by the algorithm 58.

(52) The control sub-system 60 cooperates with the computer system 56 and the algorithm 58 therein, as well as the encoder 64, to synchronize the operation of various parts of the mechanical sub-system 62.

(53) As shown in FIGS. 1A and 3, the control sub-system 60 uses a main clock 66 of the computer system 56 for synchronization purposes. The control sub-system 60 also uses the clock 68 for synchronizing with the operation of the laser system 48. In addition, the control sub-system 60 uses the clock 70 in accordance with which image acquisition and their preprocessing is performed. The crab images are supplied to the computer system 56 for 3-D analysis. The results of the 3-D pre-processing and analysis of the images are further transmitted to the cutting curves generation processing sub-system 72 for generation of cutting curves (cut-lines) to control the operation of the cutting stations 20 and 22, and the meat removal station 24.

(54) The control sub-system 60 further cooperates with the clock 76 in accordance with which the cutting curves generated at the cutting curves sub-system 72 are supplied to the crab cut processing sub-system 78, which controls the cutting trajectories of the water-jet knives 34 and 36 at the cutting stations 20 and 22, respectively, to cut the crabs, in accordance with the generated cut lines.

(55) The computer system 56 coordinates the operation of the clocks 68, 70, 76, in accordance with the main block 66, to synchronize and control the operation of the entire system 10.

(56) The subject system 10 is envisioned for operation in at least two modes of operation. For this purpose, as shown in FIGS. 1A and 3, the algorithm (image processing system) 58 embedded in the computer system 56, is configured to process the images acquired by the vision sub-system 18 either (a) by using Convolutional Neural Network (CNN) for back-fin knuckles detection processing sub-system 59, or (b) by using the Deep Learning based crab morphology segmentation architecture 130.

(57) In accordance with the CNN for back-fin knuckles detection routine 59, as schematically shown in FIG. 1B, the operation of the subject system 10 begins by acquiring two-dimensional (XY) colored images 80 by the dual-mode vision sub-system 18 using the RGB (Red-Green-Blue) camera 46. The dual-mode vision sub-system 18 also acquires 3-D images 82 by the laser system 48.

(58) The RGB camera 46, which preferably is a CCD camera, and the laser system 48 are exposed to a calibration process, and their operation is synchronized by the control sub-system 60 using the clocks 68 and 70, as shown in FIG. 3.

(59) The output of the laser system 48, i.e., the 3-D laser images 82, carry the crab height information which is used for generation of the vertical cut line. The 2-D RGB images 80 are processed for the back-fin knuckles detection in the back-fin knuckles detection sub-system 84 which are used (as will be detailed in further paragraphs) for computing the horizontal plane) cut line (XY plane) through application of the cut line template routine in the cut line template processor sub-system 86.

(60) Subsequent to the image processing in the cutline template processing sub-system 86 and the back-fin knuckle detection processing sub-system 84, the XY is computed in the cutline in XY axis processing sub-system 86. The cutline in the XY axis is supplied from the processing sub-system 88 to the mechanical system 62 which is controlled by the control sub-system 60 to cut the crab for the crab case exposure at the cutting stations 20 and 22, and for meat picking at the meat removal station 24.

(61) As shown in FIG. 1B, the crab height information obtained as the result of the 3-D laser scanning is supplied to the vertical cut line in the Z axis computing sub-system 92. This information is subsequently used by the control sub-system 60 to control the motion of the cutting tools 34 and 36 at the cutting stations 20, 22, respectively. The crab image analysis performed by the computer system 56 in accordance with the routines executed by the algorithm 58 is of importance for the operation of the subject system 10.

(62) FIG. 5 shows a typical crab image obtained by the subject vision imaging system 18. FIG. 6 shows the corresponding morphological structures of the crab body. The ideal crab cutline in the XY plane is the line 94 shown in FIG. 6, which can slice along the chitin walls to expose the jumbo meat (shown as the part 96 in FIG. 6) and the chamber meat (shown as the part 98 in FIG. 6).

(63) Although, most crabs share similar morphological structures, it is impossible to directly calculate the cutline from the images because of the noise found around the crab's main body, which is produced by the eggs, lungs tissues, and leg joints. However, the pair of backfin knuckles 100 (shown in FIG. 6) is a more reliable feature, which exists in most crabs regardless of the status of their rear legs. The back-fin knuckles contain information about the crab size, orientation on the conveyor, and the location of jumbo meat.

(64) Preliminary studies resulted in the finding that distance (L) between two knuckles and distance (W) between the mid-point of the junction cartilage and the knuckle line 102 (shown in FIG. 6) have a fixed proportional relationship. As FIG. 7 shows, in 25 crabs of both sexes, the knuckle length L ranges from 291 pixels to 395 pixels with pixel resolution 0.2 mm/pixel. However, L/W ratio remains almost unchanged, with the average value of 0.143 and the standard deviation of 0.012.

(65) In the present system, a template matching routine is used to generate the cutline in the XY plane, based on the crabs morphological information and the knuckle locations acquired via the subject imaging method.

(66) In the digital crab images, the knuckles are small and amorphous objects. Small object detection is challenging, since there may be overwhelming misleading objects. The basic solution to this detection challenge may be transformation of a global detection problem into a local detection problem. In the subject system, a localization process is used which rules out a large percentage of noise and offers a rough target region where small objects are much easier to detect.

(67) The localization strategy may be based on human experience. However, such approach is not reliable. For this reason, the present algorithm 59 uses a sliding window routine to slice the picture (image) into small pieces (sub-images, or regions) and subsequently design a classification model to determine the presence of a target object (back-fin knuckle) in each small sub-image.

(68) The classification model in question is contemplated to accept various features related to the small target as an input, including color, shape, or both.

(69) Conventional classification models may be used which may include principle component analysis (PCA), support vector machines (SVM), and discrimination models. Artificial neural networks (ANNs) are another classification model acceptable for the subject system.

(70) Convolutional Neural Networks (CNNs) model which improves the traditional ANNs classification models, is considered for application in the subject system 10. The CNN classification model uses the internal properties of the 2D images and shares weights to avoid the overfitting problem to some degree.

(71) With the development of highly powerful and efficient Graphic Processing Units (GPUs) and the advent of deep networks, the CNNs alter the track of computer vision development. CNNs utilize numerous (millions) of hidden features to achieve complicated image analysis tasks. Region-based CNNs and versions thereof have been widely used in the object detection area in recent years to find a particular set of objects in an image. However, the performance of CNNs is highly dependent on the image training dataset. Lack of a sufficient training data volume in specific area may hamper CNNs application in the industry.

(72) The subject system and method support the steps of: (a) crab image acquisition, (b) image processing, (c) knuckle sub-image dataset preparation, (d) network training and validation, (e) final knuckle positions determination, and (f) cutline generation in the XY plane.

(73) In the present system 10, as shown in the flow chart depicted in FIG. 1C, the computer system 56, and specifically the image processing sub-system (algorithm) 59, operates in accordance with a knuckle detection neural network which is established for the subject system. In accordance with the knuckle detection neural network technique, an RGB image 80 is acquired in step 1. Sliding window 106 slices the raw crab images 80 into thousands of sub-images (regions) 108 in step 2. The CNN classifier 110 is subsequently applied in step 3 to determine whether a sub-image 108 includes a knuckle or is devoid of the knuckle.

(74) If the sub-image is determined in Step 4 to contain a knuckle, the CNN classifier localizes a rough region for it in Step 5. To obtain exact knuckles positions in Step 7, an additional k-means clustering routine is applied in Step 6 to the rough region in order to cluster pixels in the rough region based on their color.

(75) The exact position generated in Step 7 is subsequently used in Step 8 for the cutline generation, which, in its turn, is supplied to the control sub-system 60 for the cutting tool actuation in Step 9.

(76) Statistical data will be presented in the following paragraphs to demonstrate the results of the subject CNN-guided knuckle detection routine. The sub-image classification accuracy attained in the subject system is comparable with SVM and ANN methods. The final detection performance after k-means clustering is compared with clustering results based on traditional localization method established upon human-level assumptions and complex blob analysis.

(77) Experiment Material and Image Acquisition

(78) As shown in FIGS. 1A-1C, and 3-4, the experiment objects (blue crabs) 26, were steamed and cooled to solidify protein according to the current industrial practice. Crabs were all de-backed to remove the top shell and internal organs. For the material handling, a mesh conveyor 12 (manufactured from a food grade stainless steel) was driven by a servomotor 28 running in step motion with a constant step size. The computer system 56, specifically, the image processing software 58, controls the step number (e.g. 200 mm per crab pitch, with no crab overlap). Thus, the computer system 56 is current with the exact position of each crab 26 on the conveyor belt 12.

(79) The mechanical operation sub-system 62 included a set of waterjet knives with a stream of 1 mm at 30,000 PSI, which makes sharp and fine bladeless cuts.

(80) The basic concept supporting the knuckle segmentation approach is the use of the red color information to separate the knuckles from the crab body. However, as knuckles are small objects relative to the whole image, a localization strategy is needed in advance to determine a rough knuckle region and to avoid the obfuscating objects such as eggs and legs. In the rough knuckle region, pixel-level clustering based on red color information extracts the final knuckle location.

(81) For knuckle localization, the easiest procedure is to manually set up standards. For example, it can be assumed that knuckles will be located near the crab-background boundary. Every crab usually has two knuckles, and they have a particular distance to the crab centroid. Most of these standards rely on human assumptions and are unreliable. The localization results usually are unable to ensure that a knuckle is the main part of the region and other red objects may still exist in the region.

(82) An alternative concept for localization of the knuckles, used in the subject system, is to train a classifier to point out whether the detected object is in a region. In this implementation of the subject system, the binary knuckle detection classifier is trained by thousands of sub-images, both including knuckle and devoid of non-knuckle sub-images. The size of sub-images is predetermined to be larger than the size of the back-fin knuckles. The size of the sub images also is chosen to ensure that the knuckle can be the key part in the sub-image. In the experiment and studies of the subject system performance, the size of sub-images was chosen to be is 81 pixels×81 pixels, with a pixel resolution of 0.2 mm×0.2 mm per pixel.

(83) After obtaining the rough knuckle positions in Step 5 (shown in FIG. 1C), a pixel-level clustering routine (in Step 6 of FIG. 1C) is applied to determine the final knuckle position. In the experiment, a k-means clustering routine is applied to a* channel values in the L*a*b* color space. K-means is a reliable clustering method and has been widely used for image segmentation. The parameter selections of the k-means routine will be detailed in further paragraphs. The L*a*b* color space is a machine-independent color space, which can be transformed from the RGB color space non-linearly. The a* value in L*a*b* color space is more sensitive to red and green. A typical a* channel image of background-removal crab is depicted in FIG. 8.

(84) Image preprocessing, as shown in FIG. 1D, includes background removal (in Step B) and main body centroid determination (in Step C) of the image acquired in Step A. The conveyor background needs to be removed to simplify the subsequent training procedure. The crab body centroid can be used to separate the crab into two parts, ensuring that the subject CNN model can find the left and right knuckle regions separately.

(85) For the background removal, the Otsu algorithm is applied. The Otsu algorithm automatically determines a threshold to separate the conveyor belt and the crab body. The saturation value of the conveyor belt is much lower than the saturation levels for a crab. The Otsu algorithm operates to maximize the inter-class variance based on the saturation gray scale histogram, and divides the pixels into two classes. A typical image after the conveyor belt background removal is shown in FIG. 9.

(86) To determine the crab body centroid in Step C (of FIG. 1D), the legs are removed in advance because the crab is not always ideally symmetric. To achieve leg removal, the images are initially transformed into a binary image in Step D. Subsequently, in Step E, a distance transformation on the binary image generates a gray scale image whose pixel values describe the distance between a specific pixel and the closest background pixel, as shown in FIG. 10A.

(87) Finally, the watershed routine is applied to the distance transformation result in Step F. The principle of the watershed algorithm is to merge pixels with similar gray scale value into a group. In this application, because the leg pixels are closer to the background than the crab body pixels, the watershed method based on the distance transformation results in a specific grouping for the crab body, as shown in FIG. 10B. The crab centroid 112 is shown in FIG. 9.

(88) Some other alternative image processing methods could also be able to remove the leg. However, in this case, accurate leg removal and centroid location are not necessary for following processes, and the updated centroid located between two knuckles is good enough for the following work.

(89) CNN Based Crab Knuckle Localization

(90) Dataset Preparation

(91) Ninety full-sized (1200×800 pixel) background removed RGB crab images were used with a pixel resolution of 0.2 mm×0.2 mm per pixel in the training dataset. Another set of 43 full-sized background removed images from a different batch of crabs were used for comparison. Using a different batch of crabs for the comparison images can validate the generalization of the classification model effectively.

(92) All 133 images were knuckle-labeled. Particularly, other leg and crab core joints were also marked with a different label, which was helpful in the subsequent sub-image selection operation. An 81 pixels×81 pixels sliding window sliced the full sized image into small sub-images (regions). The size of the sliding window was determined by the actual knuckle size in the raw image. It should be larger than the size of knuckles, but it also needs to ensure that the knuckle is the key part in the sub-image. The application of the sliding window started from the (0,0) pixel of a full sized image and moved in X/Y direction with the stride of 5 pixels.

(93) All sub-images were classified into three types intuitively, i.e., the background (all black pixels), the crab (all color pixels), and the boundary (including both color and background pixels). Knuckles always belong to boundary sub-images, and thus background and crab sub-images were discarded, and were not included in the training and test datasets.

(94) For the boundary sub-images, the ones with more than 80×40.5 color pixels were regarded as effective sub-images, because the opposite case cannot ensure the knuckle is the main part of the sub-image. Therefore, the opposite cases were also completely discarded in advance and were not included in the training and test dataset.

(95) Returning to FIG. 1C, the remaining sub-images, defined as the effective sub-images, were further classified in Step 4 into two types, i.e., (a) knuckle sub-images, and (b) non-knuckle sub-images. The sub-image with the knuckle area larger than 0.75 times corresponding knuckle area in the ground truth images was defined as a knuckle sub-image, and the ones which did not satisfy the knuckle area requirement were defined as non-knuckle sub-images. The target of the subject CNN model was to execute the binary classification of the knuckle sub-images and non-knuckle sub-images.

(96) The dataset generated from the above process still had multiple times more of non-knuckle sub-images than the knuckle sub-images. To ensure the training set has a comparable scale of positive and negative training samples, some non-knuckle images were discarded. Instead of discarding randomly, a larger percentage of non-knuckle images with leg and crab core components were kept, and a smaller percent of images with only meat or only leg components was kept. This process was implemented with the formally mentioned leg and crab core joint labels. This strategy permits the obtaining of better test results when new images are obtained. The final dataset included 21,509 knuckle sub-images and 50,021 non-knuckle sub-images.

(97) In the model training procedures, the sub-images generated from 68 full-sized training images (75% of 90 images in training dataset) were used for training. The sub-images from 22 full-sized training images remaining (25% of 90 training image) were used for validation.

(98) CNN Architecture and Network Training

(99) The basic idea of CNN architecture is to utilize hundreds of convolutional filters to describe hidden features in an image. Modern CNN architectures integrate convolutional layers with maxpooling layers and rectification linear unit (ReLU) layers, establishing the idea of a Deep Neural Network.

(100) In the CNN architecture 59 (shown in FIG. 1A), convolutional layer (filtering) routine is used for features extraction. Maxpooling is used to conduct down-sampling to decrease the side effect of overfitting and to save storage space. ReLU layers can increase the non-linearity of the model.

(101) The mathematic expression of the three layers is shown in Eq. 1, where K is a (2N+1)×(2N+1) filter matrix and I represents a sub-image sample.

(102) Generally, CNN architecture is equipped with a couple of fully-connected layers to perform the final classification, similar to the traditional ANN model

(103) $\begin{matrix} filtering : I (i, j) = {.Math.}_{p, q = - N}^{N} I (i + p, j + q) \times K (N + p, N + q) maxpooling : I (i, j) = \overset{N}{\max_{p, q = - N}} I (i + p, j + q) ReLU : I (i, j) = \max (I (i, j), 0) & (Eq . 1) \end{matrix}$

(104) The subject CNN architecture follows the idea of the VGG model developed by the Visual Geometry Group of University of Oxford. It uses 3×3 convolutional filters in each layer instead of larger filters to construct a deeper network. The subject network architecture is shown in Table 1. In the experiment, some classical AlexNet, GoogleNet, VGG and ResNet was used. For each network, different number of layers, different number of convolutional filters and different size of convolutional layers were tested. There were no obvious differences among the best results of the different models. The presented architecture shows the best experimental result.

(105) In the training procedure, adaptive gradient descent is used to minimize the cross entropy (training loss) as Eq. 2 shows, where y′ is the true distribution and y is the predicted probability distribution.

(106) In the binary classification application, for non-knuckle sample, y′ is [1, 0], and for knuckle detection, y′ is [0, 1], where y represents the probability of two classes, which was computed by sigmoid function.

(107) The learning rate was set as 0.001, and it automatically decreases based on past gradients without additional parameters needed.

(108) TABLE-US-00001 TABLE 1 Knuckle detection CNN architecture Layer parameters Output size Input Data N/A 81 × 81 × 3 Conv 1 kernel size: 3, stride: 1 79 × 79 × 64 Conv 2 kernel size: 3, stride: 1 77 × 77 × 64 Conv 3 kernel size: 3, stride: 1 75 × 75 × 64 Conv 4 kernel size: 3, stride: 1 73 × 73 × 64 Max Pool 1 kernel size: 2, stride: 2 36 × 36 × 64 Conv 5 kernel size: 3, stride: 1 34 × 34 × 128 Conv 6 kernel size: 3, stride: 1 32 × 32 × 128 MaxPool 2 kernel size: 2, stride: 2 16 × 16 × 128 Conv 7 kernel size: 3, stride: 1 14 × 14 × 256 Conv 8 kernel size: 3, stride: 1 12 × 12 × 256 FC 1 N/A 2048 FC 2 N/A 2048 FC 3 N/A 2 Compute the loss and accuracy
H.sub.y′=Σy′×log(y) (Eq. 2)

(109) The batch size for training was 256, and the network converged after 5,000 iterations. The cross entropy and curve accuracy are shown in FIGS. 11 and 12, respectively.

(110) The training accuracy may float up and down for every training step, but the general tendency is going up. On the contrary, the cross entropy (loss) goes down. 99.6% training accuracy and 98.7% validation accuracy were attained. The CNN architecture was implemented in both caffe and tensorflow with a Nvidia GeForce Titan X GPU.

(111) Polling and the Rough Knuckle Position Determination

(112) After the CNN model is well-trained, the rough knuckles positions in full-sized test images are ready to be determined in Step 5 of FIG. 1C.

(113) For a new crab image, all preprocessing procedures were performed in advance. In the test phase, an 81×81 sliding window was still needed to slice the test images into sub-images, and the subject CNN model just validated whether an effective sub-image contained a knuckle or was devoid of the knuckle.

(114) The size of test crab image was 800×1200, and the test sliding window started from the coordinate (0, 0) to (720, 1120) with a 5-pixel stride at both x and y axes.

(115) The subject CNN model can obtain 99% training and validation accuracy. However, both of the accuracy values are based on sub-images 108 (shown in FIG. 1C). When testing a new image, thousands of sub-images need to be classified. A 1% error can lead to about 10 incorrect prediction cases.

(116) To rule out the incorrect prediction cases, an additional polling routine was applied in Step 5 (shown in FIG. 1C). In this step, a weight map W is constructed to record the CNN validation results, which has the same size of raw image (800 pixel×1200 pixel). The detailed weight update rule is: if the sub-image belongs to non-effective sub-images and effective non-knuckle sub-images, the weight map will not change. On the contrary, if the sub-image is a knuckle image, the weight map should be updated based on the probability results from sigmoid function. Eq. 3 offers the mathematic expression of this process, where W is the weight map, E is an all-one matrix with size of 800×1200, and I is the background removal image.

(117) $\begin{matrix} W (i : i + 80, j : j + 80) = {\begin{matrix} W (i : i + 80, j : j + 80) + P (knuckle) \times E & I (i : i + 80, j : j + 80) \in knuckle \\ W (i : i + 80, j : j + 80) & I (i : i + 80, j : j + 80) .Math. knuckle \end{matrix} & (Eq . 3) \end{matrix}$

(118) FIG. 13 shows the weight map image in the polling step. The regions with higher value are more likely to be the knuckle region. Based on the crab body centroid, an 81×81 region with largest mean weight value can be found at the left and right separately. The two regions are regarded as the rough positions of left and right knuckles, as the parts 114 shown in FIG. 14. The wrong prediction cases can easily be detected and ruled out via the additional polling step.

(119) K-Means Clustering in Rough Knuckle Region

(120) K-means clustering is a classic unsupervised learning method, which operates to cluster pixels based on minimizing the total within-class distance, which is expressed in Eq. 4, where k is the number of clustering types, x is a d-dimensional feature vector, and u.sub.i is the mean vector in each group.

(121) $\begin{matrix} \arg \min_{S} = {.Math.}_{i = 1}^{k} \underset{x \in S_{i}}{.Math.} {.Math. (x - u_{i}) .Math.}^{2} & (Eq . 4) \end{matrix}$

(122) The a* value in L*a*b* color space is a sensitive feature to describe knuckle colors, as shown in FIG. 15. Two small images in the image correspond to the rough positions detected from the CNN model. Compared to other parts in the sub-images, knuckles have the largest a* value.

(123) In the experiment, binary k-means clustering routine in a* channel twice was executed in Step 6 (shown in FIG. 1C) to obtain the final (exact) knuckles positions. The first k-means can split crab-leg (including knuckles) and crab body apart. The second k-means is optional, which aims to further split crab-leg and knuckle apart. In the second clustering step, if the distance between the mean a* value of two classes is larger than 10, the clustering result would be preserved, otherwise, they would be discarded.

(124) The final knuckle segmentation results generated in Step 7 of FIG. 1C are shown in FIG. 16. Basic morphological operators can be applied to smooth the knuckle blobs.

(125) Template Matching Based on Knuckle Location

(126) Based on the fact that different crabs share a similar morphological structure, as shown in FIG. 8, a cutline template 116 was created by manually drawing the desired cutline on a crab, as shown in FIG. 17. The cutline template 116 includes the cutline 118, two corresponding knuckle positions 120 in the template 116, and the centroid 112 of template 116. The template was transformed to match different crabs based on the knuckle location generated by execution of the K-means clustering routine in Step 6 (shown in FIG. 1C).

(127) The template matchings needs the execution of three sequential routines, i.e., rotation, scaling and translation. First, the rotation, based on the Eq. 5 is performed. Then scaling is performed, based on Eq. 6. To simplify the process, the coordinate was transformed to the polar coordinate, and the scaling was executed by multiplying r the radius (in polar coordinates) by d, and the coordinate was changed back. Finally, the translation was conducted based on both the left knuckle and right knuckle via computing the average translation distance. The cutline 122 generated from the template matching in Step 8 of FIG. 1C is shown in FIG. 18.

(128) Rotation:
x.sub.r=(x−c.sub.x)cos θ−(y−c.sub.y)sin θ+c.sub.x
y.sub.r=(x−c.sub.x)sin θ+(y−c.sub.y)cos θ+c.sub.y (Eq. 5)

(129) Scaling:

(130) $\begin{matrix} x_{s} = r \times \tan^{- 1} (\frac{y_{r}}{x_{r}}) y_{s} = r \times \tan^{- 1} (\frac{y_{r}}{x_{r}}) r = \sqrt{x_{r}^{2} + y_{r}^{2}} \times .Math. d .Math. & (Eq . 6) \end{matrix}$

(131) where, the rotation factor θ is determined by the angle between the template knuckle line and the knuckle line, d is the ratio of the template knuckle line to the knuckle line, (x, y) is a point in the template image, (x.sub.r, y.sub.r) is the coordinate after rotation, (x.sub.s, y.sub.s) is the coordinate after scaling, and (c.sub.x, c.sub.y) is the coordinate of the template centroid 112.

(132) The cutline template was generated based on the preliminary studies presented in previous paragraphs. In the crab industry, there is no standard to qualify the quality of the cutline ground truth. The resolution of the subject waterjet system is 1 mm, which permits 5 pixels error in images. With the cutline in the XY plane, the present system opens the chitin walls to expose the jumbo meat for picking.

(133) The subject CNN-based knuckle determination approach was tested on the prepared test dataset, which included numerous full-sized images. The sub-image classification ability of the subject CNN model was validated, and its performance was compared with the traditional SVM model and the ANN model.

(134) Ten thousand (10,000) test sub-images were generated from the 43 images in accordance with the procedure presented in the previous paragraphs for the training dataset. Only boundary sub-images were included in the test dataset, instead of the background and crab sub-images. In the test dataset, the ratio of positive sample and negative sample followed the values in the training dataset, which can reflect the ratio in real applications.

(135) The SVM and ANN models were trained by the same training dataset on 90 images. To avoid the overfitting problem of the SVM and ANN models and to accelerate the training process, the Principle Component Analysis (PCA) was conducted on the image data in advance, and the first 500 components were chosen to describe the image features. For the SVM, three common kernels, including the polynomial kernel, the Gaussian kernel and the sigmoid kernel, were tested separately. After adjusting the free parameters in the three kernels, experiments showed the Gaussian kernel obtained the best result, with 96.7% training accuracy and 88.1% test accuracy.

(136) For the ANN model, the number of hidden layers was set as 1 to 3 separately. Experiments showed a two-hidden-layer network with 20 and 5 neurons in each hidden layer can achieve the best balance between the training and test accuracy, with 95.2% training accuracy and 85.7% test accuracy.

(137) There were 86 knuckles in the 43 test images. After the polling process, the rough knuckle determination accuracy of the CNN model reached 97.67. Compared to the manual localization process, the method demonstrates great advantages.

(138) In the manual localization process, background removal and watershed-based leg removal is conducted in advance, and an assumption is made that knuckles are located near the lower boundary of the crab body. In practice, the distance computation is achieved by a KD-tree data structure. In this region, binary k-means clustering on a* channel is applied once to rule out crab body pixels. Remaining regions include back shell, knuckles, and legs. Based on the constraints of blob size, distance to the crab centroid, and blob eccentricity, two regions can be selected as the left and right knuckles, and the final localization accuracy can be 82.56%, which is inferior to the CNN based localization.

(139) A qualification experiment was conducted to describe the final knuckle segmentation result. A Jaccard index J(A, B) was defined as the metrics, whose mathematic expression is presented in Eq. 7. Where, A is the segmentation result in the present method, B is the segmentation ground truth, and the cardinal number of a region represents its area.

(140) $\begin{matrix} J (A, B) = \frac{.Math. A .Math. B .Math.}{.Math. A .Math. B .Math.} & (Eq . 7) \end{matrix}$

(141) Applying k-means clustering, presented in the previous paragraphs, on the 84 correct rough knuckle regions acquired by the subject CNN model, every region attains a Jaccard value, and the average is 0.91. Because the subject clustering routine is the pixel based clustering, it may be affected by some impulse noise. However, a 0.91 Jaccard index is sufficient enough to obtain the cutline determination.

(142) As shown in FIG. 1D, the watershed algorithm was applied before determining the crab centroid. The reason for executing this is that, in practice, the crab in the digital images is not always ideally symmetric. Some crabs miss one side of their legs in the image, but keep the other side. It can happen naturally or during the manually crab loading process. FIG. 19 shows one of the wrong cases, and the dot 124 is the naïve centroid position. Because the polling routine and clustering routine process right and left knuckles separately, the naïve centroid position 124 can lead to the overlapping of left and right knuckles 126. In 43 test images, six could not obtain the correct rough knuckle positions due to the inaccurate centroid.

(143) The watershed algorithm can effectively eliminate the problem, as shown in FIG. 20. In the subject test dataset, the watershed routine successfully avoided the side effect of the naïve centroid. However, in theory, the watershed cannot remove the leg accurately. It can just make use of the gap between the legs to remove the asymmetry of the crab. The average distance between the updated centroid and the labelled centroid is (12.37, 11.94) pixels with standard derivation (7.97, 8.36) pixels, which performs better than the naïve centroid. The maximum distance between the updated centroid 128 and the labelled centroid is 32.48 pixels, which still locates in two knuckles. In addition, one of the advantages of the CNN model is that it is not dependent on the exact central position.

(144) In extreme cases, for example if there are bodies which have a comparably low saturation value with the background and are connected to the background, the watershed method may fail and would split the crab body into two or more pieces. In the present test dataset, this condition does not exist, since the crab pixels and conveyor belt pixels show significant differences in saturation channel, and the background removal routine demonstrates a high precision with a 0.97 Jaccard value (Eq. 7).

(145) In practice, the light source and camera settings need to be adjusted in different work environments. If the watershed fails, the crabs need to be rejected and be reprocessed to clean the dirt manually.

(146) The Strategy for Training Data Preparation

(147) When preparing the training dataset, some negative samples should be discarded to obtain a comparable negative (non-knuckle sub-images) and positive (knuckle sub-images) sample. Non-effective sub-images can be discarded. For effective sub-images, they can be divided into three types, only the background (type 1), only the crab body and background (type 2), and leg, body and background (type 3).

(148) All type 1 and 2 sub-images are negative samples, and the type 3 images comprise both negative samples and positive samples. To a classification machine, the type 3 sub-images are more likely to produce the false positive cases when new test images are attained, which can further affect the determination of the rough knuckle positions.

(149) Instead of discarding non-knuckle sub-images randomly, more negative samples belonging to type 3 tend to be kept. The weight map in the polling step is shown in FIG. 21. Compared to FIG. 13, it is obvious that the crab carapace at the bottom highly mislead the polling process.

(150) Selection of k-Means Parameters

(151) In general, the clustering method results rely greatly on the selection of features and the relationships among different features. The representativeness and robustness of features are the key factors to obtain the ideal clustering results. In the subject knuckle detection application, the a* channel in L*a*b* color space is the most reliable feature to describe the knuckle pixels compared to RGB values.

(152) There are two potential problems in considering the R, G, B values as clustering parameters. The R, G, B values of the crab body pixels in sub-images are very random. They could be very dark or very bright, and thus can affect clustering results. FIG. 22 shows an example when the k-means clustering is applied based on R, G, B values. However, to the same image, clustering based on a* channel can solve the problem, as shown in FIG. 23.

(153) The second critical problem accompanying the RGB based clustering is that when the clustering is finished, there is not a reliable way to determine which class represents the knuckles. However, if the clustering is based on a* value, it is always true that the class with largest a* value represents the knuckle.

(154) To determine the number of clustering classes in the a* based clustering, an additional experiment was conducted based on the subject dataset. There are three possible parts in the rough knuckle position including the leg, the knuckle, and the crab body. Knuckle pixels have the highest a* value, legs are middle, and crab body has the lowest a* value. The a* value differences between knuckle and crab body are obvious, but between the knuckle and the leg are insignificant.

(155) To choose adequate clustering classes, single 2-class clustering and single 3-class clustering were validated. 2-class clustering means all the pixels in the rough knuckle region would be clustered into two classes. Single 2-class clustering separate the crab main body pixels from others. Similarly, 3-class clustering was expected to separate the leg, the knuckle and the crab main body apart. Between the 2-class clustering and the 3-class clustering, 2-class clustering performs better,

(156) In order to execute the segmentation routine more precisely and to further separate the knuckle pixels from the trivial leg pixels, the second 2-class clustering was conducted based on the first clustering result and returned the class with highest a* value. The main problem of performing 2-class clustering is that it can result in some over segmentation cases, and weaken the final performance. Therefore, the second clustering is regarded as an optional step in practice, and is conducted only when the a* value difference between two classes is far enough (>10). An average 0.91 Jaccard value can be achieved using the constrained 2-class clustering twice in the 84 correct rough knuckle regions.

(157) Although most crabs share similar morphological structures complicated and amorphous crab components, like crab legs, crab cores, eggs, heart, and stomach, are difficult to be described via traditional image features including color and textures. The subject system, in an alternative implementation, uses the Deep Learning methodology as a powerful image processing tool. Compared to traditional image processing techniques, the Deep Learning method can extract many describable and undescribable image feature. The applications of deep learning in the agriculture area are very limited, and most research focuses on cereal and plant image classifications. However, the use of the Deep Learning concept for the crab meat picking has never been proposed in the related communities.

(158) As shown in FIG. 1A, the Deep Learning based image segmentation architecture 130 is integrated into the subject system 10 as a part of the algorithm 58 in the computer system 56 as an alternative to the back-fin detection algorithm 59. The Deep Learning algorithm 130 takes a full-sized crab image as the network input. Following the idea of auto encoder and decoder, convolutional neural network (CNN) encodes the color image into thousands of latent features, and it is self-deconvoluted to its corresponding category labels. In the Deep Learning, a full-sized crab image is automatically segmented into 5 region of interests (ROIs), as shown in FIG. 24, including crab legs 132, crab core 134, the pair of backfin knuckles 136, crab back bones 138 and conveyor background 140. Deep learning algorithm 130 not only can guide the cutline in XY plane, but also efficiently offer very accurate crab morphology information.

(159) The crab processing system 10 (as shown in FIG. 1A) with integrated Deep Learning Architecture 130, may comprise four stations. The first station is a dual-modality imaging station 18, which includes the RGB color imaging system 46 and the 3D laser imaging system 48. The RGB color image offers the 2D crab morphology information, including the crab size, and the crab orientation on the conveyor belt. Combining the accurate 2D information and 3D laser information, the 3D crab model is established.

(160) Referring again to FIG. 1A, the second and third stations are two gantry systems, referred to herein as the XY/vertical cutting station 20 and the angular cutting station 22, respectively. The cutting station 20 takes the vertical cut along the lump meat walls (XY plane) to reveal the valuable jumbo meat components based on the information offered by RGB images (acquired by the CCD camera 46) with 0.2 mm pixel resolution.

(161) The cutting station 22 performs the angular tilted cut based on 3D crab information acquired by the 3-D scanning laser system 48. The operation of the angular cutting station 22 is adapted to remove the fin chamber meat cartilage to expose small compartments hidden below the cartilages.

(162) Each gantry station 20, 22 is mounted with a water-jet knife system 34, 36, each configured with a water-jet hose 30, 32, respectively, and having a super small water-jet nozzle (serving as a knife) that is displaceable in XYZ directions. In the exemplary implementation, the stainless waterjet nozzle 30, 32 weighs about 2 lbs, and has a small orifice (about 0.006″) that ensures the high cut resolution.

(163) The control sub-system 60 includes a set of linear motors 31, which can be digitally controlled at the motion speed of 1,000 mm/sec at 0.01 mm precision to produce cutting action for the knife 34 at the cutting station 20. An additional angular servo motor 29 is integrated in the angular cutting station 22 to actuate an angular displacement of the water-jet nozzle (knife) 36 in XYZ directions to perform angular tilt cuts.

(164) The fourth station of the system 10 (also referred to herein as the meat removal station 24), performs meat picking by manipulating the end effectors 38, 40. For lump meat, spoon end effectors can mimic hand picking trajectory and conduct the meat harvesting from crab back-fin knuckles, which ensures the integrity of lump meat. For chamber meat, the comb like end effector is actuated to brush off the meat.

(165) The mesh conveyor 12 is driven by the servomotor 28 running in step motion. The control sub-system 60 is adapted to control the motion of the conveyor belt 12 by number of steps in distance (for example, 6″ per crab pitch, no crab overlap) in accordance with instructions generated by the image processing system 58, particularly, the Deep Learning Processor 130, embedded therein. The processor 58 knows the exact position of each object (crab) (based on the FIFO principles) at different stations.

(166) The present design ensures that four working stations (the imaging station 18, the cutting stations 20, 22, and the meat removal station 24) cooperate in substantially parallel fashion and the stations 20, 22, 24 can share the information captured from the vision station 18.

(167) The shaft encoder 64 provides the clock signal of the system 10, whose resolution may be, for example, 10,000 pulses/revolution. For the conveyor's shaft of 2″ in diameter, the conveyor tracking resolution can reach 0.016 mm/pulse. An encoder control board (not shown in the Drawings) is plugged in a PCIe bus slot in the computer system (PC) 56 to handle the interface functions between the encoder 64 and the software 58.

(168) Deep Learning is an emerging concept for application in an artificial intelligence area. As a branch of the Deep Learning concept, the Convolutional Neural Network (CNN) is presented in the previous paragraphs as a processing sub-system 59 of the subject algorithm 58. CNN trains hundreds of convolutional filters by itself, which enables the subject system to excavate numerous potential image features, and to combine them nonlinearly. Compared to traditional image feature operators, like HOG, Gabor, and GLCM, CNN can extract more image features. From another aspect, different from traditional artificial neural network architecture, convolution operations in CNN decreases the risk of overfitting by utilizing the image spatial information.

(169) The on-going Deep Learning research in agriculture applications focuses on image classifications, and some applications target image segmentation. A typical CNN architecture for classification problem integrates convolutional layers to extract image features, maxpooling layers for down sampling and avoiding overfitting, and activation layers to combine image features nonlinearly. At the end of these layers, the high-dimensional image features are regarded as the input of a traditional neural network to implement image classifications, and the final output of a classification problem is a class label.

(170) Comparatively, the result of the segmentation routine includes the localization information. Its output is expected to be an image, whose size should be the same as the input image, and each pixel of the image is assigned to a category.

(171) Fully Convolutional Network (FCN) starts the research using CNN model to achieve the image segmentation task. It makes use of down sampling CNN architectures to extract image features. Subsequently, up-sampling strategies are applied to these features to recover a well-segmented image. This concept originates from a special neural network named auto-encoder (AE), which is an unsupervised learning strategy. Its architecture comprises two serially connected sub-networks, i.e., the encoder network and the decoder network. The input of the encoder network is the raw data, and the outputs of the encoder network are high-dimensional data features. These data features are further fed into the decoder network, and the output of the decoder network is raw data, identical as the input of the encoder network. In essence, AE is a dimension reduction tool. It can automatically extract some low dimensional features, which can well represent the raw data.

(172) Compared to the classic non-linear dimension reduction method, like kernel PCA, AE shows great superiority, the convolutional autoencoder is an updated version of AE, which makes use of the advantage of CNN to increase the robustness of AE.

(173) The FCN can be regarded as a special convolutional autoencoder, but it is a supervised learning method. Instead of recovering the raw image, the output of FCN is manually labelled, well-segmented image. FCN transforms the segmentation problem into a pixel level classification problem, and every pixel in the image will be classified into a specific class. In the experiment and study of the subject system, the pixels have been classified into five classes.

(174) Network Architecture, Dataset Preparation and Network Training

(175) The problem of the FCN is that directly training the upsampling process is difficult without a large dataset. Research shows that, instead of the rudimentarily decoding of the image features from the encoder network, sharing some information from encoder process can improve the performance of the segmentation network.

(176) U-net is a symmetric network architecture proposed in the medical imaging community to perform the binary cell segmentation task. The U-net copies and concatenates the features from the encoding layers to their corresponding upsampling decoding layers, as shown in FIG. 25A. The convolutional layers in the decoding process can combine the encoding and decoding information to obtain a higher decoding resolution without using a large dataset. This idea has been smoothly transferred to the subject multi-class crab segmentation application.

(177) FIG. 25A is representative of the subject network architecture 130 for crab segmentation. There is no pre-trained model for a particular image size, and thus, the decoding network is not as deep as some current mainstream CNN models. The decoding network has same structure settings as the encoding network. To achieve the multi-class segmentation, the last layer of network is set as the softmax layer to acquire the probability that each pixel is divided into different classes.

(178) In the experiments, 90 well labelled crab images (800*800 pixels) shown, for example, in FIG. 24, were included in the dataset. 30 images were random selected as the training dataset. 10 images were random selected as validation dataset, and the images which were left were regarded as the test dataset. To ensure the generalization ability of the network, rational data augmentation can effectively improve the training accuracy.

(179) In the experiment, non-risky elastic deformation strategies (such as translation, mirroring, and random cropping) were directly conducted in the training and validation dataset. Some other deformation strategies (such as rotation, shear stretch, and zoom in and out) were conducted with constraint.

(180) For the rotation strategy, because there is a mechanical component in the machine to hold down crabs, the image rotation range was set at ±25 degree. For the shear stretch strategy, which is relatively rarely implemented in practice, its range was set at ±10 degree. For the zoom in and out strategy, the crab size in the augmented images was set at ±20%.

(181) In these methods, pixel displacements were computed using bi-cubic interpolation. Other common data augmentation operations in Deep Learning area, (such as color augmentation or contrast enhancement) were not useful because the light source of the vision station 18 is stable, the camera settings are adjustable, and the vision system needs to be corrected regularly. By random generation of the deformation parameters for a single crab raw image, 300 images were selected from the augmented images, and labelled images were deformed correspondingly. Finally, there were total 9,000 images which were used for network training. 3,000 images were used for validation, and their RGB values were normalized to 0-1 range.

(182) The subject network was implemented using Python Keras library with Nvidia GeForce Titan X GPU. The training process used the adaptive gradient descent algorithm Adam to optimize the loss function.

(183) The loss function, in accordance with Equation 8, was set as pixel-level categories cross entropy, where y is predicted probability distribution, generated by Softmax layer, and y′ is the ground truth distribution, expressed as one-hot format with a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).

(184) The batch size of training process was set at 4 to make full use of the graph card memory. Best model in the validation dataset was saved for a subsequent utilization.
H.sub.y′=Σy′×log(y) (Eq. 8)

(185) The crab core was XY cut at the cutting station 20. The core cut was based on the image segmentation results and was guided by a template matching concept detailed in previous paragraphs.

(186) Deep Learning segmentation model 130 is integrated into the crab processing system 10. The Deep Learning segmentation architecture 130 may operate as an alternative mode of operation to the knuckle localization processing approach 59, or both modes of operation may be used.

(187) As shown in FIG. 25B, the Deep Learning routine 130 can be divided into two separate phases, including the training phase and test phase. The training process is performed offline in Step 1. In the experiment, the training batch size was set at 4. In each training epoch, the weights of network updated 2250 times. After each epoch, the model was validated in Step 2 based on the validation dataset. The best model for the validation dataset was saved as the final network model. The whole training process takes 160 min and the model classification loss and accuracy converge in about 20 epoch. The final loss and accuracy value of training dataset and validation dataset is shown in the Table.

(188) TABLE-US-00002 TABLE 2 Network Training Metrics Classification Classification Loss Accuracy Training dataset 0.0289 0.9892 Validation dataset 0.0196 0.9943

(189) The test phase was performed online in Step 3. In order to seamlessly integrate the trained Python network model into a C++ project, the python class was wrapped into C++ code with the C++ boost library. The segmentation results computed in Step 4 were transformed to C++ array for generating crab cutline in Step 5, and were coordinated with other image acquisition codes and motion control codes.

(190) FIGS. 26A-26D depict two typical crab images (FIGS. 26A, 26C) and their corresponding segmentation results (FIGS. 26B, 26D) using the well-trained U-net model. The region 140 in FIGS. 26A-26D is the conveyor background. The regions 132 are crab legs. The regions 134 are the crab cores, the regions 136 are crab backfin knuckles, and the regions 138 represent the crab back bones.

(191) To quantify the segmentation performance of the model, 50 images in the test dataset were segmented and their results are compared with the ground truth. The segmentation routine can be considered as a pixel-level classification problem. The multi-class confusion matrix was established based on the pixels, as shown in 3. The total number of test pixels are 32,000,000 and the average pixel level accuracy was 0.9843.

(192) TABLE-US-00003 TABLE 3 Mutli-class confusion matrix for test dataset (GT: ground truth, P: predicted results) Backfin Back Background Legs Crab Core Knuckles Bones (GT) (GT) (GT) (GT) (GT) Background 23675208 143084 43328 7735 13961 (P) (99.16%) (3.62%) (1.15%) (3.46%) (8.05%) Legs (P) 148957 3784552 25944 3176 0 (0.62%) (95.78%) (0.69%) (1.42%) (0%) Crab Core 36417 20840 3685632 7920 6641 (P) (0.15%) (0.53%) (97.61%) (3.54%) (3.83%) Backfin 7830 2616 10313 199628 1247 Knuckles (P) (0.03%) (0.07%) (0.27%) (89.29%) (0.72%) Back Bones 7577 0 10804 5112 151478 (P) (0.03%) (0%) (0.29%) (2.29%) (87.39%) Total Pixels 23875989 3951092 3776021 223571 173327 Pixel-level 0.9916 0.9578 0.9761 0.8929 0.8739 Accuracy

(193) Crab backfin knuckles were the key components in determining crab cutline in XY plane. To further quantify the performance of knuckle segmentation, the largest two knuckle blobs were extracted to rule out possible noises. Based on the centroid position of the crab core, two knuckles can be determined as left knuckle and right knuckle for both ground truth segmentation and predicted segmentation. Comparison results of the predicted and ground truth knuckle positions, including X and Y positions of both left and right knuckles, proved a high correlation between the predicted position and the ground truth position. With 5-pixel resolution, 81% knuckles were found correctly. With 10-pixel resolution, 97% knuckles were found correctly.

(194) The distance between two knuckles is representative of the size of the crab. With 0.2 mm pixel resolution of the subject system, the accuracy of the predicted knuckle distance achieved was about 1 mm.

(195) The subject network architecture shows great potential in preciseness of the automatic crab meat picking, including: (a) the Deep Learning processing approach segments different crab components in parallel, and the network model is capable of balancing the segmentation accuracy of different classes of the crab body segments. Specifically, for the centroid of the crab core, the average deviation between the predicted centroid and the labelled centroid attained was 12.37 vs. 11.94 pixels. (b) the U-net model (Deep Learning) is an all-in-one segmentation model and does not need any additional configurations in the test phase, and thus attains an extremely short computation time (about 241 ms).

(196) In the subject Patent Application, an automated crab picking system is disclosed which uses intelligent vision sensors to guide high-pressure waterjets and other end effectors to expose and to harvest crab meats. The subject processing architecture focuses on understanding of crab's morphology based on 2D crab images. To achieve the subject system objectives, the Deep Learning model is integrated into the system. It can automatically segment the crab images into, for example, five ROIs (Region-of-Interest), with high speed and high accuracy, including (1) conveyor background, (2) crab legs, (3) crab core, (4) back-fin knuckles and (5) crab back bones. Among them, crab back-fin knuckles can determine the crab cutline in XY plane via a template matching strategy. Crab core can be used for locating the chamber meat of crab in the subsequent processing steps. The subject system can operate based on the patch based knuckle detection algorithm, and/or based on the Deep Learning algorithm model which can segment different crab components in a single step.

(197) For the Deep Learning model, the computation time cost for single test image can sufficiently decrease about 50 folds compared to the patch-based knuckle detection. The average pixel classification accuracy in test dataset can reach 0.9843. As the initial step of crab meat picking machine, the Deep Learning model guides the machine to understand 2D crab morphologies and is integrated into the system seamlessly, which can meet the accuracy and efficiency requirements of a real-time crab machine.

(198) Although this invention has been described in connection with specific forms and embodiments thereof, it will be appreciated that various modifications other than those discussed above may be resorted to without departing from the spirit or scope of the invention as defined in the appended claims. For example, functionally equivalent elements may be substituted for those specifically shown and described, certain features may be used independently of other features, and in certain cases, particular locations of elements, steps, or processes may be reversed or interposed, all without departing from the spirit or scope of the invention as defined in the appended claims.

Automated crab meat picking system and method

Assignee

Inventors

Cpc classification

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06V20/70

PHYSICS

Classification Explorer

G06V10/454

PHYSICS

Classification Explorer

G06V10/776

PHYSICS

Classification Explorer

A22C29/023

HUMAN NECESSITIES

Classification Explorer

G06T7/70

PHYSICS

Classification Explorer

A23L17/40

HUMAN NECESSITIES

Classification Explorer

A22C29/025

HUMAN NECESSITIES

Classification Explorer

B65B35/44

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06V2201/06

PHYSICS

Classification Explorer

A22C29/028

HUMAN NECESSITIES

Classification Explorer

G06V20/68

PHYSICS

Classification Explorer

G06V10/754

PHYSICS

International classification

Classification Explorer

A22C29/02

HUMAN NECESSITIES

Classification Explorer

A23L17/40

HUMAN NECESSITIES

Classification Explorer

B65B35/44

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G06T7/70

PHYSICS

Abstract

Claims

Description