Discriminative 3D Shape Modeling for Few-Shot Instance Segmentation
20230267614 · 2023-08-24
Assignee
Inventors
Cpc classification
International classification
Abstract
An imaging controller is provided for segmenting instances from depth images including objects to be manipulated by a robot. The imaging controller includes an input interface configured to receive a depth image that includes objects, a memory configured to store instructions and a neural network trained to segment instances from the objects in the depth image, and a processor, coupled with the memory, configured to perform the instructions to segment a pickable instance using the trained neural network. The instructions include steps of selecting a tallest point in the depth image, defining a region using a shape such that the region surrounds the tallest point, sampling points in the region of the depth image, computing depth-geodesics between the tallest point and the sampled points, submitting the depth-geodesics to the neural network to segment the pickable instance among instances of the objects in the depth image, and an output interface configured to output a geometrical feature of the pickable instance to a manipulator controller of the robot.
Claims
1. An imaging controller for segmenting instances from depth images including objects to be manipulated by a robot comprising: an input interface configured to receive a depth image that includes objects; a memory configured to store instructions and a neural network trained to segment instances from the objects in the depth image; and a processor, coupled with the memory, configured to perform the instructions to segment a pickable instance using the trained neural network, wherein steps of the instructions comprise: selecting a tallest point in the depth image; defining a region using a shape such that the region surrounds the tallest point; sampling points in the region of the depth image; computing depth-geodesics between the tallest point and the sampled points; submitting the depth-geodesics to the neural network to segment the pickable instance among instances of the objects in the depth image; and an output interface configured to output a geometrical feature of the pickable instance to a manipulator controller of the robot.
2. The imaging controller of claim 1, wherein the depth images are acquired by a camera or sensor.
3. The imaging controller of claim 1, wherein the steps further comprises: selecting a next tallest point from the depth images such that a pick radius around the next tallest point excludes an overlap with the pickable instance; defining a next region using the shape such that the next region surrounds the next tallest point; sampling next points in the next region of the depth image; computing next depth-geodesics between the next tallest point and the sampled next points; and submitting the depth-geodesics to the neural network to segment a next pickable instance among the instances of the objects in the depth image.
4. The imaging controller of claim 3, wherein the steps of selecting, defining, the sampling, computing and submitting are continued until the steps are performed to a rest of the objects in the depth image.
5. The imaging controller of claim 1, wherein the shape is a square, a rectangular, a triable, a circle or an oval.
6. The imaging controller of claim 1, wherein the neural network is trained to classify each feature vector as belonging to an identical instance or different instance.
7. The imaging controller of claim 1, wherein end-points of the geodesics are initialized using peaks produced by a Watershed Algorithm.
8. A computer-implemented method for training a neural network for segmenting instances in depth images, wherein the method uses a processor coupled with stored instructions implementing the method, wherein the instructions, when executed by the processor carry out at steps of the method, comprising steps of: selecting a depth image from a set of depth images; determining points of xy-locations on a 2-dimensional image grid and corresponding depth points with respect to the selected depth image, wherein the points on the 2-dimensional image grid are respectively annotated with ground truth instance labels; computing geodesic straight lines between pairs of the annotated determined points; generating depth geodesics by projecting the geodesic straight lines on the depth image; discretizing each of the depth geodesics to create discretized vectors, wherein each discretized vector corresponds to one of the depth geodesics between a pair of the annotated determined points; and submitting the discretized vectors and corresponding annotated labels of the discretized vectors to the neural network, wherein the steps from the selecting through the providing are repeatedly performed until a rest of all the set of depth images are used.
9. The method of claim 8, wherein the neural network is trained to classify each feature vector as belonging to an identical instance or different instance.
10. The method of claim 9, further comprises computing a convex hull of all endpoints of the geodesics that are classified as the geodesics belonging to the identical instance as a pickable point.
11. The method of claim 8, wherein the generated depth geodesics are debiased such that the geodesics are created from points being lower-depth to points being higher depth.
12. The method of claim 8, wherein end-points of the geodesics are initialized using peaks produced by a Watershed Algorithm.
13. The method of claim 12, wherein the endpoints of the geodesic are determined using a systematic selection method.
14. A bin-picking system for picking objects from a bin, comprising: an end-tool configured to pickup an object from among the objects; a robot arm including the end-tool, wherein the robot arm is configured to be driven by control signals that include end-tool signals to pickup the object from the bin using the end-tool; an interface configured to transmit and receive the control signals, sensor signals of sensors arranged on the robot arm, imaging signals of at least one imaging device; a memory configured to store instructions of a robot control program, and a classifier and a trained neural network that segments instances from the objects in the depth image, the trained neural network having been trained by a computer-implemented method of claim 8; and a processor, coupled with the memory, configured to perform the instructions to segment a pickable instance using the trained neural network and generate the control signals that drive the robot arm and the end-tool, wherein steps of the instructions comprise: selecting a tallest point in the depth image; defining a region using a shape such that the region surrounds the tallest point; sampling points in the region of the depth image; computing depth-geodesics between the tallest point and the sampled points; submitting the depth-geodesics to the neural network to segment the pickable instance among instances of the objects in the depth image; generating a geometrical feature of the pickable instance and the control signals based on the imaging signals; and transmitting the generated geometrical feature and generated control signals to the robot arm such that the end-tool pickups the object corresponding to the pickable instance from the bin using the end-tool.
15. The bin-picking system of claim 14, wherein the depth images are acquired by a camera or sensor.
16. The bin-picking system of claim 14, wherein the steps further comprises: selecting a next tallest point from the depth images such that a pick radius around the next tallest point excludes an overlap with the pickable instance; defining a next region using the shape such that the next region surrounds the next tallest point; sampling next points in the next region of the depth image; computing next depth-geodesics between the next tallest point and the sampled next points; and submitting the depth-geodesics to the neural network to segment a next pickable instance among the instances of the objects in the depth image.
17. The bin-picking system of claim 16, wherein the steps of selecting, defining, the sampling, computing and submitting are continued until the steps are performed to a rest of the objects in the depth image.
18. The bin-picking system of claim 14, wherein the shape of the region is a square, a rectangular, a triable, a circle or an oval.
19. The bin-picking system of claim 14, wherein the neural network is trained to classify each feature vector as belonging to an identical instance or different instance.
20. The bin-picking system of claim 14, wherein end-points of the geodesics are initialized using peaks produced by a Watershed Algorithm.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] The presently disclosed embodiments will be further explained with reference to the attached drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
DETAILED DESCRIPTION
[0039] While the above-identified drawings set forth presently disclosed embodiments, other embodiments are also contemplated, as noted in the discussion. This disclosure presents illustrative embodiments by way of representation and not limitation. Numerous other modifications and embodiments can be devised by those skilled in the art which fall within the scope and spirit of the principles of the presently disclosed embodiments.
[0040] Segmenting nearly-identical object instances is a problem that is ubiquitous in a variety of robotic bin-picking applications. Some examples include: (i) a robotic arm that needs to pick products moving on a conveyor belt in a manufacturing setting, (ii) a supermarket robot that needs to pick and place fruits from a bin, or (iii) a library-assistant robot that needs to pick and handover books from a box to a human.
[0041]
[0042] In the present disclosure, we consider this problem of instance segmentation of nearly-identical convex object instances in depth images in a few-shot setting, where we assume to have access to a limited set (less than five) of annotated depth images, each with a few instances being annotated with their segments. Our key idea is to create surface trajectories or geodesics on the 3D surface of the depth image, with the goal of training a neural network to classify these trajectories as being within or across two ground truth instances; the network thus potentially learning an implicit 3D model of a single object instance within its parameters, even if it is trained using only single geodesic trajectories. For a depth image with n pixels, there are n(n - 1)/2 such geodesics potentially possible, which if carefully used could provide a significantly large dataset to train. Our idea is to leverage this insight towards instance segmentation, when the number of images annotated is very few. Specifically, our algorithm has the following steps. (i) For two randomly chosen points on the depth image, we compute a surface geodesic that is the projection of a straight line connecting the two points on the 2D RGB image grid onto the depth image. Given that the objects we assume are convex (and the camera plane is assumed orthogonal to the objects), this projection will be (approximately) a shortest path connecting the two points on the object’s depth surface, and thus will be a depth geodesic. (ii) We discretize this geodesic into a pre-defined (fixed) set of bins, where each bin will hold the value of the depth of the geodesic at that bin location. The bins are equally-spaced on the straight line from which the geodesic was projected from. (iii) Next, we give a label to the geodesic using the ground truth segments provided. Specifically, if the two ends of the geodesic belong to the same object instance, then we give a label 1 to the discretized geodesic vector, and zero otherwise. (iv) We train a neural network classifier on these discrete vectors and their labels.
[0043] At test time, given a depth image, we first select a seed location in the depth image to start the segmentation process. In robotic bin-picking applications, it is usually easier for the robot to pick an instance that is located at the very top (i.e., closest to the robot). In some other cases, an instance that is most-isolated, may be preferable. We propose various heuristics to compute this initial seed. Next, we compute geodesics from this seed point to random spatial locations around the seed point and within a predefined radius. We discretize these geodesics, and classify each geodesic as belonging to the same instance or not using the pre-trained classifier. For all the points that are classified as being within the same instance, we compute a convex hull of these points, and consider all the pixels within this hull as corresponding to the same instance, thus achieving instance segmentation. To create segmentations for more than one instance, we select another seed point from the depth image that is outside a predefined proximity of the already segmented instance, and repeat the above process until we have obtained a suitable number of object instances for the task.
Proposed Method
[0044] Suppose we are given a set of annotated depth images D = {D.sub.1,D.sub.2, ...,D.sub.N} where each
defines an image grid of width W and height H pixels, such that D.sub.xy holds a non-negative value corresponding to the depth of the scene at location (x, y) ∈ [H] × [W], for [Z] denoting the index set of integers {0,1, ..., Z - 1}. For a pixel (x,y) on the image grid, we assume it is annotated with an instance label ℓ.sub.xy ∈ [L.sub.D] ∪ {L.sub.B}, where L.sub.D is the number of instances in the depth image D, and L.sub.B corresponds to a background label (i.e., a pixel that does not contain the depth of any object instance, such as the pixels for the base of the bin, the walls of the bin, etc.). To introduce our method, we will need some background notation, which we describe next.
Surface Geodesics and Assumptions
[0045] For two distinct points (x.sub.1, y.sub.1) and (x.sub.2, y.sub.2) on the image grid, suppose γ(t) (for t ∈ [0,1]) be the directed surface curve starting at D.sub.x
as the length of this curve γ, and a geodesic g is a curve (or set of curves) with the minimal length connecting the two points. That is,
[0046] We call the geodesic as a z-geodesic as well in the subsequent text. To derive our method, we make the following assumptions on our problem setting.
[0047] Assumption 1 (Surface Convexity) We assume the objects used in our setup are convex and the depth patch associated with the instances form an approximately convex smooth surface.
[0048] By convex object surfaces, we mean that all the one-dimensional curves γ(t) on the surface are convex with respect to t such that any γ(t) ≤ (1 -t)γ(0) + tγ(1), ∀t ∈ [0,1]. Suppose
is a patch from the depth image D where all the elements in
have the same instance label ℓ. Then, for two distinct points (x.sub.1, y.sub.1), (x.sub.2, y.sub.2) on the image grid where both D.sub.x
if
is a geodesic starting at
and ending at
and if label denotes the instance label of the point
on the geodesic, then we have the following proposition that is straightforward to prove using the basic properties of convexity. We will omit the subscripts and superscripts on
for now to simplify our notation, we will revert to it whenever required.
[0049] Proposition 1 If
is a convex depth patch from a depth map D, and if g(t) is a geodesic from
to
then
[0050] Assumption 2 (Orthogonal Projection) The camera projection plane is located suitably far from the instances, such that the image XY-plane is approximately orthogonal to the velocity γ̇(t) of any trajectory on the depth surface.
[0051] This assumption allows us to parameterize the geodesic
connecting 3D points
and
by the straight line
for xy(t) = (1 - t)(x.sub.1, y.sub.1) + t(x.sub.2, y.sub.2) for t ∈ [0,1]. We will use e(xy(t)) to denote
for simplicity, and with this parameterization, we have the geodesic as g(e(xy(t))), where now instead of t, we use the points on the straight line to index depth.
[0052] Assumption 3 (Stationary Pose) We further assume that the camera location and pose, as well as the bin are stationary when capturing all the depth images.
[0053] We also assume that there is one or more instances of the object in the bin in the training images and that all the instances are of the same object. We do not make any assumption on either the arrangements of the instances in the bin or on the number of instances in the bin. We also assume that the ground truth annotations are reasonably accurate, and atleast one instance in each training image is associated with a ground truth annotation. While, we may have access to RGB images of the bin alongside the depth images, we do not use these RGB images in our approach described in this work. Further, one could also easily extend the approach to work with depth point clouds, instead of depth images. In this case, the geodesic approximation using Assumption 2 may not be applicable directly as the XY points may not be described by a fixed image grid anymore.
Discriminative Shape Modeling
[0054]
[0055] Geodesic Discretization: From a practical sense, directly applying the idea of using the geodesics for instance segmentation is problematic, as in that case, one would need an implicit paramtrization of the surface geodesics as continuous curves, which may be difficult for arbitrary curves and objects for which there may not be any analytical form for such curves (e.g., a surface geodesic on a chicken nugget?). Instead, to keep things computationally cheap, we discretize the curves using a fixed number of bins. Specifically, for a geodesic g(t), we represent it using a fixed m-dimensional vector
where the k-th dimension ν.sub.k = g((k -1)/m). Such discretized geodesics can be computed very cheaply using the Assumption 2 of orthogonal projection of the camera plane, as in that case, one just needs to first split the Euclidean geodesic approximation
to m parts, i.e.,
to obtain the (x,y) 2D image grid location, which can then be used to directly index the depth map to get ν.sub.k = D.sub.xy(k).
[0056] Instance Supervision: If the discontinuities or non-convexity of the surface geodesics are sufficient to find the instance boundaries, then why would one need instance annotations? This is because, the above discretization step may skip discontinuities in the curve if the two instances are very close. For example, consider two cubes, touching each other. In the depth image, at locations where the cubes touch, the curve may be almost continuous, and the discontinuity may be skipped by the discretization step. A similar problem can happen when there is noise in the depth images that standard noise removal and hole-filling algorithms may smooth the depth images that the ground truth geodesic discontinuities may be suppressed. To circumvent these issues, the present method of the assume to have access to ground truth instance masks.
Instance Segmentation Training Pipeline
[0057]
302 (one such point and its straight lines to a couple of other points are only shown in
on the depth image by projecting these Euclidean geodesics on the depth map. Each depth geodesic is then discretized into m bins forming the set ν = {ν.sub.1, ν.sub.2, .Math., ν.sub.M} of M vectors as described in the above section, each v corresponding to a discretized depth geodesic 303. Suppose
is such a discretized vector corresponding to a depth geodesic from point (x.sub.i, y.sub.i) to (x.sub.j, y.sub.j), then we assign a label label.sub.g to ν as:
[0058] Recall that ℓ.sub.xy is the instance label associated with the image point (x,y).
[0059] Our final step in the training pipeline is to use the set ν and its corresponding binary labels to train a neural network model f.sub.θ: ν .fwdarw. {0,1}, parametrized by θ. Specifically, the neural network 304 is a series of multi-layer perceptrons (MLP), and takes as input a batch of samples from ν and predicts the label of the respective sample. This prediction is then matched with the ground truth binary label 305 using the softmax-crossentropy loss 306, which is then used to derive a gradient to train the network parameters. In our experiments, we found that augmenting each vector v ∈ ν with the length of the Euclidean geodesic
(i.e., adding an extra (m + 1)-th dimension to ν with this length) improves the training and performance of the network. This is because, for situations when there are no discontinities in the geodesics that the network can discern, it can learn an approximate size of the underlying shape for classification.
Instance Segmentation Inference Pipeline
[0060]
[0061] At test time, given a test depth image D, our goal is to repeat the process during the training phase for instance segmentation. As our goal is finally to produce a segmentation for an instance in the bin that is perhaps most useful for a robotic arm to grasp and pick, we propose to segment instances that are at the top of the bin (i.e., those instances closest to the camera) as shown in
[0062]
uniformly around H, and create Euclidean geodesics
These geodesics are then mapped to discretized depth geodesics ν (504) and classified using pre-trained f.sub.θ (neural network 505 corresponding to the trained neural network 304)to signify the other endpoint of ν (corresponding to a point (x.sub.i,y.sub.i) around H) is within an instance segment or not. The points that are classified as within a segment are then fed to a robust convex hull computation algorithm to produce a segmentation (506) of the instance. Note that the convexity of the object is thus important for this step to work correctly.
[0063] To create a segmentation for a different instance, we select another tall point H′ from the depth image such that the pick radius r around H′ will not overlap with the pick radius around H. That is, we search for instances whose depth geodesics will not overlap with the instance that we already segmented. Once we find a point H′, we apply the procedure described above. We do this process sequentially, generating one instance segment at a time.
Algorithm Extensions
Debiasing the Training Set
[0064] As an astute reader might immediately pickout, there is a difference in the way the geodesics are computed at training and at test time. While, the training samples in the above setup were selected at random from the image grid, the test samples are selected from the tallest point in the depth map. Thus, in the latter the initial dimensions (closest to H) in the discretized geodesic ν will have a trend of going up (i.e., the depth increasing), however, this need not be the case for those in the training set, creating a bias in the training and test distributions. To mitigate this issue, we sort the training points using their depth values in the ascending order of their depths, and always compute the geodesics during training from points that have a lower depth to points that have a higher depth.
Systematic Sampling of the Test Geodesics
[0065] In the basic inference algorithm described above, we randomly sampled the test points around the seed point. However, a more efficient approach would be to select the points systematically. To this end, we propose to use the pick radius r to define a circular region around the pick point H; this region is then divided into equal sectors, by dividing r into β equal parts, and dividing the circle into ζ equal angles. This leads to βζ points to consider for generating the surface geodesics, where these parameters can be adjusted depending on the underlying shape of the segment we ought to learn.
Watershed Initialization
[0066] a. So far, we have used randomly sampled points (albeit being sorted) during the training phase. Such a sampling does not distinguish between using easy geodesics against hard geodesics to learn the classifier. For example, a trivial discontinuity may be sufficient for a classifier to flag an out-of-instance trajectory, however, if such discontinuities do not happen, perhaps there are other subtle clues in the geodesic that the classifier should pay attention to? Such attention could be difficult to learn when they might be relatively very less frequent in the deluge of simple trajectories. To this end, we propose a hard-negative mining extension to our basic approach using Watershed Transforms (WT). Watershed algorithm is a classical unsupervised method for image segmentation that uses the analogy of blocking water being poured from a hill top (the interior of an instance) which flows towards the valleys (i.e., segment boundaries). If we block the valleys using “dams” (via characterizing the edges using image Laplacians), then the pixels within which the water gets trapped forms a segment. A challenge with the above approach to work correctly is in the choice of where to construct the dams such that the trapped water corresponds to a ground truth segment.
[0067] b. In WT, the points where to start the region growing (i.e., the location to pour the water) are found using distance transforms. That is, first distance transforms are computed on the images to find regions where the peaks are (which corresponds to points that are farthest from the edges), and next these points are selected for region growing. There are two advantages of using this idea in our setup: (i) points that are isolated from other instances could have such a peak, and such isolated instances could be useful for robotic picking, similar to the instance corresponding to the tallest depth point, and (ii) wherever clutter is, i.e., peaks are higher (as the water could not be blocked by the edges due to discontinuous/broken edges), those segments produced by WT might be corresponding to multiple instances being falsely segmented as a single instance by WT, and thus could be useful for our geodesic trajectory based scheme to rectify better, using the provided supervision. Thus, we propose to improve the selection of the seed points to construct the geodesic trajectories via selecting the peaks produced by WT, and confine the end points of the geodesics to be within the segmentation mask produced by WT for that respective peak point.
Experiments
[0068] a. In this section, we provide experiments demonstrating the empirical performance of our method for the task of instance segmentation. For this empirical study, we used a dataset consisting of several pieces of chicken nuggets in a bin. The images were HD quality, however, for our experiments, we resized them to 320 × 240. We used only a single annotated depth image for training our setup, while the test set consisted of 17 images. The depth images were created using a Ensenso camera. For our systematic sampling of the endpoints, we used a k = 14 and the number of angles depended on the pick radius (i.e., β = 2πr/3). The pick radius is selected depending on the size of the object to be segmented, e.g., from the average radius of the instances in the provided ground truth segmentations.
[0069] b. Neural Network: We used a discretization of the geodesic trajectory with 50 bins and thus we use m = 51 bins. Our neural network consisted of 5 MLPs, with respective output dimensions m, 5m, m, m/2,2, and using ReLU activations, and used Adam for the optimization using the default learning rate and other settings. We also experimented with other non-linear classifier models for the proposed approach (such as a non-linear SVM), the results for which will be presented shortly.
[0070] c. Evaluation: For the evaluation of the method, we sampled 1000 points from the ground truth and the predicted instance segment, and computed an F1 score over this overlap whether the classifier predicted these samples correctly. We compute the performance for predicting various number of instance segments in the depth image. One caveat for our sequential way of predicting the segmentations is that sometimes the method will not return the required number of segmentations, as some of the instances would be overlapping with others partially that the exclusion of instances using the pick radius will not be able to find these overlapping instances.
[0071] Thus, we evaluate only for the instances that had a pick point identified.
Experiments
[0072]
[0073] Computational Performance: As our scheme consists of basic computations on the image and depth maps, as well as the trajectories are discretized into small dimensional vectors, our method is computationally very efficient, and takes roughly 5 minutes to train on a 4 core CPU with 100 K trajectories, and takes about 0.05 seconds to segment an instance during inference.
Generalizability:
[0074]
[0075]
Qualitative Segmentation Results
[0076]
[0077]
[0078] The imaging controller 1200 can include an input interface to receive depth images from an imaging device including cameras or external data 1295 including a set of training datasets. The input interface can include a human machine interface 1210 within the imaging controller 1200 that connects the processor 1220 to a keyboard/measurement device 1211 and pointing device 1212, wherein the pointing device 1212 can include a mouse, trackball, touchpad, joystick, pointing stick, stylus, or touchscreen, among others. Alternatively, the input interface can include a network interface controller 1250 adapted to connect the imaging controller 1200 through the bus 1206 to a network 1290. Through the network 1290, the external data 1295 can be downloaded and stored within the storage system 1230 as training and/or operating data 1234 for storage and/or further processing.
[0079] Still referring to
[0080]
[0081] The robot 150 is configured to perform the picking operation, e.g., pick the segmented object instance 103, along the trajectory while using imaging devices 106 connected to the imaging controller 1200 that can provide depth images of objects to be manipulated by the robotic arm 101. The imaging controller 1200 is connected to the controller of the robot 150 such that the controller of the robot 150 acquires and uses the features of segmented instances from the imaging controller 1200. As used herein, the trajectory corresponds to a path defining a motion of the object 103 held by the gripper 104, for performing the picking operation. In a simple scenario, the trajectory can dictate only a vertical motion of the wrist 102. However, as the wrist 102 includes multiple degrees of freedom, the trajectory may comprise a motion profile spanning in multi-dimensional space.
[0082] A pose of an object refers to a combination of a position and an orientation of the object. The gripper 104 is movable, in a start pose 111. A pose of the gripper 104 corresponding to the start pose 111 is referred to as a start pose of the gripper 104. According to an embodiment, aim of the picking operation is to pick a segmented instance object 103. The pose 115 of the object 112 may refer to a position and/or orientation of the object 112. The robot 150 is configured to move the gripper 104 along a trajectory 113 to pick the object 103 in a pose 114. The pose 114 of the object 103 of the object 112 is referred to as a goal pose. A pose of the gripper 104 corresponding to the goal pose is referred to as a goal pose of the gripper 104.
[0083] The goal pose of the gripper 104 is determined based on a position of the object 112. At the end of a successful execution of the picking operation, the pose of the gripper 104 of the robot arm 101 is considered to have attained the goal pose of the gripper 104. Therefore, achieving the goal pose of the gripper 104 is equivalent to the successful execution of the picking operation. According to an embodiment, the trajectory 113 is defined according to the start pose and goal pose of the gripper 104, and the pose 115 of the object 112. Further, such picking operation may be executed repeatedly by the robot 150.
[0084]
[0085] Contemplated are various component configurations that may be mounted on a common motherboard, by non-limiting example, 1430, depending upon the specific application. Further still, an input interface 1417 can be connected via bus 1450 to an external receiver 1406 and an output interface 1418. A receiver 1419 can be connected to an external transmitter 1407 and a transmitter 1420 via the bus 1450. Also connected to the bus 1450 can be an external memory 1404, external sensors 1403, machine(s) 1402 and an environment 1401. Further, one or more external input/output devices 1405 can be connected to the bus 1450. A network interface controller (NIC) 1421 can be adapted to connect through the bus 1450 to a network 1422, wherein data or other data, among other things, can be rendered on a third-party display device, third-party imaging device, and/or third-party printing device outside of the computer device 1400.
[0086] Still referring to
[0087] Still referring to
[0088] Still referring to
[0089] Still referring to
[0090] Although the robotic system described above expresses, as an example, a robot that can manipulate/assemble parts of a product, the robot system can be applied to a robot that can be applied to a case where lot of foods in food processing plants are irregularly shaped objects (cut vegetables, fried chickens, etc.). The robotic system that includes a system for generating verisimilar images from real depth images and automatically segmenting multiple instances of a rigid object in depth images can be applied to the automation of food processing plants, industrial robots which can manipulate foods. Further, the robotic system described above can be applied to a segmentation (method) system for food recognition. Segmentation is one of the most popular and important problems in the image processing. It’s essential to make accuracy of segmentation high and both training and computation time short for applying to food processing plants.
[0091] The above description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
[0092] Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the append claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.