AUGMENTED REALITY-ENHANCED FOOD PREPARATION SYSTEM AND RELATED METHODS

20210030199 · 2021-02-04

Assignee

Miso Robotics, Inc. (Pasadena, CA, US)

Inventors

Cpc classification

International classification

Abstract

A food preparation system is configured to enhance the efficiency of food preparation operations in a commercial kitchen by displaying instructions on a surface in the kitchen work area. The food preparation system includes a plurality of cameras aimed at a kitchen workspace for preparing the plurality of food items and a processor operable to compute an instruction for a kitchen worker to perform a food preparation step based on one or more types of information selected from order information, recipe information, kitchen equipment information, data from the cameras, and food item inventory information. A projector in communication with the processor visually projects the instruction onto a location in the kitchen workspace for the kitchen worker to observe. Related methods for projecting food preparation instructions are described.

Claims

1. A food preparation system for preparing a plurality of food items in a commercial kitchen environment, the system comprising: at least one camera aimed at a kitchen workspace for preparing the plurality of food items; a processor operable to compute an instruction for a kitchen worker to perform a food preparation step based on data from the at least one camera, order information, and recipe information; and a projector in communication with the processor and operable to visually project the instruction onto a location in the kitchen workspace for the kitchen worker to observe.

2. The food preparation system of claim 1, wherein the projector is configured to project the instruction on the food item or in close proximity to the food item.

3. The food preparation system of claim 1, wherein the projector comprises a laser for projecting the instruction.

4. The food preparation system of claim 1, wherein the projector comprises AR glasses.

5. (canceled)

6. The food preparation system of claim 1, wherein the instruction comprises at least one of a text, indicia, symbols, figures, or diagrams.

7. (canceled)

8. (canceled)

9. The food preparation system of claim 1, further comprising at least one sensor in addition to the at least one camera.

10. (canceled)

11. The food preparation system of claim 1, further comprising a communication interface to connect with the internet.

12. The food preparation system of claim 1, wherein the processor is in communication with the POS and adapted to receive the order information.

13. The food preparation system of claim 1, wherein the processor is operable to recognize and locate the food item based on data from the at least one camera.

14. (canceled)

15. (canceled)

16. The food preparation system of claim 1, wherein one of the cameras is an IR camera, and wherein the processor is operable to determine next food step based on image data from the IR camera of the food item.

17. (canceled)

18. (canceled)

19. (canceled)

20. (canceled)

21. A method to assist a kitchen worker to prepare a plurality of food items in a kitchen workspace in order to complete a customer order, the method comprising: receiving the customer order; computing an instruction for a kitchen worker to perform a food preparation step on the plurality of food items in the kitchen workspace based on order information, camera data, and recipe information; and projecting the instruction onto a location in the workspace viewable by the kitchen worker.

22. The method of claim 21, wherein the step of projecting is performed directly onto the location.

23. (canceled)

24. (canceled)

25. (canceled)

26. The method of claim 21, further comprising recognizing and locating the food items based on the camera data.

27. The method of claim 26, wherein the step of recognizing and locating is performed with a trained CNN.

28. The method of claim 26, further comprising monitoring a state of the food items.

29. The method of claim 28, further comprising providing an instruction based on a change in the state.

30. (canceled)

31. (canceled)

32. The method of claim 21, further comprising computing the total time to carry out the customer order.

33. (canceled) The method of claim 32, further comprising computing a first set of steps to complete a first customer order including computing a duration for each step, and a time to commence each step.

34. (canceled)

35. The method of claim 21, wherein the step of projecting includes projecting a symbol onto a grill or the food item.

36. (canceled)

37. (canceled)

38. The method of claim 21, further comprising displaying the instruction and wherein the displaying comprises use of an AR display adapted to superimpose instructions onto another location shown in the display.

39. The method of claim 38, wherein the location is on top of a food item.

40. (canceled)

41. (canceled)

42. (canceled)

43. (canceled)

44. (canceled)

45. (canceled)

46. (canceled)

47. A method to assist a kitchen worker to prepare a plurality of food items in a kitchen workspace to complete a customer order, the method comprising: detecting when a cooking threshold has been reached; computing an instruction for a kitchen worker to perform a food preparation step on the plurality of food items in the kitchen workspace; and projecting the instruction onto a location in the workspace viewable by the kitchen worker.

48. The method of claim 47 wherein the threshold is temperature.

49.-88. (canceled)

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0056] FIG. 1 illustrates a food preparation system in a kitchen environment;

[0057] FIG. 2 schematically depicts an example architecture of an automated food preparation system;

[0058] FIG. 3 is a flow diagram of a method for projecting an instruction onto a location in the kitchen workspace viewable by the kitchen worker;

[0059] FIGS. 4A-4B depict various software modules of a system for projecting an instruction onto a location in the kitchen workspace in accordance with embodiments of the invention; and

[0060] FIG. 5 is a flow diagram of a method for processing data from multiple sensors in accordance with an embodiment of the invention.

DISCLOSURE OF THE INVENTION

[0061] Before the present invention is described in detail, it is to be understood that this invention is not limited to particular variations set forth herein as various changes or modifications may be made to the invention described and equivalents may be substituted without departing from the spirit and scope of the invention. As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process act(s) or step(s) to the objective(s), spirit or scope of the present invention. All such modifications are intended to be within the scope of the claims made herein.

[0062] Methods recited herein may be carried out in any order of the recited events which is logically possible, as well as the recited order of events. Furthermore, where a range of values is provided, it is understood that every intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the invention. Also, it is contemplated that any optional feature of the inventive variations described may be set forth and claimed independently, or in combination with any one or more of the features described herein.

[0063] All existing subject matter mentioned herein (e.g., publications, patents, patent applications and hardware) is incorporated by reference herein in its entirety except insofar as the subject matter may conflict with that of the present invention (in which case what is present herein shall prevail). Amongst other patent applications and patents listed herein, provisional patent application no. 62/467,735, filed Mar. 6, 2017, and entitled VISUAL INSTRUCTION DISPLAY SYSTEM TO ENHANCE EFFICIENCY OF WORKERS is incorporated herein by reference in its entirety.

[0064] Reference to a singular item, includes the possibility that there are plural of the same items present. More specifically, as used herein and in the appended claims, the singular forms a, an, said and the include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as solely, only and the like in connection with the recitation of claim elements, or use of a negative limitation.

[0065] It is also to be appreciated that unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

[0066] Food Preparation System Overview

[0067] FIG. 1 depicts a food preparation system 100 for projecting instructions to a location in the kitchen environment viewable by kitchen workers in accordance with an embodiment of the invention. Particularly, the food preparation system 100 is shown including: cameras 110 for visual recognition, a laser projection system 120, various food items 130, viewable instructions 140 projected by the laser projection system onto the food items or onto the commercial grill 150 as desired. Additional components or features can be included (or operate) with the food preparation system (not shown) including but are not limited to: a computer for processing information, an Internet connection, a point of sale system, a kitchen display system (KDS), local and remote server(s), and a human worker to read the instructions and act accordingly.

[0068] In preferred embodiments, the food preparation system takes or receives orders from a system that collects customer orders, and then, as described further herein, projects meaningful visual information onto the work areas (either directly with a laser, for instance, or virtually with AR glasses) to help guide kitchen workers in the preparation of food items. The food preparation system is operable to determine which instructions to project (and when and where to project the instructions) based on various types of information including knowledge of the current state of the grill, status of food items being prepared, information collected by cameras and other sensors, recipes for various food items, past orders, and other information. In embodiments, the system automatically monitors the work area for evidence that the step has been completed and then projects a next step.

[0069] It is also to be understood that the food preparation system may be used with a wide variety of kitchen equipment including but not limited to various types of grills, stove tops, assembly areas, areas for prepping food items, e.g., cutting and dicing, packaging areas, and storage areas as desired. Examples of grills include a standard electric grill with a cooking surface that is 24 inches deep and 48 inches wide. Such a grill can be used for preparation of burger patties, chicken, steaks, and onions. Another electric grill is 24 inches deep and 24 inches wide that is used for the preparation of veggie patties and buns. Another type of equipment is an assembly area and packaging area that contains work surfaces and bins with the various toppings that the customer can choose from. Yet another equipment is storage for the various ingredients. Indeed, the invention described herein may be used with a wide variety of kitchen equipment as desired.

[0070] FIG. 2 schematically depicts an example architecture of a food preparation system 210 in accordance with an embodiment of the invention. System 210 is shown including a computer 212 with storage 214, a processor 216, and communication interface 218. The storage may be preloaded with various information relating to recipe data, kitchen equipment parameters, and computer instructions for the processor to read and perform.

[0071] System 210 is also shown including sensors 222 which, as described further herein, can be used in combination with the cameras 220 to recognize and locate various food items as well as determine doneness of a food item being cooked.

[0072] In embodiments, three cameras are employed. A first camera is trained on a first type of grill such as a 2448 inch grill. The first camera can be mounted approximately 48 inches up from the back surface of the grill, at the midpoint of the grill width and set back approximately 6 inches.

[0073] A second camera can be trained on a second type of grill such as the 2424 grill and positioned similarly to that described in connection with the first camera.

[0074] A third camera may be located in the assembly area and trained on the assembly area.

[0075] Examples of cameras include but are not limited to the Blackfly 2.3 MP Color USB3 Vision with Sony Pregius IMX249 sensors with Fujinon CF12.5HA-1 lens with focal length of 12.5 mm, each of which is commercially available. However, other cameras may be used. Additionally, in embodiments, supplemental lighting and/or bandpass filters may be employed to improve the quality of images captured by the cameras.

[0076] Additionally, other types of visual sensors may be used that provide additional information, e.g., depth sensing systems incorporating projected infrared grids, such as the one used by the Xbox Kinect v1 sensor, and/or depth sensing system employing time-of-flight technologies, such as the one used by Xbox Kinect v2, developed by PrimeSense, manufactured by Microsoft Corporation.

[0077] Additionally, 3D, Lidar and ultrasonic-based sensors can be employed to provide data to locate and identify food items, workers, and equipment.

[0078] In embodiments, the plurality of sensors includes a visible spectrum camera (e.g., a black and white, or RGB camera), a depth sensor, and an infrared (IR) camera.

[0079] The infrared or IR camera generates IR image data by measuring the intensity of infrared waves and providing data representing such measurements over the observed area. In embodiments, the focal length of the camera lens and orientation of the optics has been set such that the area imaged includes the work area. Preferably, the IR camera is adapted to measure the intensity of IR waves over an area and generates IR image data. Preferably, the IR wavelength ranges from 7.2 to 13 microns, but other wavelengths in the IR may be used. An exemplary IR sensor is the CompactPro high resolution thermal imaging camera manufactured by Seek Thermal Corporation (Santa Barbara, Calif.), which can provide an image of size 320240 with each value a 16-bit unsigned integer representing measured thermal intensity.

[0080] In embodiments, the visible spectrum camera is an RGB camera to generate image data. The RGB image comprises a 960 by 540 grid with intensity data for red, green, and blue portions of the spectrum for each pixel in the form of 8-bit unsigned integers. In embodiments, the focal length of the camera lens and orientation of the optics have been set such that the area imaged includes the work surface. An exemplary visible spectrum camera is the Kinect One sensor manufactured Microsoft Corporation (Redmond, Wash.).

[0081] A depth sensor incorporates a time of flight (TOF) camera to generate data on the distance of each point in the field of view from the camera. The TOF camera is a range imaging camera system that resolves distance based on the known speed of light, measuring the time-of-flight of a light signal between the camera and the subject for each point of the image. In embodiments, the image comprises a 960 by 540 grid with a value of the distance from the sensor for each point in the form of a 16-bit unsigned integer. An exemplary depth sensor is the Kinect One sensor manufactured Microsoft Corporation (Redmond, Wash.).

[0082] Without intending to be bound to theory, we have discovered the IR camera sensors providing IR image data have the potential to mitigate or overcome the above-mentioned shortcomings associated with conventional automated cooking equipment. Due to the temperature differences typically present when an uncooked food is placed on a hot grill or other high temperature cooking surface or when a kitchen worker or kitchen worker's appendage is imaged against a predominantly room temperature background, IR camera sensors are able to provide high contrast and high signal-to-noise image data that is an important starting point for determining identity and location of kitchen objects, including food items, food preparation items and human workers. In contrast, the signal-to-noise ratio is significantly lower using only traditional RGB images than if using IR images. This occurs because some kitchen backgrounds, work surfaces, and cooking surfaces can be similar to food items in color, but temperatures are generally significantly different. Based on the foregoing, embodiments of the invention include IR-camera sensors in combination with other types of sensors as described herein. Use of IR sensors for assisting with food preparation is also described in provisional patent application no. 62/592,130, filed Nov. 29, 2017, and entitled AN INFRARED-BASED AUTOMATED KITCHEN ASSISTANT SYSTEM FOR RECOGNIZING AND PREPARING FOOD AND RELATED METHODS, incorporated herein by reference in its entirety.

[0083] FIG. 2 also shows human input 224 which may be in the form of a keyboard, touchscreen display or another means to provide input to the system. If available, a point of sale (POS) 226 can be coupled to the computer 212 which sends real time customer order information to the system.

[0084] Projection System

[0085] As described herein, the invention is directed to projecting instructions to the kitchen worker via a projection system. A wide range of projection systems may be employed to project the instructions onto a location or food item in the kitchen workspace.

[0086] For example, projection of visual information can be performed via a laser projection system such as the Clubmax 800 from Pangolin Laser Systems, Inc. Such commercially available laser projection systems comprise a laser source, mirrors, galvanometer scanners, various electronic components, and other optical components capable of projecting a laser beam at a given area in the system's field of view. One advantage of such a laser projection system is that they have, over the ranges under consideration for this application, essentially infinite depth of field.

[0087] Other methods may be used to perform the projection including, but not limited to, augmented reality (AR) glasses and digital light processing (DLP) projectors. In embodiments, AR glasses are employed with beacons placed on or near the grill to properly orient the projected visual information. Such beacons can be rigidly attached to a fixed surface, preferably the grill, and emit visual, infrared, or ultraviolet light that is recognized by sensors in the AR glasses. An example of a beacon is shown in FIG. 1, represented by reference numeral 160. Additional examples of AR type glasses include, without limitation, the Microsoft HoloLens, Google glass, Oculus rift. See also U.S. Pat. Nos. 9,483,875 and 9,285,589.

[0088] The eyepiece's object recognition software may process the images being received by the eyepiece's forward-facing camera in order to determine what is in the field of view.

[0089] In other embodiments, the GPS coordinates of the location as determined by the eyepiece's GPS are used to determine what is in the field of view.

[0090] In other embodiments, an RFID or other beacon in the environment may be broadcasting a location. Any one or combination of the above may be used by the eyepiece to identify the location and the identity of what is in the field of view.

[0091] In embodiments, a DLP-type projector is employed to project the instructions. For example, a DLP projection system rated at 1500 lumens, mounted 4 feet above the surface of the grill at the midpoint of its long axis on the back side, so as to be out of the way of the workers operating the grill, can be used. The DLP projector uses optics to align the focal plane of the image with the grill surface. With these optics in place, the DLP projector is able to project images onto the grill surface that are easily readable by workers operating the grill.

[0092] Instructions may be projected on a wide variety of locations whether flat or irregularly shaped. In embodiments, instructions may be projected onto food items using standard video projector and projection mapping software that uses 3D knowledge of work surface to be projected upon to modify projected image so that it appears clear to the worker. Such software uses data on the orientation and shape of the surface to be projected upon, such as data provided by a 3D sensor and modifies the projected image accordingly.

[0093] Indeed, a wide range of projection systems may be incorporated into the food preparation system described herein and the invention is only intended to be limited as recited in the appended claims.

[0094] With reference again to FIG. 2, other modalities to deliver instructions or output to the kitchen worker include, for example, audio 240 or a display 250. For example, the computed instructions may be communicated and shown to the worker via the restaurant's KDS.

[0095] FIG. 3 is a flowchart illustrating an overview of a food preparation assistance method 300 for projecting food preparation instructions to a kitchen worker in accordance with an embodiment of the invention.

[0096] Step 310 states to receive customer order. Order data can be electronically collected from a POS system using any of a number of techniques, including, but not limited to: querying a remote server that collects order data, querying a local server that collects order data, and intercepting data sent to printer to create order tickets. Preferably, the order data is collected from a local or remote server via communication interface.

[0097] Step 320 states to compute an instruction for a kitchen worker to perform a food preparation step. The computation carried out in step 320 is performed by a processor programmed with software and is based on a number of types of information and/or data including, but not limited to, camera and sensor data, the current state of the food items, customer order information from step 310, recipe information, kitchen equipment information, food inventory information, and estimates of upcoming demand based on such items as historical order information correlated to relevant variables such as day of the week, time of day, presence of holiday, etc.

[0098] Step 330 states projecting the instruction onto a location in the workspace and viewable by the kitchen worker. As described further herein in connection with the projection system, an instruction can be projected (directly or virtually) onto the grill or a food item to indicate to the kitchen worker the food item needs to be manipulated (e.g., removed from the grill, flipped, etc.)

[0099] Step 340 states to update the state of food items, customer order, recipe, kitchen equipment, and food item inventory information. As described further herein, in embodiments, a kitchen scene understanding engine or module computes and updates a state of the food items in the kitchen using data from the sensors and cameras, as well as input from other means.

[0100] In embodiments, the state system monitors the time a food item is cooked based on an internal clock. The system automatically detects the food item placed on the grill using camera data, and automatically commences an internal electronic stopwatch type timer.

[0101] In other embodiments, the state system monitors the surface temperature of the food item being cooked based on IR camera readings. Yet in other embodiments, the state system monitors the internal temperature of the food item based on extrapolating or applying predictive algorithms to the a) surface temperature data of the food item, b) volumetric analysis of the food item based on the visual camera data, and/or c) temperature data of the grill surface. The system is updated and computes instructions once a threshold condition is met such as time elapsed, internal temperature target (e.g., a threshold temperature for a minimum amount of time), or surface temperature target (e.g., a threshold temperature for a minimum amount of time).

[0102] FIG. 4A depicts an overview of various software modules or engines 400 in accordance with an embodiment of the invention and which are run on the processor or server(s) described herein. FIG. 4A shows a Kitchen Scene Understanding Engine (KSUE) 430 receiving camera and sensor data 420. As described further herein, the KSUE computes the food item type and location information.

[0103] FIG. 4A also shows Food Preparation Supervisory System (FPSS) 494 receiving the location information from the KSUE. The FPSS keeps track of the state of the system and is operable, as described further herein, to compute instructions to present to the kitchen worker. The FPSS is a computational engine utilizing a wide range of inputs 410 such as but not limited to state of all relevant items, recipe, inventory, POS and order information, human inputs, and kitchen equipment specifications. The FPSS also accepts and is updated with the type and location of the food items and kitchen objects by the KSUE 430. Particularly, the system automatically detects the items, and updates the system state accordingly with the updated information.

[0104] FIG. 4A shows output information 412 including a restaurant display, restaurant KDS, data log, and in some embodiments, a robotic kitchen assistant (RKA) to carry out food preparation steps.

[0105] After the instructions for the kitchen worker area computed by the FPSS 494, the instructions are delivered to the projection system 496 to either virtually or actually project the instructions on the food item or kitchen location.

[0106] Now with reference to FIG. 4B, an expanded view of the Kitchen Scene Understanding Engine is shown. Particularly, the sensor data 420, including cameras and IR image data arising from viewing objects in the kitchen environment, is provided to the kitchen scene understanding engine 430. The sensor image data 420 is pre-processed 440 in order that the multi-sensor image data are aligned, or registered into one reference frame (e.g., the IR image reference frame).

[0107] The combined image data serves as an input layer 450 to a trained convolutional neural network (CNN) 460.

[0108] As shown with reference to step 460, a CNN processes the image input data to produce the CNN output layer 470. In embodiments, the CNN has been trained to identify food items and food preparation items, kitchen items, and other objects as may be necessary for the preparation of food items. Such items include but are not limited to human workers, kitchen implements, and food.

[0109] For each set of combined image data provided as an input layer to the CNN, the CNN outputs a CNN output layer 470 containing location in the image data and associated confidence levels for objects the CNN has been trained to recognize. In embodiments, the location data contained in the output layer 470 is in the form of a bounding box in the image data defined by two corners of a rectangle.

[0110] As described further herein, one embodiment of the CNN 460 is a combination of a region proposal network and CNN. An example of region proposal network and CNN is described in Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 39 Issue 6, June 2017, which is hereby incorporated by reference in its entirety. Examples of other types of convolution neural networks are described in Patent Publication Nos. US 20170169315 entitled Deeply learned convolutional neural networks (cnns) for object localization and classification; 20170206431 entitled Object detection and classification in images, and Pat. No. 9,542,621 entitled Spatial pyramid pooling networks for image processing, each of which is herein incorporated by reference in its entirety.

[0111] Optionally, the accuracy of the object's location within the image may be further computed. In some embodiments, for example, IR image data measured within the area defined by the bounding box taken from the CNN output layer is further processed to more accurately determine an object's location. Techniques to do so include various computer vision and segmentation algorithms known in the art such Ohta, Yu-Ichi, Takeo Kanade, and Toshiyuki Sakai. Color information for region segmentation. Computer graphics and image processing 13.3 (1980): 222-241; and Beucher, Serge, and Fernand Meyer. The morphological approach to segmentation: the watershed transformation. Optical Engineering-New York-Marcel Dekker Incorporated-34 (1992): 433-433.

[0112] In some embodiments, determining location information includes determining information on orientation including angular position, angle, or attitude.

[0113] It is to be appreciated that the direct incorporation of the IR image data into the image data that, along with the RGB and depth data, makes up the input layer 450 to the CNN 460 improves the performance of the system. Although determining exactly why the inclusion of a given sensor improves the capabilities of a CNN is challenging because of the nature of CNNs, we conjecture, and without intending to be bound to theory, that the IR data offer higher signal-to-noise ratios for certain objects of a given temperature in a kitchen environment where such objects are often placed on work surfaces or imaged against backgrounds with significantly different temperatures. In cases where the CNN is used to recognize foods by the extent to which they are cooked, the IR data provides helpful information to the CNN on the thermal state of the food item and work surface, which can be a cooking surface.

[0114] With reference again to FIG. 4B, the CNN output layer 470 is then further processed to translate the location data of the identified objects given in the two dimensional coordinate system of the image into a three dimensional coordinate system such as a world coordinate frame or system reference frame. In embodiments, the world coordinate frame is the same frame used by ancillary apparatus such as robotic kitchen assistants or other automated food preparation and cooking devices. Step 480 may be carried out using standard transformations as is known to those of ordinary skill in the art.

[0115] The resulting object output vector 490 represents a single observation on the presence of a food or other type of object. Particularly, the object output vector 490 contains the location of recognized objects in the 3D or world coordinate frame and a confidence level that each such recognized object is the object the CNN has been trained to identify.

[0116] In embodiments, the output vector comprises instances of known food items and the degree that each is cooked (namely, degree of doneness). In embodiments, the measure of cooking is the internal temperature of the object, such as a steak cooked to medium rare corresponding to an internal temperature of 130 to 135 degrees Fahrenheit. In embodiments, the CNN is trained to detect not just individual objects and their location, but the internal temperature of the objects. Measurements of the internal temperature of the food item can be taken with temperature sensors and used in the output vector for the training of the CNN. In some embodiments, these temperature measurements are taken dynamically by a thermocouple that is inserted into the food item.

[0117] In embodiments, an alternate or additional thermal model is used to track the estimated internal temperature of various food items to determine when they are cooked to the appropriate level. In these cases, data can be provided by the Kitchen Scene Understanding Engine on how long the various items have been cooked and their current surface temperature and or temperature history as measured by the IR camera.

[0118] Kitchen Bayesian Belief Engine 492 receives the object output vector 490 and assembles/aggregates the real-time continuous stream of these vectors into a set of beliefs which represents the state of all recognized food and kitchen implements in the kitchen area. In a sense, the output of the engine 430 is an atlas or aggregated set of information on the types of food, kitchen implements, and workers within the work space. An example of a final set of beliefs is represented as a list of objects that are believed to exist with associated classification confidences and location estimates, and in embodiments internal temperatures.

[0119] A Food Preparation Supervisory System 494 is shown receiving the updated food item state information from the Kitchen Scene Understanding Engine 430. As described above with reference to FIG. 4A, the Food Preparation Supervisory System 494 also can receive data from the order management system, inputs from workers, and computes visual instructions for the grill and assembly workers. The visual instructions are sent to the projection system as described above to instruct the worker accordingly. This Food Preparation Supervisory computation engine 494 has recipes on all the various food items and estimates (and in embodiments, may assist in defining or creating the estimates) for the time required to perform all relevant steps in the process. This computation engine preferably tracks the state of all relevant items, including but not limited to: grill temperature, items currently on grill, items in assembly area, total number of items processed, number of workers in the assembly area, tasks being performed by workers in assembly area, grill temperature, grill cleanliness, ingredient levels in storage bins, and inputs from restaurant workers.

[0120] The computational engine then sends the computed instructions to the projection system for viewing by the kitchen worker. In embodiments, commands are sent to a robotic kitchen assistant and optionally to the KDS or other displays and data logs.

[0121] FIG. 5 Pre-Processing Sensor Data

[0122] As stated above, in embodiments, the invention collects data from multiple sensors and cameras. This data is preferably pre-processed prior to being fed to the convolutional neural network for object recognition. With reference to FIG. 5, a flow diagram serves to illustrate details of a method 500 for pre-processing the data from multiple sensors in accordance with an embodiment of the invention.

[0123] Step 510 states to create multi-sensor point cloud. Image data from RGB and depth sensors are combined into a point cloud as is known in the art. In embodiments, the resulting point cloud is a size of m by n with X, Y, Z, and RGB at each point (herein we refer to the combined RGB and depth image point cloud as the RGBD point cloud). In embodiments, the size of the RGBD point cloud is 960 by 540.

[0124] Step 520 states to transform the multi-sensor point cloud to the IR sensor coordinates. The process of transforming an image from one frame to another is commonly referred to as registration (see, e.g., Lucas, Bruce D., and Takeo Kanade. An iterative image registration technique with an application to stereo vision. (1981): 674-679). Particularly, in embodiments, the RGBD point cloud is transformed into the frame of the IR camera using extrinsic transformations and re-projection. In embodiments, because the field of view of the RGB and depth sensors is larger than the field of view of the IR sensor, a portion of the RGB and depth data is cropped during registration and the resulting RGBD point cloud becomes 720 by 540.

[0125] Step 530 states to register the multi-sensor point cloud to the IR sensor data and coordinates. The transformed RGBD point cloud is registered into the IR frame by projecting the RGBD data into the IR image frame. In embodiments, the resulting combined sensor image input data is 720 by 540 RGBD, and IR data for each point. In embodiments, values are converted to 8-bit unsigned integers. In other embodiments, the registration process is reversed and the IR image is projected into the RGBD frame.

[0126] In embodiments with multiple sensors, including an IR camera, the registration of the data from the various sensors simplifies the training of the CNN. Registering the IR data and the RGB and depth data in the same frame of reference converts the input (namely, the image input data 450 of FIG. 4B) into a more convenient form for the CNN, improving the accuracy of the CNN to recognize food items and/or reducing the number of labeled input images required to train the CNN.

[0127] Following step 530, the registered multi-sensor image data is fed into the CNN, described above in connection with the Kitchen Scene Understanding Engine.

EXAMPLE

[0128] Described below is a prophetic example for illustrating the steps and aspects of the subject invention.

[0129] Initially, the processor or computer is configured to receive customer order information and operable to execute or run the software engines and modules described above.

[0130] When a first order is received the system evaluates the preparation time of each of the various items. The calculation of preparation time can include assembly and packaging.

[0131] Once preparation time is calculated, the system identifies the sequence of steps to cook the items so as to: (a) Minimize the time between completion of the food item and transfer of the food item for pick-up by the consumer; and (b) Ensure the even flow of completed items to the assembly area (based on a range of variables, including but not limited to, activities of current staff).

[0132] Once the computer has selected the initial item to be cooked, the system projects the representative symbol onto the grill which is an instruction for the grill operator to put the chosen food item onto the image on the grill. In embodiments, the symbols are as follows: C for chicken; B for burger patty; V for vegetable patty; and N for bun. Other abbreviations and symbols and insignia may be used to represent the food item.

[0133] The system selects the specific location of the area onto which to project the symbol by dividing up the grill space as follows: (a) The 24 inch deep 48 inch wide grill is divided into a grid of squares, with 4 squares on the short edge and 8 squares on the long edge; (b) The 24 inch deep 24 inch wide grill is divided into two separate grids of squares. The first grid runs the full depth and half the width of the grilling surface and is comprised of 8 squares in 4 rows and 2 columns. The second grid runs the full depth and the other half of the width of the grill surface and consists of 3 columns of 6 squares. Other grid dimensions may be used.

[0134] The squares on the grills can be denoted by their row and column position, with the bottom left square being denoted as e.g., (1,1). Further, squares on the 2448 grill can include the letter A and shall be denoted as e.g., A(1,1). Squares on the 2424 grill can include the letter B and shall be denoted as e.g., B(1,1).

[0135] For the first item to be cooked, the system selects the appropriate position based on which grill is to be used. Subsequent items are then placed to fill out rows. The specific locations may be chosen in order to optimize throughput and minimize errors.

[0136] The system monitors the necessary preparation time and cooking steps for each item and signals appropriate instructions to the worker.

[0137] The following instructions are used: F to flip the item; O to apply grilled onions to the item; and A to apply cheese to the item.

[0138] The letter number-combinations X1, X2, X3, X4, . . . , and X8 shall mean the instruction for the kitchen worker to remove the food item and put it into any one of eight different positions for later assembly.

[0139] For this example, a first order shall consist of: (a) Cheeseburger with grilled onions, lettuce, tomato, and special sauce; (b) Chicken sandwich with cheese, grilled onions, and special sauce; (c) Hamburger with lettuce, tomato, and special sauce.

[0140] A second order which comes in 3 minutes after the first order shall consist of: (a) Chicken sandwich with cheese, grilled onions, and special sauce; (b) Hamburger with lettuce, tomato, and special sauce; and (c) Hamburger with lettuce, tomato, and special sauce.

[0141] Cooking times may be pre-stored, uploaded with recipe data, or otherwise input into storage. In this example, the cooking times for the various items are as follows:

TABLE-US-00001 Order 1 cheeseburger 4 minutes Order 1 chicken sandwich with cheese 7 minutes Order 1 hamburger 4 minutes Order 2 chicken sandwich with cheese 7 minutes Order 2 hamburger 4 minutes Order 2 hamburger 4 minutes

[0142] Upon receiving the initial order, the system calculates the total production time of each item.

[0143] The system looks at the state of the all relevant elements of the food preparation system, recipes, time required for each step, available restaurant worker capacity, and other variables and determines an optimal cooking order and set of steps that are then communicated to grill workers via the projection system.

[0144] In this example, the system signals that the chicken is placed on the grill and cooked for one minute, followed by the two hamburgers. The system signals when to flip each item and put on cheese.

[0145] When the second order comes in, there are three items on the grill, and the chicken needs to be flipped in 30 seconds. The system calculates that there is sufficient time for the worker to place the new chicken patty on the grill and communicates that step to the grill worker.

[0146] Alternate Embodiments

[0147] In embodiments, instead of a predetermined time to cook a specific food item, one or more of the sensors and cameras are operable to detect placement of the food item on the grill, and to determine doneness of the food item. Particularly, in embodiments, IR data is used to determine whether a food item is done cooking by comparing an estimated internal temperature to a maximum or target internal temperature for the food item. The computational engine is operable to receive the IR data and trained to predict the internal temperature of the food item based on the IR data, and to evaluate whether the recognized food item is done. Alternatively, an internal temperature is estimated of the food item based on one or more of the following alone or in combination: grill temperature, time cooked, volumetric analysis of the food item, and IR surface temperature data.

[0148] In embodiments, a display (such as monitor or tablet) showing the kitchen location or food item is augmented with the instructions. The processor, cameras, sensors and display operate together to fuse or superimpose the instructions on top of the kitchen item, food item or location. Such super-imposition AR can be used in lieu or in combination with the AR projection described herein. In embodiments, the screen is a transparent screen (e.g., large glass panel) is positioned between the worker and food items or kitchen equipment, and instructions are presented on the screen to appear as if the instructions are located on top of the applicable equipment, utensil, and food item to be manipulated.

[0149] In embodiments, a portion of the tasks necessary to prepare various food items may be performed by a robotic kitchen assistant as described in, for example, provisional patent application no. 62/467,743, filed Mar. 6, 2017, and entitled ROBOTIC SYSTEM FOR PREPARING FOOD ITEMS IN A COMMERCIAL KITCHEN, and co-pending PCT Application No. ***to be assigned***, inventors D. Zito et al., filed Mar. 5, 2018, entitled ROBOTIC KITCHEN ASSISTANT FOR PREPARING FOOD ITEMS IN A COMMERCIAL KITCHEN AND RELATED METHODS corresponding to attorney docket number MIS001PCT, each of which is incorporated herein by reference in its entirety.

[0150] The invention described herein delegates some tasks to the human worker and other tasks to the Robotic Kitchen Assistant (RKA). For example, frying or sauting vegetables such as onions or peppers on the grill may be delegated to the human worker, and operating the deep fryer to cook French fries is performed automatically by the RKA.

[0151] Though the invention has been described in context of kitchen environments, the system or aspects of the system may be applied to any process involving directing human workers using projected information in dynamic environments. The invention is only intended to be limited as recited in the appended claims.

[0152] Other modifications and variations can be made to the disclosed embodiments without departing from the subject invention.

AUGMENTED REALITY-ENHANCED FOOD PREPARATION SYSTEM AND RELATED METHODS

Assignee

Inventors

Cpc classification

Classification Explorer

G06F18/2414

PHYSICS

Classification Explorer

G06Q10/06311

PHYSICS

Classification Explorer

G06V20/52

PHYSICS

Classification Explorer

G05B19/42

PHYSICS

Classification Explorer

B25J9/0081

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

A47J37/12

HUMAN NECESSITIES

Classification Explorer

A47J37/06

HUMAN NECESSITIES

Classification Explorer

A23V2002/00

HUMAN NECESSITIES

Classification Explorer

G06V10/764

PHYSICS

Classification Explorer

G06V20/68

PHYSICS

Classification Explorer

A47J36/32

HUMAN NECESSITIES

Classification Explorer

G05B2219/40391

PHYSICS

Classification Explorer

G06T2207/10048

PHYSICS

Classification Explorer

G06Q20/202

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G02B27/017

PHYSICS

Classification Explorer

A23L5/15

HUMAN NECESSITIES

Classification Explorer

G06Q50/12

PHYSICS

Classification Explorer

G06T2207/10024

PHYSICS

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

G06T7/70

PHYSICS

Classification Explorer

A23L5/10

HUMAN NECESSITIES

Classification Explorer

G06T7/73

PHYSICS

Classification Explorer

G06T2207/10028

PHYSICS

Classification Explorer

G05B2219/36184

PHYSICS

International classification