OBJECT RECOGNITION SYSTEMS AND METHODS
20230237558 · 2023-07-27
Inventors
- Bhavin Asher (Boca Raton, FL, US)
- Sam Zietz (Boca Raton, FL, US)
- Farshad Tafazzoli (Boca Raton, FL, US)
- Smit Patel (Boca Raton, FL, US)
- Badhri Suresh (Boca Raton, FL, US)
Cpc classification
G06Q20/208
PHYSICS
G06V10/22
PHYSICS
International classification
Abstract
An image sensor is used to capture an image that includes a plurality of objects. Presence and location data is identified for the plurality of objects. The image and the presence and location data is utilized to create individual representations of the plurality of objects. The plurality of objects are classified through employment of the individual representations. A machine learning model is updated with the classification data generated by classifying the plurality of objects.
Claims
1. A method comprising: using an image sensor to capture an image that includes a plurality of objects; detecting presence and location data of the plurality of objects; utilizing the image and the presence and location data to create individual representations of the plurality of objects; classifying the plurality of objects through employment of the individual representations; and updating a machine learning model with classification data generated by classifying the plurality of objects.
2. The method of claim 1, wherein the image sensor is a video camera.
3. The method of claim 1, wherein the machine learning model is a deep learning model.
4. The method of claim 1, wherein utilizing comprises cropping each of the plurality of objects from the image to create the individual representations.
5. The method of claim 1, wherein updating comprises adding classification information for the plurality of objects to a pre-existing machine learning model.
6. The method of claim 1, further comprising: displaying the two-dimensional image on an output display device; and using the individual representations to draw a boundary around each of the plurality of objects on the output display device.
7. The method of claim 1, wherein the image is a two-dimensional image.
8. A method for comprising: using an image sensor, at a first location, to capture an image that includes a plurality of objects; sending the image over a network to a second location; detecting, at the second location, presence and location data of the plurality of objects; utilizing the two-dimensional image and the presence and location data to create individual representations of the plurality of objects; classifying the plurality of objects through employment of the individual representations; updating a machine learning model with classification data generated by classifying the plurality of objects; and. sending the machine learning model over the network to the first location.
9. The method of claim 8, further comprising: loading the machine learning model at a user terminal of a point-of-sale system at the first location.
10. The method of claim 9, further comprising: capturing a second image of a second plurality of objects at the point-of-sale system; using the machine learning model to identify the second plurality of objects; creating a checkout cart including the second plurality of objects; and enabling the customer to purchase the second plurality of objects through the checkout cart.
11. The method of claim 8, further comprising: drawing a boundary around each of the second plurality of objects on the output display device.
12. The method of claim 8, wherein the second image is a two-dimensional image.
13. The method of claim 8, wherein the image sensor is a video camera.
14. An apparatus comprising: a processor; and a memory coupled with the processor, the memory comprising executable instructions that when executed by the processor cause the processor to effectuate operations comprising: using an image sensor to capture an image that includes a plurality of objects; detecting presence and location data of the plurality of objects; utilizing the two-dimensional image and the presence and location data to create individual representations of the plurality of objects; classifying the plurality of objects through employment of the individual representations; and updating a machine learning model with classification data generated by classifying the plurality of objects.
15. The apparatus of claim 14, wherein the image sensor is a video camera.
16. The apparatus of claim 14, wherein the machine learning model is a deep learning model.
17. The apparatus of claim 14, wherein utilizing comprises cropping each of the plurality of objects from the image to create the individual representations.
18. The apparatus of claim 14, wherein updating comprises adding classification information for the plurality of objects to a pre-existing machine learning model.
19. The apparatus of according to claim 14, wherein the operations further comprise: displaying the two-dimensional image on an output display device; and using the individual representations to draw a boundary around each of the plurality of objects on the output display device.
20. The apparatus of claim 14, wherein the image is a two-dimensional image.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] A more detailed understanding may be had from the following description, given by way of example in conjunction with the accompanying drawings wherein:
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
DETAILED DESCRIPTION
[0014] Referring to
[0015] In one example, system 100 includes a base surface 102, a terminal 104, a mount 106. Base surface 102 is utilized by a user of system 100 to place one or more objects thereon. Terminal 104 allows a user or operator to interact with system and may includes one or more input/output devices, such as touchscreens, keypads, and the like. Mount 106 in one embodiment comprises a first arm 108 extending from one end 110 upward at a 45 degrees angle from base surface 102. In one embodiment a second end 112 of first arm 108 is connected through a hinge 114 to a second arm 116. Second arm 116 in one example has a first end 118 and a second end 120. Second arm 116 extends longitudinally from first end 118 to second end 120 along a plane that is parallel to a plane of base surface 102. First arm 108 and second arm 116 may be used to mount hardware components utilized in system 100. In one embodiment first arm 108 includes a sensor 122 mounted thereto and second arm 116 includes a lighting source mounted thereto. It should be noted that the depicted configuration of base surface 102, first arm 108, and second arm 112 is provided for illustrative purposes and could be altered, added to, or subtracted to without departing from the scope of the present disclosure.
[0016] Referring further to
[0017] Referring to
[0018] Referring further to
[0019] It should be noted that the functionality, which is executed in connection with
[0020] Referring to
[0021] Referring now to
[0022] Referring to
[0023] Cropping module 322, in one example, utilizes the object identifiers to “crop” the individual objects 202(1) . . . 202(n) located in image 301. To “crop”, in one example, means to create individual images for each of the objects located in image 301. For instance, cropping module 322 may determine coordinate boundaries for an object 202(1) . . . 202(n) and then extract image data from image 301 corresponding to those boundaries. In one example, coordinate boundaries may be provided to terminal 104 (
[0024] The output of the cropping module is image data associated with objects 202(1) . . . 202(n) present in image 204(i). The image data is input to classifier module 324. Classifier module 324, in one example, comprises a machine learning module that is trained to identify objects. In the example classifier is module 324 is trained to identify food items. An embodiment of classifier module 324 has deep learning structure, based on a neural network, which can identify food items. Such a deep learning module may be supervised, unsupervised, or semi-supervised.
[0025] Classifier module 324 in one example receives image data corresponding to each object 202(1) . . . 202(n) and compares or superimposes the image data over one or more data sets provided to classifier module 324. For example, there are a number of available data sets (Frieburg Groceries Dataset, UFC Food 256), relating to grocery items, that may be provided to classifier module 324. Classifier module utilizes the image data from an object 202(1) . . . 202(n) with such data sets to identify objects 202(1) . . . 202(n) with precision. Similar to ODC 302, a deep learning architecture model is trained to recognize food items present in the inventory. In one example, the classifier module 324 is trained at the architecture level and then fine-tuned based on custom datasets to optimize its performance. The output of classifier module 324 is data set 303. Data set 303 in one example is then added to a model of objects that are utilized by system 100 (
[0026] Referring to
[0027] Referring to
[0028] In one embodiment, a deep learning model is created or updated by system 300 when an operator of system 100 determines that it wants to add new items to its deep learning model. Accordingly, system 100 commences a learning process 501. As part of the learning process 501, an operator of the system places items on base surface 102 and an image 301 is captured (
[0029] In one example, the model is created by updating a pre-existing model to include the items that are trained as part of a particular process. For instance, a pre-existing model may have a dataset comprising classification index having n objects 304(1) . . . 304(n). An operator of system 100 may elect to train k new objects. Accordingly, system 100 captures image data for the k new objects and provides the image data to system 300, which performs object detection and classification on the k new objects. Upon completion of object detection and classification for the k new object, system 300 updates the classification to 304(1) . . . 304(n+k) by adding the classification data for the k new objects to the preexisting data model.
[0030] It should be noted the processes described in connection with
[0031]
[0032]
[0033] The computer 920 may further include a hard disk drive 927 for reading from and writing to a hard disk (not shown), a magnetic disk drive 928 for reading from or writing to a removable magnetic disk 929, and an optical disk drive 930 for reading from or writing to a removable optical disk 931 such as a CD-ROM or other optical media. The hard disk drive 927, magnetic disk drive 928, and optical disk drive 930 are connected to the system bus 923 by a hard disk drive interface 932, a magnetic disk drive interface 933, and an optical drive interface 934, respectively. The drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules and other data for the computer 920. As described herein, computer-readable media is a tangible, physical, and concrete article of manufacture and thus not a signal per se.
[0034] Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 929, and a removable optical disk 931, it should be appreciated that other types of computer readable media which can store data that is accessible by a computer may also be used in the exemplary operating environment. Such other types of media include, but are not limited to, a magnetic cassette, a flash memory card, a digital video or versatile disk, a Bernoulli cartridge, a random access memory (RAM), a read-only memory (ROM), and the like.
[0035] A number of program modules may be stored on the hard disk, magnetic disk 929, optical disk 931, ROM 924 or RAM 925, including an operating system 935, one or more application programs 936, other program modules 937 and program data 938. A user may enter commands and information into the computer 920 through input devices such as a keyboard 940 and pointing device 942. Other input devices (not shown) may include a microphone, joystick, game pad, satellite disk, scanner, or the like. These and other input devices are often connected to the processing unit 921 through a serial port interface 946 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port, or universal serial bus (USB). A monitor 947 or other type of display device is also connected to the system bus 923 via an interface, such as a video adapter 948. In addition to the monitor 947, a computer may include other peripheral output devices (not shown), such as speakers and printers. The exemplary system of
[0036] The computer 920 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 949. The remote computer 949 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and may include many or all of the elements described above relative to the computer 920, although only a memory storage device 950 has been illustrated in
[0037] When used in a LAN networking environment, the computer 920 is connected to the LAN 951 through a network interface or adapter 953. When used in a WAN networking environment, the computer 920 may include a modem 954 or other means for establishing communications over the wide area network 952, such as the Internet. The modem 954, which may be internal or external, is connected to the system bus 923 via the serial port interface 946. In a networked environment, program modules depicted relative to the computer 920, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
[0038] Computer 920 may include a variety of computer readable storage media. Computer readable storage media can be any available media that can be accessed by computer 920 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media include both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 920. Combinations of any of the above should also be included within the scope of computer readable media that may be used to store source code for implementing the methods and systems described herein. Any combination of the features or elements disclosed herein may be used in one or more examples.
[0039] In describing preferred examples of the subject matter of the present disclosure, as illustrated in the Figures, specific terminology is employed for the sake of clarity. The claimed subject matter, however, is not intended to be limited to the specific terminology so selected, and it is to be understood that each specific element includes all technical equivalents that operate in a similar manner to accomplish a similar purpose.
[0040] This written description uses examples to disclose the invention, including the best mode, and also to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims.