SYSTEMS AND METHODS FOR BUILDING AND CONTROLLING A SELF-DRIVING ROBOT
20250123630 ยท 2025-04-17
Inventors
Cpc classification
International classification
Abstract
Systems and methods for controlling self-driving robots are provided. Data from a plurality of sensors, associated with a robotic device, is received by a plurality of neural network models. Each neural network model of the plurality of neural network models receives a subset of the data from the plurality of sensors. The plurality of neural network models generate, based on the data from the plurality of sensors, a plurality of outputs. Each output of the plurality of outputs is generated by a particular neural network model of the plurality of neural network models. Each output corresponds to a respective subset of data received by the particular neural network model of the plurality of neural network models. Thereafter, an output from the plurality of outputs is selected, where the output includes data for controlling the robotic device. The robotic device is then controlled based on the output.
Claims
1. A system, comprising: a non-transitory memory; and one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising: receiving, at a plurality of neural network models, data from a plurality of sensors associated with a robotic device, wherein each neural network model of the plurality of neural network models receives a subset of the data from a different subset of sensors of the plurality of sensors; generating, by the plurality of neural network models and based on the data from the plurality of sensors, a plurality of outputs, wherein each output of the plurality of outputs is generated by a particular neural network model of the plurality of neural network models, and wherein each output corresponds to a respective subset of data received by the particular neural network model of the plurality of neural network models; selecting an output from the plurality of outputs, the output comprising data for controlling the robotic device; and controlling the robotic device based on the output.
2. The system of claim 1, wherein the output comprises steering data and throttle data.
3. The system of claim 1, wherein the operations further comprise: converting the output into control signals; and controlling the robotic device based on the control signals.
4. The system of claim 1, wherein the subset of data received by respective first and second models of the plurality of neural network models is different.
5. The system of claim 1, wherein the data is received from the plurality of sensors in real-time or at predefined time intervals.
6. The system of claim 1, wherein the selecting the output is performed by an agent module, in communication with the plurality of neural network models, based on a target task for the robotic device.
7. The system of claim 1, wherein the plurality of neural network models are configured to perform a plurality of different tasks associated with the controlling the robotic device.
8. The system of claim 1, wherein two or more models of the plurality of neural network models are configured to perform a same task associated with the controlling the robotic device, and wherein the selecting the output is performed by an agent module, in communication with the plurality of neural network models, based on a performance metric for each of the two or more models for performing the same task.
9. The system of claim 1, wherein at least some of the plurality of neural network models comprise models trained using a plurality of data augmentation transformations.
10. The system of claim 6, wherein a neural network engine comprises the plurality of neural network models and the agent module, and wherein the robotic device comprises the neural network engine.
11. The system of claim 6, wherein a neural network engine comprises the plurality of neural network models and the agent module, and wherein the robotic device is configured to communicate with the neural network engine over a network.
12. A method, comprising: receiving, at a plurality of neural network models, data from a plurality of sensors associated with a robotic device, wherein each neural network model of the plurality of neural network models receives a subset of the data from a different subset of sensors of the plurality of sensors; generating, by the plurality of neural network models and based on the data from the plurality of sensors, a plurality of outputs, wherein each output of the plurality of outputs is generated by a particular neural network model of the plurality of neural network models, and wherein each output corresponds to a respective subset of data received by the particular neural network model of the plurality of neural network models; selecting an output from the plurality of outputs, the output comprising data for controlling the robotic device; and controlling the robotic device based on the output.
13. The method of claim 12, wherein the output comprises steering data and throttle data.
14. The method of claim 12, further comprising: converting the output into control signals; and controlling the robotic device based on the control signals.
15. The method of claim 12, wherein the subset of data received by respective first and second models of the plurality of neural network models is different.
16. The method of claim 12, wherein the plurality of neural network models are configured to perform a plurality of different tasks associated with the controlling the robotic device.
17. The method of claim 12, wherein at least some of the plurality of neural network models comprise models trained using a plurality of data augmentation transformations.
18. The method of claim 12, wherein a neural network engine comprises the plurality of neural network models and an agent module, and wherein the robotic device comprises the neural network engine or is configured to communicate with the neural network engine over a network.
19. A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: receiving, at a plurality of neural network models, data from a plurality of sensors associated with a robotic device, wherein each neural network model of the plurality of neural network models receives a subset of the data from a different subset of sensors of the plurality of sensors; generating, by the plurality of neural network models and based on the data from the plurality of sensors, a plurality of outputs, wherein each output of the plurality of outputs is generated by a particular neural network model of the plurality of neural network models, and wherein each output corresponds to a respective subset of data received by the particular neural network model of the plurality of neural network models; selecting an output from the plurality of outputs, the output comprising data for controlling the robotic device; and controlling the robotic device based on the output.
20. The non-transitory machine-readable medium of claim 19, wherein the output comprises steering data and throttle data, and wherein the operations further comprise: converting the output into control signals; and controlling the robotic device based on the control signals.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0004]
[0005]
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015] Embodiments of the present disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numerals are used to identify like elements illustrated in one or more of the figures, wherein showings therein are for purposes of illustrating embodiments of the present disclosure and not for purposes of limiting the same.
DETAILED DESCRIPTION
[0016] The following disclosure provides different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. In addition, one skilled in the art will understand that the following description has broad application, and the discussion of any embodiment is meant only to be exemplary of that embodiment, and not intended to intimate that the scope of the disclosure, including the claims, is limited to that embodiment.
[0017] Certain terms are used throughout the following description and claims to refer to particular system components. As one skilled in the art will appreciate, various companies may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not function. In the following discussion and in the claims, the terms including and comprising are used in an open-ended fashion, and thus should be interpreted to mean including, but not limited to . . . . Also, the term couple or couples is intended to mean either an indirect or direct connection. Thus, if a first device couples to a second device, that connection may be through a direct connection, or through an indirect connection via other devices and connections.
[0018] As used herein, the term network may comprise any hardware or software-based framework that includes any artificial intelligence network or system, neural network or system and/or any training or learning models implemented thereon or therewith. Further, the term module may comprise hardware or software-based framework that performs one or more functions. In some embodiments, the module may be implemented on one or more neural networks. Also, the term autonomous or autonomously may be used to describe operations performed without human intervention or input, such as operations performed by a computing device, network, module, neural network engine, or combinations thereof. For purposes of this discussion, real-time may be defined as any process which has an immediate request, processing, and response workflow. In some embodiments, real-time may also be defined as a process that occurs while an application or service is running (e.g., during runtime).
[0019] As previously noted, autonomous robots have experienced rapid technological advancement due, in large part, to advancements in artificial intelligence (AI). Generally, autonomous robots (or autonomous vehicles, as one example) may be equipped with, or otherwise in communication with, a plurality of sensors that provide vast amounts of information about the vehicle and/or a surrounding environment. AI models can be used to process these vast amounts of data, collected and/or provided by the sensors, in real-time to provide for autonomous robot control in dynamically changing environments. However, accurate and efficient processing of such data, as well as systems and platforms to develop and test AI models, remains challenging. Thus, existing systems and methods have not proved entirely satisfactory in all respects.
[0020] Embodiments of the disclosure offer advantages over the existing art, though it is understood that other embodiments may offer different advantages, not all advantages are necessarily discussed herein, and no particular advantage is required for all embodiments. For example, embodiments discussed herein include systems and methods for controlling self-driving robots, thereby effectively overcoming various shortcomings of existing implementations. In various implementations, and for purposes of the present disclosure, autonomous robots (or robotic devices) may have the form of a toy car. However, it will be understood that aspects of the present disclosure are not limited to robotic devices having a particular form. In some embodiments, autonomous robots may have alternate forms such as robots suitable for use as a humanoid robot, industrial robot, service robot, factory robot, military robot, mining robot, drone, automobile, boat, security monitoring device, medical device, or other type of vehicle or robotic device.
[0021] In some embodiments, the disclosed autonomous robot may include or be communicatively coupled to multiple sensors. In some cases, the sensors, or other components of the robot, may be attached to the robot using custom 3D printed components. By way of example, and in some embodiments, the sensors may include a camera, a depth camera, radar, LiDAR, GPS, inertial sensors, other suitable sensors, or combinations thereof. Generally, the data provided by the sensors may be used to identify the location of the autonomous robot, the environment surrounding the autonomous robots (e.g., including obstacles in the environment), and predict or otherwise control the steering, throttle, speed, direction, etc., of the autonomous robot. In some embodiments, the sensor data may be provided to a computing device that includes a neural network engine and an automatic drive engine. In some examples, the neural network engine may include multiple neural network models and an agent module. For purposes of the present disclosure, the neural network models may include machine learning (ML) models, deep learning (DL) models, and/or other suitable AI models, as discussed in more detail below. Each neural network model may receive a subset of data from one or more of the sensors and generate an output that includes steering and/or throttle data. In some cases, the subset of data received by each neural network model may be different from subsets of data received by the other neural network models as each subset of data may include data from different combinations of sensors. In some embodiments, the agent module may select an output of one of the neural network models and transmit the output to the automatic drive engine. The automatic drive engine, in turn, may convert the steering and/or throttle data in the output into control data for controlling the autonomous robot. In various embodiments, this process may repeat iteratively to control the driving direction and speed of the autonomous robot.
[0022] Additional embodiments and advantages will become evident in the discussion that follows and with reference to the accompanying figures. As one example, embodiments of the present disclosure provide a client graphical user interface (GUI) that is compatible with, and which can communicate with, diverse hardware systems mounted on and used to control the autonomous robot. This may be particularly useful for testing AI models across diverse hardware systems and/or for provider a user with the option to choose a particular hardware system based on availability or preference.
[0023] Referring now to
[0024] In some embodiments, the system 100 may include one or more client devices 102, computing devices 104, and robotic devices 106. Although only one or two of each is illustrated, it will be understood that various embodiments may include any number of client devices 102, computing devices 104, and robotic devices 106. In various examples, the client devices 102, computing devices 104, and robotic devices 106 are coupled to a network 108, for example by way of network communication devices, as discussed below. As shown, some of the computing devices 104 and robotic devices 106 may be separate devices, each independently coupled to the network 108. In other examples, the robotic device 106 may include the computing device 104 (e.g., the computing device 104 may be mounted onto or otherwise coupled to the robotic device 106), or be a single device 106/104, where the combination robotic device 106/computing device 104 is then coupled to the network 108.
[0025] In some embodiments, the client devices 102 may include any type of computing device such as a laptop, a desktop, a mobile computing device, a smart phone, a tablet, a PC, a wearable computing device (e.g., such as a smart watch, virtual reality headset, eyeglasses that incorporate computing devices, implantable computing devices, etc.), and/or any other computing device having computing and/or communications capabilities in accordance with the described embodiments. Client devices 102 may be operated by a user and may be configured to transmit, receive, manipulate data, execute various applications, and communicate with other devices connected to the network 108.
[0026] By way of example, client devices 102 generally may provide one or more client programs, such as system programs and application programs to perform various computing and/or communications operations. Example system programs may include, without limitation, an operating system (e.g., MICROSOFT OS, UNIX OS, LINUX OS, macOS, JavaOS, and others), run-time environments (e.g., such as for NVIDIA Jetson computing boards, Google Coral, Intel Movidius Neural Compute Stick, BeagleBone AI, Xilinx Zynq UltraScale+ MPSoC, Qualcomm Snapdragon, Arduino Portenta H7), device drivers, programming tools, utility programs, software libraries, application programming interfaces (APIs), and so forth. Example application programs may include, without limitation, an autonomous robot or vehicle control application, a web browser application, messaging application, contacts application, calendar application, electronic document application, database application, media application (e.g., music, video, television), location-based services (LBS) application (e.g., GPS, mapping, directions, positioning systems, geolocation, point-of-interest, locator) that may utilize hardware components such as an antenna, and so forth. One or more of client programs may display various graphical user interfaces (GUIs) to present information to and/or receive information inputted by one or more users of client devices 102.
[0027] As shown, client devices 102 are coupled to one or more networks 108, the one or more networks 108 further coupled to robotic devices 106 and computing devices 104. The computing devices 104 may include a neural network engine configured to implement one or more of the functionalities of the various embodiments of the present disclosure, as described in more detail below. In some instances, computing devices 104 may be a server conducive to processing and storing large amounts of data. Similarly, the robotic devices 106 may be configured to implement one or more of the functionalities of the embodiments of the present disclosure, as described below. Further, in various embodiments, the system 100 may be structured, arranged, and/or configured such that the functionalities of the computing device 104, including functionalities provided by the neural network engine, are provided alternatively and/or additionally by one or both of the client devices 102 and the robotic devices 106.
[0028] In some embodiments, the network 108 of the system 100 may be implemented as a single network or as a combination of multiple networks. For example, in various embodiments, the network 108 may include the Internet and/or one or more intranets, landline networks, wireless networks, cellular networks, satellite networks, private or local area networks, wide area networks, and/or other appropriate types of networks. In some examples, the client devices 102, the computing devices 104, and/or the robotic devices 106 may communicate through the network 108 via cellular communication, by way of one or more user network communication devices. In other examples, the client devices 102, the computing devices 104, and/or the robotic devices 106 may communicate through the network 108 via wireless communication (e.g., via a WiFi network), by way of one or more user network communication devices. In yet other examples, the client devices 102, the computing devices 104, and/or the robotic devices 106 may communicate through the network 108 via any of a plurality of other radio and/or telecommunications protocols, by way of one or more user network communication devices. In still other embodiments, the client devices 102, the computing devices 104, and/or the robotic devices 106 may communicate through the network 108 using a Short Message Service (SMS)-based text message, by way of one or more user network communication devices.
[0029] In some embodiments, the robotic devices 106 include one or more computing boards. The computing boards may provide a complete set of computer components integrated onto a single circuit board. For example, in various embodiments, the computing boards may include one or more processors (e.g., CPU and/or GPU), a system on a chip (SoC), a memory, input/output (I/O) interfaces, USB ports, Ethernet ports, wireless networking chips (e.g., such as an IEEE 802.11 chipset, a Bluetooth low-energy module), as well as other appropriate features. By way of example, the computing boards may include an a Jetson Nano board, a Raspberry Pi single-board computer, or other type of computing board. In some embodiments, the robotic devices 106 also include one or more microcontroller boards, such as an Arduino board, or other microcontroller board(s). Further, in various cases, the computing boards and microcontroller boards embedded within the robotic devices 106 may be coupled to the network 108, for example by way of network communication devices. In some embodiments, such as when the computing boards are embedded within the robotic devices 106, the computing boards may serve as the computing device 104. In some examples, the robotic devices 106 may include a handheld computing device executing one or more applications to provide for streaming video to a client, remote control, local control, neural network inference, interfacing with the computing boards, data recording, etc. In some embodiments, the robotic devices 106 include, or are communicatively coupled to, a plurality of sensors. In some cases, the sensors, as well as other components (e.g., such as the handheld computing device), may be attached to the robotic devices 106 using custom 3D printed components. As noted above, the sensors may include a camera, a depth camera, radar, LiDAR, GPS, inertial sensors, other suitable sensors, or combinations thereof. In various examples, the sensors may be coupled to the robotic devices 106 in various locations, including front, back, sides, top, and bottom of the robotic devices 106 or be included within computing device 104. In some embodiments, the data provided by the sensors may be communicated to the computing boards. Alternatively, in some embodiments, the data provided by the sensors may be communicated to the client devices 102 and/or the computing device 104, over the network 108, to control the robotic device 106.
[0030] In some embodiments, the computing device 104 may be a server or another computing device that includes software and hardware for processing large amounts of data in real-time. As described in more detail below, the computing device 104 may include a neural network engine including a plurality of neural network models. In some embodiments, the computing device 104 may provide and execute one or more programs, such as operating system (e.g., MICROSOFT OS, UNIX OS, LINUX OS, macOS, JavaOS, and others), run-time environments (e.g., such as for NVIDIA Jetson computing boards), device drivers, programming tools, utility programs, software libraries, APIs, and so forth. In some embodiments, the computing device 104 includes a database used to store and maintain various types of information for use by the system 100 and may comprise or be implemented by various types of computer storage devices (e.g., servers, memory) and/or database structures (e.g., relational, object-oriented, hierarchical, dimensional, network) in accordance with the described embodiments. It can be appreciated that the computing device 104 may be deployed in other ways and that the operations performed and/or the services provided by the computing device 104 may be combined or separated for a given implementation and may be performed by a greater number or fewer number of servers. Further, as previously noted, the operations performed and/or the services provided by the computing device 104 may be performed by one or both of the client devices 102 and the robotic devices 106. Thus, in at least some cases, one or both of the client devices 102 and the robotic devices 106 may include the computing device 104. For instance, as shown in
[0031] For a better understanding of the various embodiments disclosed herein, reference is now made to
[0032] In each of the implementations of
[0033] As shown in each of
[0034] In some embodiments, such as shown in the implementations of
[0035] In various embodiments, and as shown in the implementations of
[0036] In some embodiments, and with reference to the various implementations of
[0037] Elaborating on aspects of at least some of the hardware components, and with reference to
[0038] In accordance with various embodiments of the disclosure, computer system 300, such as a computer and/or a server, includes various resources, such as a bus 302 or other communication mechanism for communicating information, which interconnects subsystems and components, such as a processing component 304 (e.g., processor, micro-controller, digital signal processor (DSP), graphics processing unit (GPU), etc.), a system memory component 306 (e.g., RAM), a static storage component 308 (e.g., ROM), a disk drive component 310 (e.g., magnetic, optical, or solid-state), a network interface component 312 (e.g., modem, Ethernet card, IEEE 802.11 networking component, or a Bluetooth low-energy module), a display component 314 (e.g., CRT or LCD), an input component 318 (e.g., keyboard, keypad, or virtual keyboard), a cursor control component 320 (e.g., mouse, pointer, or trackball), a location determination component 322 (e.g., a Global Positioning System (GPS) device as illustrated, a cell tower triangulation device, and/or a variety of other location determination devices known in the art), and/or a camera component 323. In one implementation, the disk drive component 310 may comprise a database having one or more disk drive components.
[0039] In accordance with embodiments of the disclosure, the computer system 300 performs specific operations by the processing component 304 executing one or more sequences of instructions contained in the memory component 306, such as described herein with respect to the client devices, computing devices, and robotic devices. Such instructions may be read into the system memory component 306 from another computer readable medium, such as the static storage component 308 or the disk drive component 310. In other embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement embodiments of the disclosure.
[0040] Logic may be encoded in a computer readable medium, which may refer to any medium that participates in providing instructions to the processing component 304 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. In one embodiment, the computer readable medium is non-transitory. In various implementations, non-volatile media includes optical or magnetic disks, such as the disk drive component 310, volatile media includes dynamic memory, such as the system memory component 306, and transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise the bus 302. In one example, transmission media may take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
[0041] Some common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, carrier wave, or any other medium from which a computer is adapted to read. In one embodiment, the computer readable media is non-transitory.
[0042] In various embodiments of the disclosure, execution of instruction sequences to practice the disclosure may be performed by the computer system 300. In various other embodiments of the disclosure, a plurality of the computer systems 300 coupled by a communication link 324 to the network 108/208 (e.g., such as a LAN, WLAN, PTSN, and/or various other wired or wireless networks, including telecommunications, mobile, and cellular phone networks) may perform instruction sequences to practice embodiments of the disclosure in coordination with one another.
[0043] The computer system 300 may transmit and receive messages, data, information and instructions, including one or more programs (i.e., application code) through the communication link 324 and the network interface component 312. The network interface component 312 may include an antenna, either separate or integrated, to enable transmission and reception via the communication link 324. Received program code may be executed by processing component 304 as received and/or stored in disk drive component 310 or some other non-volatile storage component for execution.
[0044] Where applicable, various embodiments provided by the disclosure may be implemented using hardware, software, or combinations of hardware and software. Also, where applicable, the various hardware components and/or software components set forth herein may be combined into composite components comprising software, hardware, and/or both without departing from the scope of the disclosure. Where applicable, the various hardware components and/or software components set forth herein may be separated into sub-components comprising software, hardware, or both without departing from the scope of the disclosure. In addition, where applicable, it is contemplated that software components may be implemented as hardware components and vice-versa.
[0045] Software, in accordance with the disclosure, such as program code and/or data, may be stored on one or more computer readable mediums. It is also contemplated that software identified herein may be implemented using one or more general purpose or specific purpose computers and/or computer systems, networked and/or otherwise. Where applicable, the ordering of various steps described herein may be changed, combined into composite steps, and/or separated into sub-steps to provide features described herein.
[0046] Referring now to
[0047] As shown in the example of
[0048] For purposes of illustration,
[0049] Returning to
[0050] In some embodiments, the drive window 502 may further include a status window 510 that includes radio buttons, toggle button, or other graphical control elements, to indicate a status of, and in some embodiment to provide selection of, various features and/or functions of the system 200. In the present example, the status window 510 provides radio buttons to indicate an ON/OFF state of the system and/or to turn the system ON or OFF, to indicate a status of a record feature and/or to activate or deactivate the record feature, and to indicate a status of an autodrive feature and/or to activate or deactivate the autodrive feature. In some embodiments, when the record feature is active, the client application may save (e.g., at the client device 202, at the robotic device 206, at the computing device 204, at the handheld computing device 234, at the computing board 236, or a combination thereof) data collected by the sensors 210-218 (including sensors 210A, 212A, 216A, 218A) and/or data generated by user inputs or the neural network engine 222/automatic drive engine 220 (e.g., such as image data, throttle data, steering data, controller data, radar data, LiDAR data, and/or other data). In some embodiments, when the autodrive feature is active, the robotic device 206 may be controlled using the computing device, including the neural network engine 222 and the automatic drive engine 220, as previously described. Additional details regarding the autodrive feature are provided below with reference to
[0051] The drive window 502 may also include a custom view window 512. In some embodiments, the custom view window 512 may display a cropped view of frames/images that are to be subsequently normalized and provided as inputs to a model (e.g., such as a neural network model) which generates an output that includes steering and/or throttle data. Alternatively, in some cases, the custom view window 512 may display an image captured by the depth camera 212. The desired display of the custom view window 512 may be selectable using a drop-down menu 514, as shown.
[0052] As noted above, the client application interface may be created using Python (e.g., PySide6 and Qt for Python). As such, in some embodiments, the drive window 502 may use multiple Python threads (e.g., two threads) for control. For example, one of the threads is for updating the camera (e.g., including the camera feed portion 504) and one for the controller (e.g., selectable via the drop-down menu 508). Both of these threads work independently of each other and provide for concurrent execution. In some examples, a Python object is also used to instantiate a handler, controller, and camera, all of which are used conjointly to provide the user with the necessary capabilities to drive the robotic device 206 remotely. In some embodiments, the handler can be used to handle both the controllers (e.g., selectable via the drop-down menu 508) and the robotic devices 206 regardless of particular configuration (e.g., type of handheld computing device 234 or computing board 236). Thus, as one example, regardless of a particular platform (e.g., such as iOS, Android, Jetson run-time) that a user selects, all of the controllers available via the drop-down menu 508 will work seamlessly with the selected platform. As a result, it may not be necessary to write custom code for each controller, adding new controllers or new platforms is fairly simple, and the system can be more easily maintained.
[0053] With reference to
[0054] With reference to
[0055] Turning to the use of neural networks to control the robotic device 206, and as noted above, the handheld computing device 234 may execute one or more applications to provide for neural network inference and training, among others. In some embodiments, when an Android OS platform is used (e.g., such as when the handheld computing device 234 includes an Android handheld computing device), the handheld computing device 234 may utilize a first machine learning platform including a Tensorflow/Pytorch application for training an inference of the neural network models 224. In some embodiments, when an Apple iOS platform or a Jetson run-time environment is used (e.g., such as when the handheld computing device 234 includes an Apple handheld computing device or when the computing board 236 includes a NVIDIA Jetson computing board), the handheld computing device 234 and/or the computing board 236 may utilize a second machine learning platform including a Pytorch application for training an inference of the neural network models 224. It will be understood that in other embodiments, different machine learning platforms may be used without departing from the scope of the present disclosure.
[0056] With respect to the neural network training and deployment stack, and in various embodiments, different strategies may be used to enhance the output of the neural network models (e.g., such as the neural network models 224). For instance, in some cases, neural network engine may be trained on a training dataset that includes images and/or sensor data collected using sensors 210-218 or sensors 210A, 212A, 216A, 218A. In some embodiments, training may be performed using a remote device (e.g., such as the cloud-based computing device 204), and the trained model(s) may be downloaded to a handheld computing device 234 or a computing board 236 for local execution. In some cases, data augmentation may be used to increase and/or supplement data in a training set for a neural network model, to increase its robustness and performance, and cover various test cases that are not included in the original training dataset. Data augmentation may be used in behavior cloning and may be applied to various image types. Data augmentation may generate new training images by applying various transformations to an original set of images. By way of example, such transformations may include geometric transformations such as translation, cropping, flipping, and rotation, random erasing, and photometric transformations (e.g., such as lighting and color), among others. It is also noted that before being used to train a neural network model, the input data (e.g., which may include frames/images, for instance) should also be normalized to provide faster model convergence, prevent gradient issues, and improve model performance.
[0057] Referring to
[0058] Referring to
[0059] Referring to
[0060] Referring to
[0061] As previously discussed, the neural network models used in the disclosed embodiments may include a backbone model composed of various CNNs such as PilotNet, MobileNet, Resnet50, VGG19, VGG16, Alexnet, YOLO, or other suitable CNNs. In some cases, one or more multi-layer perceptrons (MLPs) are added to the backbone model. In some embodiments, transfer learning (applying a pre-trained model to another related task) is used to initialize the model. By way of example, transferred weights may come from a model pre-trained using an image database for training (e.g., such as ImageNet). If transfer learning is used, there are several ways to implement the transfer learning such as: (i) fixing the backbone model and training only the MLPs; (ii) training the whole model together (copy weights only); or (iii) fix the first several layers of the backbone model only.
[0062] Expanding on the neural network models that may be implemented in various embodiments of the present disclosure,
[0063] The neural network models 224 may comprise a neural network architecture. The example neural network architecture may comprise an input layer 1302, one or more hidden layers 1304 and an output layer 1306. The neural network model models 224 may be built as a collection of connected units or nodes, referred to as neurons 1308. Each layer 1302, 1304, or 1306 may comprise the same or different number of neurons or nodes 1308, with neurons between layers being interconnected according to a specific topology. Each neuron 1308 may be associated with an adjustable weight. The neurons 1308 may be aggregated into layers 1302, 1304, 1306 such that different layers may perform different transformations on the respective input to generate a transformed output, which is an input for the subsequent layer. Further, different layers in a neural network model may be combined into their own neural network models, such that an output layer of one neural network model, is an input into the next neural network model, until a final output layer 1306 is reached.
[0064] Input layer 1302 receives input data, such as a subset of data from one or more of the sensors 210-218 or sensors 210A, 212A, 216A, 218A, including structured, numerical data, and/or images. The number of nodes (neurons) in the input layer 1302 may be determined by the dimensionality of the input data (e.g., the length of a vector of a given example of the input). Each node 1308 in the input layer 1302 may represent a feature or attribute of the input.
[0065] The hidden layers 1304 are intermediate layers located between the input and output layers 1302, 1306 of the neural network models 224. Although three hidden layers 1304 are shown, there may be any number of hidden layers in the neural network models 224. Hidden layers 1304 may extract and transform the input data through a series of weighted computations and activation functions associated with individual neurons.
[0066] For example, the neural network models 224 may receive input (e.g., such as a subset of data from one or more of the sensors 210-218, including sensors 210A, 212A, 216A, 218A) at input layer 1302 and generate a classifier that is an output of output layer 1306. To perform the transformation, each neuron 1308 receives input signals (which may be the input to a neural network model or the output of the preceding layer), performs a weighted sum of the inputs according to weights assigned to each connection and then applies an activation function associated with the respective neuron 1308 to the result. The output of the neuron is passed to the next layer of neurons or serves as the final output of the network. The activation function may be the same or different across different layers 1302, 1304, 1306, and may be different at neurons 1308 within each layer. Example activation functions include but not limited to Sigmoid, hyperbolic tangent, Rectified Linear Unit (ReLU), Leaky ReLU, Softmax, and/or the like. In this way, input data received at the input layer 1302 is transformed by hidden layers 1304 into different values indicative of data characteristics corresponding to a task that the neural network model has been trained to perform.
[0067] The output layer 1306 is the final layer of the neural network structure. It produces the network's output or prediction based on the computations performed in the preceding layers (e.g., 1302, 1304). The number of nodes in the output layer depends on the nature of the task being addressed. For example, in a binary classification problem, the output layer may consist of a single node representing the probability of belonging to one class. In a multi-class classification problem, the output layer may have multiple nodes, each representing the probability of belonging to a specific class. In the embodiments discussed herein, an output of output layer 1306 may comprise steering data and/or throttle data for controlling a robotic device (e.g., such as the robotic device 206).
[0068] The above neural network structure of neural network models 224, including the structure and number of hidden layers 1308 may be adjusted to improve accuracy, speed, and throughput off neural network models 224. For example, one of neural network models 224 may have fewer hidden layers 1308, take up less memory space, be less accurate, and may have faster processing speed, and may be suitable for hand-held computing device 234 discussed in
[0069] Neural network models 224 may also be implemented by hardware, software and/or a combination thereof. For example, the neural network models 224 may comprise a specific neural network structure implemented and run on various hardware platforms, such as but not limited to CPUs (central processing units), GPUs (graphics processing units), FPGAs (field-programmable gate arrays), Application-Specific Integrated Circuits (ASICs), dedicated AI accelerators like TPUs (tensor processing units), and specialized hardware accelerators designed specifically for the neural network computations described herein, and/or the like. Example specific hardware for neural network structures may include, but not limited to Google Edge TPU, Deep Learning Accelerator (DLA), NVIDIA AI-focused GPUs, and/or the like. The hardware may be used to implement the neural network structure is specifically configured based on factors such as the complexity of the neural network, the scale of the tasks (e.g., training time, input data scale, size of training dataset, etc.), and the desired performance.
[0070] The neural network models 224 may be trained by iteratively updating the underlying weights of the neurons 1308, etc., bias parameters and/or coefficients in the activation functions associated with neurons 1308. The weights may be updated based on a loss function, such as a mean squared estimation error (MSEE), cross-entropy loss, log-loss, and the like. For example, during training, the training data such historical signals are fed into neural network models 224 over thousands of iterations. The training data flows through the network's layers 1302, 1304, 1306, with each layer performing computations based on its weights, biases, and activation functions until the output layer 1306 produces the output.
[0071] The training data may be labeled with an expected output (e.g., a ground-truth such as the corresponding give an example of ground truth label). The output generated by the output layer 1306 is compared to the expected output from the training data to compute a loss function that measures the discrepancy between the predicted output and the expected output. In some embodiments, the negative gradient of the loss function may be computed with respect to the weights of each layer individually. This negative gradient is computed one layer at a time, iteratively backward from the output layer 1306 to the input layer 1302 of the neural network model. These gradients quantify the sensitivity of the network's output to changes in the parameters. The chain rule of calculus is applied to efficiently calculate these gradients by propagating the gradients backward (in a back propagation network) from the output layer 1306 to the input layer 1302.
[0072] Parameters of the neural network are updated backwardly from the output layer to the input layer (backpropagating) based on the computed negative gradient using an optimization algorithm to minimize the loss. The backpropagation from the last layer (output layer 1306) to the input layer 1302 may be conducted for a number of training samples in a number of iterative training epochs. In this way, parameters of the neural network models 224 may be gradually updated in a direction to result in a lesser or minimized loss, indicating the neural network has been trained to generate a predicted output value closer to the target output value with improved prediction accuracy. Training may continue until a stopping criterion is met, such as reaching a maximum number of epochs or achieving satisfactory performance on the validation data. In a multiple neural network embodiment, the neural network models may be trained separately and then combined together and trained as a single neural network model.
[0073] Neural network parameters may be trained over multiple stages. For example, initial training (e.g., pre-training) may be performed on one set of training data, and then an additional training stage (e.g., fine-tuning) may be performed using a different set of training data, such as machine-readable code in one or more programming languages. In some embodiments, all or a portion of parameters of one or more neural-network models being used together may be frozen, such that the frozen parameters are not updated during that training phase. This may allow, for example, a smaller subset of the parameters to be trained without the computing cost of updating all the parameters.
[0074] Therefore, the training process transforms the neural network into an updated trained neural network with updated parameters such as weights, activation functions, and biases. The trained neural network thus improves neural network technology for generating machine-readable code from text specifying one or more rules.
[0075] Once training is complete, the trained neural network models 224 may enter an inference stage where the neural network models 224 may be used to make predictions on new, unseen data, such as determining classifiers that are associated with personalized additional content.
[0076] In various embodiments, and with respect to deployment of the neural network model(s), the neural network model(s) may be converted prior to deployment (e.g., within the system 200). In some embodiments, an intermediate model exchange format such as ONNX (Open Neural Network Exchange) is used to perform the conversion of the neural network model(s). By way of example, a model in Tensorflow/Pytorch format may be converted (e.g., by the intermediate model exchange format) to a Tensorflow Lite (tflite) or Pytorch Mobil format, which is more suitable for inference and edge devices such as present in the system 200. In another example, a model in Pytorch format may be converted (e.g., by the intermediate model exchange format) to a TorchScript format a TensorRT format, or other suitable format. In some cases, if the neural network model does not support conversion using the intermediate model exchange format (e.g., ONNX), then other options are available. In one example, the backbone model may be inspected and rewritten (at least in part) so that the model is compatible with the intermediate model exchange format. In another example, it may be possible to directly convert some neural network model(s) while bypassing the intermediate model exchange format.
[0077] With reference now to
[0078] Referring to
[0079] In the present example, the model layer includes four (4) different models (Model #1, Model #2, Model #3, and Model #4). However, it will be understood that in other embodiments, more models or less models may equally be used. The different models in the model layer may include a variety of different types of models configured to perform a variety of different tasks. In some examples, the various models may include various CNNs, such as previously described. In some cases, the various models may be configured to perform tasks such as image classification, determining steering angles and/or throttle, behavior cloning, etc. In some embodiments, two or more of the various models may be configured to perform the same task, where each of the two or more models may have different performance metrics such as different latency (e.g., amount of time to produce a prediction for a single input), different throughput (e.g., total number of predictions for a given amount of time), and/or different accuracy (e.g., such as different values of mean average precision, mAP, or other accuracy metric). In addition, the various models may be trained, or otherwise modified or optimized, using one or more of the methods described above. As shown, and in various embodiments, each neural network model (Model #1, Model #2, Model #3, and Model #4) of the plurality of neural network models 224 may receive a subset of data from one or more of the sensors. In the illustrated example, Model #1 receives depth camera data from the depth camera 212, Model #2 receives radar data from the radar 214 and image/video data from the camera 210, Model #3 receives image/video data from the camera 210, and Model #4 receives image/video data from the camera 210 and LiDAR data from the LiDAR 216. In turn, each of the neural network models (Model #1, Model #2, Model #3, and Model #4) generates an output that includes steering data and/or throttle data. It is noted that the subset of data received by each neural network model may be different from subsets of data received by the other neural network models as each subset of data may include data from different combinations of sensors. Further, the exemplary subsets of data shown in
[0080] In various embodiments, a user may seamlessly switch between remote (or local) control of the robotic device 206 (e.g., as described with respect to the data flow diagram 1400 of
[0081] Elaborating on data provided by the camera 210 and collected by the data collector (
[0082] Data path 1602 includes an encoder 1606, a packetizer 1608, and a transport socket 1610. The encoder 1606 performs encoding of the video and/or image data. In some embodiments, the encoder 1606 may include an H.264 codec (also referred to as advanced video coding, AVC). In other cases, the encoder 1606 may include an H.265 codec (also referred to as high efficiency video coding, HEVC). After encoding of the video and/or image data, the packetizer 1608 may create real-time transport protocol (RTP) packets using the encoded video and/or image data. In various examples, use of an RTP protocol provides for a low-latency stream. After creating the RTP packets, the transport socket 1610 may be used to send the RTP packets to the client device 202, for example, for display of a video stream and/or images via the GUI 230 (or the GUI 400) of the client device 202. In some embodiments, the transport socket 1610 may include a user datagram protocol (UDP) socket. Alternatively, in some cases, the transport socket 1610 may include a transmission control protocol (TCP) socket.
[0083] Data path 1604 may include a shared memory 1612. In various embodiments, the video and/or image data may thus also be provided to the shared memory 1612 for storage therein. In some embodiments, the shared memory 1612 includes memory that resides in the handheld computing device 234, the computing board 236, or the computing device 204 such that the video and/or image data is stored in the memory disposed within the handheld computing device 234, the computing board 236, or the computing device 204. Alternatively, or additionally, the shared memory 1612 may include the database 228, in some embodiments. In yet other embodiments, the shared memory 1612 may include memory that resides at another local or remote location and is in communication with the data collector (
[0084] In view of the above discussion, and with reference to
[0085] The method 1700 begins at block 1702 where data from a plurality of sensors (e.g., such as the sensors 210-218, including sensors 210A, 212A, 216A, 218A) associated with a robotic device (e.g., such as the robotic device 206) are received by a plurality of neural network models 224. In some embodiments, one or more of the neural network models 224 may also receive video and/or image data from the shared memory 1612, as described above. In various embodiments, the plurality of neural network models 224 may be configured to perform a variety of tasks, and the various models may be trained, or otherwise modified or optimized, using one or more of the methods described above. In some embodiments, each neural network model of the plurality of neural network models 224 may receive a subset of data from a different subset of sensors of the plurality of sensors (and in some cases from the shared memory 1612).
[0086] The method 1700 then proceeds to block 1704 where a plurality of outputs are generated by the plurality of neural network models 224 based on the data received from the plurality of sensors (and in some cases from the shared memory 1612). In various embodiments, each particular neural network model of the plurality of neural network models 224 may generate an output that corresponds to a respective subset of data received by the particular neural network model of the plurality of neural network models 224. In some embodiments, the output generated by each of the plurality of neural network models 224 may include steering data and/or throttle data.
[0087] The method 1700 then proceeds to block 1706 where an output is selected from a plurality of outputs. In particular, and in some cases, the outputs from each of the plurality of neural network models 224 are provided to an agent module (e.g., such as the agent module 226), and the agent module may select a particular output from the plurality of outputs provided by the plurality of neural network models 224. The agent module may use a variety of different criteria to select the output, in accordance with various embodiments. For example, in some cases, the agent module may select an output based on a target task for the robotic device 206, as described above. In other examples, the agent module may select an output based on the model that has the best performance metric for a given task, as described above. In still other examples, various other criteria may instead be used by the agent module to select the output of the one of the neural network models.
[0088] The method 1700 then proceeds to block 1708 where the output selected by the agent module is converted into one or more control signals. In some embodiments, the selected output is provided to an automatic drive engine (e.g., such as the automatic drive engine 220). In various example, after the automatic drive engine receives the selected output from the agent module, the automatic drive engine may convert the received steering and/or throttle data into control data (e.g., such as ROS control signals, in some cases).
[0089] The method 1700 then proceeds to block 1710 where the robotic device 206 is controlled based on the control signals. For instance, in various embodiments, after the automatic drive engine converts the steering and/or throttle data into control data, the control data may be provided to the robotic device 206 to control the robotic device 206. In some cases, it may also be said that the robotic device 206 is controlled by the output selected by the agent module (which is subsequently converted into the one or more control signals). In various embodiments, this process steps of the method 1700 may be repeated iteratively to control the driving direction and speed of the robotic device 206, as previously described.
[0090] The foregoing disclosure is not intended to limit the disclosure to the precise forms or particular fields of use disclosed. As such, it is contemplated that various alternate embodiments and/or modifications to the disclosure, whether explicitly described or implied herein, are possible in light of the disclosure. Having thus described embodiments of the disclosure, persons of ordinary skill in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure. Thus, the disclosure is limited only by the claims.