Dynamic multi-sensor and multi-robot interface system
10179407 ยท 2019-01-15
Assignee
Inventors
Cpc classification
G06F3/015
PHYSICS
G05B2219/35503
PHYSICS
Y10S901/16
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06F3/011
PHYSICS
Y10S901/09
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06F3/0484
PHYSICS
International classification
G06F3/0484
PHYSICS
Abstract
An adaptive learning interface system for end-users for controlling one or more machines or robots to perform a given task, combining identification of gaze patterns, EEG channel's signal patterns, voice commands and/or touch commands. The output streams of these sensors are analyzed by the processing unit in order to detect one or more patterns that are translated into one or more commands to the robot, to the processing unit or to other devices. A pattern learning mechanism is implemented by keeping immediate history of outputs collected from those sensors, analyzing their individual behavior and analyzing time correlation between patterns recognized from each of the sensors. Prediction of patterns or combination of patterns is enabled by analyzing partial history of sensors' outputs. A method for defining a common coordinate system between robots and sensors in a given environment, and therefore dynamically calibrating these sensors and devices, is used to share characteristics and positions of each object detected on the scene.
Claims
1. A method for generating a common coordinate system between robotic devices and sensors in a given environment comprising: providing at least one processing unit, a first sensor and a second sensor, and at least one robotic device; collecting a sequence of images from said first sensor showing said second sensor and said robotic device; analyzing said sequence of images to uniquely identify said second sensor and said robot device and their relative location and pose; generating a set of conversion parameters for permuting said relative location to location data relative to at least one of said second sensor and said robotic device, wherein said second sensor is a gaze tracker.
2. The method according to claim 1, wherein said first sensor is a digital camera.
3. The method of claim 1, wherein said first sensor is a depth sensor.
4. The method according to claim 3, wherein collecting a sequence of images comprises collecting a depth mapping input stream, and wherein identifying said relative location and pose is performed dynamically on said depth map input stream.
5. The method according to claim 1, wherein determining the relative location and pose further comprises determining coordinates of additional objects detected in said sequence of images.
6. The method according to claim 1, wherein determining the relative location comprises determining coordinates of objects relative to said robotic device.
7. The method of claim 1, wherein said robotic device includes a gripper mounted on a robot end point.
8. The method of claim 7, wherein determining the relative location and pose of said robotic device comprises determining a relative location and pose of said gripper with respect to at least one of said first and said second sensor.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DESCRIPTION OF EMBODIMENTS OF THE INVENTION
(10) According to some embodiments of the present invention, there is provided a method for defining a common coordinate system between robots and sensors in a given environment. The method comprises collecting a sequence of a plurality of images showing at least one sensor and/or one mechanical device with internal coordinate system, such as a robotic arm, performing an analysis of the sequence of a plurality of images to identify the sensor or machine in the scene and to determine its relative position with regards to the sensor from where the sequence of images was collected, creating a dynamic function and parameters required to transform any given coordinates from/to the generating sensor to/from the detected sensor or machine.
(11) Optionally, the plurality of images are complemented by corresponding depth maps or other special correlating matrices. The analysis includes object recognition algorithms to detect the rest of the sensors and/or robots in the environment.
(12) Optionally, special stickers or markers such as chessboards or barcodes or other recognizable IDs are placed over each of the sensors on the environment and/or over the robot's end points in order to facilitate these algorithms. Orthogonal unit vectors representing the axes of the objects coordinate system can be derived from these special stickers. They assist in describing the rotation of its coordinates with respect to the generating sensor.
(13) Optionally, the stickers or markers above are placed on a skin section or sections of an end-user, or in clothes or devices worn by the end-userin order to determine the end-users' relative position with regards to one or more robots and/or sensors in the environment.
(14) Optionally, color segmentation can be used in each image in order to locate other sensors or mechanical devices. This will be useful in environments where colors of sensors and/or robots are not present in other objects.
(15) Optionally, shape or contrast segmentation can be used where the shapes of robots and/or sensors are unique in the environment.
(16) Optionally, a combination of the above can be used for segmentation purposes in determining the presence and location of other sensors and/or robots in the environment.
(17)
(18) 201External camera with robot arm and scene objects on field of view
(19) 202Robotic arm
(20) 203Robotic arm's gripper
(21) 204Special identifier of robotic end-point pose for location and identification by other sensors
(22) 205Objects on scene to be manipulated by robot
(23) Reference is now made to
(24) On an initial step (1), a collection of depth maps, images and/or other spatial data are collected. On step (2), algorithms are run over the matrixes collected to find other sensors in the field of view and/or mechanical devices such as robots. Search algorithms might be assisted by adding identifiable stickers on each sensor with an ID encoded. These stickers might be a chessboards printout, barcodes, characters and numbers that can be identified through OCR algorithms or other uniquely identifiable signs. Once the corners of the corners of the chessboard are detected in step (4), vectors are constructed for x, y and z coordinates relative to x,y,z coordinates 0,0,0 of the generating sensor. This matrix might include a parallel shift (i.e. x+xdisplacement, y+ydisplacement, z+zdisplacement) and/or a rotational angle displacement (i.e. degrees rotation for x coordinate with respect to x coordinate of generating sensor, degrees rotation for y and degrees rotation for z). This information is stored and used on demand (step 8) each time an object is detected by any of the sensors and it needs to be translated to coordinates of another sensor in the environment or a robot or mechanical device.
(25) Once a sensor is detected and its position determined with respect to the generating sensorthe one from which the images where collected, a conversion matrix is built to allow the transformation of coordinates from one device to and from the other. This is done by calculating a parallel shift of each axis (x,y,z in 3D or x,y in 2D), and calculating rotation angles and direction for each of these axis. The parallel shift and rotation angles' parameters are saved and used to transform coordinates between sensors and/or machines in the environment. For example, if a sensor detects an object, it will determine the x,y,z coordinates of the object within the sensors' coordinate system (where usually the sensor's position is 0,0,0 in its coordinate system). Then, when a robot is required to perform an action on this object, a transformation of coordinates of the object is performed towards the coordinate system of the robot itself. This transformation typically utilizes knowledge of the relationship between the sensor and the robot.
(26) Optionally, the matrixes and/or parameters mentioned above describe a single rotation about some axis according to Euler's rotation theorem, using three or more real numbers.
(27) Optionally, quaternions are used to represent the translation and rotation of the detected sensor and/or mechanical device with respect to the generating sensor.
(28) For example, analysis of a frame 0 of a video-stream+depth map of a given sensor can identify the location of a second sensor, a gripper, robotic arm end-point or other devices on the scene. A rotation unit quaternion is calculated based on, for example, three given points that are co-planar to the detected device. Then, the detected device shares its 3D location based on its own 3D coordinate system. For example, a robotic arm can share where the end-point is located according to its own XYZ coordinate system and can also share the rotation of the end point represented as a rotation angle around each of its base axes. A later frame will again detect the position of these three points in the generating sensor's coordinate system. If the position of one or more of these points changed according to the generating sensor's coordinate system, the processing unit can estimate the rotation and translation of the robotic end point with respect to the previous location on robot coordinates by calculating QPConjugate of Q. Q being the inverse quaternion of the rotation quaternion defined by the three robot end-point points detected in the previous frame with respect to the planes of the sensor, normalized as a unit quaternion and P being the quaternion that represents the delta displacement from previous frame to current frame. The resulting matrix is used to increase/decrease each robot axis coordinate value, shared in previous frame, in order to know the robot equivalent coordinates to the camera/sensor ones. The rotation of the endpoint in robot coordinates is calculated by robQCamQ, where robQ is the unit quaternion representing the robot endpoint rotation in the original frame, expressed in robot coordinate system as rotations around each of its base axes, and CamQ is the unit quaternion representing the delta rotation of the three detected points with respect to the previous frame in camera coordinates. Pre-equivalence between axes might be setup by end-user by defining, for example, that X axis coordinate in the sensor's coordinate system will be equivalent to Z axis in the robot coordinate system.
(29) Optionally, the method described above can be used to dynamically and without user intervention, calibrate a robotic gripper and one or more cameras in the environment.
(30) Optionally, the method described above can be used to control a robotic arm and bring its gripper and/or endpoint to a given location and in a given rotation calculated based on the camera or camera's coordinate system. In this case, instead of determining the location and rotation from identification of the gripper points in a later frame, these points are calculated according to the processing unit software in the sensors' coordinate system and the method above is used to convert these sensor coordinate values into robot coordinate values.
(31) According to some embodiments of the present invention, there is provided an end-user interface for controlling one or more machines or robots or electrical devices to perform a given task required by an end-user. See
(32) 101End user
(33) 102Eye tracker sensor or camera, visually identifiable from external sensors such as the sensor (5) in illustration
(34) 103Special graphical identifier for unique object pose and location identification by other devices
(35) 104External camera or sensor with other devices in its field of view
(36) 105Robot with any combination and quantities of arms, legs, wheels and other actuators and sensors
(37) 106Cameras or sensors mounted on robot
(38) The end-user can control the robots' actions by moving his gaze to the direction of an object in the environment, then selecting through the user interface a given object and the action to be perform on it/with it. The method comprises gathering gaze position data from a plurality of images collected from a sensor that has one or more end-users' eyes in its field of view, and position data from one or more sensors in the environment where the object and the end-user's gaze tracking device are at least at one of their fields of view, then enabling a selection capability of the given object by detecting either an eye blink of predetermined time length, or a predetermined gaze gesture, and highlighting the object on the screen for feedback purposes. Then, an option selection is enabled by showing on screen to the end-user a list of available actions and allowing the end-user to scroll through them by directing his/her gaze in a given direction or using any other pointing computing device. Options are highlighted on screen in response to detected gaze movements in the given direction. Finally, a selection capability is enabled by detecting the end-user's blink for a predetermined length of time, or by detecting a predetermined gaze gesture in the tracking history of the end-user's pupil's center, while an option is highlighted on screen. Optionally, at this point, a processing unit transfers the coordinates of the selected objects to the robot, converted to coordinates that are relative to the robot itself and based on the first method described above. It also transfers the type of actions selected by the end-user to be performed by the robot on the object or with the object.
(39) Optionally, the end-user's pupil home position is set by enabling the end user to select the set home position option while the pupil is detected in certain image position.
(40) Optionally, the end-user's pupil home position is set automatically by keeping track of the pupil's position on the images at the initial stage and for a given length of time, and making an average of where the pupil was detected on the image matrix (i.e. BGR matrix retrieved from sensor).
(41) Optionally, gaze gestures are detected by keeping a history of the pupil's centre coordinates for the last set of images or for a given length of time, in a dynamic way.
(42) Optionally, pupil's trajectories that are detected as being similar to an ellipse or a circle by,for exampleusing fitEllipse function or HoughCircles function of OPENCV library, and that are moving in a clockwise direction, are interpreted as increase command or as scrolling command in one direction.
(43) Optionally, pupil's trajectories that are similar in shape to a circumference and moving in non-clockwise direction are interpreted as a decrease command or as a scrolling command in the one direction.
(44) 301Pupils are detected by searching for circle-like dark shapes that fit within predetermined diameter limits. Pupils' home positions are set.
(45) 302Pupil displacement is calculated by detecting pixel differences between home position pupil's center and current image pupil's center. Direction and pixel displacement are translated into a robot movement direction and distance to be performed. Movement speed is calculated by detecting the pixel displacements of the centers on each image and using the timestamp of each of the image frames used.
(46) 303Gaze gestures are recognized by keeping a history of the pupil's center detected through multiple frames, and analyzing trajectory shapes. In this illustration, a counter-clock circle type of trajectory is detected after several sequential images are analyzed and the pupil center in each of them is detected.
(47) A home position is set in 301. In 303, circles 1 to 8 illustrate the positions where the pupil was tracked in the last 8 frames. An ellipse type of shape and a counter-clock direction is detected in the tracked history.
(48) Optionally, pupil's trajectories that are similar in shape to straight lines are interpreted as scrolling command in that direction.
(49) Optionally, using selection options described above, an end-user can increase or decrease the distance that the robot will move for each step. This will work as a virtual-gear, where a given pixel displacement of the pupil's centre is translated into a given spatial displacement of the robot's end point multiplied or divided by a factor that the end-user selects.
(50) Optionally, using selections options described above, an end-user can increase or decrease the distance that a cursor on screen will move to indicate each step. This will work as a virtual-gear, where a given pixel displacement of the pupil's centre is translated into a given spatial displacement of the cursor on screen multiplied or divided by a factor that the end-user selects.
(51) Optionally, a camera is placed on a mechanical device and/or robot. The direction of the pupil's movement is translated into movements that the mechanical device performs in the same direction. The images from this camera are then transferred back to the end-user. This enables the ability to explore an environment visually by moving the pupil towards the direction where the end-user wants to expand and explore. If the end-user, for example, moves his gaze rightwards far from the pupil's home position, then the mechanical device will move rightwards and images of the camera mounted on it will be transmitted showing an additional portion of the environment towards the right of the previous field of view.
(52) Optionally, when controlling a robot with gaze an end user can switch between sets of coordinates and see on screen the robot and optionally the object being moved by retrieving images of another sensor that offers this image. This is illustrated in
(53) Optionally, an option is enabled to the end-user through gaze gestures allowing him to switch between sets of 2D coordinates of a given mechanical device, and then control with gaze that device on those target coordinates, with or without visual feedback from sensors around that device. See
(54) Optionally, a 3D coordinate system is implemented where x,y coordinates are obtained from the row and column of the detected pupil's center in the image, while the z coordinate is calculated based on the diameter of the pupil detected or its relative variations. See
(55) Optionally, the pupil diameter change is used to calculate a spatial difference for one of the coordinate axis. For example, increase/decrease on pupil's diameter can be interpreted as increase/decrease in z coordinates.
(56) Optionally, x,y coordinates on screen are compensated for the 3D circularity of the users' eyeball.
(57) Optionally, an axis of eye pupil can be transformed to a different axis on the robot or machine to be controlled by a selection of the end-user.
(58) According to some embodiments of the present invention, there are provided methods and devices for robotic machine task learning through recording and reproduction of end-users' commands through one or more selection method described above. End-users commands are stored sequentially or in parallel and then replicated on demand.
(59) Optionally, an option is enabled to allow the end-user to save the robot's current position in any given time on its local coordinates system, and create a robot trajectory between the saved positions that can be ran later on by request of the end-user selection.
(60) Optionally, the controller analyses the direction where the end-user is looking in the environment, then through the coordinates transformation system described above, identifies this object's location from the point of view of an external sensor. Then these coordinates are converted to any of the devices' or sensors' coordinates systems for future actions.
(61) According to some embodiments of the present invention, there is provided an apparatus associated with a robotic controller. The apparatus comprises at least one processing unit and one or more sensors of images and/or depth maps and/or sounds and/or voice and/or EEG and/or and/or touch. The outputs of these sensors are analysed by the processing unit in order to detect one or more patterns on inputs from one or more sensors that are translated into one or more commands to the robot, to the processing unit or to other devices. A pattern learning mechanism is implemented by keeping history of outputs collected from those sensors, analysing any apparent pattern on these outputs and analysing time correlations between patterns recognized from each of the sensors. The end-user can then visualize those patterns and their interrelation, and define a command or sets of commands to be executed each time similar pattern combinations are detected in the future.
(62) Optionally, sensors connected to the controller produce raw data such as bit-map images, EEG signals per channel and sound.
(63) Optionally, one or more devices connected to the controller produce pre-processed data. For example, an Emotiv EEG device pre-detects certain commands based on EEG channels, and/or Primesense's sensors identify gestures and produce notifications of these gestures and/or cellphone devices are able to recognize words pronounced by the end-user. The proposed controller then takes these inputs into account and produces a combined pattern that is later used by the end-user to generate a command or sets of commands. If the word Do is detected by the cellphone just after a particular command was detected at the EEG emotive device and just before the end-user created a given gaze signal, a new pattern is defined and the end-user can associate a command to this pattern. Optionally, each time the same sequence of events is recognized the controller will perform the selected command.
(64) Optionally, patterns are detected by fitting geometrical shapes to trajectories created by tracking relative displacement of the end-users' eye centres. For example, detecting a circular type of movement, or linear type of movement and its direction. fitEllipse HoughCircles function of OpenCV can be used in order to enable this option, by running them on the backward recorded positions. This tracking mechanism records to memory or disk the position where the centre of the pupil or eyes was detected in each frame and the time when the frame was acquired, among other useful data. The history buffer is pre-set to store a pre-defined set of eye center positions. Optionally, the history buffer is set by a pre-defined time period. For example, detected centre of eyes are recorded and analysed dynamically for the last 10 seconds with regards to the current frame. A FIFO queue is implemented for these records.
(65) Optionally, the end-user's eye center is detected by fitting an ellipse of predefined minimum and maximum diameter to darker areas of an image collected from sensors that is located close to the end-user's eye. Using an IR illuminated black and white CMOS camera or equivalent, for example, the pupil will be the darkest section on the image.
(66) Optionally, patterns are detected by fitting geometrical shapes to trajectories of other body parts such as finger tips, hands, head orientation and others.
(67) Optionally, patterns are pre-recorded and used to identify end-users' requests.
(68) Optionally, a mechanical device such as a robot is connected to the controller. Commands detected through the patterns system described above are translated into actions that this device will execute.
(69) Optionally, other electrical devices such as lights, appliances or other electrical-powered artefacts are connected to the controller. Commands detected through the patterns system described above are translated into actions that this device will execute.
(70) Optionally, a predictive method is implemented that anticipates the pattern or combination of patterns to be generated by analysing partial history of sensors' output. For example, if patterns were detected and defined based on a set of 50 consecutive images from an input video camera, or from a collection of images acquired during 5 seconds of video history, a prediction method is implemented to detect potential future pattern based on only last 20 consecutive images or on last 2 seconds of video history. If it's circular-like movement tracked from the end-users eye center position, detecting half circle on partial history track activates a prediction that translates into a predicted command corresponding to the circle-like type of shape in history tracking.
(71) Optionally, the methods and embodiments described above are used as a system to assist physically impaired patients who can demand actions from a robot combining one or more gesture mechanisms: Eye gaze, voice, gestures, EEG signals, touch, and others.
(72) Optionally, the methods and embodiments described above are used to control a robot remotely through the Internet or other communication means.
(73) Optionally, the methods and embodiments described above are used to create a semi-automatic robotic system where the end-user highlights objects on the screen based on images collected from the system's sensors, offering feedback on the objects identified and their locations.