ELECTRONIC PERSONAL INTERACTIVE DEVICE

20220319517 · 2022-10-06

    Inventors

    Cpc classification

    International classification

    Abstract

    An interface device and method of use, comprising audio and image inputs; a processor for determining topics of interest, and receiving information of interest to the user from a remote resource; an audio-visual output for presenting an anthropomorphic object conveying the received information, having a selectively defined and adaptively alterable mood; an external communication device adapted to remotely communicate at least a voice conversation with a human user of the personal interface device. Also provided is a system and method adapted to receive logic for, synthesize, and engage in conversation dependent on received conversational logic and a personality.

    Claims

    1. An electronic system, comprising: a network communication port configured to communicate with a communication network; a memory configured to store topics of interest to a user; an external interface, configured to transmit a series of search requests to at least one of an external automated search engine and an external automated social network system, and to receive responses through the network communication port; a conversational agent configured to: interact with the user according to a natural language conversation dependent on the stored topics of interest and the received responses; define the series of search requests; and update the memory to maintain current topics of interest to the user dependent on user feedback obtained during the natural language conversation; and a user interface directed by the conversational agent.

    2. The electronic system according to claim 1, wherein the user interface comprises a microphone, a speaker, a display, and a camera.

    3. The electronic system according to claim 1, wherein: the topics of interest comprise identified persons; the conversational agent is configured to communicate the identified persons to the external automated social network system; and the received responses comprise social network records relating to the identified persons.

    4. The electronic system according to claim 1, wherein: the topics of interest comprise current events; the conversational agent is configured to communicate characteristics of the current events to the external automated search engine; and the received responses comprise news reports relating to the current events.

    5. The electronic system according to claim 1, wherein: the user interface comprises an audiovisual interface; the conversational agent is further configured to determine an emotional state of the user based on interactions with the user through the audiovisual interface; and the conversational agent is further configured to selectively react to the emotional state of the user.

    6. The electronic system according to claim 1, wherein: the conversational agent is further configured to determine an emotional state of the user; and the series of requests are selectively dependent on the determined emotional state of the user.

    7. The electronic system according to claim 1, wherein: the conversational agent is further configured to determine a current emotional state of the user and a desired emotional state of the user; the natural language conversation is dependent on the stored topics of interest, the received responses, the determined current emotional state of the user, and the determined desired emotional state of the user.

    8. The electronic system according to claim 1, wherein the conversational agent is further configured to: store a status of conversational elements at an end of a conversation in the memory; update the conversational elements at the end of the conversation by transmitting search requests; and introducing the updated conversational elements in the natural language conversation.

    9. The electronic system according to claim 1, wherein the conversational agent is implemented using an artificial neural network.

    10. The electronic system according to claim 1, wherein the conversational agent is further configured to determine an emergency state of the user, and to automatically contact emergency assistance services in event of the determined emergency state of the user.

    11. The electronic system according to claim 10, wherein the conversational agent is further configured to control the user interface to selectively communicate at least one of audio and visual information to the emergency assistance services.

    12. The electronic system according to claim 10, wherein the conversational agent is further configured to determine a mood of the user based on implicit information in audio input received through the user interface.

    13. The electronic system according to claim 10, wherein the conversational agent is further configured to mine data from the at least one of the external automated search engine and the external automated social network system based on learned relevance of information within the topics of interest from prior interaction with the user.

    14. The electronic system according to claim 10, wherein the conversational agent is further configured to initiate a conversation with the user.

    15. An interactive conversational system, comprising: a network communication port; a memory configured to store topics of interest to a user; a database interface, configured to transmit search requests to an automated database system, and to receive responses, through the network communication port; a conversational agent configured to define the search requests and to interact with the user in a natural language conversation dependent on the stored topics of interest and the received responses; a conversation continuity agent configured to update the memory to maintain current topics of interest to the user dependent on user feedback obtained during the natural language conversation; and a user interface directed by the conversational agent.

    16. The interactive conversational system according to claim 15, wherein the automated database system comprises an Internet search engine comprising knowledge records.

    17. The interactive conversational system according to claim 15, wherein the automated database system comprises an Internet social network system comprising human relationship records.

    18. The interactive conversational system according to claim 15, wherein: the user interface comprises an audiovisual interface; the conversational agent is further configured to determine an emotional state of the user based on implicit interactions with the user through the audiovisual interface; and the conversational agent is further configured to select topics of conversation based on the emotional state of the user.

    19. The interactive conversational system according to claim 15, wherein the conversational agent is further configured to: store a status of conversational elements at an end of a conversation in the memory; update the conversational elements at the end of the conversation by transmitting search requests; and introducing the updated conversational elements into the natural language conversation.

    20. An interactive conversational method, comprising: storing topics of interest to a user in a memory; transmitting search requests to an automated database system selected from the group consisting of an Internet search engine, a new search engine, and a social network database search engine, through an automated communication network; receiving responses to the transmitted search requests through the automated communication network; defining the search requests based on conversational natural language communications with a user through a user interface device, the topics of interest to the user, and the received responses; updating the memory to maintain current topics of interest to the user dependent on user feedback obtained during the conversational natural language communications; and conducting the conversational natural language communications with the user with an automated conversational agent according to the topics of interest to the user, the received responses, natural language user inputs, and a user context, with an automated conversational agent.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0148] FIG. 1 illustrates an exemplary machine implementing an embodiment of the present invention.

    [0149] FIG. 2 illustrates a flowchart of a method implementing an embodiment of the present invention.

    [0150] FIG. 3 illustrates an embodiment of this invention which can be run on a substantially arbitrary cell phone with low processing abilities.

    [0151] FIG. 4 illustrates a flowchart for a processor implementing an embodiment of the present invention.

    [0152] FIG. 5 illustrates a smart clock radio implementing an embodiment of the present invention.

    [0153] FIG. 6 illustrates a television with a set-top box implementing an embodiment of the present invention.

    [0154] FIG. 7 illustrates a special purpose robot implementing an embodiment of the present invention.

    [0155] FIG. 8 shows a prior art computer system.

    DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

    EXAMPLE 1

    Cell Phone

    [0156] FIG. 1 illustrates an exemplary machine 100 that can be used to implement an embodiment of the present invention. The machine comprises a microphone 110 adapted to receive audio information input and a camera 120 adapted to receive image information input. The camera 120 preferably is facing the user. There are one or more speakers 130 for audio output (e.g., voice reproduction) and a display 140, which also preferably faces the user. There is also a processor (not illustrated in FIG. 1, but an exemplary processor appears in FIG. 4) and the machine is preferably at least sometimes able to connect to the Internet or a remote database server which stores a variety of human-interest information. The image 150 in display 140 is preferably the face of a person who is selected by the user. The face may also be of another species, or completely synthetic. In one embodiment, the lips of image 150 move as image 150 speaks, and image 150's facial expression is determined to convey an anthropomorphic mood, which itself may be responsive to the mood of the user, as signaled by the audio and image input through microphone 110 and camera 120. The mood of the user may be determined from the words spoken by the user, the voice tone of the user, the facial expression and gestures of the user, the hand gestures of the user, etc. The device 100 may be configured as a cellular telephone or so-called smartphone, but persons having ordinary skill in the art will realize that this invention could be implemented in many other form factors and configurations. For example, the device could be run on a cell phone, a smart phone (e.g., Blackberry, Apple iPhone), a PDA (e.g., Apple iPod, Apple iPad, Amazon Kindle), a laptop computer, a desktop computer, or a special purpose computing machine, with relatively minor modifications. The interface may be used for various consumer electronics devices, such as automobiles, televisions, set-top boxes, stereo equipment, kitchen appliances, thermostats and HVAC equipment, laundry appliances, and the like. The interface may be employed in public venues, such as vending machines and ATMs. In some cases, the interface may be an audio-only interface, in which imaging may be unidirectional or absent. In audio-only systems, the interface seeks to conduct an intelligent conversational dialog and may be part of a call center or interactive voice response system. Thus, for example, the technology might be employed to make waiting queues for call centers more interesting and tolerable for users.

    [0157] FIG. 2 is a flowchart 200 illustrating the operation of one embodiment of the invention. In step 210, the user Ulysses looks into the camera and speaks into the microphone. Preferably, the user would naturally be looking into the camera because it is located near the screen where an image of a person is displayed. The person could be anyone whom the user selects, of whom the user can provide a photograph. For example, it might be a deceased friend or spouse, or a friend or relative who lives far away and visits rarely. Alternatively, the image might be of a famous person. In the example, the image in the machine (not illustrated) is of Ulysses' wife, Penelope.

    [0158] In the example, in step 210, Ulysses says, “Is my grandson James partying instead of studying?” Ulysses has an angry voice and a mad facial expression. In step 220, the machine detects the mood of the user (angry/mad) based on audio input (angry voice) and image input (mad facial expression). This detection is done by one or more processors, which is, for example, a Qualcomm Snapdragon processor. Also, the one or more processors are involved in detecting the meaning of the speech, such that the machine would be able to provide a conversationally relevant response that is at least partially responsive to any query or comment the user makes, and builds on the user's last statement, in the context of this conversation and the course of dealings between the machine and the user. Roy, US App. 2009/0063147, incorporated herein by reference, discusses an exemplary phonetic, syntactic and conceptual analysis drive speech recognition system. Roy's system, or a similar technology, could be used to map the words and grammatical structures uttered by the user to a “meaning”, which could then be responded to, with a response converted back to speech, presented in conjunction with an anthropomorphic avatar on the screen, in order to provide a conversationally relevant output. Another embodiment of this invention might use hierarchal stacked neural networks, such as those described by Commons, U.S. Pat. No. 7,613,663, incorporated herein by reference, in order to detect the phonemes the user pronounces and to convert those phonemes into meaningful words and sentence or other grammatical structures. In one embodiment, the facial expression and/or the intonation of the user's voice are coupled with the words chosen by the user to generate the meaning. In any case, at a high level, the device may interpret the user input as a concept with a purpose, and generates a response as a related concept with a counter-purpose. The purpose need not be broader than furthering the conversation, or it may be goal-oriented. In step 230, the machine then adjusts the facial expression of the image of Penelope to angry/mad to mirror the user, as a contextually appropriate emotive response. In another embodiment, the machine might use a different facial expression in order to attempt to modify the user's mood. Thus, if the machine determines that a heated argument is an appropriate path, then a similar emotion to that of the user would carry the conversation forward. In other cases, the interface adopts a more submissive response, to defuse the aggression of the user.

    [0159] Clearly, the machine has no way of knowing whether James is partying or studying without relying on external data. However, according to one embodiment of the invention, the machine can access a network, such as the Internet, or a database to get some relevant information. Here, in step 240, the machine checks the social networking website Facebook to determine James' recent activity. Facebook reveals that James got a C on his biology midterm and displays several photographs of James getting drunk and engaging in “partying” behavior. The machine then replies 250 to the user, in an angry female voice, “It is horrible. James got a C on his biology midterm, and he is drinking very heavily. Look at these photographs taken by his neighbor.” The machine then proceeds to display the photographs to the user. In step 260, the user continues the conversation, “Oh my God. What will we do? Should I tell James that I will disinherit him unless he improves his grades?”

    [0160] Note that a female voice was used because Penelope is a woman. In one embodiment, other features of Penelope, for example, her race, age, accent, profession, and background could be used to select an optimal voice, dialect, and intonation for her. For example, Penelope might be a 75-year-old, lifelong white Texan housewife who speaks with a strong rural Texas accent.

    [0161] The machine could look up the information about James in response to the query, as illustrated here. In another embodiment, the machine could know that the user has some favorite topics that he likes to discuss (e g , family, weather, etc.) The machine would then prepare for these discussions in advance or in real-time by looking up relevant information on the network and storing it. This way, the machine would be able to discuss James' college experience in a place where there was no Internet access. In accordance with this embodiment, at least one Internet search may occur automatically, without a direct request from the user. In yet another embodiment, instead of doing the lookup electronically, the machine could connect to a remote computer server or a remote person who would select a response to give the user. Note that the remote person might be different from the person whose photograph appears on the display. This embodiment is useful because it ensures that the machine will not advise the user to do something rash, such as disinheriting his grandson.

    [0162] Note that both the machine's response to the user's first inquiry and the user's response to the machine are conversationally relevant, meaning that the statements respond to the queries, add to the conversation, and increase the knowledge available to the other party. In the first step, the user asked a question about what James was doing. The machine then responded that James' grades were bad and that he had been drunk on several occasions. This information added to the user's base of knowledge about James. The user then built on what the machine had to say by suggesting threatening to disinherit James as a potential solution to the problem of James' poor grades.

    [0163] In one embodiment, the machine starts up and shuts down in response to the user's oral commands This is convenient for elderly users who may have difficulty pressing buttons. A deactivation permits the machine to enter into a power saving low power consumption mode. In another embodiment, the microphone and camera monitor continuously the scene for the presence of an emergency. If an emergency is detected, emergency assistance services, selected for example from the group of one or more of police, fire, ambulance, nursing home staff, hospital staff, and family members might be called. Optionally, the device could store and provide information relevant to the emergency, to emergency assistance personnel. Information relevant to the emergency includes, for example, a video, photograph or audio recording of the circumstance causing the emergency. To the extent the machine is a telephone, an automated e911 call might be placed, which typically conveys the user's location. The machine, therefore, may include a GPS receiver, other satellite geolocation receiver, or be usable with a network-based location system.

    [0164] In another embodiment of this invention, the machine provides a social networking site by providing the responses of various people to different situations. For example, Ulysses is not the first grandfather to deal with a grandson with poor grades who drinks and parties a lot. If the machine could provide Ulysses with information about how other grandparents dealt with this problem (without disinheriting their grandchildren), it might be useful to Ulysses.

    [0165] In yet another embodiment (not illustrated) the machine implementing the invention could be programmed to periodically start conversations with the user itself, for example, if the machine learns of an event that would be interesting to the user. (E.g., in the above example, if James received an A+in chemistry, the machine might be prompted to share the happy news with Ulysses.) To implement this embodiment, the machine would receive relevant information from a network or database, for example through a web crawler or an RSS feed. Alternatively, the machine could check various relevant websites, such as James' social networking pages, itself to determine if there are updates. The machine might also receive proactive communications from a remote system, such as using an SMS or MMS message, email, IP packet, or other electronic communication.

    EXAMPLE 2

    Cell Phone with Low Processing Abilities

    [0166] This embodiment of this invention, as illustrated in FIG. 3, can be run on an arbitrary cell phone 310 connected to a cellular network, such as the GSM and CDMA networks available in the US, such as the Motorola Razr or Sony Ericsson W580. The cell phone implementing this embodiment of the invention preferably has an ability to place calls, a camera, a speakerphone, and a color screen. To use the invention, the user of the cell phone 310 places a call to a call center 330. The call could be placed by dialing a telephone number or by running an application on the phone. The call is carried over cell tower 320. In response to placing the call, an image of a person selected by the user or an avatar appears on the screen of the cell phone 310. Preferably, the call center is operated by the telephone company that provides cell phone service for cell phone 310. This way, the telephone company has control over the output on the screen of the cell phone as well as over the voice messages that are transmitted over the network.

    [0167] The user says something that is heard at call center 330 by employee 332. The employee 332 can also see the user through the camera in the user's telephone. An image of the user appears on the employee's computer 334, such that the employee can look at the user and infer the user's mood. The employee then selects a conversationally relevant response, which builds on what the user said and is at least partially responsive to the query, to say to the user. The employee can control the facial expression of the avatar on the user's cell phone screen. In one embodiment, the employee sets up the facial expression on the computer screen by adjusting the face through mouse “drag and drop” techniques. In another embodiment, the computer 334 has a camera that detects the employee's facial expression and makes the same expression on the user's screen. This is processed by the call center computer 334 to provide an output to the user through cell phone's 310 speaker. If the user asks a question, such as, “What will the weather be in New York tomorrow?” the call center employee 332 can look up the answer through Google or Microsoft Bing search on computer 334.

    [0168] Preferably, each call center employee is assigned to a small group of users whose calls she answers. This way, the call center employee can come to personally know the people with whom she speaks and the topic that they enjoy discussing. Conversations will thus be more meaningful to the users.

    EXAMPLE 3

    Smart Phone, Laptop or Desktop with CPU Connected to a Network

    [0169] Another embodiment of the invention illustrated in FIG. 4, is implemented on a smartphone, laptop computer, or desktop computer with a CPU connected to a network, such as a cellular network or an Ethernet WiFi network that is connected to the internet. The phone or computer implementing the invention has a camera 410 and a microphone 420 for receiving input from the user. The image data received by the camera and the audio data received by the microphone are fed to a logic to determine the user's mood 430 and a speech recognizer 440. The logic to determine the user's mood 430 provides as output a representation of the mood and the speech recognizer 440 provides as output a representation of the speech.

    [0170] As noted above, persons skilled in the art will recognize many ways the mood-determining logic 430 could operate. For example, Bohacek, U.S. Pat. No. 6,411,687, incorporated herein by reference, teaches that a speaker's gender, age, and dialect or accent can be determined from the speech. Black, U.S. Pat. No. 5,774,591, incorporated herein by reference, teaches about using a camera to ascertain the facial expression of a user and determining the user's mood from the facial expression. Bushey, U.S. Pat. No. 7,224,790, similarly teaches about “verbal style analysis” to determine a customer's level of frustration when the customer telephones a call center. A similar “verbal style analysis” can be used here to ascertain the mood of the user. Combining the technologies taught by Bohacek, Black, and Bushey would provide the best picture of the emotional state of the user, taking many different factors into account.

    [0171] Persons skilled in the art will also recognize many ways to implement the speech recognizer 440. For example, Gupta, U.S. Pat. No. 6,138,095, incorporated herein by reference, teaches a speech recognizer where the words that a person is saying are compared with a dictionary. An error checker is used to determine the degree of the possible error in pronunciation. Alternatively, in a preferred embodiment, a hierarchal stacked neural network, as taught by Commons, U.S. Pat. No. 7,613,663, incorporated herein by reference, could be used. If the neural networks of Commons are used to implement the invention, the lowest level neural network would recognize speech as speech (rather than background noise). The second level neural network would arrange speech into phonemes. The third level neural network would arrange the phonemes into words. The fourth level would arrange words into sentences. The fifth level would combine sentences into meaningful paragraphs or idea structures. The neural network is the preferred embodiment for the speech recognition software because the meanings of words (especially keywords) used by humans are often fuzzy and context sensitive. Rules, which are programmed to process clear-cut categories, are not efficient for interpreting ambiguity.

    [0172] The output of the logic to determine mood 430 and the speech recognizer 440 are provided to a conversation logic 450. The conversation logic selects a conversationally relevant response 452 to the user's verbal (and preferably also image and voice tone) input to provide to the speakers 460. It also selects a facial expression for the face on the screen 470. The conversationally relevant response should expand on the user's last statement and what was previously said in the conversation. If the user's last statement included at least one query, the conversationally relevant response preferably answers at least part of the query. If necessary, the conversation logic 450 could consult the internet 454 to get an answer to the query 456. This could be necessary if the user asks a query such as “Is my grandson James partying instead of studying?” or “What is the weather in New York?”

    [0173] To determine whether the user's grandson James is partying or studying, the conversation logic 450 would first convert “grandson James” into a name, such as James Kerner. The last name could be determined either through memory (stored either in the memory of the phone or computer or on a server accessible over the Internet 454) of prior conversations or by asking the user, “What is James' last name?” The data as to whether James is partying or studying could be determined using a standard search engine accessed through the Internet 454, such as Google or Microsoft Bing. While these might not provide accurate information about James, these might provide conversationally relevant information to allow the phone or computer implementing the invention to say something to keep the conversation going. Alternatively, to provide more accurate information the conversation logic 450 could search for information about James Kerner on social networking sites accessible on the Internet 454, such as Facebook, LinkedIn, Twitter, etc., as well as any public internet sites dedicated specifically to providing information about James Kerner. (For example, many law firms provide a separate web page describing each of their attorneys.) If the user is a member of a social networking site, the conversation logic could log into the site to be able to view information that is available to the user but not to the general public. For example, Facebook allows users to share some information with their “friends” but not with the general public. The conversation logic 450 could use the combination of text, photographs, videos, etc. to learn about James' activities and to come to a conclusion as to whether they constitute “partying” or “studying.”

    [0174] To determine the weather in New York, the conversation logic 450 could use a search engine accessed through the Internet 454, such as Google or Microsoft Bing. Alternatively, the conversation logic could connect with a server adapted to provide weather information, such as The Weather Channel, www.weather.com, or AccuWeather, www.accuweather.com, or the National Oceanic and Atmospheric Administration, www.nws.noaa.gov.

    [0175] Note that, to be conversationally relevant, each statement must expand on what was said previously. Thus, if the user asks the question, “What is the weather in New York?” twice, the second response must be different from the first. For example, the first response might be, “It will rain in the morning,” and the second response might be, “It sunny after the rain stops in the afternoon.” However, if the second response were exactly the same as the first, it would not be conversationally relevant as it would not build on the knowledge available to the parties.

    [0176] The phone or computer implementing the invention can say arbitrary phrases. In one embodiment, if the voice samples of the person on the screen are available, that voice could be used. In another embodiment, the decision as to which voice to use is made based on the gender of the speaker alone.

    [0177] In a preferred embodiment, the image on the screen 470 looks like it is talking. When the image on the screen is talking, several parameters need to be modified, including jaw rotation and thrust, horizontal mouth width, lip corner and protrusion controls, lower lip tuck, vertical lip position, horizontal and vertical teeth offset, and tongue angle, width, and length. Preferably, the processor of the phone or computer that is implementing the invention will model the talking head as a 3D mesh that can be parametrically deformed (in response to facial movements during speech and facial gestures).

    EXAMPLE 4

    Smart Clock Radio

    [0178] Another embodiment of this invention illustrated in FIG. 5, includes a smart clock radio 500, such as the Sony Dash, adapted to implement the invention. The radio once again includes a camera 510 and a microphone 520 for receiving input from the user. Speakers 530 provide audio output, and a screen 550 provides visual output. The speakers 530 may also be used for other purposes, for example, to play music or news on AM, FM, XM, or Internet radio stations or to play CDs or electronic audio files. The radio is able to connect to the Internet through the home WiFi network 540. In another embodiment, an Ethernet wire or another wired or wireless connection is used to connect the radio to the Internet.

    [0179] In one embodiment, the radio 500 operates in a manner equivalent to that described in the smartphone/laptop embodiment illustrated in FIG. 4. However, it should be noted that, while a user typically sits in front of a computer or cell phone while she is working with it, users typically are located further away from the clock radio. For example, the clock radio might be located in a fixed corner of the kitchen, and the user could talk to the clock radio while the user is washing the dishes, setting the table or cooking.

    [0180] Therefore, in a preferred embodiment, the camera 510 is more powerful than a typical laptop camera and is adapted to viewing the user's face to determine the facial expression from a distance. Camera resolutions on the order of 8-12 megapixels are preferred, although any camera will suffice for the purposes of the invention.

    EXAMPLE 5

    Television with Set-Top Box

    [0181] The next detailed embodiment of the invention illustrated in FIG. 6, is a television 600 with a set-top box (STB) 602. The STB is a standard STB, such as a cable converter box or a digital TV tuner available from many cable companies. However, the STB preferably either has or is configured to receive input from a camera 610 and microphone 620. The output is provided to the user through the TV screen 630 and speakers 640.

    [0182] If the STB has a memory and is able to process machine instructions and connect to the internet (over WiFi, Ethernet or similar), the invention may be implemented on the STB (not illustrated). Otherwise, the STB may connect to a remote server 650 to implement the invention. The remote server will take as input the audio and image data gathered by the STB's microphone and camera. The output provided is an image to display in screen 630 and audio output for speakers 640.

    [0183] The logic to determine mood 430, speech recognizer 440, and the conversation logic 450, which connects to the Internet 454 to provide data for discussion all operate in a manner identical to the description of FIG. 4.

    [0184] When setting up the person to be displayed on the screen, the user needs to either select a default display or send a photograph of a person that the user wishes to speak with to the company implementing the invention. In one embodiment, the image is transmitted electronically over the Internet. In another embodiment, the user mails a paper photograph to an office, where the photograph is scanned, and a digital image of the person is stored.

    EXAMPLE 6

    Robot with a Face

    [0185] FIG. 7 illustrates a special purpose robot 700 designed to implement an embodiment of this invention. The robot receives input through a camera 710 and at least one microphone 720. The output is provided through a screen 730, which displays the face of a person 732, or non-human being, which is either selected by the user or provided by default. There is also at least one speaker 740. The robot further has joints 750, which it can move in order to make gestures.

    [0186] The logic implementing the invention operates in a manner essentially identical to that illustrated in FIG. 4. In a preferred embodiment, all of the logic is internal to the robot. However, other embodiments, such as a processor external to the robot connecting to the robot via the Internet or via a local connection, are possible.

    [0187] There are some notable differences between the present embodiment and that illustrated in FIG. 4. In a preferred embodiment, the internet connection, which is essential for conversation logic 450 of FIG. 4 is provided by WiFi router 540 and the robot 700 is able to connect to WiFi. Alternatively, the robot 700 could connect to the internet through a cellular network or through an Ethernet cable. In addition to determining words, voice tone, and facial expression, the conversation logic 450 can now suggest gestures, e.g., wave the right hand, point middle finger, etc. to the robot.

    [0188] In one embodiment, the camera is mobile, and the robot rotates the camera so as to continue looking at the user when the user moves. Further, the camera is a three-dimensional camera comprising a structured light illuminator. Preferably, the structured light illuminator is not in a visible frequency, thereby allowing it to ascertain the image of the user's face and all of the contours thereon.

    [0189] Structured light involves projecting a known pattern of pixels (often grids or horizontal bars) on to a scene. These patterns deform when striking surfaces, thereby allowing vision systems to calculate the depth and surface information of the objects in the scene. For the present invention, this feature of structured light is useful to calculate and to ascertain the facial features of the user. Structured light could be outside the visible spectrum, for example, infrared light. This allows for the robot to effectively detect the user's facial features without the user being discomforted.

    [0190] In a preferred embodiment, the robot is completely responsive to voice prompts and has very few button, all of which are rather larger. This embodiment is preferred because it makes the robot easier to use for elderly and disabled people who might have difficulty pressing small buttons.

    [0191] In this disclosure, we have described several embodiments of this broad invention. Persons skilled in the art will definitely have other ideas as to how the teachings of this specification can be used. It is not our intent to limit this broad invention to the embodiments described in the specification. Rather, the invention is limited by the following claims.

    [0192] With reference to FIG. 8, a generic system, such as disclosed in U.S. Pat. No. 7,631,317, for processing program instructions is shown which includes a general purpose computing device in the form of a conventional personal computer 20, including a processing unit 21, a system memory 22, and a system bus 23 that couples various system components including the system memory to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory includes read only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system 26 (BIOS) containing the basic routines that help to transfer information between elements within the personal computer 20, such as during start-up, is stored in ROM 24. In one embodiment of the present invention on a server computer 20 with a remote client computer 49, commands are stored in system memory 22 and are executed by processing unit 21 for creating, sending, and using self-descriptive objects as messages over a message queuing network in accordance with the invention. The personal computer 20 further includes a hard disk drive 27 for reading from and writing to a hard disk, not shown, a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD-ROM or other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 20. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 29 and a removable optical disk 31, it should be appreciated by those skilled in the art that other types of computer-readable media which can store data that is accessible by a computer, such as flash memory, network storage systems, magnetic cassettes, random access memories (RAM), read only memories (ROM), and the like, may also be used in the exemplary operating environment.

    [0193] A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more application programs 36, other program modules 37, and program data 38. A user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 21 through a serial data interface 46 that is coupled to the system bus, but may be collected by other interfaces, such as a parallel port, game port or a universal serial bus (USB). A monitor 47 or another type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.

    [0194] The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49, through a packet data network interface to a packet switch data network. The remote computer 49 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in FIG. 8. The logical connections depicted in FIG. 8 include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

    [0195] When used in a LAN networking environment, the personal computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other elements for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other elements for establishing a communications link between the computers may be used.

    [0196] Typically, a digital data stream from a superconducting digital electronic processing system may have a data rate which exceeds a capability of a room temperature processing system to handle. For example, complex (but not necessarily high data rate) calculations or user interface functions may be more efficiently executed on a general purpose computer than a specialized superconducting digital signal processing system. In that case, the data may be parallelized or decimated to provide a lower clock rate, while retaining essential information for downstream processing.

    [0197] The present embodiments are to be considered in all respects as illustrative and not restrictive, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. The invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The disclosure shall be interpreted to encompass all of the various combinations and permutations of the elements, steps, and claims disclosed herein, to the extent consistent, and shall not be limited to specific combinations as provided in the detailed embodiments.