CLOUD-BASED AUTOMATED ASSISTANT FOR HANDLING INCOMING TELEPHONE CALL(S)
20250280068 ยท 2025-09-04
Inventors
- Dekel Auster (Tel Aviv, IL)
- Yoav TZUR (Tel Aviv, IL)
- Ran El Manor (Savyon, IL)
- Ohad Dekel (Ganey Tikva, IL)
- Liran Peretz (Tel Mond, IL)
Cpc classification
H04M3/4365
ELECTRICITY
H04M3/54
ELECTRICITY
G08B3/1008
PHYSICS
G10L13/02
PHYSICS
International classification
H04M3/436
ELECTRICITY
H04M3/42
ELECTRICITY
G10L13/02
PHYSICS
G08B5/22
PHYSICS
H04M3/54
ELECTRICITY
G08B3/10
PHYSICS
Abstract
Implementations are directed to a cloud-based automated assistant that can handle incoming telephone calls on behalf of a user. In some implementations, processor(s) of a remote server can receive an indication of an incoming telephone call that is directed to a client device and via a cloud-based automated assistant executing at least in part at the remote server (e.g., that is remote from the client device). The processor(s) can determine a next action to be implemented by the cloud-based automated assistant and cause the next action to be implemented by the cloud-based automated assistant. In some implementations, the indication of the incoming telephone call can be directed to the cloud-based automated assistant (e.g., without the client device routing the incoming telephone call to the cloud-based automated assistant). In other implementations, the indication of the incoming telephone call is routed to the cloud-based automated assistant via the client device.
Claims
1. A method implemented by one or more processors, the method comprising: receiving an indication of an incoming telephone call, the incoming telephone call being directed to a client device of a user, and the indication of the incoming telephone call being received via a cloud-based automated assistant executing at least in part at a remote server that is remote from the client device; determining whether to cause the cloud-based automated assistant to answer the incoming telephone call; in response to determining to cause the cloud-based automated assistant to answer the incoming telephone call: causing the cloud-based automated assistant to answer the incoming telephone call and conduct a conversation with a caller that initiated the incoming telephone call; determining, based on the cloud-based automated assistant conducting conversation with the caller that initiated the incoming telephone call, a next action to be implemented by the cloud-based automated assistant; and causing the next action to be implemented by the cloud-based automated assistant.
2. The method of claim 1, wherein determining whether to cause the cloud-based automated assistant to answer the incoming telephone call is based on a caller telephone number associated with the caller that initiated the incoming telephone call.
3. The method of claim 2, wherein determining whether to cause the cloud-based automated assistant to answer the incoming telephone call based on the caller telephone number associated with the caller that initiated the incoming telephone call comprises: comparing the caller telephone number to a deny list of telephone numbers; and determining, based on comparing the caller telephone number to the deny list of telephone numbers; whether the caller telephone number is included on the deny list of telephone numbers, wherein determining to cause the cloud-based automated assistant to answer the incoming telephone call is in response to determining that the caller telephone number is not included on the deny list of telephone numbers.
4. The method of claim 3, further comprising: in response to determining that the caller telephone number is included on the deny list of telephone numbers: refraining from causing the cloud-based automated assistant to answer the incoming telephone call.
5. The method of claim 4, wherein no indication of the incoming telephone call is rendered at the client device of the user.
6. The method of claim 1, wherein causing the cloud-based automated assistant to conduct the conversation with the caller that initiated the incoming telephone call comprises: causing the cloud-based automated assistant to generate one or more corresponding instances of synthesized speech; causing the cloud-based automated assistant to render one or more of the corresponding instances of the synthesized speech, at an additional client device of the caller, to conduct the conversation with the caller that initiated the incoming telephone call; receiving one or more corresponding instances of audio data that each capture speech of the caller; and determining, based on one or more of the corresponding instances of the audio data, the next action to be implemented by the cloud-based automated assistant.
7. The method of claim 6, further comprising: determining, based on one or more of the corresponding instances of the audio data, to forward the incoming telephone call to the client device of the user as the next action to be implemented by the cloud-based automated assistant; and causing the incoming telephone call to be forwarded to the client device.
8. The method of claim 7, wherein causing the incoming telephone call to be forwarded to the client device causes the client device to ring based on the incoming telephone call.
9. The method of claim 7, wherein determining to forward the incoming telephone call to the client device as the next action to be implemented by the cloud-based automated assistant is further based on a user state of the user and/or a client device state of the client device indicating that the user is currently available to answer the incoming telephone call at the client device.
10. The method of claim 6, further comprising: determining, based on one or more of the corresponding instances of the audio data, to generate a notification associated with the incoming telephone call as the next action to be implemented by the cloud-based automated assistant; and causing the notification to be rendered at the client device of the user.
11. The method of claim 10, wherein causing the notification to be rendered at the client device of the user comprises causing notification data to be transmitted to the client device that, when received, causes the client device to visually render the notification and without causing the client device to ring based on the incoming telephone call.
12. The method of claim 10, wherein determining to generate the notification associated with the telephone call as the next action to be implemented by the cloud-based automated assistant is further based on a user state of the user and/or a client device state of the client device indicating that the user is not currently available to answer the incoming telephone call.
13. The method of claim 6, further comprising: determining, based on one or more of the corresponding instances of the audio data, to forward the incoming telephone call to a further additional client device of the user as the next action to be implemented by the cloud-based automated assistant, the further additional client device being associated with a different telephone number than that of callee telephone number associated with the client device; and causing the incoming telephone call to be forwarded to the further additional client device.
14. The method of claim 13, wherein causing the incoming telephone call to be forwarded to the further additional client device causes the further additional client device to ring based on the incoming telephone call.
15. The method of claim 13, wherein determining to forward the incoming telephone call to the further additional client device as the next action to be implemented by the cloud-based automated assistant is further based on a user state of the user and/or a client device state of the client device indicating that the user is currently available to answer the incoming telephone call at the further additional client device, but not at the client device.
16. The method of claim 6, further comprising: determining, based on one or more of the corresponding instances of the audio data, to terminate the incoming telephone call.
17. The method of claim 16, wherein determining to terminate the incoming telephone call is based on content included in one or more of the corresponding instances of the audio data.
18. The method of claim 6, further comprising: determining, based on one or more of the corresponding instances of the audio data, to instruct the caller that initiated the incoming telephone call to leave a voicemail, wherein determining to instruct the caller that initiated the incoming telephone call to leave the voicemail as the next action to be implemented by the cloud-based automated assistant is further based on a user state of the user and/or a client device state of the client device indicating that the user is not currently available to answer the incoming telephone call; and causing the voicemail to be forwarded to the client device, wherein causing the voicemail to be forwarded to the client device causes the client device to render an indication of the voicemail.
19. A system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the at least one processor to be operable to: receive an indication of an incoming telephone call, the incoming telephone call being directed to a client device of a user, and the indication of the incoming telephone call being received via a cloud-based automated assistant executing at least in part at a remote server that is remote from the client device; determine whether to cause the cloud-based automated assistant to answer the incoming telephone call; in response to determining to cause the cloud-based automated assistant to answer the incoming telephone call: cause the cloud-based automated assistant to answer the incoming telephone call and conduct a conversation with a caller that initiated the incoming telephone call; determine, based on the cloud-based automated assistant conducting conversation with the caller that initiated the incoming telephone call, a next action to be implemented by the cloud-based automated assistant; and cause the next action to be implemented by the cloud-based automated assistant.
20. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to be operable to perform operations, the operations comprising: receiving an indication of an incoming telephone call, the incoming telephone call being directed to a client device of a user, and the indication of the incoming telephone call being received via a cloud-based automated assistant executing at least in part at a remote server that is remote from the client device; determining whether to cause the cloud-based automated assistant to answer the incoming telephone call; in response to determining to cause the cloud-based automated assistant to answer the incoming telephone call: causing the cloud-based automated assistant to answer the incoming telephone call and conduct a conversation with a caller that initiated the incoming telephone call; determining, based on the cloud-based automated assistant conducting conversation with the caller that initiated the incoming telephone call, a next action to be implemented by the cloud-based automated assistant; and causing the next action to be implemented by the cloud-based automated assistant.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
DETAILED DESCRIPTION
[0019] Turning now to
[0020] The user input engine 111 can detect various types of user input at the client device 110. In some examples, the user input detected at the client device 110 can include spoken utterance(s) of a human user of the client device 110 that is detected via microphone(s) of the client device 110. In these examples, the microphone(s) of the client device 110 can generate audio data that captures the spoken utterance(s). In other examples, the user input detected at the client device 110 can include touch input of a human user of the client device 110 that is detected via user interface input device(s) (e.g., touch sensitive display(s)) of the client device 110, and/or typed input detected via user interface input device(s) (e.g., touch sensitive display(s) and/or keyboard(s)) of the client device 110. In these examples, the user interface input device(s) of the client device 110 can generate textual data that captures the touch input and/or the typed input.
[0021] The rendering engine 112 can cause content and/or other output to be visually rendered for presentation to the user at the client device 110 (e.g., via a touch sensitive display or other user interface output device(s)) and/or audibly rendered for presentation to the user at the client device 110 (e.g., via speaker(s) or other user interface output device(s)). The content and/or other output can include, for example, a transcript of a dialog between a user of the client device 110 and an automated assistant 115 executing at least in part at the client device 110, a transcript of a dialog between the automated assistant 115 executing at least in part at the client device 110 and an additional user that is in addition to the user of the client device 110, notifications, selectable graphical elements, and/or any other content and/or output described herein.
[0022] The state engine 113 can determine a user state of the user of the client device 110 and/or a client device state of the client device 110 (and/or client device states of other client devices of the user). The user state can, for example, indicate whether the user is available to answer the incoming telephone call at the client device 110 (and/or other client devices of the user), indicate whether the user is a threshold distance away from the client device 110 (and/or other client devices of the user), and/or other user state information. Further, the user state can be determined based on, for example, calendar information associated with the user (e.g., stored in databases 198), software application activity data associated with the user (e.g., stored in the databases 198), user profile data associated with the user (e.g., stored in the databases 198), or sensor data generated by the client device 110 (and/or generated by other client devices of the user). Further, the client device state can, for example, indicate whether a state of charge of the client device 110 (and/or other client devices of the user) is below a threshold state of charge, indicate whether a mode of the client device 110 (and/or other client devices of the user) is associated with the user's current availability, indicate whether the client device 110 (and/or other client devices of the user) is a threshold distance away from the user, and/or other client device state information. Further, the client device state can be determined based on, for example, sensor data generated by the client device 110 (and/or generated by other client devices of the user).
[0023] Further, the client device 110 is illustrated in
[0024] The automated assistant system 120 can leverage various databases. For instance, and as noted above, the ML model engine 130 can the leverage ML models database 130A that stores various ML models; the phone call answering engine 150 can leverage deny list database 150A that stores telephone numbers associated with telemarketers, scammers, spammers, etc.; and the client device 110 and/or the automated assistant system 120 can leverage databases 198 for other information described herein (e.g., calendar data, software application data, user profile data, sensor data, and/or other data). Although
[0025] Moreover, the client device 110 can execute the automated assistant system client 114. An instance of the automated assistant system client 114 can be an application that is separate from an operating system of the client device 110 (e.g., installed on top of the operating system)-or can alternatively be implemented directly by the operating system of the client device 110. The automated assistant system client 114 can communicate automated assistant system 120 via one or more of the networks 199 (e.g., as shown in
[0026] Furthermore, the client device 110 and/or the automated telephone call system 120 may include one or more memories for storage of data and software applications, one or more processors for accessing data and executing the software applications, and other components that facilitate communication over one or more of the networks 199. In some implementations, one or more of the software applications can be installed locally at the client device 110, whereas in other implementations one or more of the software applications can be hosted remotely from the client device 110 (e.g., by one or more servers), but accessible by the client device 110 over one or more of the networks 199.
[0027] As described herein, the automated assistant system 120 can be utilized to handle incoming telephone calls that are directed to the client device 110. By handling the incoming telephone calls that are directed to the client device 110, the automated assistant system 120 can serve as a cloud-based personal call assistant. In some implementations, the incoming telephone call can be initially routed to the cloud-based personal call assistant and without the client device 110 ringing to indicate that there is an incoming telephone card. In some versions of those implementations, the cloud-based personal call assistant may determine that a caller that initiated the incoming telephone call is a known telemarketer, scammer, spammer, etc., and refrain from answering the telephone call. As a result, the cloud-based personal call assistant may refrain from answering the incoming telephone call and the user may not be aware of the incoming telephone call, thereby mitigating and/or eliminating instances of these telephone calls. Further, computational resources and/or battery resources can be conserved at the client device in these implementations. In some versions of those implementations, the cloud-based personal call assistant may answer the incoming telephone call to conduct a conversation with the caller on behalf of the user and determine a next action to be implemented based at least in part on the conversation (and optionally the user state and/or the client device state). Various next actions to be implemented are described in more detail herein (e.g., with respect to
[0028] The conversations described herein can be conducted by the automated assistant system 120. For example, the conversations can be performed during telephone calls conducted using Voice over Internet Protocol (VoIP), public switched telephone networks (PSTN), and/or other telephonic communication protocols. Further, the conversations described herein are automated in that the automated assistant system 120 conducts the conversations and determines the next action to be implemented using one or more of the components depicted in
[0029] In various implementations, the ASR engine 131 can process, using ASR model(s) stored in the ML models database 130A (e.g., a recurrent neural network (RNN) model, a transformer model, and/or any other type of ML model capable of performing ASR), audio data that captures a spoken utterance and that is generated by microphone(s) of the client device 110 (or microphone(s) of an additional client device) to generate ASR output. Further, the NLU engine 132 can process, using NLU model(s) stored in the ML models database 130A (e.g., a long short-term memory (LSTM), gated recurrent unit (GRU), and/or any other type of RNN or other ML model capable of performing NLU) and/or NLU rule(s), the ASR output (or other typed or touch inputs received via the user input engine 111 of the client device 110) to generate NLU output. Moreover, the fulfillment engine 133 can process, using fulfillment model(s) and/or fulfillment rules stored in the ML models database 130A, the NLU data to generate fulfillment output. Additionally, the TTS engine 134 can process, using TTS model(s) stored in the ML models database 130A, textual content (e.g., text formulated by the automated assistant 115) to generate synthesized speech audio data that includes computer-generated synthesized speech. Furthermore, in various implementations, the LLM engine 135 can replace one or more of the aforementioned components. For instance, the LLM engine 135 can replace the NLU engine 132 and/or the fulfillment engine 133. In these implementations, the LLM engine 135 can process, using LLM(s) stored in the ML models database 130A (e.g., PaLM, BARD, BERT, LaMDA, Meena, GPT, and/or any other LLM, such as any other LLM that is encoder-only based, decoder-only based, sequence-to-sequence based and that optionally includes an attention mechanism or other memory), the ASR output (or other typed or touch inputs received via the user input engine 111 of the client device 110) to generate LLM output.
[0030] In various implementations, the ASR output can include, for example, a plurality of speech hypotheses (e.g., term hypotheses and/or transcription hypotheses) that are predicted to correspond to spoken utterance(s) based on the processing of audio data that captures the spoken utterance(s). The ASR engine 131 can optionally select a particular speech hypotheses as recognized text for the spoken utterance(s) based on a corresponding value associated with each of the plurality of speech hypotheses (e.g., probability values, log likelihood values, and/or other values). In various implementations, the ASR model(s) stored in the ML model(s) database 130A are end-to-end speech recognition model(s), such that the ASR engine 131 can generate the plurality of speech hypotheses directly using the ASR model(s). For instance, the ASR model(s) can be end-to-end model(s) used to generate each of the plurality of speech hypotheses on a character-by-character basis (or other token-by-token basis). One non-limiting example of such end-to-end model(s) used to generate the recognized text on a character-by-character basis is a recurrent neural network transducer (RNN-T) model. An RNN-T model is a form of sequence-to-sequence model that does not employ attention mechanisms or other memory. In other implementations, the ASR model(s) are not end-to-end speech recognition model(s) such that the ASR engine 131 can instead generate predicted phoneme(s) (and/or other representations). For instance, the predicted phoneme(s) (and/or other representations) may then be utilized by the ASR engine 131 to determine a plurality of speech hypotheses that conform to the predicted phoneme(s). In doing so, the ASR engine 131 can optionally employ a decoding graph, a lexicon, and/or other resource(s). In various implementations, a corresponding transcription that includes the recognized text can be rendered at the client device 110.
[0031] In various implementations, the NLU output can include, for example, annotated recognized text that includes one or more annotations of the recognized text for one or more (e.g., all) of the terms of the recognized text. For example, the NLU engine 132 may include a part of speech tagger (not depicted) configured to annotate terms with their grammatical roles. Additionally, or alternatively, the NLU engine 132 may include an entity tagger (not depicted) configured to annotate entity references in one or more segments of the recognized text, such as references to people (including, for instance, literary characters, celebrities, public figures, etc.), organizations, locations (real and imaginary), and so forth. In some implementations, data about entities may be stored in one or more databases, such as in a knowledge graph (not depicted). In some implementations, the knowledge graph may include nodes that represent known entities (and in some cases, entity attributes), as well as edges that connect the nodes and represent relationships between the entities. The entity tagger may annotate references to an entity at a high level of granularity (e.g., to enable identification of all references to an entity class such as people) and/or a lower level of granularity (e.g., to enable identification of all references to a particular entity such as a particular person). The entity tagger may rely on content of the natural language input to resolve a particular entity and/or may optionally communicate with a knowledge graph or other entity database to resolve a particular entity. Additionally, or alternatively, the NLU engine 132 may include a coreference resolver (not depicted) configured to group, or cluster, references to the same entity based on one or more contextual cues. For example, the coreference resolver may be utilized to resolve the term them to buy theatre tickets in the natural language input buy them, based on theatre tickets being mentioned in a client device notification rendered immediately prior to receiving input buy them. In some implementations, one or more components of the NLU engine 132 may rely on annotations from one or more other components of the NLU engine 132. For example, in some implementations the entity tagger may rely on annotations from the coreference resolver in annotating all mentions to a particular entity. Also, for example, in some implementations, the coreference resolver may rely on annotations from the entity tagger in clustering references to the same entity. Also, for example, in some implementations, the coreference resolver may rely on user data of the user of the client device 110 in coreference resolution and/or entity resolution. The user data may include, for example, historical location data, historical temporal data, user preference data, user account data, calendar information, email data, and/or any other user data that is accessible at the client device 110.
[0032] In various implementations, the fulfillment output can include, for example, one or more tasks to be performed by the automated assistant 115. For example, the user can provide unstructured free-form natural language input in the form of spoken utterance(s). The spoken utterance(s) can include, for instance, an indication of the one or more tasks to be performed by the automated assistant 115. The one or more tasks may require the automated assistant 115 to provide certain information to the user, engage with one or more external systems on behalf of the user (e.g., an inventory system, a reservation system, etc. via a remote procedure call (RPC)), and/or any other task that may be specified by the user and performed by the automated assistant 115. Accordingly, it should be understood that the fulfillment output may be based on the one or more tasks to be performed by the automated assistant 115 and may be dependent on the corresponding conversations with the user.
[0033] In various implementations, the TTS engine 134 can generate synthesized speech audio data that captures computer-generated synthesized speech. The synthesized speech audio data can be rendered at the client device 110 via speaker(s) of the client device 110 and/or other client devices described herein. The synthesized speech may include any output generated by the automated assistant 115 as described herein, and may include, for example, synthesized speech generated as part of a dialog between the user of the client device 110 and the automated assistant 115, as part of an automated telephone call between the automated assistant 115 and a representative associated with an entity (e.g., a human representative associated with the entity, an automated assistant representative associated with the entity, and interactive voice response (IVR) system associated with the entity, etc.), and so on.
[0034] In various implementations, the LLM output can include, for example, a probability distribution over a sequence of tokens, such as words, phrases, or other semantic units, that are predicted to be responsive to the spoken utterance(s) or other user inputs provided by the user of the client device 110 and/or other users (e.g., the representative associated with the entity). Notably, the LLM(s) stored in the ML model(s) database 130A can include billions of weights and/or parameters that are learned through training the LLM on enormous amounts of diverse data. This enables these LLM(s) to generate the LLM output as the probability distribution over the sequence of tokens. In these implementations, the LLM engine 135 can replace the NLU engine 132 and/or the fulfillment engine 133 since these LLM(s) can perform the same or similar functionality in terms of natural language processing.
[0035] Although
[0036] Referring now to
[0037] Further, and in response to receiving the indication of the incoming telephone call, the phone call answering engine 150 can determine whether to answer the incoming telephone call as indicated by 202. In some implementations, the phone call answering engine 150 can compare a caller telephone number associated with the incoming telephone call to a plurality of telephone numbers included in a deny list of telephone numbers (e.g., stored in the deny list database 150A). The deny list can include telephone numbers associated with known telemarketers, spammers, scammers, etc. Notably, the deny list of telephone numbers can be dynamically updated over time as telephone numbers associated with telemarketers, spammers, scammers, etc. are identified by users (e.g., the user of the client device 110 and/or other users) and/or by other means.
[0038] Assuming that the phone call answering engine 150 determines to refrain from answering the incoming telephone call (e.g., the telephone number associated with the caller is included on the deny list of telephone numbers), then the phone call answering engine 150 can cause the incoming phone call to be terminated as indicated by 203 (and optionally without alerting the user of the client device 110 to the incoming telephone call). However, and assuming that the phone call answering engine 150 determines to answer the incoming telephone call (e.g., the telephone number associated with the caller is not included on the deny list of telephone numbers), then the phone call answering engine 150 can cause the incoming phone call to be answered as indicated by 204.
[0039] In response to answering the incoming telephone call, the conversation engine 160 can cause a cloud-based automated assistant to conduct a conversation with the caller during a telephone call. For instance, and in response to answering the incoming telephone call, the conversation engine 160 can generate a corresponding instance of synthesized speech 205 (e.g., using the TTS model(s) stored in the ML model(s) database 130A) that corresponds to text formulated by the cloud-based automated assistant. The corresponding instance of synthesized speech 205 can, for example, greet the caller, identify itself as an automated assistant employed by the user of the client device 110, ask the caller a certain reason with respect to why the caller is calling the user of the client device 110, etc. Further, the conversation engine 160 can cause the corresponding instance of synthesized speech to be rendered at an additional client device 210 that is associated with the caller and over one or more telephonic networks (e.g., VoIP, PSTN, etc.). Moreover, and in response to rendering the corresponding instance of synthesized speech at the additional client device 210, the conversation engine 160 can receive a corresponding instance of audio data 206 that captures speech of the caller, and over one or more of the telephonic networks, and cause the corresponding instance of audio data 206 to be processed (e.g., using the ASR model(s) stored in the ML model(s) database 130A and optionally the LLM(s), NLU model(s), and/or fulfillment model(s) stored in the ML model(s) database 130A). The corresponding instance of audio data 206 can, for example, identify the caller, identify the certain reason with respect to why the caller is calling the user of the client device 110, and/or include other content.
[0040] Notably, while the cloud-based automated assistant is conducting the conversation, the next action engine 170 can process the corresponding instance of synthesized speech 205 and/or the corresponding instance of audio data 206 to determine a next action 208 to be implemented by the cloud-based automated assistant. Although the process flow 200 of
[0041] In some implementations, the state engine 113 can provide an indication of user state(s) of a user of the client device 110 and/or client device state(s) of client device(s) of the user of the client device 110 as indicated by 207. In these implementations, the next action can additionally process the indication of the user state(s) and/or the client device state(s) in determining the next action to be implemented. How the indication of the user state(s) and/or the client device state(s) can be utilized to influence which of the various next actions to be implemented is described in more detail herein (e.g., with respect to
[0042] Turning now to
[0043] At block 352, the system receives an indication of an incoming telephone call. The incoming telephone call may be directed to a client device of a user, and the indication of the incoming telephone call can be received via a cloud-based automated assistant that is executed at least in part by the system.
[0044] At block 354, the system determines whether to answer the incoming telephone call. The system can determine whether to answer the incoming telephone call based on a caller telephone number associated with a caller that initiated the incoming telephone call. For example, the system can compare the caller telephone number to a deny list of telephone numbers. In this example, if the caller telephone number is included on the deny list of telephone numbers, then the system may determine not to answer the incoming telephone call. However, if the caller telephone number is not included on the deny list of telephone numbers, then the system may determine to answer the incoming telephone call.
[0045] If, at an iteration of block 354, the system determines not to answer the incoming telephone call, the system proceeds to block 356. At block 356, the system refrains from answering the incoming telephone call. The method 300 may end and initiate an additional iteration of the method 300 in response to receiving an indication of an additional incoming telephone call.
[0046] If, at an iteration of block 354, the system determines not to answer the incoming telephone call, the system proceeds to block 358. At block 358, the system answers the incoming telephone call. At block 360, the system causes a cloud-based automated assistant to conduct a conversation during a telephone call and with a caller that initiated the incoming telephone call. The system can cause the cloud-based automated assistant to conduct the conversation by generating corresponding instances of synthesized speech and by rendering the corresponding instances of synthesized speech at an additional client device of the caller that initiated the incoming telephone call. Further, the system can cause the cloud-based automated assistant to conduct the conversation by receiving corresponding instances of audio data (e.g., capturing speech of the caller) and by processing the corresponding instances of audio data.
[0047] At block 362, the system determines a next action to be implemented. The system can determine the next action to be implemented based on content of the conversation between the cloud-based automated assistant and the caller. In some implementations, the system can determine the next action further based on an indication of user state(s) of a user of the client device and/or client device state(s) of client device(s) of the user of the client device. In determining the next action, the system can seek to appropriately handle the telephone call, but while also minimizing computational resources consumed in interacting with the caller, network resources consumed in interacting with the caller, and/or battery resources at the client device of the user.
[0048] For example, at an iteration of block 362, the system can determine to terminate the telephone call. In this example, the system can proceed to block 364. At block 364, the system causes the cloud-based automated assistant to terminate the telephone call. An example of causing the cloud-based automated assistant to terminate the telephone call (e.g., via the termination engine 171) is described in more detail with respect to
[0049] As another example, at an iteration of block 362, the system can determine to forward the telephone call. In this example, the system can proceed to block 366. At block 366, the system determines a client device to which the telephone call is to be forwarded. For instance, at an iteration of block 366, the system can determine to forward the telephone call to a client device of a user (e.g., the client device of the user to which the incoming telephone call was initially directed). In this instance, the system proceeds to block 368. At block 368, the system causes the telephone call to be forwarded to the client device of the user. An example of causing the cloud-based automated assistant to forward the telephone call to the client device of the user (e.g., via the forwarding engine 172) is described in more detail with respect to
[0050] Also, for instance, at an iteration of block 366, the system can determine to forward the telephone call to an additional client device of a user. In this instance, the system proceeds to block 370. At block 370, the system causes the telephone call to be forwarded to the additional client device of the user (e.g., the additional client device being in addition to the client device of the user to which the incoming telephone call was initially directed, and the additional client device optionally being associated with a different telephone number). An example of causing the cloud-based automated assistant to forward the telephone call to the additional client device of the user (e.g., via the forwarding engine 172) is described in more detail with respect to
[0051] As yet another example, at an iteration of block 362, the system can determine to instruct the caller to leave a voicemail based on the conversation during the telephone call. In this example, the system can proceed to block 372. At block 372, the system causes the cloud-based automated assistant to instruct the caller to leave the voicemail. The system may then cause an indication of the voicemail to be transmitted to the client device of the user and/or the additional client device of the user (e.g., in the same or similar described with respect to block 366). An example of causing the cloud-based automated assistant to instruct the caller to leave the voicemail (e.g., via the voicemail engine 173) is described in more detail with respect to
[0052] As yet another example, at an iteration of block 362, the system can determine to generate a notification based on the conversation during the telephone call. In this example, the system can proceed to block 374. At block 374, the system causes the cloud-based automated assistant to generate, based on the conversation during the telephone call, a notification. The system may then cause an indication of the notification to be transmitted to the client device of the user and/or the additional client device of the user (e.g., in the same or similar described with respect to block 366). An example of causing the cloud-based automated assistant to generate the notification (e.g., via the notification engine 174) is described in more detail with respect to
[0053] Although certain next actions are described with respect to
[0054] Turning now to
[0055] At block 452, the system receives an incoming telephone call. The incoming telephone call may be directed to a client device of a user. In contrast with the method 300 of
[0056] At block 454, the system determines a user state of a user and/or a client device state of a client device of the user. The user state can, for example, indicate whether the user is available to answer the incoming telephone call at the client device, indicate whether the user is a threshold distance away from the client device, and/or other user state information. Further, the system can determine the user state based on, for example, calendar information associated with the user, software application activity data associated with the user, user profile data associated with the user, or sensor data generated by the client device. The client device state can, for example, indicate whether a state of charge of the client device is below a threshold state of charge, indicate whether a mode of the client device is associated with the user's current availability, indicate whether the client device is a threshold distance away from the user, and/or other client device state information. Further, the system can determine the client device state based on, for example, sensor data generated by the client device.
[0057] At block 456, the system determines whether to forward an indication of the incoming telephone call to a cloud-based automated assistant. The system can determine whether to forward the indication of the incoming telephone call to the cloud-based automated assistant based on, for example, the user state and/or the client device state. For example, if the user state indicates that the user is in a meeting or otherwise unavailable based on the calendar information associated with the user, that the user is engaged in a certain activity based on the software application activity data associated with the user (e.g., the user is actively using a workout or run tracking application, the user is actively using a video conference application, etc.), that the user is not near the client device (e.g., based on proximity sensor data), then the system can determine to forward the indication of the incoming telephone call to the cloud-based automated assistant. Additionally, or alternatively, if the client device state indicates that the client device has little/no battery or that the client device is in a do not disturb mode, then the system can determine to forward the indication of the incoming telephone call to the cloud-based automated assistant.
[0058] If, at an iteration of block 456, the system determines not to forward the indication of the incoming telephone call to the cloud-based automated assistant, then the system proceeds to block 458. At block 458, the system causes the client device to ring based on the incoming telephone call. Put another way, the system can cause the client device to ring like the client device typically does in response to receiving incoming telephone calls.
[0059] If, at an iteration of block 456, the system determines to forward the indication of the incoming telephone call to the cloud-based automated assistant, then the system proceeds to block 460. At block 460, the system transmits the indication of the incoming telephone call to the cloud-based automated assistant. Put another way, the system can proceed to block 354 of the method 300 of
[0060] Turning now to
[0061] The display 190 of the client device 110 in
[0062] For the sake of example throughout
[0063] Referring specifically to
[0064] Referring specifically to
[0065] For instance, the forwarding engine 172 can determine to forward the telephone call to the client device 110 of the user. In this instance, and in determining where to forward the telephone call, the forwarding engine 172 can consider a user state of the user of the client device 110 and/or a client device state of the client device 110. In the example of
[0066] In contrast, and referring specifically to
[0067] Referring specifically to
[0068] Referring specifically to
[0069] Although particular content of conversations is described above with respect to
[0070] Turning now to
[0071] Computing device 610 typically includes at least one processor 614 which communicates with a number of peripheral devices via bus subsystem 612. These peripheral devices may include a storage subsystem 624, including, for example, a memory subsystem 625 and a file storage subsystem 626, user interface output devices 620, user interface input devices 622, and a network interface subsystem 616. The input and output devices allow user interaction with computing device 610. Network interface subsystem 616 provides an interface to outside networks and is coupled to corresponding interface devices in other computing devices.
[0072] User interface input devices 622 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a touchscreen incorporated into the display (e.g., a touch sensitive display), audio input devices such as voice recognition systems, microphones, and/or other types of input devices. In general, use of the term input device is intended to include all possible types of devices and ways to input information into computing device 610 or onto a communication network.
[0073] User interface output devices 620 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices. The display subsystem may include a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem may also provide non-visual display such as via audio output devices. In general, use of the term output device is intended to include all possible types of devices and ways to output information from computing device 610 to the user or to another machine or computing device.
[0074] Storage subsystem 624 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 624 may include the logic to perform selected aspects of the methods disclosed herein, as well as to implement various components depicted in
[0075] These software modules are generally executed by processor 614 alone or in combination with other processors. Memory 625 used in the storage subsystem 624 can include a number of memories including a main random-access memory (RAM) 630 for storage of instructions and data during program execution and a read only memory (ROM) 632 in which fixed instructions are stored. A file storage subsystem 626 can provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a CD-ROM drive, an optical drive, or removable media cartridges. The modules implementing the functionality of certain implementations may be stored by file storage subsystem 626 in the storage subsystem 624, or in other machines accessible by the processor(s) 614.
[0076] Bus subsystem 612 provides a mechanism for letting the various components and subsystems of computing device 610 communicate with each other as intended. Although bus subsystem 612 is shown schematically as a single bus, alternative implementations of the bus subsystem 612 may use multiple busses.
[0077] Computing device 610 can be of varying types including a workstation, server, computing cluster, blade server, server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computing device 610 depicted in
[0078] In situations in which the systems described herein collect or otherwise monitor personal information about users, or may make use of personal and/or monitored information), the users may be provided with an opportunity to control whether programs or features collect user information (e.g., information about a user's social network, social actions or activities, profession, a user's preferences, or a user's current geographic location), or to control whether and/or how to receive content from the content server that may be more relevant to the user. Also, certain data may be treated in one or more ways before it is stored or used, so that personal identifiable information is removed. For example, a user's identity may be treated so that no personal identifiable information can be determined for the user, or a user's geographic location may be generalized where geographic location information is obtained (such as to a city, ZIP code, or state level), so that a particular geographic location of a user cannot be determined. Thus, the user may have control over how information is collected about the user and/or used.
[0079] In some implementations, a method implemented by one or more processors is provided, and includes: receiving an indication of an incoming telephone call, the incoming telephone call being directed to a client device of a user, and the indication of the incoming telephone call being received via a cloud-based automated assistant executing at least in part at a remote server that is remote from the client device; determining whether to cause the cloud-based automated assistant to answer the incoming telephone call; in response to determining to cause the cloud-based automated assistant to answer the incoming telephone call: causing the cloud-based automated assistant to answer the incoming telephone call and conduct a conversation with a caller that initiated the incoming telephone call; determining, based on the cloud-based automated assistant conducting conversation with the caller that initiated the incoming telephone call, a next action to be implemented by the cloud-based automated assistant; and causing the next action to be implemented by the cloud-based automated assistant.
[0080] These and other implementations of technology disclosed herein can optionally include one or more of the following features.
[0081] In some implementations, determining whether to cause the cloud-based automated assistant to answer the incoming telephone call may be based on a caller telephone number associated with the caller that initiated the incoming telephone call.
[0082] In some versions of those implementations, determining whether to cause the cloud-based automated assistant to answer the incoming telephone call based on the caller telephone number associated with the caller that initiated the incoming telephone call may include: comparing the caller telephone number to a deny list of telephone numbers; and determining, based on comparing the caller telephone number to the deny list of telephone numbers; whether the caller telephone number is included on the deny list of telephone numbers. Determining to cause the cloud-based automated assistant to answer the incoming telephone call may be in response to determining that the caller telephone number is not included on the deny list of telephone numbers.
[0083] In some further versions of those implementations, the method may further include, in response to determining that the caller telephone number is included on the deny list of telephone numbers: refraining from causing the cloud-based automated assistant to answer the incoming telephone call.
[0084] In some yet further versions of those implementation, no indication of the incoming telephone call may be rendered at the client device of the user.
[0085] In some implementations, causing the cloud-based automated assistant to conduct the conversation with the caller that initiated the incoming telephone call may include: causing the cloud-based automated assistant to generate one or more corresponding instances of synthesized speech; causing the cloud-based automated assistant to render one or more of the corresponding instances of the synthesized speech, at an additional client device of the caller, to conduct the conversation with the caller that initiated the incoming telephone call; receiving one or more corresponding instances of audio data that each capture speech of the caller; and determining, based on one or more of the corresponding instances of the audio data, the next action to be implemented by the cloud-based automated assistant.
[0086] In some versions of those implementations, the method may further include determining, based on one or more of the corresponding instances of the audio data, to forward the incoming telephone call to the client device of the user as the next action to be implemented by the cloud-based automated assistant; and causing the incoming telephone call to be forwarded to the client device.
[0087] In some further versions of those implementations, causing the incoming telephone call to be forwarded to the client device may cause the client device to ring based on the incoming telephone call.
[0088] In some additional or alternative further versions of those implementations, determining to forward the incoming telephone call to the client device as the next action to be implemented by the cloud-based automated assistant may be further based on a user state of the user and/or a client device state of the client device indicating that the user is currently available to answer the incoming telephone call at the client device.
[0089] In additional or alternative versions of those implementations, the method may further include determining, based on one or more of the corresponding instances of the audio data, to generate a notification associated with the incoming telephone call as the next action to be implemented by the cloud-based automated assistant; and causing the notification to be rendered at the client device of the user.
[0090] In some further versions of those implementations, causing the notification to be rendered at the client device of the user may include causing notification data to be transmitted to the client device that, when received, causes the client device to visually render the notification and without causing the client device to ring based on the incoming telephone call.
[0091] In some additional or alternative further versions of those implementations, determining to generate the notification associated with the telephone call as the next action to be implemented by the cloud-based automated assistant may be further based on a user state of the user and/or a client device state of the client device indicating that the user is not currently available to answer the incoming telephone call.
[0092] In additional or alternative versions of those implementations, the method may further include determining, based on one or more of the corresponding instances of the audio data, to forward the incoming telephone call to a further additional client device of the user as the next action to be implemented by the cloud-based automated assistant, the further additional client device being associated with a different telephone number than that of callee telephone number associated with the client device; and causing the incoming telephone call to be forwarded to the further additional client device.
[0093] In some further versions of those implementations, causing the incoming telephone call to be forwarded to the further additional client device may cause the further additional client device to ring based on the incoming telephone call.
[0094] In some additional or alternative further versions of those implementations, determining to forward the incoming telephone call to the further additional client device as the next action to be implemented by the cloud-based automated assistant may be further based on a user state of the user and/or a client device state of the client device indicating that the user is currently available to answer the incoming telephone call at the further additional client device, but not at the client device.
[0095] In additional or alternative versions of those implementations, the method may further include determining, based on one or more of the corresponding instances of the audio data, to terminate the incoming telephone call.
[0096] In some further versions of those implementations, determining to terminate the incoming telephone call may be based on content included in one or more of the corresponding instances of the audio data.
[0097] In additional or alternative versions of those implementations, the method may further include determining, based on one or more of the corresponding instances of the audio data, to instruct the caller that initiated the incoming telephone call to leave a voicemail; and causing the voicemail to be forwarded to the client device.
[0098] In some further versions of those implementations, causing the voicemail to be forwarded to the client device may cause the client device to render an indication of the voicemail.
[0099] In some additional or alternative further versions of those implementations, determining to instruct the caller that initiated the incoming telephone call to leave the voicemail as the next action to be implemented by the cloud-based automated assistant is further based on a user state of the user and/or a client device state of the client device indicating that the user is not currently available to answer the incoming telephone call.
[0100] In some implementations, a method implemented by one or more processors is provided, and includes: receiving an indication of an incoming telephone call, the incoming telephone call being directed to a client device of a user, and the indication of the incoming telephone call being received via a cloud-based automated assistant executing at least in part at a remote server that is remote from the client device; receiving an indication of a user state of the user and/or a client device state of the client device; determining, based on the user state of the user and/or the client device state of the client device, whether to forward the incoming telephone call to the client device or instruct a caller that initiated the incoming telephone call to leave a voicemail; in response to determining to instruct the caller that initiated the incoming telephone call to leave the voicemail: causing the cloud-based automated assistant to generate one or more corresponding instances of synthesized speech; causing the cloud-based automated assistant to render one or more of the corresponding instances of the synthesized speech, at an additional client device of the caller, to instruct the caller to leave the voicemail; and in response to determining that the caller has left the voicemail: causing the voicemail to be transmitted to the client device or an additional client device of the user.
[0101] These and other implementations of technology disclosed herein can optionally include one or more of the following features.
[0102] In some implementations, determining whether to forward the incoming telephone call to the client device or instruct the caller that initiated the incoming telephone call to leave a voicemail may be based on the user state of the user, and the user state of the user may indicate one or more of: that the user is not currently available to answer the incoming telephone call at the client device, or that the user is a threshold distance away from the client device.
[0103] In some versions of those implementations, the user state of the user may be determined based on calendar information associated with the user, software application activity data associated with the user, user profile data associated with the user, or sensor data generated by the client device.
[0104] In some implementations, determining whether to forward the incoming telephone call to the client device or instruct the caller that initiated the incoming telephone call to leave a voicemail may be based on the client device state of the client device, and the client device state of the client device may indicate one or more of: that a state of charge of the client device is below a threshold, that a mode of the client device indicates that the user is not currently available to answer the incoming telephone call at the client device, or that the client device is a threshold distance away from the user.
[0105] In some versions of those implementations, the client device state of the client device may be determined based on sensor data generated by the client device.
[0106] In some implementations, the method may further include, in response to determining to forward the incoming telephone call to the client device: causing the incoming telephone call to be forwarded to the client device.
[0107] In some versions of those implementations, causing the incoming telephone call to be forwarded to the client device may cause the client device to ring based on the incoming telephone call.
[0108] In some implementations, causing the voicemail to be forwarded to the client device may cause the client device to render an indication of the voicemail.
[0109] In some implementations, causing the voicemail to be transmitted to the client device or the additional client device may include causing the voicemail to be transmitted to the client device in response to determining that the client device has a threshold state of charge and is within a threshold distance of the user.
[0110] In some implementations, causing the voicemail to be transmitted to the client device or the additional client device may include causing the voicemail to be transmitted to the additional client device in response to determining that the client device does not have a threshold state of charge or is not within a threshold distance of the user.
[0111] In some implementations, a method implemented by one or more processors is provided, and includes: receiving an incoming telephone call, the incoming telephone call being directed to a client device of a user, and the incoming telephone call being received via an automated assistant executing at least in part at the client device; determining a user state of the user and/or a client device state of the client device; determining, based on the user state of the user and/or the client device state of the client device, whether to forward the incoming telephone call to a remote server to be handled by a cloud-based automated assistant executing at least in part the remote server; and in response to determining to forward the incoming telephone call to the remote server to be handled by the cloud-based automated assistant: transmitting, to the remote server, an indication of the incoming telephone call. Transmitting the indication of the incoming telephone call to the remote server causes the cloud-based automated assistant to: determine a next action to be implemented for handling the incoming telephone call; and cause the next action to be implemented for handling the incoming telephone call.
[0112] In addition, some implementations include one or more processors (e.g., central processing unit(s) (CPU(s)), graphics processing unit(s) (GPU(s), and/or tensor processing unit(s) (TPU(s)) of one or more computing devices, where the one or more processors are operable to execute instructions stored in associated memory, and where the instructions are configured to cause performance of any of the aforementioned methods. Some implementations also include one or more non-transitory computer readable storage media storing computer instructions executable by one or more processors to perform any of the aforementioned methods. Some implementations also include a computer program product including instructions executable by one or more processors to perform any of the aforementioned methods.
[0113] It should be appreciated that all combinations of the foregoing concepts and additional concepts described in greater detail herein are contemplated as being part of the subject matter disclosed herein. For example, all combinations of claimed subject matter appearing at the end of this disclosure are contemplated as being part of the subject matter disclosed herein.