ANSWER ASSISTANCE COMPUTING SYSTEM
20260093729 ยท 2026-04-02
Inventors
- Pedro TABACOF (San Francisco, CA, US)
- Pratik BOTHRA (San Francisco, CA, US)
- Edward HELSBY (San Francisco, CA, US)
- Fergal Charles REID (San Francisco, CA, US)
Cpc classification
International classification
Abstract
Technology is disclosed for programmatically generate answers for a user that are responsive to aspects of a conversation. In one implementation, a conversation record is processed to determine a message embedding of a most recent message received. The message embedding is used to determine a semantically similar question embedding of a conversational snippet from a knowledge base. An answer-generation input instruction for a language model is generated based on the most recent message, the conversational snippet, and an answer-format instruction. The language model is directed to produce an answer output, which is presented via a user interface. An answer-augmentation instruction for the language model is generated based on the answer output, similar messages sent by the user based on string similarity with the answer output, and an augmented-answer format instruction. The language model is directed to produce an augmented-answer output, which is presented via the user interface.
Claims
1. A computer-implemented method comprising: generating a message embedding corresponding to a representation of a most recent message received by a user from a conversation history record; for a plurality of question and answer (Q&A) pairs determined from previous conversation history records, determining a Q&A pair relevant to the representation of the most recent message based on a computed semantic similarity of the message embedding to a question embedding corresponding to a corresponding question of the Q&A pair; programmatically generate an answer-generation input instruction for a language model to cause the language model to produce an answer output, the answer-generation input instruction generated based at least on the representation of the most recent message, a corresponding answer of the Q&A pair, and an answer-format instruction; causing a representation of the answer output to be presented via a user interface (UI) of a computing device; and causing a representation of an augmented-answer output to be presented via the UI of the computing device by: for a plurality of messages previously sent by the user, determining a set of messages similar to the answer output, each message having a similarity to the answer output based on a computed string similarity of the message to the answer output; and programmatically generate an answer-augmentation instruction for the language model to cause the language model to produce the augmented-answer output, the answer-augmentation instruction generated based at least on the answer output, the set of messages, and an augmented-answer format instruction.
2. The computer-implemented method of claim 1, wherein the computed semantic similarity of the message embedding to the question embedding is above a threshold semantic similarity and highest ranking semantic similarity of the plurality of Q&A pairs.
3. The computer-implemented method of claim 1, wherein the most recent message corresponds to a set of messages received following the last message sent by the user.
4. The computer-implemented method of claim 1, wherein the answer-generation input instruction is further generated based on other portions of the conversation history record to provide at least one of context, style, or tone.
5. The computer-implemented method of claim 1, wherein the answer-generation input instruction is further generated based on commonly-used greetings extracted from other conversations history records of the user.
6. The computer-implemented method of claim 1, wherein the answer-format instruction instruct the language model to include, in the answer output, a citation to the Q&A pair.
7. The computer-implemented method of claim 1, wherein the answer-format instruction instruct the language model to include, in the answer output, a citation to a corresponding conversation from which the Q&A pair was extracted.
8. The computer-implemented method of claim 1, wherein determining the set of messages similar to the answer output further comprises: computing string similarity of each of the plurality of message to the answer output; ranking the plurality of messages based on the string similarity using a best matching 25 (BM25) algorithm; and selecting N number of highest ranking messages.
9. The computer-implemented method of claim 1 wherein the message embedding and the question embedding is determined using Sentence-Bidirectional Encoder Representations from Transformers (SBERT).
10. The computer-implemented method of claim 1, wherein the augmented-answer format instruction comprises instructions to only change the style of the answer output, not the content of the answer output.
11. A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: generating a message embedding corresponding to a representation of a most recent message received by a user from a conversation history record; for a plurality of passages within one or more documents in a knowledge base, determining a set of passages relevant to the representation of the most recent message, each passage having a relevance to the representation of the most recent message based on a computed semantic similarity of the message embedding to a passage embedding corresponding to the passage; programmatically generate an answer-generation input instruction for a language model to cause the language model to produce an answer output, the answer-generation input instruction generated based at least on the representation of the most recent message, the set of passages, and an answer-format instruction; causing a representation of the answer output to be presented via a user interface (UI) of a computing device; and causing a representation of an augmented-answer output to be presented via the UI of the computing device by: for a plurality of messages previously sent by the user, determining a set of messages similar to the answer output, each message having a similarity to the answer output based on a computed string similarity of the message to the answer output; and programmatically generate an answer-augmentation instruction for the language model to cause the language model to produce the augmented-answer output, the answer-augmentation instruction generated based at least on the answer output and the set of messages.
12. The media of claim 11, wherein the most recent message corresponds to a set of messages received following the last message sent by the user.
13. The media of claim 11, wherein the computed semantic similarity of the message embedding to each passage embedding of the set of passages is above a threshold semantic similarity and highest ranking semantic similarity of the plurality of passages.
14. The media of claim 11, wherein the answer-generation input instruction is further generated based on other portions of the conversation history record to provide at least one of context, style, or tone.
15. The media of claim 11, wherein the answer-generation input instruction is further generated based on commonly-used greetings extracted from other conversations history records of the user.
16. The media of claim 11, wherein the answer-format instruction instructs the language model to include, in the answer output, at least a first citation corresponding to at least a first portion of the answer output that is generated using a first passage from the set of passages, the first citation indicating the first passage and a first document that includes the first passage.
17. The media of claim 11, wherein the answer-format instruction instructs the language model to include, in the answer output, at least a first citation corresponding to at least a first portion of the answer output that is generated using a first passage from the set of passages, the first citation indicating the first passage and a first document that includes the first passage, the first citation includes a direct link to the location of the first passage within the first document comprising a hyperlink, anchor link, URL, or pointer.
18. The media of claim 11, wherein determining the set of messages similar to the answer output further comprises: computing string similarity of each of the plurality of message to the answer output; ranking the plurality of messages based on the string similarity using a best matching 25 (BM25) algorithm; and selecting N number of highest ranking messages.
19. The media of claim 11, wherein the message embedding and each embedding of the plurality of passages is determined using Sentence-Bidirectional Encoder Representations from Transformers (SBERT).
20. A computing system comprising: a processor; and a non-transitory computer-readable medium having stored thereon instructions that when executed by the processor, cause the processor to perform operations including: accessing a conversation history record; generating a message embedding corresponding to a representation of a most recent message received by a user from the conversation history record; for a plurality of question and answer (Q&A) pairs determined from previous conversation history records, determining a Q&A pair relevant to the representation of the most recent message based on a computed semantic similarity of the message embedding to a question embedding corresponding to a corresponding question of the Q&A pair; programmatically generate an answer-generation input instruction for a language model to cause the language model to produce an answer output, the answer-generation input instruction generated based at least on the representation of the most recent message, a corresponding answer of the Q&A pair, and an answer-format instruction; causing a representation of the answer output to be presented via a user interface (UI) of a computing device; and causing a representation of an augmented-answer output to be presented via the UI of the computing device by: for a plurality of messages previously sent by the user, determining a set of messages similar to the answer output, each message having a similarity to the answer output based on a computed string similarity of the message to the answer output; and programmatically generate an answer-augmentation instruction for the language model to cause the language model to produce the augmented-answer output, the answer-augmentation instruction generated based at least on the answer output and the set of messages.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
DETAILED DESCRIPTION
[0062] The present disclosure relates to an answer assistance computing system that is integrated with customer support software to enhance the quality and efficiency of customer service interactions. In particular, this disclosure provides technologies to programmatically generate answers for a user, such as a CSA, that are responsive to aspects of a conversation, such as a conversation between the CSA and a customer, and to provide the generated answer to the CSA. As further described herein, in various embodiments, the answer is generated based on previously extracted conversational snippets from past conversations between CSAs and customers. In some implementations, the answer is generated by a language model based on a conversational snippet that is above a threshold semantic similarity to the most recent message from the customer. In some implementations, the generated answer, along with a threshold number of past messages sent by the CSA with the highest string similarity to the generated answer, is provided to the language model to rewrite the answer in the same style and/or tone as the CSA. In this way, these embodiments of the answer assistance computing system provide an answer that is responsive to the conversation based on previous conversation with functionality enabling the user to provide the answer with their respective style and/or tone, thereby enhancing the trustworthiness of the information and customer experience provided to a customer by the user.
[0063] According to one embodiment, a conversation record or conversation history is accessed. The conversation history comprises a data file that is a text record of a conversation, which may be occurring in near-real time. For example, as a CSA is communicating with a customer, the chat log or transcript of the discussion is created and comprises a conversation history record. The conversation history may be determined from a chat log or chat history of a chat session or by using automatic speech recognition, such as a speech-to-text software utility on audio information of the communication, such as from a customer who is speaking with a CSA over a phone call.
[0064] From this conversation history record, the last message from a user is extracted. For example, as a CSA is communicating with a customer, the customer's most recent message, or set of messages following the last message from the CSA, is extracted from the conversation history record. A representation of the most recent message is used to generate an embedding, referred to as a message embedding. The embedding captures the semantic essence of the representation of the most recent message in a vector space that enables a computation of similarity of the message embedding with other text embeddings. In this way, other texts, including conversational snippets corresponding to Q&A pairs and/or passages within documents of a knowledge base, can be identified that are relevant to the representation of the most recent message based on a similarity comparison of corresponding embeddings. Some implementations use Sentence Bidirectional Encoder Representations from Transformers (SBERT) to generate the embedding.
[0065] A knowledge base is accessed to identify information relevant to the representation of the most recent message for use to generate an answer responsive to the most recent message. In some embodiments, only conversational snippets corresponding to question and answer (Q&A) pairs extracted from previous conversations are accessed from the knowledge base to identify information relevant to the representation of the most recent message for use to generate the answer. For example, conversational snippets corresponding to Q&A pairs can be extracted from previous conversations using a language model. Examples describing how conversational snippets corresponding to Q&A pairs are extracted from previous conversations are described in U.S. patent application Ser. No. 18/539,158, filed Dec. 13, 2023, which is incorporated herein by reference. For example, conversational snippets corresponding to Q&A pairs can be extracted from past conversations between a customer and a CSA. The conversational snippets can be added to a knowledge base, so that the Q&A pairs can be utilized to provide automated customer support. In certain embodiments, in order to identify the most relevant Q&A snippets from conversations between a customer and a CSA, a first call is made to a language model to extract all of the Q&A pairs from a conversation and annotate the Q&A pairs. A second call is made to the language model to remove irrelevant Q&A pairs of the conversation and generate a single, summarized Q&A pair from the remaining Q&A pairs of the conversation. The single, summarized Q&A pair of the conversation is presented to an end user, such as an administrator or the CSA, for approval before storage in the knowledge base. The summarized Q&A pair can be used as context to a language model in order to generate answers to semantically similar questions in the future.
[0066] For each conversational snippet, an embedding (referred to as a question embedding) is computed for the question of the corresponding Q&A pair, thereby allowing for the computation of semantic similarity between embeddings. In this way, question embeddings may be compared to message embeddings to determine a conversational snippet that is semantically relevant to the representation of the most recent message. Accordingly, a query is performed on the conversational snippets corresponding to Q&A pairs in a knowledge base to determine a question of a Q&A pair that is relevant to a conversation history record by determining a question of a Q&A pair in the knowledge base that has a corresponding question embedding that is similar to the message embedding corresponding to the most recent message. In particular, a conversational snippet that is relevant to the representation of the most recent message may be determined by computing a semantic similarity of the message embedding to question embeddings corresponding to each of the conversational snippets in the knowledge base. The conversational snippet with the highest similarity score and above a threshold of similarity can be determined to be the most relevant to the message embedding. In some implementations, a set of conversational snippets are determined to be relevant to the representation of the most recent message. For example, all of the conversational snippets are ranked for similarity (e.g., based on the corresponding question embedding) and the top certain number of conversational snippets corresponding to the most relevant conversational snippets are included in the set of conversational snippets relevant to the representation of the most recent message. In some embodiments, if there are no conversational snippets with question embeddings above a threshold similarity to the message embedding, no conversational snippets are included as relevant to the representation of the most recent message.
[0067] In some embodiments, other portions of the knowledge base are accessed to identify information relevant to the representation of the most recent message for use to generate the answer. In various implementations, the knowledge base comprises a repository of documents each containing one or more passages that may be relevant to queries of the representation of the most recent message. For example, documents in the knowledge base can include, without limitation, help center information, such as technical documentation, user manuals, FAQs, policy documents, product guides; internal documentation; conversation histories or portions thereof, which may include portions of past conversations or summaries of conversations between a user, such as a CSA, and a customer regarding an issue and its resolution; other information specifically curated for the knowledge base, and other information sources potentially relevant for addressing customer queries. In some instances, a document may comprise a plurality of related files or electronic documents, as well as multimedia content. For each document in the knowledge base, an embedding (referred to as a passage embedding) is computed for the passages of the document thereby allowing for the computation of semantic similarity between embeddings. In this way, passage embeddings may be compared to message embeddings to determine passages that are semantically relevant to a representation of the most recent message. Accordingly, a query is performed on the knowledge base to determine a set of passages that are relevant to a conversation history record by determining a set of passages in the knowledge base that have corresponding passage embedding that are similar to the message embedding corresponding to the most recent message. In particular, the set of the passages that are relevant to the representation of the most recent message may be determined by computing a semantic similarity of the message embedding to an embedding corresponding to each of the passages of the documents in the knowledge base. Those passages that are sufficiently relevant, such as satisfying a threshold of similarity, are included in the set of passages. In some implementations, all of the passages are ranked for similarity and only the top certain number of passages, corresponding to the most relevant passages, are included in the set of passages relevant to the representation of the most recent message.
[0068] Continuing with this example embodiment, the most relevant conversational snippet and/or passages are then used to generate a prompt for a language model, such as a large language model (LLM) for instance, GPT 3.5 Turbo, instructing it to produce an answer output that is responsive to the representation of the most recent message. In some implementations, only the answer from the Q&A pair of the most relevant conversational snippet is used to generate a prompt for a language model, instructing it to produce an answer output that is responsive to the representation of the most recent message.
[0069] Next, an answer-generation input instruction is programmatically generated and provided to a language model, such as an LLM, to cause the language model to produce an answer output. In various embodiments, the answer-generation input instruction is generated using one or more of: (a) the most relevant conversational snippet and/or passages, (b) the representation of the most recent message, (c) other portions of the conversation history record, such as the entire conversation history record or only the messages sent by the CSA to provide context, style, and/or tone, (d) the name of the CSA, (e) the name of the customer, (f) commonly-used greetings extracted from other conversations with the CSA or other CSAs using any known natural language processing (NLP) technique to extract the commonly-used greetings, (g) an answer-format instruction, and/or other relevant information. For example, the answer-generation input instruction instructs the language model to use the answer from the Q&A pair of the most relevant conversational snippet to generate an answer output that is responsive to the representation of the most recent message (such as a query in the most recent message) and based on the answer-format instruction. In some embodiments, answer-generation input instruction logic is used to generate the answer-generation input instruction. The answer-generation input instruction logic can include computer instructions, programming routines, rules, or templates used for generating the answer-generation input instruction.
[0070] The answer-format instruction is programmatically determined to direct the language model to format aspects of the answer output according to the format instruction, and may include instructions to integrate citations within the answer output. In particular, some implementations of the answer-format instruction instruct the language model to include, in the answer output, a corresponding citation to the corresponding conversational snippet, such as the Q&A pair and/or the past conversation from which the Q&A pair was extracted from. In some implementations, the answer-format instruction provides that each citation include a direct link to the location of the Q&A pair and/or the past conversation from which the Q&A pair was extracted from. Some implementations of the answer-format instruction instruct the language model to include, in the answer output, a corresponding citation for each portion of the answer output, such as a sentence or a paragraph in the answer output, that is generated using a particular passage with an indication indicating the document that includes the indicated passage. In some examples, the citations may occur within the answer output following each portion of the answer output corresponding to the citation, or the citations may generated as footnotes or endnotes of the answer output. In some implementations, the answer-format instruction provides that each citation include a direct link to the location of its indicated passage in a document. For example, the direct link may comprise an anchor link, hyperlink, a URL, pointer, or similar link.
[0071] Some implementations of the answer-format instruction instruct the language model to include, in a citation, source information regarding the citation. Source information includes information about the document, for example and without limitation, information regarding the type of document (e.g., a conversation record or snippet of a conversation, help center documentation, internal documentation, log, etc.); a date of the conversation; a creation date of the conversational snippet and/or document; a last modification date of the conversational snippet and/or document indicating how recently the conversational snippet and/or document was updated; whether the conversational snippet and/or document is internal to the user (for example, the CSA) or accessible to a customer, or publicly accessible; an indication of the number of times the conversational snippet and/or document has been previously cited in past answer outputs, which may be used to determine that a particular conversational snippet and/or document is used often for generating answers; or a CSA user feedback rating based on prior occurrences of the conversational snippet and/or document's passages in past answer outputs. In this way, the citations serve to indicate the information about source of the information used to generate the answer output, thereby enhancing the transparency and trustworthiness of the generated answer.
[0072] Continuing with this example embodiment, the generated answer-generation input instruction is provided as an input prompt to the language model. In response, the language model provides an output comprising an answer output. The answer output is received from the language model and processed to determine a representation of the answer (referred to as an answer representation) that can be provided, via a user interface (UI), to the user. For example, the answer output may be presented via a graphical user interface to a CSA.
[0073] In some implementations, the answer output further includes visualization instructions for presenting the answer representation via a UI. Further, some implementations of the UI comprise a first UI element presenting aspects of the conversation history, which may comprise the transcript of an ongoing conversation, and a second UI element for presenting the answer representation. In some instances, the second UI element is positioned proximate the first UI element so that a user (for example a CSA) can view and interface with the conversation and also view the answer representation including citations, thereby enabling the user to access the passages that are indicated by the citations.
[0074] In some implementations, the UI further provides UI elements to the CSA to take actions with respect to the answer output. For example, a UI element can provide the CSA the option to send the answer output to the customer in response to the most recent message. As another example, a UI element can provide the CSA the option to manually edit the answer output before sending the manually-edited answer output to the customer in response to the most recent message. As another example, a UI element can provide the CSA the option to reject the answer output. In some implementations, the rejection of the answer can be used to re-generate the response by the language model. As another example, a UI element can provide the CSA the option to rate the answer output. In some implementations, the rating can be used to re-generate the response by the language model.
[0075] In some implementations, the UI further provides UI elements to the CSA to augment the answer output using a language model. For example, upon selection of a UI element to augment the answer output, an answer-augmentation instruction is programmatically generated and provided to a language model, to cause the language model to produce an augmented-answer output. In various embodiments, the answer-augmentation instruction is generated using the answer output, an augmented-answer format instruction, and one or more of (a) a user-style answer-augmentation instruction to augment the answer output to be similar to the tone or style of the user, such as the CSA that is communicating with the customer; (b) a friendly-style answer-augmentation instruction to augment the answer output to sound more helpful, cordial, and/or friendly; (c) a formal-style answer-augmentation instruction to augment the answer output to sound more formal; (d) a rephrase answer-augmentation instruction to augment the answer output to rephrase the answer output; (e) an expand answer-augmentation instruction to augment the answer output to expand the answer output into a longer response; (f) a translate answer-augmentation instruction to augment the answer output to translate the answer output into a different language; (g) a grammar and/or spelling answer-augmentation instruction to augment the answer output to fix the grammar and/or spelling of the answer output; and other similar answer-augmentation instructions. In some implementations, each of the answer-augmentation instructions can include its own corresponding UI element so that the CSA can select how to augment the answer output using the language model via a UI element. In some implementations, the answer-augmentation instruction can be used to augment a manually-written, such as a manually-drafted response by a CSA to a customer, and/or a manually-edited answer output from a language model.
[0076] In some embodiments, the user-style answer-augmentation instruction to augment the answer output to be similar to the tone or style of the user is generated using similar messages previously sent by the user, such as the CSA communicating with the customer. For example, previous conversations between the CSA and customers can be searched to determine messages previously sent by the CSA that are similar to the answer output. Accordingly, a keyword search is performed on messages previously sent by the user to determine a set of messages that are similar to the answer output by computing a string similarity of the answer output to the messages previously sent by the user. The messages previously sent by the user that are sufficiently similar, such as satisfying a threshold of similarity, are included in the set of messages. In some implementations, all of the messages previously sent by the user are ranked for string similarity and only the top certain number of messages previously sent by the user, corresponding to the most relevant messages previously sent by the user, are included in the set of messages similar to the answer output. In some embodiments, a similarity ranking function, such as the best matching 25 (BM25) algorithm using ElasticSearch, can be used to determine the top certain number of messages previously sent by the user.
[0077] Continuing with this example embodiment, the answer augmentation instruction is provided as an input prompt to the language model. In response, the language model provides an output comprising an augmented-answer output. The augmented-answer output is received from the language model and processed to determine a representation of the augmented-answer (referred to as an augmented-answer representation) that can be provided, via the UI, to the user. For example, when a CSA selects the UI corresponding to the user-style answer-augmentation instruction, the user-style answer-augmentations instruction, including (a) answer output, (b) the top five (5) similar messages previously sent by the user to the answer output, (c) instructions to transform the answer into an augmented-answer that matches the style of the top five (5) similar messages previously sent by the user, (d) instructions to the language model that the augmentation should not change the content and the semantics of the reply, only the superficial style, (e) instructions to the language model not to remove any images, and/or any other relevant instructions, are provided as an input prompt to the language model. In response, the language model provides an output comprising an augmented-answer output in the similar style of the CSA. The augmented-answer output can then be presented via a graphical user interface to the CSA.
[0078] Some embodiments of the answer assistance computing system technology disclosed herein are implemented on a chatbot platform. Chatbots are a useful tool to help customer support teams. For example, chatbots can include rules to route users to correct units of customer support/success organizations. As another example, chatbots can directly handle simple user queries based on explicit rules where a specific intent (e.g., a distinct user goal or request in the query) can be identified in the query.
Overview of Technical Problems, Technical Solutions, and Technological Improvements
[0079] Generally, and at a high level, embodiments described herein facilitate programmatically implementing a specialized answer assistance computing system that uses language models to (1) generate answers from semantically similar conversational snippets corresponding to Q&A pairs extracted from previous conversations and/or passages of documents of knowledge bases and (2) generate augmented-answers, such as by augmenting the answer to be similar to the style of the user. In this regard, embodiments described herein facilitate using a representation of the most recent message as a query to perform a semantic search in a knowledge base to identify conversational snippets and/or passages within documents of the knowledge base that are semantically similar to the query. The most relevant conversational snippets and/or passages are then used to generate a prompt instructing a language model to generate an answer responsive to the most recent message from the customer. Embodiments described herein facilitate instructing a language model to augment the generated-answer to be similar to the style of the user, such as the CSA communicating with the customer, using similar messages previously sent by the user. In order to determine messages previously sent by the user that are similar to the answer output, a keyword search is performed on messages previously sent by the user to determine a top certain number of messages that are similar to the answer output based on the string similarity of the answer output to the messages previously sent by the user.
[0080] Advantageously, efficiencies of computing and network resource utilization can be enhanced using implementations described herein. In particular, embodiments of an answer assistance computing system that utilize a language model to generate an answer that is responsive to the conversation with citations and functionality enabling a user to access the conversation(s) and/or passage(s) supporting the answer, provides for a more efficient use of computing and network resources than conventional methods of manually accessing knowledge base information, searching for relevant information in the knowledge base, which may require iterative searching, and manually adapting, from the search results, an answer to be suitable for a context of the conversation. The technology described herein decreases the number of computer input/output operations related to manually intensive operations, thereby decreasing computation costs and decreasing network resource utilization (e.g., higher throughput, lower latency, and decreasing packet generation costs due to fewer packets being sent) when the information is located over a computer network.
[0081] Further, embodiments of an answer assistance computing system that utilize a language model to generate an augmented answer, such as augmenting the answer to be similar to the style of the user sending the message, provides for a more efficient use of computing and network resources than conventional methods of manually reviewing and manually re-writing the answer to augment the answer. The technology described herein decreases the number of computer input/output operations related to manually intensive operations, thereby decreasing computation costs and decreasing network resource utilization (e.g., higher throughput, lower latency, and decreasing packet generation costs due to fewer packets being sent) when the information is located over a computer network. Further, embodiments of an answer assistance computing system that utilize a language model to generate the augmented answer to be similar to the style of the user sending the message by using keyword search and/or string similarity to find similar messages, provides for a more efficient use of computing and network resources as keyword searching and/or string similarity provides better results for matching the tone while decreasing computation costs and decreasing network resource utilization that would be required to use more complex similarity search algorithms, such as semantic similarity.
[0082] Even further, embodiments of the technologies disclosed herein improve upon existing customer support software by addressing the dynamic and unpredictable nature of customer interactions. In particular, some of these embodiments enable CSAs to adapt to complex or novel issues that arise during conversations by providing them with contextually appropriate answers generated from the knowledge base, and direct-to-passage citations to confirm the generated answer, understand the context of the information source(s) used to generate the answer, or drill down for additional, relevant information. Accordingly, embodiments of the technology not only streamline the information retrieval process, by reducing inefficiency, but also enhance the transparency and reliability of the support provided.
Additional Description of the Embodiments
[0083] Turning to
[0084] In various implementations, the components of environment 100 include computer storage media that stores information including data, data structures, computer instructions (e.g., software program instructions, routines, or services), and/or models (e.g., machine learning models) used in some embodiments of the technologies described herein. For example, in some implementations, customer device 102, customer support device 112, language model 110, server 150, and/or storage 130 may comprise one or more data stores (or computer data memory). Further, although customer device 102, customer support device 112, server 150, language model 110, and storage 130 are each depicted as a single component in
[0085] The components of environment 100 communicate with each other via a network 104. In some embodiments, network 104 includes one or more local area networks (LANs), wide area networks (WANs), and/or other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet.
[0086] In the example illustrated in
[0087] In various embodiments, the functionality described herein is allocated across any number of devices. In some embodiments, application(s) 106 and/or 116 are hosted at least partially server-side, such that chat interface 108, communication tool 120, chatbot tool 170, chatbot component 151, customer support component 165, and/or any of the elements illustrated in
[0088] An example workflow of the configuration illustrated in
[0089] Customer support device 112 is a desktop, laptop, or mobile device such as a tablet or smart phone, and application 116 provides one or more user interfaces. In some embodiments, an end user, such as a CSA of the business, chats, or accesses a chat (e.g., a conversation with the customer), with a customer through chat interface 124 of communication tool 120. Additionally or alternatively, a chatbot via chatbot component 151 chats, or accesses a chat (e.g., a conversation) with a customer through chat interface 108 of application 106 and an end user, such as a CSA of the business, chats, or accesses a chat between the chatbot and the customer through chat interface 124 of communication tool 120.
[0090] In some embodiments, chatbot component 151 facilitates programmatically implements a specialized chatbot platform that uses language models to determine answers from semantically similar documents of knowledge bases. For example, chatbot component 151 facilitates using a language model 110 to determine aspects of a conversation with a user in order to compute the semantic similarity of the conversation to answers provided in documents of a knowledge base 132 (e.g., extracted snippet files 133, manually curated files 134, such as public content 135 and/or private content 136, URLs 137, historical conversation files 138, and/or the like) and provide responses to the customer through chat interface 108.
[0091] In some embodiments, chatbot component 151 facilitates providing multimodal responses to the customer through chat interface 108. For example, chatbot component 151 facilitates using language model 110 as a multimodal language model to determine semantically similar images, audio, and/or video provided in documents of a knowledge base 132 (e.g., images, audio, and/or video stored in extracted snippet files 133, manually curated files 134, such as public content 135 and/or private content 136, URLs 137, historical conversation files 138, and/or the like). In another example, chatbot component 151 facilitates providing responses that include images, audio, and/or video that are included in the semantically similar answers of documents of a knowledge base 132. In another example, chatbot component 151 can provide responses that include images, audio, and/or video, generated by language model 110 where the language model is a multimodal generative language model.
[0092] In some embodiments, chatbot component 151 facilitates handling multimedia input (e.g., images, videos, gifs, voice notes, etc.), both in the ingested content and/or as the end-user input. An example of chatbot component 151 facilitating the handling of multimedia input in the end-user input is shown in diagram 600F of
[0093] An example of chatbot component 151 facilitating the handling of multimedia input in the ingested content is shown in diagram 600G of
[0094] Another example of chatbot component facilitating the handling of multimedia input in the end-user input is shown in diagram 600H of
[0095] In some embodiments, chatbot component 151 facilitates using a language model 110 to determine aspects of a conversation with a user in order to interact with external systems (e.g., external sources), such as an external application through third-party application configuration files 144, in order to provide a response or take an action with respect to the user. For example, chatbot component 151 may retrieve information from an ecommerce store, such as a price of an item or an answer to the user that allows the user to purchase an item, and provide the relevant response to the user. As another example, chatbot component 151 may retrieve information from an ecommerce system to determine whether historical customer data of the user indicates whether the customer qualifies for a particular offer or discount.
[0096] In some embodiments, chatbot component 151 facilitates augmenting the user context by utilizing data from external systems (e.g. reading order information from Shopify). An example of chatbot component 151 facilitating the augmenting of the user context by reading the data from external sources is shown in diagram 600I of
TABLE-US-00001 search_for_answer { conversation: { ... parts: [ ... { author: { id: 5103890, type: bot }, text: Alright, how can I help you? }, { author: { id: 641c08a73463b681f342960d, type: user }, text: I dont remember if I have a PRODUCT NAME 1 or a PRODUCT NAME 2. Could you let me know which one do I have? } ], ... available_actions: [ { id: 44539, name: Get Shopify Orders, description: Use this action to lookup recent Shopify orders for this user, parameters: [ ] } ], ... }, ... }
[0097] In the specific example of the get_reply call above, there is one action defined. However, a get_reply call can be generated with any number of actions available. In some embodiments, if action has parameters, the action can be described with the following fields: name: text; description: text; type: enum (data type); required: boolean (true/false); and default_value: any value.
[0098] Continuing with diagram 600I, in some embodiments, the action selection process can include chatbot component 151 facilitating choosing which actions to call in order to augment the user's context based on the user's issue summary, current conversation, and available actions. In this regard, the action selection process can choose to call 0, 1, or more actions of the given conversation state. The output of the action selection process can be a list of actions to call, and arguments for each action call (e.g., call action Get Shopify Order with order_id: <some_order_id_from_conversation>). In some embodiments, the action calling process can include chatbot component 151 facilitating calling back into server 150, asking for the response of called actions (e.g., after action selection is performed). Server 150 can call external system (e.g., via third-party application configuration files 144) and proxies the response. In some embodiments the response is redacted as users (e.g., teammates) can define which fields are returned back in the response. In some embodiments, the context augmentation process can include chatbot component 151 facilitating augmenting the context in the answer finding stage (e.g., as described with respect to answer search state component 154 of
[0099] Returning to
[0100] At a high level, chatbot (e.g., implemented by chatbot component 151) can include a number of agents (e.g., a program that, for each user turn, based on the conversation transcript and/or other factors, computes the next appropriate action to take in the conversation, and performs the action), such as a start agent (e.g., start state chatbot agent component 153) and a feedback agent (e.g., feedback state chatbot agent component 155). In this regard, computing the next action (e.g., a sub procedure of an agent that causes the effect on the conversation and can be implemented in a general programming language, such as Python, and can perform any operation a given language can perform, such as another call to a large language model (LLM), such as GPT3, GPT4, Anthropic's Claude, and/or the like) can be implemented as an LLM call with lightweight post-processing. In some examples, based on the chosen action and a history of previous actions, the computed action is overridden and another action is used. An example of an agent flow can include: (1) user (e.g., customer) sends a message; (2) API endpoint invokes the active agent (e.g., start state chatbot agent component 153 or feedback state chatbot agent component 155) of the conversation; (3) agent computes the best next action (e.g., from predefined set of actions); (4) agent executes the action (e.g., which can include as part of action execution, an agent yielding the control to another agent).
[0101] Continuing with the high level overview, an example diagram 200 of a model implementing a specialized chatbot platform that uses language models to determine answers from knowledge bases is shown in
[0102] Accordingly and as shown in the example embodiment of
[0103] Returning to
[0104] In some embodiments, start state chatbot agent component 153 is programmatically designed to elicit a clear question from the user (e.g., a customer via chat interface 108) and find an answer to the given question. An example diagram 300 of a model implementing a start state agent of a chatbot that uses a language model to determine answers from semantically similar documents of a knowledge base is shown in
[0105] Returning to
[0106] In some embodiments, start state chatbot agent component 153 includes a goodbye action (e.g., goodbye action 316 of
[0107] In some embodiments, start state chatbot agent component 153 includes a route to teammate/escalate action (e.g., escalate action 318 of
[0108] In some embodiments, start state chatbot agent component 153 includes a search for answer action (e.g., search for answers action 308 of
[0109] In some embodiments, start state chatbot agent component 153 includes a clarify action (e.g., clarify action 310 of
[0110] In some embodiments, start state chatbot agent component 153 includes a default action (e.g., default action 314 of
[0111] A specific example of a prompt provided by start agent (e.g., start state chatbot agent component 153) to a language model 110 (e.g., with a conversation with the customer) is as follows: [0112] [0113] Human: You are a friendly and polite customer support bot working for ${customer_name}. [0114] You're identifying the customer issue and searching for an answer. [0115] You have access to the following tools: [0116] route_to_customer_support_agent: Useful ONLY when the customer explicitly asks to talk to a customer support agent. Input to this is a polite reply to customer saying that you're routing them to a customer support agent. [0117] greet: Useful when you need to greet the customer. Input to this is a greeting to give to the customer. [0118] say_goodbye: Useful when customer explicitly says goodbye. Input to this is a polite reply saying something like goodbye to the customer. [0119] clarify: Useful in two cases: [0120] 1. When what customer is saying is gibberish. [0121] 2. When customer is saying they have a question/problem/query/issue/etc, but they haven't told you what it is. [0122] Input to this is a question that you ask the customer. [0123] default: Useful if no other action is appropriate. [0124] search_for_answer: Useful when: customer is asking a question or giving some information about their issue. Use this tool if it's not clear what other tool to use. If customer issue is clear, use this tool to summarize the issue and search for the answer. Input to this tool is what the customer said. Do not attempt to answer the question. [0125] Use the following format: [0126] Customer: the input from the human customer. [0127] Thought: you should always think about what to do. [0128] Action: the action to take, should be one of [route_to_customer_support_agent, greet, say_goodbye, clarify, default, search_for_answer]. [0129] Action Input: the input to the action. The input for this action must be in German. [0130] Observation: the result of the action. [0131] . . . (this Thought/Action/Action Input/Observation can repeat N times) [0132] Begin! [0133] If customer expressed the issue or what they want you to do, use search_for_answer to search for an answer. [0134] If no mentioned tools are appropriate, use default tool as a fallback. [0135] Conversation with customer: [0136] Customer: Hello there [0137] AI: Hi, I'm a customer bot. You can ask me anything about {customer_name} [0138] Customer: I'd like to know when, I'm eligible to get a bonus. [0139] Thought: [0140]
[0141] A specific example of response to the specific example of the prompt provided by start agent (e.g., start state chatbot agent component 153) by a language model 110 is as follows: [0142] search_for_answer [0143] Action Input: When I'm eligible to get a bonus?
[0144] In some embodiments, after output from language model 110 is received, the output of the language model can be parsed with regular expressions by chatbot component 151. For example, in the specific example of response to the specific example of the prompt, the returned action (e.g., the search for answer action) is invoked with returned action input (e.g., When am I eligible to get a bonus?). Following the returned action and action input, in some embodiments, a programming function (e.g., a python function) representing the action (e.g., the search for answer action) can be called with a textual argument corresponding to the action input (e.g., When am I eligible to get a bonus?). In some embodiments, if the output of the language model 110 does not conform to the specifications and/or the returned action name of the language model 110 is not a valid action name, a recovery procedure is started.
[0145] In some embodiments, answer search state component 154 facilitates programmatically implementing the search for answer action of start state chatbot agent component 153. Generally, answer search state component 154 facilitates the delivery of truthful and grounded resolution to the user's issue in textual form. As a high level overview, in some embodiments, answer search state component 154 initially determines whether a custom answer (e.g., a manually-defined answer) exists based on aspects of the conversation that triggered the search for answer action (e.g., via custom answers component 160). If no custom answer is determined, an issue summary is generated based on aspects of the conversation with the user. A language model determines whether the answer can be found in portions of documents of the knowledge base that are most similar to the issue summary. For example, the knowledge base can include articles from various sources, a help center, FAQs, public URLs, historical conversation data (e.g., previous conversations between customers and CSAs), snippets of conversations, and multi-source answers may be provided, such as by synthesizing information from various sources of knowledge base. For example, knowledge base 132 of
[0146] In some embodiments, certain information sources of a knowledge base have corresponding access permissions. Accordingly, certain information in a knowledge base is made available or not available for generating an answer, according to the implementation. For example, embodiments of the answer assistance computing system that are implemented to be internally facing may be enabled to access all information source in a knowledge base, while embodiments of the answer assistance computing system that are implemented to be externally facing to the public or externally facing to customers, may be enabled to access only information sources of the knowledge base that are permitted for access by the public or customers, respectively. In this way, information in a knowledge base that is private or sensitive to a company is not used to determine passages for generating an answer, when the recipient of the answer is not authorized to access the underlying information used to generate the answer.
[0147] Returning to the high level overview of answer search state component 154, in some embodiments, if the answer can be found, the answer is provided to the user via chat interface 108. If no answer can be found, a set of summaries corresponding to a set of documents that are most similar to the issue summary are provided to the language model to determine whether any of the documents may contain the answer. If the language model determines that documents within the set of documents may contain the answer, a threshold number of document most likely to contain the answer, along with summaries of the documents, are provided to the user via chat interface 108. If the language model determines that documents within the set of documents will not contain the answer, the language model indicates that the answer could not be found. The start state chatbot agent component 153 can then determine the next action, such as a clarify action or escalation action.
[0148] In some embodiments, different language models can be utilized for different actions or portions of actions. For example, one language model (e.g., GPT3) can be utilized to summarize the issue by answer search state component 154 whereas a different language model (e.g., GPT4) can be utilized to evaluate whether an article in the knowledge contains the answer by answer search state component 154. In some embodiments, one language model (e.g., GPT3) is utilized to screen articles as an initial check whether a top article contains an answer before asking a different language model (e.g., GPT4) to determine whether the top article contains an answer in order to reduce calls to the different language model (e.g., GPT4). In some embodiments, frequently asked questions (FAQ) are pre-computed using a language model to reduce live calls to the language model. In some embodiments, answer search state component 154 can utilize information from beyond the knowledge base. In some embodiments, answer search state component 154 and other components are multilingual.
[0149] In some embodiments, answer search state component 154 can request an issue summary based on the conversation with user (e.g., the customer) from language model 110. In some embodiments, the summary of the user's issue as action input to the search for answer action provided by language model 110 (e.g., the issue summary utilized to determine the appropriate action) is ignored and a second call is provided to the language model to summarize the issue again in order to increase the quality and/or reliability of the summary of the user's issues in certain scenarios. In some embodiments, aspects of the conversation with the user, such as the issue summary generated by language model 110 and/or historical customer data of the user, can be utilized to find the relevant context for searching the knowledge base 132. In some embodiments, the issue summary as generated by language model 110 can be utilized to engineer around an LLM's limitation on input length (e.g., the context window), which can cause the computation of the output to not be based on all parts of the input over a certain length and/or cause certain portions of input over a certain length to get a different level of attention while computing the output. In some embodiments, the issue summary can be used to stuff the prompt when the answer is being extracted by answer search state component 154 by utilizing the retrieved context and the issue summary to answer the user question at hand.
[0150] In some embodiments, answer search state component 154 facilitates programmatically finding an answer from a knowledge base. An example diagram 400 of a model implementing a specialized chatbot platform that uses language models to determine answers from knowledge bases is shown in
[0151] At block 406, it is determined whether aspects of the input correspond to a custom answer (e.g., via custom answers component 160 of
[0152] In some embodiments, at block 410, a call is made to the language model 110 to summarize the issue based on aspects of input 404. A sentence embedding of the issue summary generated from block 410, or a portion thereof, is computed (e.g., utilizing SBERT). In some embodiments, historical customer data for the user chatting with the chatbot (e.g., through chat interface 108) can be utilized to provide contextual information for the issue summary. For example, the contextual information related to the historical customer data can be encoded into the embedding of the issue summary.
[0153] A specific example of a prompt (e.g., as provided by answer search state component 154 via start state chatbot agent component 153) to a language model (e.g., language model 110) to request an issue summary (e.g., a summary of an issue within the conversation) is as follows: [0154] [0155] You are a customer support AI for ACME CORP, having a conversation with the ACME CORP's customer. [0156] AI: Hi I'm Fin, an AI-powered support bot that automatically answers questions. Type your question in the chat below and I'll do what I can to help. Don't worry, if I'm not able to help you, I'll pass you along to a Live Support agent. [0157] Customer: How did this Event not happen??? [0158] - - - [0159] This is a chat history of the whole conversation, and Customer could have mentioned an issue/question/problem or issued a request. [0160] There are two possible scenarios: [0161] 1. Customer only mentioned a single issue/problem/question/request. [0162] 2. Customer raised multiple completely unrelated issues/problems/questions/requests. [0163] If it's #1, rewrite the customer issue in one sentence. [0164] If it's #2, concentrate only on the last issue/problem/question/request and rewrite it in one sentence. [0165] Look at the whole conversation and make sure you're picking up the last issue/problem/question/request fully! [0166] If it's clean, leave it as is. [0167] Customer: [0168]
[0169] A specific example of response from the language model to the specific example of the prompt requesting an issue summary is as follows: [0170] [0171] How did this Event not happen??? [0172]
[0173] At block 412, a threshold number of portions from documents of a knowledge base (e.g., knowledge base 132 as accessed by knowledge base accessing component 157 of
[0174] Returning to
[0175] At block 416, an LLM (e.g., language model 110 of
[0176] A specific example of a prompt (e.g. as provided by answer search state component 154 via start state chatbot agent component 153 of tags. [0187] Provide the answer between
[0195] A specific example of a response from the language model to the specific example of the prompt to find an answer given an issue summary and a threshold number of portions of documents of a knowledge base similar to the issue summary is as follows: [0196] [0197] If you've tried our paid plans and it wasn't for you, you can downgrade to the free plan at any time via your Admin page. Once you downgrade, you'll lose access to all your upgraded features at the end of your billing cycle.
1. In your ACME CORP Admin, select your profile image (or three horizontal lines if you're on a phone) to find Billing. 2. Hit the Change Plan button found in the Your Subscription section. 3. In the pop-up, select the plan youd like to change to. If you want to cancel your subscription, scroll down below the plan options where you'll see Downgrade to Free.
[0199]
[0208] In some embodiments, if there is an answer tag (e.g., <answer>) present in the response provided by the language model, and is different from UNKNOWN, the answer is determined to be found and the answer is delivered to the user. In some embodiments, the answer tag comprises a textual answer corresponding to whatever the language model returned inside answer tag and/or a link to the documents used to form an answer. An example chat interface between a chatbot and a user showing a response from the chatbot with a link to the corresponding source of the response is shown in
[0209] In some embodiments, if the language model did not find the answer to the issue summary at block 416, at block 422, a threshold number of most similar documents, or portions thereof, are identified. In this regard, if the previous stage didn't find a direct, inline answer, a prompt is provided to the language model to find content that is not directly answering the question, but might be relevant. In some embodiments, semantic search is used to determine the portions of documents of the knowledge base that are most similar to the issue summary based on the semantic similarity of the issue summary, or a portion thereof, to portions of documents of the knowledge base. For example, the semantic similarity of an embedding corresponding to the issue summary and an embedding corresponding to each portion of documents of the knowledge base customer can be computed (e.g., based on a vector search query, a dot product of the embeddings, using MS MARCO, and/or the like) to determine the portions of documents of the knowledge base most similar to the issue summary. In some embodiments, utilizing a Q&A asymmetrical search model (e.g., MS MARCO), the embedding of the issue summary can be utilized as a question and the answer can be found based on embeddings of corresponding portions of the knowledge base. A threshold number of passages (e.g., the 20 most similar passages) can be identified based on the dot product of the embeddings. In some embodiments, the location of the documents, or portions thereof, such as the identification of the article, can be determined.
[0210] In some embodiments, at block 424, a prompt is provided to a language model to determine whether an article might contain an answer relevant to the issue based on the issue summary and summaries 426 of each of the threshold number of most similar documents, or portions thereof, as identified at block 422. In some embodiments, the summaries of each of the documents of the knowledge base are precomputed and stored in the knowledge base. At block 428, if the language model determines that documents within the threshold number of documents may contain the answer, a threshold number of document most likely to contain the answer are provided to the user (e.g., via chat interface 108 of
[0211] A specific example of a prompt (e.g., as provided by answer search state component 154 via start state chatbot agent component 153 of
[0213] System: Here is a list of articles. For each, you have: ID, title, short summary: [0214] -ID: 0; Title: General FAQs; Summary: The article contains a list of frequently asked questions about various topics related to Intercom, including installation, access tokens, Slack integration, ticketing, exporting data, managing teammates, and resolving login issues. [0215] ID: 1; Title: Intercom features explained; Summary: This article provides an overview of the different features offered by Intercom, including the Messenger, Next-Gen Inbox, Tickets, FinThe AI-Powered Bot, AI assist Features, Articles, Outbound Messages, News, Surveys, 2-Way SMS, Product Tours, Checklists, Tooltips, Workflows, Custom Answers, Custom Actions and Objects, and Switch, as well as information on how to set up Intercom for your business. [0216] ID: 2; Title: Workflows explained; Summary: This article explains how to use Workflows in Intercom to easily build chatbots, automate tasks, and provide support to customers across multiple channels, with features such as triggers, templates, a visual builder, omnichannel support, rules, conditions, updating attributes, open tickets, custom actions, tagging conversations, and passover to other bots, and also provides information on how to manage Workflows. [0217] ID: 3; Title: AI assist for Articles [Beta]; Summary: The article introduces AI assist for Articles, a feature integrated with OpenAI's GPT 3.5 that allows teams to generate full article versions from summaries, saving time and cognitive load for creating help center content and providing customers with self-serve resources faster. [0218] ID: 4; Title: What is Automation?; Summary: Intercom is introducing new automation features, including an AI-powered bot called Fin, the rebranding of Custom Bots and Task Bots to Workflows, and the introduction of Basics for simple automations. Existing features like Resolution Bot and Inbox Rules will still be available but may be deprecated in the future. The navigation in the workspace will change depending on the features enabled. [0219] ID: 5; Title: Using Fin Profiles alongside other automations; Summary: This article explains how to use Fin Profiles alongside other automations in Intercom, such as Custom Answers and Workflows, and provides tips for configuring and troubleshooting conflicts between these features. [0220] ID: 6; Title: AI assist for Inbox; Summary: The article introduces AI assist for Inbox, a feature that uses OpenAI's GPT 3.5 to help support reps in writing customer responses, rephrasing messages, and summarizing conversations, ultimately saving time and improving the customer experience. The article also provides information on how Intercom's own support team is using these features and answers frequently asked questions about the functionality and availability of AI assist for Inbox. [0221] ID: 7; Title: Custom Actions and Objects explained; Summary: This article explains how to use Custom Actions and Objects to create personalized self-serve support experiences in bots and Inbox without any coding. [0222] ID: 8; Title: 6 great ways to use Surveys; Summary: The article explains 6 great ways to use surveys to capture and act on customer insights, from NPS to onboarding discovery, product satisfaction, lead generation, and understanding customer churn. [0223] ID: 9; Title: Articles explained; Summary: This article explains how to use Articles in Intercom to create self-service content, sync with Zendesk, build a knowledge base, provide faster support, generate AI answers, get feedback, and improve content. [0224]
[0225] Look at the list above and figure out whether given articles are related to the following customer query. [0226] Customer: Can you integrate your AI chatbot with our product search to handle customer enquiries? [0227] Please provide the IDs of the articles that are related to the customer query. [0228] Wrap each article returned with
[0231] A specific example of a response from the language model to the specific example of the prompt to determine documents likely to contain an answer to an issue summary is as follows:
[0232] In some embodiments, article indices are then tied back to article identification from the knowledge base (e.g., knowledge base 132 of
[0233] Returning to
[0234] An example diagram 500 of a model implementing a feedback state agent of a chatbot is shown in
[0235] Returning to
[0236] In some embodiments, feedback state chatbot agent component 155 includes a greet action (e.g., greet action 510 of
[0237] In some embodiments, feedback state chatbot agent component 155 includes a goodbye action (e.g., goodbye action 512 of
[0238] In some embodiments, feedback state chatbot agent component 155 includes a route to teammate/escalate action (e.g., escalate action 318 of
[0239] In some embodiments, feedback state chatbot agent component 155 includes a route to start agent action (e.g., route to main action 516 of
[0240] Data regarding the prompts utilized by chatbot component 151 and/or data communicated to/from the language model 110, customer device 102, and/or customer support device 112 can be stored in any suitable storage location, such as storage 130, customer support device 112, server 150, some combination thereof, and/or other locations as communication records files 131 and/or chatbot configuration files 140.
[0241] In an example implementation, chat interface 108 provides interface functionality that allows a user (e.g., a customer) to communicate with a chatbot and/or a CSA. Generally, chat interface 108 presents one or more interaction elements that provide various interaction modalities for chatting with a chatbot and/or a CSA. In various embodiments, these tools are implemented using code that causes a presentation of a corresponding interaction element(s), and detects and interprets inputs interacting with the interaction element(s).
[0242] In an example implementation, chatbot preview tool 128 provides interface functionality that allows a user (e.g., CSA) to communicate with a chatbot, for example, to preview the functionality of the chatbot. Generally, chatbot preview tool 128 presents one or more interaction elements that provide various interaction modalities for chatting with a chatbot. In various embodiments, these tools are implemented using code that causes a presentation of a corresponding interaction element(s), and detects and interprets inputs interacting with the interaction element(s).
[0243] Examples of a chat interface displayed to a customer (e.g., via chat interface 108) and/or CSA (e.g., via chatbot preview tool 128) are shown in
Example Configuration Environment
[0244] Returning to
[0245] Customer data accessing component 156 facilitates programmatically accessing customer data (e.g., customer data files 141) in order to implement a chatbot configuration of a chatbot implemented by chatbot component 151 based on the customer data of the customer. For example, a chatbot may access certain portions of the knowledge based on the status of the customer as a VIP. Customer data can be stored in any suitable storage location, such as storage 130, customer support device 112, server 150, some combination thereof, and/or other locations as customer data files 141. Customer data within a dataset may include, by way of example and not limitation, data that is sensed or determined from one or more sensors, such as location information of mobile device(s), smartphone data (such as phone state, charging data, date/time, or other information derived from a smartphone), activity information (for example: app usage; online activity; searches; browsing certain types of webpages; listening to music; taking pictures; voice data such as automatic speech recognition; activity logs; communications data including calls, texts, instant messages, and emails; website posts; other user data associated with communication events) including activity that occurs over more than one device, user history, session logs, application data, contacts data, calendar and schedule data, notification data, social network data, news (including popular or trending items on search engines or social networks), online gaming data, ecommerce activity, sports data, health data, and nearly any other source of data that may be used to identify the customer.
[0246] Knowledge base accessing component 157 facilitates programmatically implementing access to the knowledge base. In some embodiments, knowledge base accessing component 157 accesses portions of the knowledge base based on the configuration of the chatbot implemented by chatbot component 151. For example, the configuration of chatbot component 151 may indicate that only a specific set of documents of the knowledge base should be searched by the chatbot. In some embodiments, knowledge base accessing component 157 automatically syncs content (e.g., such as content provided by content management tool 173 and/or snippets tool 129) provided by the customer support application 116 and/or stored in knowledge base 132. Data regarding the configuration of the chatbot facilitating access to different portions of the knowledge base can be stored in any suitable storage location, such as storage 130, customer support device 112, server 150, some combination thereof, and/or other locations as chatbot configuration files 140.
[0247] Route to agent component 158 facilitates programmatically implementing routing of a conversation from a chatbot to a CSA via the customer support component 165 to the communication tool 120 of the customer support application 116. For example, if a customer request to speak to a person or the chatbot decides to route to teammate/escalation action, route to agent component 158 automatically implements the routing of the conversation from the chatbot to a CSA. Data regarding the configuration of the chatbot to route to a CSA can be stored in any suitable storage location, such as storage 130, customer support device 112, server 150, some combination thereof, and/or other locations as chatbot configuration files 140.
[0248] Chat workflow implementation component 159 facilitates programmatically implementing chat workflows designed to implement bots, triggers, conditions, and/or rules for chats. For example, a CSA can designed a chat workflow through cat workflow design tool so that the chatbot automatically implements certain actions, such as asking for specific information, before routing to a CSA. Data regarding chat workflows of the chatbot can be stored in any suitable storage location, such as storage 130, customer support device 112, server 150, some combination thereof, and/or other locations as workflow configuration files 142.
[0249] Custom answers component 160 facilitates programmatically implementing manually written answers and/or chat workflows to specific questions from customers by prioritizing the custom answers over AI-generated answers from the chatbot. In some embodiments, custom answers are searched using SBERT directly. For example, an embedding generated based on the customer's query can be searched against embeddings of one or more example questions corresponding to a custom answer (e.g., a user of customer support application 116 can input multiple examples of a single question. In some embodiments, the custom answer can include examples of questions, keywords, and/or phrases to automatically call a chat workflow designed in chat workflow design tool 180 and implemented by chat workflow implementation component 159. For example, if a customer requests a refund through chat interface 108, the custom answer for a refund can be determined through SBERT, which implements a chat workflow that automatically calls an API to trigger a refund. Data regarding custom answers can be stored in any suitable storage location, such as storage 130, customer support device 112, server 150, some combination thereof, and/or other locations as custom answer files 139.
[0250] Chatbot analysis component 161 facilitates programmatically generating reports by analyzing chatbot performance chatbot conversation monitoring, chatbot customer satisfaction score (CSAT), usage metrics, usage limits notifications, performance/return on investment (ROI) metrics, custom reports, and/or content-level metrics regarding access to content. Data regarding analysis of chatbots can be stored in any suitable storage location, such as storage 130, customer support device 112, server 150, some combination thereof, and/or other locations as metrics files 145.
[0251] In an example implementation, chatbot tool 170 provides interface functionality that allows a user (e.g., a CSA) of customer support application 116 to implement settings and/or configuration of the chatbot in order to customize the chatbot of chatbot component. Generally, chatbot tool 170 presents one or more interaction elements that provide various interaction modalities for customizing the chatbot. In various embodiments, these tools are implemented using code that causes a presentation of a corresponding interaction element(s), and detects and interprets inputs interacting with the interaction element(s). An example of chatbot tool 170 is shown in
[0252] In the example implementation in
[0253] Chatbot tool 170 includes chatbot identity customization tool 172 that allows the user to rename the chatbot and/or change the image/icon for the chatbot, which can be stored in chatbot configuration files 140. An example of chatbot identity customization tool 172 is shown in
[0254] Chatbot tool 170 includes content management tool 173 that allows the selection of content of the knowledge base 132 that the chatbot uses to search for answers. For example, the CSA may not want to utilize certain content as the content may be outdated. As another example, different content can be utilized for different audiences to target the chatbot to different audiences, such as based on customer data or customer segment (e.g., VIP customer). In some embodiments, knowledge base accessing component 157 automatically syncs any content (e.g., such as content provided by content management tool 173) provided by the customer support application 116 and/or stored in knowledge base 132. In some embodiments, content management tool 173 allows the selection of multimodal content stored in knowledge base 132 in order for the chatbot to provide multimodal answers, such as images, audio, and/or video in responses. In some embodiments, content management tool 173 allows the selection of external systems, such as an external application (e.g., an ecommerce store and/or system) through third-party application configuration files 144, in order for the chatbot to utilize external systems to provide answers. An example of content management tool 173 is shown in
[0255] Chatbot tool 170 includes audience selection tool 174 that allows the selection of audience to target the chatbot, such as based on customer data or customer segment (e.g., VIP customer) based on customer data stored in customer data files 141. Examples of audience selection tool 174 are shown in
[0256] Chatbot tool 170 includes communication channel selection tool 175 that allows the selection of various communication channels, such as Web, mobile, application, telephonic, SMS, messaging applications, social media, email, and/or any communication channel (e.g., as stored in third-party application configuration files 144). In some embodiments, the chatbots can be deployed on any number of communication channels. In some embodiments, different settings/configuration of the chatbot can be applied to different communication channels. For example, only some content of knowledge base 132 may be utilized by chatbot for customers communicating through a specific communication channel. In some embodiments, communication channel selection tool 175 allows the selection of multimodal receiving input user queries and/or outputting responses by the chatbot via audio. An example of communication channel selection tool 175 is shown in
[0257] Chatbot tool 170 includes chatbot behavior settings tool 176 that allows a user to provide settings for the chatbot regarding how the chatbot should behave when answering multiple questions and/or when to handover the chatbot to a teammate, which can be stored in chatbot configuration files 140. Examples of chatbot behavior settings tool 176 are shown in
[0258] Chatbot tool 170 includes inactive conversations settings tool 177 that allows a user of customer support application 116 to set automatic replies to inactive conversations, which can be stored in chatbot configuration files 140 and/or workflow configuration files 142. For example, inactive conversations settings tool 177 can send auto-replies to snoozed conversations. As another example, inactive conversations settings tool 177 can automate snoozing and set conversation pacing with new actions. As yet another example, inactive conversations settings tool 177 can automatically reroute conversations that receive no response to a human, such as a CSA. In some embodiments, inactive conversations settings tool 177 is implemented through chat workflow design tool 180 as a part of a chat workflow. An example of inactive conversations settings tool 177 is shown in
[0259] Chatbot tool 170 includes language settings tool 178 allows the chatbot to be multilingual and/or target the chatbot differently to customers in different languages, which can be stored in language configuration files 143. For example, only some content of knowledge base 132 may be utilized by chatbot for customers communicating in a specific language. Example of language settings tool 178 are shown in
[0260] Chatbot tool 170 includes scheduling settings tool 179 that allows a user to provide settings as to when to enable a chatbot, which can be stored in chatbot configuration files 140. For example, the chatbot can be implemented only for certain timeframe. An example of scheduling settings tool 179 is shown in
[0261] Chatbot tool 170 includes chat workflow design tool 180 allows the user of the customer support application 116 to design chat workflows, which can be stored as workflow configuration files 142. For example, the user can design chat workflows to handoff to support teams or to other bots. For example, a chatbot can include a chat workflow can be designed to automatically handover to a CSA when the chatbot cannot answer the question. As another example, a chat workflow can be designed to automatically implement a chat workflow to request additional information from the customer before handing the chat over to a CSA when the chatbot cannot answer a question. In some embodiments, the chatbot that uses a language model to determine answers from a knowledge base can be implemented in an initial block of the chat workflow. In some embodiments, a chat workflow can be designed to implement the chatbot that uses a language model to determine answers from a knowledge base at a later block in the workflow. For example, the chatbot may determine whether the customer data indicates the customer is not a VIP customer that is automatically assigned to a CSA before implementing the chatbot that uses a language model to determine answers from a knowledge base. Examples of chat workflow design tool 180 are shown in
[0262] Chatbot tool 170 includes chatbot analysis tool 181 that provides an interface for generating reports analyzing chatbot performance, such as through chatbot conversation monitoring, chatbot CSAT, usage metrics, usage limits notifications, performance/ROI metrics, custom reports, and/or content-level metrics, which can be stored as metrics files 145. Examples of chatbot analysis tool 181 and output of chatbot analysis tool 181 are shown in
[0263] Chatbot tool 170 includes custom answers tool 182 that provides an interface for a user to implement manually written answers and/or chat workflows to specific questions from customers. For example, a user of the customer support application 116 may provide a specific written answer and examples of potential customer questions that would require the specific written answer so that specific written answer is provided without requiring the chatbot to call the language model. In some embodiments, the custom answer can include examples of questions, keywords, and/or phrases to automatically call a chat workflow designed in chat workflow design tool 180 and implemented by chat workflow implementation component 159. For example, if a customer requests a refund through chat interface 108, the custom answer for a refund can be determined through SBERT based on aspects of the conversation, which implements a chat workflow that automatically calls an API to trigger a refund. An example of custom answers tool 182 is shown in
Example Customer Support Communication Environment
[0264] Continuing with
[0265] In some embodiments, messages routing component 166 routes messages from conversations from customers, such as conversations between a customer and a chatbot (e.g., through route to agent component 158 of chatbot component 151). In some embodiments, messages routing component 166 routes messages to and/or from customers from various communication channels, such as Web, mobile, application, telephonic, SMS, messaging applications, social media, email, and/or any communication channel. Messages routing component 166 can route messages to message inbox interface 122 and/or chat interface 124 of a customer support application 116 of a CSA.
[0266] In some embodiments, AI-assisted chat component 167 assists customer support in responding to customers in chats using AI. For example, a language model may be called to revise an initial response drafted by the CSA to expand the response, rephrase the response, make the response more formal, make the response friendlier, and/or the like. As another example, using a similar process as the answer search state component 154 of chatbot component 151, the AI-assisted chat component 167 can determine answers to customer queries and provide the answers as suggestions to the CSA to respond to the customer query.
[0267] In some embodiments, snippets component 168 can programmatically extract conversational snippets from conversations between a customer and a CSA (e.g., from a communication record of communication records files 131, such as a chat log, chat transcript, meeting transcript, email, or other communication between a customer and a CSA). For example, a customer initiates a chat with a CSA through a chat interface 108 of application 106 executing on customer device 102. During the chat, the customer asks the CSA a series of questions and the CSA provides answers to each question through chat interface 124 of application 116 executing on customer support device 112 before the chat is ended. After the chat ends, the chat is stored (e.g., communication records files 131) so that the chat can be accessed in order to extract each question and corresponding answer (Q&A pairs) from the chat in a subsequent block. In some embodiments, the conversation can be accessed from the communication record by snippets component 168 during the conversation (e.g., chat) between the customer and the CSA in order to extract Q&A pairs from the conversation during the conversation. Snippets component 168 generates a prompt to language model 110 in order to extract Q&A pairs from the portion of the conversation record and to generate a single, summarized Q&A pair based on the Q&A pairs extracted from the conversation record. In some embodiments, snippets component generates a prompt to language model 110 to tag each Q&A pair of the conversation with contextual metadata corresponding to each of the Q&A pairs and filter irrelevant Q&A pairs of the conversation based on the metadata of each Q&A pair before generating a single, summarized Q&A pair based on the remaining Q&A pairs extracted from the conversation record. Examples of contextual metadata include: (1) contextual metadata corresponding to whether the question was answered by a human (e.g., a CSA) or a bot (e.g., the chatbot); (2) contextual metadata corresponding to the topic of the Q&A pair; (3) contextual metadata corresponding to a score indicating the relevance of the Q&A pair to other customers; and/or (4) contextual metadata corresponding to a dialog classification corresponding to a category of the type of dialogue. Examples of the type of dialogue can include: (a) informational content where an answer in the Q&A pair is directed to general information, knowledge and/or instructions; (b) a clarification where an answer in the Q&A pair is a request for clarification from the user; (c) a CSA action where an answer in the Q&A pair required the CSA to take some action on the background of the conversation; (d) a feature request where an answer in the Q&A pair is directed to future product improvements or feature requests; and/or (e) other categories or a category for an answer in the Q&A pair does not fit into the previous specified types of dialogue.
[0268] In an example implementation, communication tool 120 provides interface functionality that allows a user (e.g., a CSA) to chat with a customer, chat with a customer with AI-assisted chat capabilities, trigger extractions of snippets, interact with messages from various customers and/or communication channels through their inbox, and/or preview a chatbot through interactions with an interface controlled by communication tool 120. Generally, communication tool 120 presents one or more interaction elements that provide various interaction modalities for its functionality. In various embodiments, these tools are implemented using code that causes a presentation of a corresponding interaction element(s), and detects and interprets inputs interacting with the interaction element(s). An example of communication tool 120 is shown in
[0269] In the example implementation in
[0270] Communication tool 120 includes chatbot preview tool 128 allows the user of the customer support application 116 to preview the chatbot, such as before setting the chatbot live for customers. An example of chatbot preview tool 128 is shown in
[0271] Communication tool 120 includes snippets tool 129 that provides an interface so that users of customer support application 116 can add snippets determined from conversational data (e.g., communication records files 131) by snippets component 168 to add to the knowledge base 132 (e.g., snippets files 133). In this regard, the answers in snippets of snippets files 133 can be edited, targeted (e.g., by content management tool 173), and managed in the knowledge base utilized by the chatbot to provide answers. An example of snippets tool 129 is shown in
Example Chat Interface
[0272]
[0273]
Example Customer Support Application Interface
[0274]
[0275]
[0276]
Example Chatbot Configuration Interface
[0277]
[0278]
[0279]
[0280]
[0281]
[0282]
[0283]
[0284]
[0285]
[0286]
[0287]
[0288]
[0289]
[0290]
[0291]
[0292]
Example Process Flows
[0293] With reference now to
[0294]
[0295]
[0296]
[0297]
[0298]
[0299]
[0300] At block 1420, generate a representation of the conversation record (referred to as a conversation representation) via a language model. Embodiments of block 1420 generate, from the conversation record, a conversation representation. The conversation representation represents a distilled summary and context of the conversation, or a set of one or more extracted queries that encapsulate the customer's issues or questions. In some implementations, the conversation representation is generated using a language model, such as a large language model (LLM) for instance, GPT 3.5 Turbo, or using a small language model. The language model is provided as an input, a portion of the conversation history record and an issue summarization prompt. Issue summarization prompts are designed to instruct a language model in summarizing complex topics, discussions, or content into a concise and coherent summary. In one example, an issue summarization prompt is based on the following instruction: [0301] [0302] Chat history: [0303] {chat_history} [0304] - - - [0305] You have a chat history with a customer of {app_name}. [0306] Your goal is to summarize relevant information about customer's LAST question or issue in one full, meaningful and clear standalone sentence. [0307] You have to rewrite it on the customer behalf, jump directly to the question or issue. Be formal. [0308] Keep it short, don't include many details. Only stick to the information in the chat historythis is very important to my career. [0309]
In one embodiment, the portion of the conversation history includes recent conversation parts, such as the back-and-forth messages exchanged in the conversation; for instance, some implementations determine and use the five most recent conversation parts to determine the conversation representation.
[0310] At block 1430, generate an embedding corresponding to the conversation representation. Embodiments of block 1430 generate an embedding corresponding to the conversation representation thereby forming a representation embedding. In some embodiments, the embedding is generated or computed using Sentence Bidirectional Encoder Representations from Transformers (SBERT) or another suitable transformer.
[0311] At block 1440, determine a set of passages of documents in a knowledge base that are relevant to the conversation representation. Embodiments of block 1440 comprise accessing a knowledge base to identify information relevant to the conversation representation for use to generate an answer responsive to the conversation representation. In various implementations, the knowledge base comprises a repository of documents each containing one or more passages that may be relevant to queries of the conversation representation. For example, documents in the knowledge base can include, without limitation, help center information, such as technical documentation, user manuals, FAQs, policy documents, product guides; internal documentation; conversation histories or portions thereof, which may include portions of past conversations or summaries of conversations between a user, such as a CSA, and a customer regarding an issue and its resolution; other information specifically curated for the knowledge base, and other information sources potentially relevant for addressing customer queries. In some instances, a document may comprise a plurality of related files or electronic documents, as well as multimedia content.
[0312] As described herein, documents in the knowledge base are segmented into discrete units known as passages. For each document in the knowledge base, an embedding is computed for the passages of the document thereby allowing for the computation of semantic similarity between embeddings. Accordingly, passage embeddings are compared to representation embeddings to determine a set of passages that are semantically relevant to a conversation representation.
[0313] In some embodiments of block 1440, a query is performed on the knowledge base to determine a set of passages that are relevant to a conversation history record by determining a set of passages in the knowledge base that have corresponding passage embedding that are similar to the representation embedding corresponding to the conversation representation. In particular, the set of the passages that are relevant to the conversation representation may be determined by computing a semantic similarity of the representation embedding to an embedding corresponding to each of the passages of the documents in the knowledge base. Those passages that are sufficiently relevant, such as satisfying a threshold of similarity, are included in the set of passages. In some implementations, all of the passages are ranked for similarity and only the top certain number of passages, corresponding to the most relevant passages, are included in the set of passages relevant to the conversation representation. For instance, the set of passages may comprise forty-five passages that are ranked in order of similarity, representing relevance, to the conversation the conversation representation. In some implementations, the similarity comparison is performed using the model msmarco-distilbert-base-tas-b if the language corresponding to the embeddings is English and Multilingual-e5-base for other languages.
[0314] In some embodiments using a similarity threshold, the similarity threshold is 0.8. In other embodiments using a similarity threshold, pre-determined or configured according to the number or potentially relevant passages (e.g., the threshold may be set higher if there are more passages, so that only the most relevant passages are included in the set, but the threshold may be set lower if there are fewer potentially relevant passages, so that more are included for consideration in generating the answer output response.
[0315] At block 1450, prune the number of passages in the set of passages based on properties of a language model. Some embodiments of method 1400 include decreasing the number of passages in the set of passages that will be used by the language model for generating an answer output. Accordingly, embodiments of block 1450 include programmatically pruning the number of passages in the set according to the limitations of the language model or a computation speed to provide a desired user experience. For instance in one embodiment, a programmatic pruning process may be performed automatically to determine the first number of N source documents that contain the passages with the highest relevance to the conversation representation. The value of N may be set to 15, for instance, which would comprise identifying the top 15 source documents with the corresponding high-ranking passages in regards to relevance to the conversation representation. However, N could also be 10, 20, or another number, depending on the implementation or the characteristics of the language model in use. In some implementations, a document relevance is determined for each of the documents and used for determining the number of N documents most relevant to the conversation representation. For example, for each document having passages in the set of passages relevant to the conversation representation, the relevance of the document to the conversation representation is determined based on the relevance of each passage within the document that is in the set of passages.
[0316] At block 1460, determine document-passage groupings for the remaining passages. Embodiments of block 1460 use the set of relevant passages, or the passages corresponding to the number of N documents, to generate document-passage groupings. A document-passage grouping indicates a document and each passage of the document that is in the set of relevant passages. For example the indication may comprise a document ID and a passage ID or index number from an index of passages for the document. In some implementations, the indication of a particular passage is based on the document ID and an index or number corresponding to each passage in the document. Accordingly, in instances having a number of N documents that have the passages with the highest relevance to the conversation representation, there will be N document-passage groupings, comprising a grouping of the relevant passages for each document.
[0317] In some embodiments of block 1460, each relevant document-passage grouping is determined from a predetermined document-passage grouping. In particular, in some implementations, a pre-determined document-passage grouping is determined for each document in the knowledge base and comprises an indication of the document and indications of each passage occurring within the document. The indications of passages in the predetermined document-passage grouping include all passages that are in the document, including passages that are not necessarily relevant to the conversation representation. Accordingly, in some embodiments, a document-passage grouping is determined by removing from the predetermined document-passage grouping, the indications of those passages that are not in the set of passages relevant to the conversation representation.
[0318] At block 1470, programmatically generate an answer-generation input instruction to cause the language model to produce an answer output. Embodiments of block 1470 programmatically generate an answer-generation input instruction for inputting to a language model, such as an LLM, to cause the language model to produce an answer output. In various embodiments, the answer-generation input instruction is generated using one or more of: (a) the conversation representation, (b) the relevant document-passage groupings, or a portion of the relevant document-passage groupings), and (c) an answer-format instruction. For example, the answer-generation input instruction instructs the language model to use the document-passage groupings (or a portion of the document-passage groupings) to generate an answer output that is responsive to the conversation representation (such as a query in the conversation representation) and based on the answer-format instruction. In some embodiments, the language model comprises a GPT-4 LLM, and the answer output is generated using Retrieval-Augmented Generation (RAG). In some embodiments, answer-generation input instruction logic is used to generate the answer-generation input instruction. The answer-generation input instruction logic can include computer instructions, programming routines, rules, or templates used for generating the answer-generation input instruction.
[0319] In some embodiments, the number of relevant document-passage groupings included in the answer-generation input instruction is based on a target token length corresponding to the language model. Thus, the number of document-passage groupings may be limited so that only the document-passage groupings having the most relevant passages are included based on the target token length. Accordingly, document-passage groupings having less relevant passages may be excluded from the answer-generation input instruction, if the target token length is small. In some implementations, the target token length is determined using an LLM tokenizer configured for the language model.
[0320] Embodiments of block 1470 include programmatically determining an answer-format instruction, which is included in the answer-generation prompt instruction, to direct the language model to include specific information or format aspects of the answer output according to the format instruction. Some embodiments of answer-format instructions include instructions to generate and integrate citations within the answer output. In particular, some implementations of the answer-format instruction instruct the language model to include, in the answer output, a corresponding citation for each portion of the answer output that is generated using a particular passage, from the document-passage groupings. A citation corresponds to at least a portion of the answer output, such as a sentence or a paragraph in the answer output, and indicates the passage used to generate the corresponding portion of the answer output. A citation also may indicate the document that includes the indicated passage. For example, the citations may occur within the answer output following each portion of the answer output corresponding to the citation, or the citations may generated as footnotes or endnotes of the answer output. In one example implementation, the citations are provided in the answer output in the format: [@doc_id=120-1-0], where 120 is a code for the document type, 1 indicates an ID of the source, and 0 indicates the particular passage within that source.
[0321] In some implementations, the answer-format instruction specifies that each citation include a direct link to the location of its indicated passage in a document. That is, in these implementations, the citation does not merely link to the document that has the indicated passage, but the citation links directly to the passage within the document. For example, the direct link may comprise an anchor link, hyperlink, a URL, pointer, or similar link.
[0322] Further, some embodiments of the answer-format instruction generated in block 1470 include an instruction to direct the language model to include, in a citation, source information regarding the cited document. Source information includes information about the document, for example and without limitation, information regarding the type of document (e.g., a conversation record or snippet of a conversation, help center documentation, internal documentation, log, etc.); a creation date of the document; a last modification date of the document indicating how recently the document was updated; whether the document is internal to the user (for example, the CSA) or accessible to a customer, or publicly accessible; an indication of the number of times the document has been previously cited in past answer outputs, which may be used to determine that a particular document is used often for generating answers; or a CSA user feedback rating based on prior occurrences of the document's passages in past answer outputs. In this way, the citations serve to indicate the information about source of the information used to generate the answer output, thereby enhancing the transparency and trustworthiness of the generated answer.
[0323] Further, some embodiments of the answer-format instruction generated in block 1470 include an instruction directing the language model to include, in the answer output, an indication of a likelihood that the answer output is contained entirely within a passage or multiple passages of a document. Similarly, in some embodiments, the answer-format instruction instructs the language model to determine a likelihood that the answer output is contained entirely within a passage of a document (or alternatively to determine a likelihood that the answer output is contained entirely within a document), and based on the likelihood satisfying a confidence threshold, the language model is instructed to include, in the answer output, an indication that the answer output answers the conversation representation. For example, when the CSA user is presented with the answer output, the word Answer may be also presented in or with the citation, or presented before or after the answer output, thereby indicating that the provided answer output is contained entirely within the cited passage or the cited document. For example, in response to this instruction, the answer output may include: [0324] [0325]
[0327] Some embodiments of the answer-format instruction generated in block 1470 include an example of an answer output, an answer output template, a portion of an example answer output, or a portion of an answer output template.
[0328] Example aspects of an answer-generation input instruction generated at block 1470 include the following: [0329] [0330] ================== System Message ================================ [0331] You are a customer support answer service for {app_name}, helping a customer support agent. Your job is to answer questions as accurately and efficiently as possible. You're not talking to a customer. [0332] {query} [0333] ==================== AI Message ================================== [0334] Here is a list of passages from internal knowledge base in descending order of relevance to your query: [0335] {internal_passages_text} [0336] Here is a list of passages from public knowledge base in descending order of relevance to your query: [0337] {passages_text} [0338] =================== Human Message ================================= [0339] Lets refer to those passages as The information I've found about your query [0340] Given that information I want you to reasonably reinterpret my most recent query and determine if you can answer my query by summarising the relevant information into a helpful response. [0341] First, using a score between 1 to 10 write [0342]
[0362] At block 1480, receive the answer output from the language model in response to providing the answer-generation input instruction to the language model. In particular, the answer-generation input instruction is be provided as an input prompt to the language model, which generates a response and provides the response as the answer output. The answer output is received from the language model and processed (in block 1490) to determine a representation of the answer (referred to as an answer representation) that can be provided, via a user interface (UI), to the user.
[0363] At block 1490, cause a representation of the answer to be presented via a user interface on a computing device. Embodiments of block 1490 process the answer output to determine an answer representation and cause the answer representation to be presented via a user interface on a computing device, such as computing device 1700, described in
[0364] Some implementations of the UI comprise a first UI element for presenting aspects of the conversation history, which may comprise the transcript of an ongoing conversation, and a second UI element for presenting the answer representation. In some instances, the second UI element is positioned proximate the first UI element so that a user (for example a CSA) can view and interface with the conversation and also view the answer representation including citations, thereby enabling the user to access the passages that are indicated by the citations. In some embodiments, passage-level citations are presented in the UI as links which highlight the relevant passage in the original source, thereby enhancing the ability of the user to quickly fact check the answers given by Copilot.
[0365] Some implementations of method 1400 operate continuously or as needed during a conversation. Accordingly, the process of method 1499 for generating and providing an answer representation to a user based on a conversation history record is continuously updated as the conversation continues. In this way, the user, such as a CSA, may be continuously presented with an answer representation that is relevant to the current conversation with a customer. For instance, as the conversation evolves and new, more recent conversation parts are added to the conversation history, those more recent conversation parts are used to ultimately determine a new answer representation.
[0366]
[0367] At block 1510 a conversation from a communication record is accessed. Embodiments of block 1510 access a conversation record or conversation history. The conversation history comprises a data file that is a text record of a conversation, which may be occurring in near-real time. For example, as a CSA is communicating with a customer, the chat log or transcript of the discussion is created and comprises a conversation history record. The conversation history may be determined from a chat log or chat history of a chat session or by using automatic speech recognition, such as a speech-to-text software utility on audio information of the communication, such as from a customer who is speaking with a CSA over a phone call.
[0368] At block 1520, the most recent message from a user, such as customer, extracted from the communication record and a representation of the most recent message is used to generate a message embedding. For example, as a CSA is communicating with a customer, the customer's most recent message, or set of messages following the last message from the CSA, is extracted from the conversation history record. The embedding captures the semantic essence of the representation of the most recent message in a vector space that enables a computation of similarity of the message embedding with other text embeddings. In this way, other texts, including conversational snippets corresponding to Q&A pairs and/or passages within documents of a knowledge base, can be identified that are relevant to the representation of the most recent message based on a similarity comparison of corresponding embeddings. Some implementations use Sentence Bidirectional Encoder Representations from Transformers (SBERT) to generate the embedding.
[0369] At block 1530, a knowledge base is accessed to identify information relevant to the representation of the most recent message for use to generate an answer responsive to the most recent message. Some embodiments of block 1530 only access conversational snippets corresponding to Q&A pairs extracted from previous conversations from the knowledge base to identify information relevant to the representation of the most recent message for use to generate the answer. For example, conversational snippets corresponding to Q&A pairs can be extracted from previous conversations using a language model. Examples describing how conversational snippets corresponding to Q&A pairs are extracted from previous conversations are described in U.S. patent application Ser. No. 18/539,158, filed Dec. 13, 2023, which is incorporated herein by reference. For example, conversational snippets corresponding to Q&A pairs can be extracted from past conversations between a customer and a CSA. The conversational snippets can be added to a knowledge base, so that the Q&A pairs can be utilized to provide automated customer support. In certain embodiments, in order to identify the most relevant Q&A snippets from conversations between a customer and a CSA, a first call is made to a language model to extract all of the Q&A pairs from a conversation and annotate the Q&A pairs. A second call is made to the language model to remove irrelevant Q&A pairs of the conversation and generate a single, summarized Q&A pair from the remaining Q&A pairs of the conversation. The single, summarized Q&A pair of the conversation is presented to an end user, such as an administrator or the CSA, for approval before storage in the knowledge base. The summarized Q&A pair can be used as context to a language model in order to generate answers to semantically similar questions in the future. For each conversational snippet, an embedding (referred to as a question embedding) is computed for the question of the corresponding Q&A pair, thereby allowing for the computation of semantic similarity between embeddings. In this way, question embeddings may be compared to message embeddings to determine a conversational snippet that is semantically relevant to the representation of the most recent message. Accordingly, a query is performed on the conversational snippets corresponding to Q&A pairs in a knowledge base to determine a question of a Q&A pair that is relevant to a conversation history record by determining a question of a Q&A pair in the knowledge base that has a corresponding question embedding that is similar to the message embedding corresponding to the most recent message. In particular, a conversational snippet that is relevant to the representation of the most recent message may be determined by computing a semantic similarity of the message embedding to question embeddings corresponding to each of the conversational snippets in the knowledge base. The conversational snippet with the highest similarity score and above a threshold of similarity can be determined to be the most relevant to the message embedding. In some implementations, a set of conversational snippets are determined to be relevant to the representation of the most recent message. For example, all of the conversational snippets are ranked for similarity (e.g., based on the corresponding question embedding) and the top certain number of conversational snippets corresponding to the most relevant conversational snippets are included in the set of conversational snippets relevant to the representation of the most recent message. In some embodiments, if there are no conversational snippets with question embeddings above a threshold similarity to the message embedding, no conversational snippets are included as relevant to the representation of the most recent message.
[0370] Some embodiments of block 1530 access other portions of the knowledge base to identify information relevant to the representation of the most recent message for use to generate the answer. In various implementations, the knowledge base comprises a repository of documents each containing one or more passages that may be relevant to queries of the representation of the most recent message. For example, documents in the knowledge base can include, without limitation, help center information, such as technical documentation, user manuals, FAQs, policy documents, product guides; internal documentation; conversation histories or portions thereof, which may include portions of past conversations or summaries of conversations between a user, such as a CSA, and a customer regarding an issue and its resolution; other information specifically curated for the knowledge base, and other information sources potentially relevant for addressing customer queries. In some instances, a document may comprise a plurality of related files or electronic documents, as well as multimedia content. For each document in the knowledge base, an embedding (referred to as a passage embedding) is computed for the passages of the document thereby allowing for the computation of semantic similarity between embeddings. In this way, passage embeddings may be compared to message embeddings to determine passages that are semantically relevant to a representation of the most recent message. Accordingly, a query is performed on the knowledge base to determine a set of passages that are relevant to a conversation history record by determining a set of passages in the knowledge base that have corresponding passage embedding that are similar to the message embedding corresponding to the most recent message. In particular, the set of the passages that are relevant to the representation of the most recent message may be determined by computing a semantic similarity of the message embedding to an embedding corresponding to each of the passages of the documents in the knowledge base. Those passages that are sufficiently relevant, such as satisfying a threshold of similarity, are included in the set of passages. In some implementations, all of the passages are ranked for similarity and only the top certain number of passages, corresponding to the most relevant passages, are included in the set of passages relevant to the representation of the most recent message.
[0371] Some embodiments of block 1530, use the most relevant conversational snippet and/or passages to generate a prompt for a language model, such as a LLM for instance, GPT 3.5 Turbo, instructing it to produce an answer output that is responsive to the representation of the most recent message. In some implementations, only the answer from the Q&A pair of the most relevant conversational snippet is used to generate a prompt for a language model, instructing it to produce an answer output that is responsive to the representation of the most recent message.
[0372] At block 1540, an answer-generation input instruction is programmatically generated to cause the language model to produce an answer output. Embodiments of block 1550 programmatically generate an answer-generation input instruction for inputting to a language model, such as an LLM, to cause the language model to produce an answer output. In various embodiments, the answer-generation input instruction is generated using one or more of: (a) the most relevant conversational snippet and/or passages, (b) the representation of the most recent message, (c) other portions of the conversation history record, such as the entire conversation history record or only the messages sent by the CSA to provide context, style, and/or tone, (d) the name of the CSA, (e) the name of the customer, (f) commonly-used greetings extracted from other conversations with the CSA or other CSAs using any known natural language processing (NLP) technique to extract the commonly-used greetings, (g) an answer-format instruction, and/or other relevant information. For example, the answer-generation input instruction instructs the language model to use the answer from the Q&A pair of the most relevant conversational snippet to generate an answer output that is responsive to the representation of the most recent message (such as a query in the most recent message) and based on the answer-format instruction. In some embodiments, answer-generation input instruction logic is used to generate the answer-generation input instruction. The answer-generation input instruction logic can include computer instructions, programming routines, rules, or templates used for generating the answer-generation input instruction.
[0373] Embodiments of block 1540 include programmatically determining an answer-format instruction, which is included in the answer-generation prompt instruction, to direct the language model to include specific information or format aspects of the answer output according to the format instruction. Some embodiments of answer-format instructions include instructions to generate and integrate citations within the answer output. In particular, some implementations of the answer-format instruction instruct the language model to include, in the answer output, a corresponding citation to the corresponding conversational snippet, such as the Q&A pair and/or the past conversation from which the Q&A pair was extracted from. In some implementations, the answer-format instruction provides that each citation include a direct link to the location of the Q&A pair and/or the past conversation from which the Q&A pair was extracted from. Some implementations of the answer-format instruction instruct the language model to include, in the answer output, a corresponding citation for each portion of the answer output, such as a sentence or a paragraph in the answer output, that is generated using a particular passage with an indication indicating the document that includes the indicated passage. In some examples, the citations may occur within the answer output following each portion of the answer output corresponding to the citation, or the citations may generated as footnotes or endnotes of the answer output. In some implementations, the answer-format instruction provides that each citation include a direct link to the location of its indicated passage in a document. For example, the direct link may comprise an anchor link, hyperlink, a URL, pointer, or similar link.
[0374] Some implementations of the answer-format instruction instruct the language model to include, in a citation, source information regarding the citation. Source information includes information about the document, for example and without limitation, information regarding the type of document (e.g., a conversation record or snippet of a conversation, help center documentation, internal documentation, log, etc.); a date of the conversation; a creation date of the conversational snippet and/or document; a last modification date of the conversational snippet and/or document indicating how recently the conversational snippet and/or document was updated; whether the conversational snippet and/or document is internal to the user (for example, the CSA) or accessible to a customer, or publicly accessible; an indication of the number of times the conversational snippet and/or document has been previously cited in past answer outputs, which may be used to determine that a particular conversational snippet and/or document is used often for generating answers; or a CSA user feedback rating based on prior occurrences of the conversational snippet and/or document's passages in past answer outputs. In this way, the citations serve to indicate the information about source of the information used to generate the answer output, thereby enhancing the transparency and trustworthiness of the generated answer.
[0375] Embodiments of block 1540 provide the generated answer-generation input instruction as an input prompt to the language model. Example aspects of an answer-generation input instruction generated at block 1540 include the following: [0376] [0377] You're called {admin_name} \ [0378] and are an expert customer support agent working for {app_name}. \ [0379] You are speaking to {user_name}. \ [0380] Your job is to provide customer support answers. [0381] Your answers should be grounded in the context provided below (if it makes sense). \ If the context is not enough, you can ask clarifying questions. \ [0382] Don't assume anything about {app_name} products or processes if it is not in the context below. [0383] Never ask the customer to contact customer support: you're the customer support! \ [0384] Also, never refer to yourself as an AI or bot, always pretend to be a human agent. [0385] Try to sound human and natural: don't repeat yourself too much and don't be too formal. \ [0386] Don't start every answer in the same way. \ [0387] Be concise and to the point: don't pad your answers with unnecessary words. [0388] {greeting_style} \ [0389] Maintain a consistent style as the previous conversation turns but don't be afraid to be [0390] creative in order to \ [0391] deliver a great customer support experience. [0392] The previous conversation turns are stored in the conversation history. \ [0393] Your previous answers are marked with {admin_name}, and the end users are marked with {user_name}. \ [0394] Other names refer to people in your team. Use those previous turns to gauge the style and \ tone of the conversation and to avoid repeating yourself. \ [0395] Context (FAQ snippets found by semantic search): [0396] [0397] {context} [0398]
[0399] At block 1550, receive the answer output from the language model in response to providing the answer-generation input instruction to the language model. In particular, the answer-generation input instruction is be provided as an input prompt to the language model, which generates a response and provides the response as the answer output. The answer output is received from the language model and processed (in block 1560) to determine a representation of the answer (referred to as an answer representation) that can be provided, via UI, to the user.
[0400] At block 1560, cause a representation of the answer to be presented via a user interface on a computing device. Embodiments of block 1560 process the answer output to determine an answer representation and cause the answer representation to be presented via a user interface on a computing device, such as computing device 1700, described in
[0401] In some implementations, the answer output further includes visualization instructions for presenting the answer representation via a UI. Further, some implementations of the UI comprise a first UI element presenting aspects of the conversation history, which may comprise the transcript of an ongoing conversation, and a second UI element for presenting the answer representation. In some instances, the second UI element is positioned proximate the first UI element so that a user (for example a CSA) can view and interface with the conversation and also view the answer representation including citations, thereby enabling the user to access the passages that are indicated by the citations.
[0402] In some implementations, the UI further provides UI elements to the CSA to take actions with respect to the answer output. For example, a UI element can provide the CSA the option to send the answer output to the customer in response to the most recent message. As another example, a UI element can provide the CSA the option to manually edit the answer output before sending the manually-edited answer output to the customer in response to the most recent message. As another example, a UI element can provide the CSA the option to reject the answer output. In some implementations, the rejection of the answer can be used to re-generate the response by the language model. As another example, a UI element can provide the CSA the option to rate the answer output. In some implementations, the rating can be used to re-generate the response by the language model.
[0403] At block 1570, an answer-augmentation instruction is programmatically generated in response to a selection by a user, such as a CSA. In some implementations, the UI further provides UI elements to the CSA to augment the answer output using a language model. For example, upon selection of a UI element to augment the answer output, an answer-augmentation instruction is programmatically generated and provided to a language model, to cause the language model to produce an augmented-answer output. In various embodiments, the answer-augmentation instruction is generated using the answer output, an augmented-answer format instruction, and one or more of (a) a user-style answer-augmentation instruction to augment the answer output to be similar to the tone or style of the user, such as the CSA that is communicating with the customer; (b) a friendly-style answer-augmentation instruction to augment the answer output to sound more helpful, cordial, and/or friendly; (c) a formal-style answer-augmentation instruction to augment the answer output to sound more formal; (d) a rephrase answer-augmentation instruction to augment the answer output to rephrase the answer output; (e) an expand answer-augmentation instruction to augment the answer output to expand the answer output into a longer response; (f) a translate answer-augmentation instruction to augment the answer output to translate the answer output into a different language; (g) a grammar and/or spelling answer-augmentation instruction to augment the answer output to fix the grammar and/or spelling of the answer output; and other similar answer-augmentation instructions. In some implementations, each of the answer-augmentation instructions can include its own corresponding UI element so that the CSA can select how to augment the answer output using the language model via a UI element. In some implementations, the answer-augmentation instruction can be used to augment a manually-written, such as a manually-drafted response by a CSA to a customer, and/or a manually-edited answer output from a language model.
[0404] Some embodiments of block 1570 programmatically generates the answer-augmentation instruction as a user-style answer-augmentation instruction to augment the answer output to be similar to the tone or style of the user using similar messages previously sent by the user, such as the CSA communicating with the customer. For example, previous conversations between the CSA and customers can be searched to determine messages previously sent by the CSA that are similar to the answer output. Accordingly, a keyword search is performed on messages previously sent by the user to determine a set of messages that are similar to the answer output by computing a string similarity of the answer output to the messages previously sent by the user. The messages previously sent by the user that are sufficiently similar, such as satisfying a threshold of similarity, are included in the set of messages. In some implementations, all of the messages previously sent by the user are ranked for string similarity and only the top certain number of messages previously sent by the user, corresponding to the most relevant messages previously sent by the user, are included in the set of messages similar to the answer output. In some embodiments, a similarity ranking function, such as the BM25 algorithm using ElasticSearch, can be used to determine the top certain number of messages previously sent by the user.
[0405] Example aspects of a user-style answer-augmentation instruction generated at block 1570 include the following: [0406] [0407] Use the following messages from previous conversations for style and tone examples: [0408] {similar_turns} [0409] Rewrite the message below in the same style/tone as the messages above using the following format: [0410]
[0420] Example aspects of an additional instruction when there are images present in the answer output (e.g., or manually-written or manually-edited message) for an answer-augmentation instruction, such as a user-style answer-augmentation instruction, generated at block 1570 include the following: [0421] [0422] The customer support message will contain text like \n\n[Image: URL]\n\n. These represent contextually relevant \ [0423] images not words and should be left as-is including the \n\n characters and whatever the URL was. [0424] For example, \n\n [Image: [0425] https://media.intercomcdn.com/attachments/1234/5678/image.png]\n\n in the input message \ [0426] should remain as \n\n [Image: [0427] https://media.intercomcdn.com/attachments/1234/5678/image.png]\n\n in the rewritten \ [0428] message. [0429] Images in the rewritten message should not be used within sentences. [0430] All of the images from the input message should be in the rewritten message. [0431] Each image makes sense within the context of the surrounding input text so needs to be kept with it when rewriting the \ [0432] message. [0433]
[0434] At block 1580, receive the augmented-answer output from the language model in response to providing the answer augmentation instruction to the language model. Embodiments of block 1580 are similar to the answer output being received by the language model in block 1550. At block 1590, cause a representation of the augmented-answer to be presented via a user interface on a computing device. Embodiments of block 1590 are similar to presenting the representation of the augmented-answer via the UI on the computing device in block 1550. For example, when a CSA selects the UI corresponding to the user-style answer-augmentation instruction, the user-style answer-augmentations instruction, including (a) answer output, (b) the top five (5) similar messages previously sent by the user to the answer output, (c) instructions to transform the answer into an augmented-answer that matches the style of the top five (5) similar messages previously sent by the user, (d) instructions to the language model that the augmentation should not change the content and the semantics of the reply, only the superficial style, (e) instructions to the language model not to remove any images, and/or any other relevant instructions, are provided as an input prompt to the language model. In response, the language model provides an output comprising an augmented-answer output in the similar style of the CSA. The augmented-answer output can then be presented via a graphical user interface to the CSA.
Example Operating Environments
[0435] Having described an overview of aspects of the technology described herein and various implementations, several example computing environments are provided, in
[0436] First, a natural language corpus (for example, various WIKIPEDIA English words or BooksCorpus) of the inputs 1601 are converted into tokens and then feature vectors and embedded into an input embedding 1602 to derive meaning of individual natural language words (for example, English semantics) during pre-training. In some embodiments, to understand English language, corpus documents, such as text books, periodicals, blogs, social media feeds, and the like are ingested by the language model 1600.
[0437] In some embodiments, each word or character in the input(s) 1601 is mapped into the input embedding 1602 in parallel or at the same time, unlike existing long short-term memory (LSTM) models, for example. The input embedding 1602 maps a word to a feature vector representing the word. But the same word (for example, apple) in different sentences may have different meanings (for example, phone versus fruit). This is why a positional encoder 1604 can be implemented. A positional encoder 1604 is a vector that gives context to words (for example, apple) based on a position of a word in a sentence. For example, with respect to a message I just sent the document, because I is at the beginning of a sentence, embodiments can indicate a position in an embedding closer to just, as opposed to document. Some embodiments use a sine/cosine function to generate the positional encoder vector using the following two example equations:
[0438] After passing the input(s) 601 through the input embedding 1602 and applying the positional encoder 1604, the output is a word embedding feature vector, which encodes positional information or context based on the positional encoder 1604. These word embedding feature vectors are then passed to the encoder and/or decoder block(s) 1606, where it goes through a multi-head attention layer 1606-1 and a feedforward layer 1606-2. The multi-head attention layer 1606-1 is generally responsible for focusing or processing certain parts of the feature vectors representing specific portions of the input(s) 1601 by generating attention vectors. For example, in Question-Answering systems, the multi-head attention layer 1606-1 determines how relevant the i.sup.th word (or particular word in a sentence) is for answering the question or relevant to other words in the same or other blocks, the output of which is an attention vector. For every word, some embodiments generate an attention vector, which captures contextual relationships between other words in the same sentence or other sequences of characters. For a given word, some embodiments compute a weighted average or otherwise aggregate attention vectors of other words that contain the given word (for example, other words in the same line or block) to compute a final attention vector.
[0439] In some embodiments, a single-headed attention has abstract vectors Q, K, and V that extract different components of a particular word. These are used to compute the attention vectors for every word, using the following equation (3):
[0440] For multi-headed attention, there are multiple weight matrices W.sup.q, W.sup.k and W.sup.v. so there are multiple attention vectors Z for every word. However, a neural network may expect one attention vector per word. Accordingly, another weighted matrix, W.sup.z, is used to make sure the output is still an attention vector per word. In some embodiments, after the layers 1606-1 and 1606-2, there is some form of normalization (for example, batch normalization and/or layer normalization) performed to smoothen out the loss surface making it easier to optimize while using larger learning rates.
[0441] Layers 1606-3 and 1606-4 represent residual connection and/or normalization layers where normalization re-centers and rescales or normalizes the data across the feature dimensions. The feedforward layer 1606-2 is a feed-forward neural network that is applied to every one of the attention vectors outputted by the multi-head attention layer 1606-1. The feedforward layer 1606-2 transforms the attention vectors into a form that can be processed by the next encoder block or make a prediction at 1608. For example, given that a document includes first natural language sequence the due date is . . . , the encoder/decoder block(s) 1606 predicts that the next natural language sequence will be a specific date or particular words based on past documents that include language identical or similar to the first natural language sequence.
[0442] In some embodiments, the encoder/decoder block(s) 1606 includes pre-training to learn language (pre-training) and make corresponding predictions. In some embodiments, there is no fine-tuning because some embodiments perform prompt engineering or learning. Pre-training is performed to understand language, and fine-tuning is performed to learn a specific task, such as learning an answer to a set of questions (in Question-Answering [QA] systems).
[0443] In some embodiments, the encoder/decoder block(s) 1606 learns what language and context for a word is in pre-training by training on two unsupervised tasks (Masked Language Model [MLM] and Next Sentence Prediction [NSP]) simultaneously or at the same time. In terms of the inputs and outputs, at pre-training, the natural language corpus of the inputs 1601 may be various historical documents, such as text books, journals, and periodicals, in order to output the predicted natural language characters in 1608 (not make the predictions at runtime or prompt engineering at this point). The example encoder/decoder block(s) 1606 takes in a sentence, paragraph, or sequence (for example, included in the input[s] 1601), with random words being replaced with masks. The goal is to output the value or meaning of the masked tokens. For example, if a line reads, please [MASK] this document promptly, the prediction for the mask value is send. This helps the encoder/decoder block(s) 1606 understand the bidirectional context in a sentence, paragraph, or line at a document. In the case of NSP, the encoder/decoder block(s) 606 takes, as input, two or more elements, such as sentences, lines, or paragraphs, and determines, for example, if a second sentence in a document actually follows (for example, is directly below) a first sentence in the document. This helps the encoder/decoder block(s) 1606 understand the context across all the elements of a document, not just within a single element. Using both of these together, the encoder/decoder block(s) 1606 derives a good understanding of natural language.
[0444] In some embodiments, during pre-training, the input to the encoder/decoder block(s) 1606 is a set (for example, two) of masked sentences (sentences for which there are one or more masks), which could alternatively be partial strings or paragraphs. In some embodiments, each word is represented as a token, and some of the tokens are masked. Each token is then converted into a word embedding (for example, 1602). At the output side is the binary output for the next sentence prediction. For example, this component may output 1, for example, if masked sentence 2 followed (for example, was directly beneath) masked sentence 1. The outputs are word feature vectors that correspond to the outputs for the machine learning model functionality. Thus, the number of word feature vectors that are input is the same number of word feature vectors that are output.
[0445] In some embodiments, the initial embedding (for example, the input embedding 1602) is constructed from three vectors: the token embeddings, the segment or context-question embeddings, and the position embeddings. In some embodiments, the following functionality occurs in the pre-training phase. The token embeddings are the pre-trained embeddings. The segment embeddings are the sentence numbers (that includes the input[s] 1601) that is encoded into a vector (for example, first sentence, second sentence, and so forth, assuming a top-down and right-to-left approach). The position embeddings are vectors that represent the position of a particular word in such a sentence that can be produced by positional encoder 1604. When these three embeddings are added or concatenated together, an embedding vector is generated that is used as input into the encoder/decoder block(s) 1606. The segment and position embeddings are used for temporal ordering since all of the vectors are fed into the encoder/decoder block(s) 1606 simultaneously, and language models need some sort of order preserved.
[0446] In pre-training, the output is typically a binary value C (for NSP) and various word vectors (for MLM). With training, a loss (for example, cross-entropy loss) is minimized. In some embodiments, all the feature vectors are of the same size and are generated simultaneously. As such, each word vector can be passed to a fully connected layered output with the same number of neurons equal to the same number of tokens in the vocabulary.
[0447] In some embodiments, after pre-training is performed, the encoder/decoder block(s) 1606 performs prompt engineering or fine-tuning on a variety of QA data sets by converting different QA formats into a unified sequence-to-sequence format. For example, some embodiments perform the QA task by adding a new question-answering head or encoder/decoder block, just the way a masked language model head is added (in pre-training) for performing an MLM task, except that the task is a part of prompt engineering or fine-tuning. This includes the encoder/decoder block(s) 1606 processing the inputs 1602 and/or 1628 in order to make the predictions and generate a prompt response, as indicated in 1604. Prompt engineering, in some embodiments, is the process of crafting and optimizing text prompts for language models to achieve desired outputs. In other words, prompt engineering comprises a process of mapping prompts (for example, a question) to the output (for example, an answer) that it belongs to for training. For example, if a user asks a model to generate a poem about a person fishing on a lake, the expectation is it will generate a different poem each time. Users may then label the output or answers from best to worst. Such labels are an input to the model to make sure the model is giving a more human-like or best answers, while trying to minimize the worst answers (for example, via reinforcement learning). In some embodiments, a prompt as described herein includes one or more of: a request (for example, a question or instruction [for example, write a poem]), target content, and one or more examples, as described herein.
[0448] In some embodiments, the inputs 1601 additionally or alternatively include other inputs, such as the inputs to machine learning models described the embodiment disclosed herein. Alternative to prompt engineering, certain embodiments of inputs represent inputs provided to the encoder/decoder block(s) 1608 at runtime or after the model 1600 has been trained, tested, and deployed. Likewise, in these embodiments, the predictions in the output 608 represent predictions made at runtime or after the model 1600 has been trained, tested, and deployed.
[0449] With reference to
[0450] Embodiments of the disclosure are described in the general context of computer code or machine-useable instructions, including computer-useable or computer-executable instructions, such as program modules, being executed by a computer or other machine such as a smartphone, a tablet PC, or other mobile device, server, or client device. Generally, program modules, including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types. Embodiments of the disclosure are practiced in a variety of system configurations, including mobile devices, consumer electronics, general-purpose computers, more specialty computing devices, or the like. Embodiments of the disclosure are also practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media, including memory storage devices.
[0451] Some embodiments comprise an end-to-end software-based system that operates within system components described herein to operate computer hardware to provide system functionality. At a low level, hardware processors generally execute instructions selected from a machine language (also referred to as machine code or native) instruction set for a given processor. The processor recognizes the native instructions and performs corresponding low-level functions related to, for example, logic, control, and memory operations. Low-level software written in machine code can provide more complex functionality to higher level software. Accordingly, in some embodiments, computer-executable instructions include any software, including low-level software written in machine code, higher level software such as application software, and any combination thereof. In this regard, the system components can manage resources and provide services for system functionality. Any other variations and combinations thereof are contemplated within the embodiments of the present disclosure.
[0452] With reference to
[0453] With continued reference to
[0454] Computing device 1700 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 1700 and includes both volatile and nonvolatile, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program sub-modules, or other data.
[0455] Computer storage media includes RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Computer storage media does not comprise a propagated data signal.
[0456] Communication media typically embodies computer-readable instructions, data structures, program sub-modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
[0457] Memory 1712 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory 1712 may be removable, non-removable, or a combination thereof. Exemplary memory includes solid-state memory, hard drives, and optical-disc drives. Computing device 1700 includes one or more processors 1714 that read data from various entities such as bus 1710, memory 1712, or I/O components 1720. Presentation component(s) 1716 present data indications to a user or other device. Exemplary presentation components 1716 include a display device, speaker, printing component, and vibrating component. I/O port(s) 1718 allow computing device 1700 to be logically coupled to other devices including I/O components 1720, some of which may be built in.
[0458] Illustrative I/O components include a microphone, joystick, game pad, satellite dish, scanner, printer, display device, wireless device, a controller (such as a keyboard, and a mouse), a natural user interface (NUI) (such as touch interaction, pen (or stylus) gesture, and gaze detection), and the like. In aspects, a pen digitizer (not shown) and accompanying input instrument (also not shown but which may include, by way of example only, a pen or a stylus) are provided in order to digitally capture freehand user input. The connection between the pen digitizer and processor(s) 1714 may be direct or via a coupling utilizing a serial port, parallel port, and/or other interface and/or system bus known in the art. Furthermore, the digitizer input component may be a component separated from an output component such as a display device, or in some aspects, the usable input area of a digitizer may be coextensive with the display area of a display device, integrated with the display device, or may exist as a separate device overlaying or otherwise appended to a display device. Any and all such variations, and any combination thereof, are contemplated to be within the scope of aspects of the technology described herein.
[0459] A NUI processes air gestures, voice, or other physiological inputs generated by a user. Appropriate NUI inputs may be interpreted as ink strokes for presentation in association with the computing device 1700. These requests may be transmitted to the appropriate network element for further processing. A NUI implements any combination of speech recognition, touch and stylus recognition, facial recognition, biometric recognition, gesture recognition both on screen and adjacent to the screen, air gestures, head and eye tracking, and touch recognition associated with displays on the computing device 1700. The computing device 1700 may be equipped with depth cameras, such as stereoscopic camera systems, infrared camera systems, RGB camera systems, and combinations of these, for gesture detection and recognition. Additionally, the computing device 1700 may be equipped with accelerometers or gyroscopes that enable detection of motion. The output of the accelerometers or gyroscopes may be provided to the display of the computing device 1700 to render immersive augmented reality or virtual reality.
[0460] A computing device may include radio(s) 1724. The radio 1724 transmits and receives radio communications. The computing device may be a wireless terminal adapted to receive communications and media over various wireless networks. Computing device 1700 may communicate via wireless protocols, such as code division multiple access (CDMA), global system for mobiles (GSM), or time division multiple access (TDMA), as well as others, to communicate with other devices. The radio communications may be a short-range connection, a long-range connection, or a combination of both a short-range and a long-range wireless telecommunications connection. When we refer to short and long types of connections, we do not mean to refer to the spatial relation between two devices. Instead, we are generally referring to short range and long range as different categories, or types, of connections (i.e., a primary connection and a secondary connection). A short-range connection may include a Wi-Fi connection to a device (e.g., mobile hotspot) that provides access to a wireless communications network, such as a WLAN connection using the 802.11 protocol. A Bluetooth connection to another computing device is a second example of a short-range connection. A long-range connection may include a connection using one or more of CDMA, GPRS, GSM, TDMA, and 802.16 protocols.
[0461] The technology described herein has been described in relation to particular aspects, which are intended in all respects to be illustrative rather than restrictive. The technology described herein is described with specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms step and block may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.