Patent classifications
G10L15/28
SOUND SIGNAL PROCESSING DEVICE AND SOUND SIGNAL PROCESSING METHOD
A sound signal processing device includes a plurality of sound signal processing units and a plurality of buffers. Each of the plurality of sound signal processing units belongs to one of a plurality of groups and executes a predetermined process by using a supplied sound signal. The plurality of buffers are provided with mutually different buffer sizes where each of the plurality of buffers is associated with one of the groups and supplies an accumulated sound signal to a sound signal processing unit that belongs to an associated group.
Edge appliance to provide conversational artificial intelligence based software agents
In some aspects, an edge appliance is placed in an active mode and causes a software agent that is based on a machine learning algorithm to engage in a conversation to take an order from a customer that is located at an order post. The edge appliance provides, using a communication interface, audio data that includes the conversation, to a communications system of a restaurant. The edge appliance provides, using the communication interface, a content of a cart associated with the order to a point-of-sale terminal of the restaurant. If the edge appliance determines, using the communication interface, that a microphone of the communication system is receiving audio input from an employee, the edge appliance automatically transitions the edge appliance from the active mode to an override mode, enabling the employee to receive a remainder of the order from the customer.
Edge appliance to provide conversational artificial intelligence based software agents
In some aspects, an edge appliance is placed in an active mode and causes a software agent that is based on a machine learning algorithm to engage in a conversation to take an order from a customer that is located at an order post. The edge appliance provides, using a communication interface, audio data that includes the conversation, to a communications system of a restaurant. The edge appliance provides, using the communication interface, a content of a cart associated with the order to a point-of-sale terminal of the restaurant. If the edge appliance determines, using the communication interface, that a microphone of the communication system is receiving audio input from an employee, the edge appliance automatically transitions the edge appliance from the active mode to an override mode, enabling the employee to receive a remainder of the order from the customer.
Multithreaded Speech-to-Text Processing
An apparatus includes a processor to: receive a request to perform speech-to-text conversion of a speech data set; perform pause detection to identify a set of likely sentence pauses and/or speaker diarization technique to identify a set of likely speaker changes; based the set of likely sentence pauses and/or the set of likely speaker changes, divide the speech data set into data segments representing speech segments; use an acoustic model with the data segments to derive sets of probabilities of speech sounds uttered; store the sets of probabilities in temporal order within a buffer queue; distribute the sets of probabilities from the buffer queue in temporal order among threads of a thread pool; and within each thread, and based on set(s) of probabilities, derive one candidate word and select either the candidate word or an alternate candidate word derived from a language model as the next word most likely spoken.
MULTI-DEVICE WAKEWORD DETECTION
A method for selecting a device for audio processing may involve receiving a first wakeword confidence metric from a first device that includes at least a first microphone and receiving a second wakeword confidence metric from a second device that includes at least a second microphone. The first and second wakeword confidence metrics may correspond to a first local maximum of a first plurality of wakeword confidence values determined by the first device and a second local maximum of a second plurality of wakeword confidence values determined by the second device. The method may involve comparing the first wakeword confidence metric and the second wakeword confidence metric and selecting a device for subsequent audio processing based, at least in part, on a comparison of the first wakeword confidence metric and the second wakeword confidence metric.
MULTI-DEVICE WAKEWORD DETECTION
A method for selecting a device for audio processing may involve receiving a first wakeword confidence metric from a first device that includes at least a first microphone and receiving a second wakeword confidence metric from a second device that includes at least a second microphone. The first and second wakeword confidence metrics may correspond to a first local maximum of a first plurality of wakeword confidence values determined by the first device and a second local maximum of a second plurality of wakeword confidence values determined by the second device. The method may involve comparing the first wakeword confidence metric and the second wakeword confidence metric and selecting a device for subsequent audio processing based, at least in part, on a comparison of the first wakeword confidence metric and the second wakeword confidence metric.
Detection of speech
A method of own voice detection is provided for a user of a device. A first signal is detected, representing air-conducted speech using a first microphone of the device. A second signal is detected, representing bone-conducted speech using a bone-conduction sensor of the device. The first signal is filtered to obtain a component of the first signal at a speech articulation rate, and the second signal is filtered to obtain a component of the second signal at the speech articulation rate. The component of the first signal at the speech articulation rate and the component of the second signal at the speech articulation rate are compared, and it is determined that the speech has not been generated by the user of the device, if a difference between the component of the first signal at the speech articulation rate and the component of the second signal at the speech articulation rate exceeds a threshold value.
Detection of speech
A method of own voice detection is provided for a user of a device. A first signal is detected, representing air-conducted speech using a first microphone of the device. A second signal is detected, representing bone-conducted speech using a bone-conduction sensor of the device. The first signal is filtered to obtain a component of the first signal at a speech articulation rate, and the second signal is filtered to obtain a component of the second signal at the speech articulation rate. The component of the first signal at the speech articulation rate and the component of the second signal at the speech articulation rate are compared, and it is determined that the speech has not been generated by the user of the device, if a difference between the component of the first signal at the speech articulation rate and the component of the second signal at the speech articulation rate exceeds a threshold value.
METHODS AND SYSTEMS FOR REDUCING LATENCY IN AUTOMATED ASSISTANT INTERACTIONS
Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.
Discrete Three-Dimensional Processor
A discrete three-dimensional (3-D) processor comprises communicatively coupled first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises at least a non-memory circuit and at least an off-die peripheral-circuit component of the 3D-M arrays. The first die does not comprise said off-die peripheral-circuit component. The non-memory circuit on the second die is not part of a memory.