G10L25/78

Privacy device for smart speakers
11711662 · 2023-07-25 ·

Systems, apparatuses, and methods are described for a privacy blocking device configured to prevent receipt, by a listening device, of video and/or audio data until a trigger occurs. A blocker may be configured to prevent receipt of video and/or audio data by one or more microphones and/or one or more cameras of a listening device. The blocker may use the one or more microphones, the one or more cameras, and/or one or more second microphones and/or one or more second cameras to monitor for a trigger. The blocker may process the data. Upon detecting the trigger, the blocker may transmit data to the listening device. For example, the blocker may transmit all or a part of a spoken phrase to the listening device.

System and method for conversation-based notification management

A method for dynamic notification management at a head mounted display (HMD) includes presenting, at the HMD, an augmented reality display, receiving, at a processor controlling the HMD, a notification from an application for display at the HMD, receiving, at the processor at a first time, first sensor data from one or more of a camera or a microphone, determining, based on the first sensor data, a first value of one or more factors associated with a probability that a user of the HMD is currently in a real-world conversation, determining an importance value of the received notification at the first time, and determining whether to display the notification from the application based on a comparison of the first value of the one or more factors associated relative to the importance value of the received notification.

Speech endpointing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.

Speech endpointing

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing are described. In one aspect, a method includes the action of accessing voice query log data that includes voice queries spoken by a particular user. The actions further include based on the voice query log data that includes voice queries spoken by a particular user, determining a pause threshold from the voice query log data that includes voice queries spoken by the particular user. The actions further include receiving, from the particular user, an utterance. The actions further include determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold. The actions further include based on determining that the particular user has stopped speaking for at least a period of time equal to the pause threshold, processing the utterance as a voice query.

SPEECH ENDPOINTING BASED ON WORD COMPARISONS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.

METHODS AND APPARATUS TO OPERATE A MOBILE CAMERA FOR LOW-POWER USAGE
20230237791 · 2023-07-27 ·

Disclosed examples include accessing sensor data; recognizing, by executing an instruction with programmable circuitry, a feature in the sensor data based on a convolutional neural network; and transitioning, by executing an instruction with the programmable circuitry, a mobile device between at least two of motion feature detection, audio feature detection, or camera feature detection after the feature is recognized in the sensor data, the mobile device to operate at a different level of power consumption after the transition than before the transition.

METHODS AND APPARATUS TO OPERATE A MOBILE CAMERA FOR LOW-POWER USAGE
20230237791 · 2023-07-27 ·

Disclosed examples include accessing sensor data; recognizing, by executing an instruction with programmable circuitry, a feature in the sensor data based on a convolutional neural network; and transitioning, by executing an instruction with the programmable circuitry, a mobile device between at least two of motion feature detection, audio feature detection, or camera feature detection after the feature is recognized in the sensor data, the mobile device to operate at a different level of power consumption after the transition than before the transition.

ATTENTION AWARE VIRTUAL ASSISTANT DISMISSAL
20230236676 · 2023-07-27 ·

Systems and processes for operating an intelligent automated assistant are provided. An example process includes initiating a virtual assistant session responsive to receiving user input. In accordance with initiating the virtual assistant session, the process includes determining, based on data obtained using one or more sensors of the electronic device, whether one or more criteria representing expressed user disinterest are satisfied. In accordance with determining that the one or more criteria representing expressed user disinterest are satisfied prior to a first time, the process includes automatically deactivating the virtual assistant session prior to the first time. The first time is defined by a setting of the electronic device. In accordance with determining that the one or more criteria representing expressed user disinterest are not satisfied prior to the first time, the process includes automatically deactivating the virtual assistant session at the first time.

ATTENTION AWARE VIRTUAL ASSISTANT DISMISSAL
20230236676 · 2023-07-27 ·

Systems and processes for operating an intelligent automated assistant are provided. An example process includes initiating a virtual assistant session responsive to receiving user input. In accordance with initiating the virtual assistant session, the process includes determining, based on data obtained using one or more sensors of the electronic device, whether one or more criteria representing expressed user disinterest are satisfied. In accordance with determining that the one or more criteria representing expressed user disinterest are satisfied prior to a first time, the process includes automatically deactivating the virtual assistant session prior to the first time. The first time is defined by a setting of the electronic device. In accordance with determining that the one or more criteria representing expressed user disinterest are not satisfied prior to the first time, the process includes automatically deactivating the virtual assistant session at the first time.

NETWORKED DEVICES, SYSTEMS, & METHODS FOR INTELLIGENTLY DEACTIVATING WAKE-WORD ENGINES

In one aspect, a playback deice is configured to identify in an audio stream, via a second wake-word engine, a false wake word for a first wake-word engine that is configured to receive as input sound data based on sound detected by a microphone. The first and second wake-word engines are configured according to different sensitivity levels for false positives of a particular wake word. Based on identifying the false wake word, the playback device is configured to (i) deactivate the first wake-word engine and (ii) cause at least one network microphone device to deactivate a wake-word engine for a particular amount of time. While the first wake-word engine is deactivated, the playback device is configured to cause at least one speaker to output audio based on the audio stream. After a predetermined amount of time has elapsed, the playback device is configured to reactivate the first wake-word engine.