Patent classifications
G06F2203/0381
Method and device for recognizing speech in vehicle
The present disclosure relates to a method and a device for recognizing speech in a vehicle. The method for recognizing the speech in the vehicle may include collecting one or more types of information, determining information to be linked with each other for speech recognition based on an information processing priority predefined corresponding to each type of the collected information, analyzing the determined information to perform the speech recognition for a signal input through a microphone, and extracting at least one of a wake up voice or a command voice through the speech recognition to control the vehicle. Therefore, the present disclosure has an advantage of more accurately performing the speech recognition by linking collected various information in the vehicle with each other.
Electronic device and control method thereof
An electronic device and method are disclosed. The electronic device includes a foldable display that is at least partially foldable, at least one processor, and at least one memory. The processor implements the method, including detecting, by at least one processor, that a foldable display of the electronic device changes from an unfolded state to a partially folded state, configuring a first region of the foldable display to accept touch inputs in the partially folded state, and configuring a second region of the foldable display to accept non-touch inputs in the partially folded state.
Content size adjustment
One embodiment provides a method, including: receiving, at an information handling device, an indication to display content at a first size on a display screen, wherein only a portion of the content is viewable at the first size on the display screen; detecting, at the information handling device, a resize gesture; and adjusting, responsive to the resize gesture, a size of the content from the first size to a second size, wherein more than the portion of the content is viewable at the second size on the display screen. Other aspects are described and claimed.
Wearable speech input-based vision to audio interpreter
An eyewear device with camera-based compensation that improves the user experience for user's having partial blindness or complete blindness. The camera-based compensation determines features, such as objects, and then converts the determined objects to audio that is indicative of the objects and that is perceptible to the eyewear user. The camera-based compensation may use a region-based convolutional neural network (RCNN) to generate a feature map including text that is indicative of objects in images captured by a camera. The feature map is then processed through a speech to audio algorithm featuring a natural language processor to generate audio indicative of the objects in the processed images.
Enhanced graphical user interface for voice communications
Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.
System for gaze interaction
The present invention provides improved methods and systems for assisting a user when interacting with a graphical user interface by combining gaze based input with gesture based user commands. The present invention provide systems, devices and method that enable a user of a computer system without a traditional touch-screen to interact with graphical user interfaces in a touch-screen like manner using a combination of gaze based input and gesture based user commands. Furthermore, the present invention offers a solution for touch-screen like interaction using gaze input and gesture based input as a complement or an alternative to touch-screen interactions with a computer device having a touch-screen, such as for instance in situations where interaction with the regular touch-screen is cumbersome or ergonomically challenging. Further, the present invention provides systems, devices and methods for combined gaze and gesture based interaction with graphical user interfaces to achieve a touchscreen like environment in computer systems without a traditional touchscreen or in computer systems having a touchscreen arranged ergonomically unfavourable for the user or a touchscreen arranged such that it is more comfortable for the user to use gesture and gaze for the interaction than the touchscreen.
System and method for context driven voice interface in handheld wireless mobile devices
A sequence of context based search verb and search terms are selected via either touch or voice selection in a mobile wireless device and then a human articulated voice query is expanded using a culture and a world intelligence dictionary for conducting more efficient searches. Focus groups are used for populating prior query search databases for storage in the mobile wireless device that are organized by context based search terms in a mobile wireless device for efficient search.
SYSTEM AND METHOD FOR CONTINUOUS MULTIMODAL SPEECH AND GESTURE INTERACTION
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing multimodal input. A system configured to practice the method continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream. Then the system identifies a temporal window associated with a time of the speech event, and analyzes data from the gesture input stream within the temporal window to identify a gesture event. The system processes the speech event and the gesture event to produce a multimodal command. The gesture in the gesture input stream can be directed to a display, but is remote from the display. The system can analyze the data from the gesture input stream by calculating an average of gesture coordinates within the temporal window.
CONTEXT-AWARE AUGMENTED REALITY OBJECT COMMANDS
Embodiments are disclosed that relate to operating a user interface on an augmented reality computing device comprising a see-through display system. For example, one disclosed embodiment includes receiving a user input selecting an object in a field of view of the see-through display system, determining a first group of commands currently operable based on one or more of an identification of the selected object and a state of the object, and presenting the first group of commands to a user. The method may further include receiving a command from the first group of commands, changing the state of the selected object from a first state to a second state in response to the command, determining a second group of commands based on the second state, where the second group of commands is different than the first group of commands, and presenting the second group of commands to the user.
REDUCING THE NEED FOR MANUAL START/END-POINTING AND TRIGGER PHRASES
Systems and processes for selectively processing and responding to a spoken user input are provided. In one example, audio input containing a spoken user input can be received at a user device. The spoken user input can be identified from the audio input by identifying start and end-points of the spoken user input. It can be determined whether or not the spoken user input was intended for a virtual assistant based on contextual information. The determination can be made using a rule-based system or a probabilistic system. If it is determined that the spoken user input was intended for the virtual assistant, the spoken user input can be processed and an appropriate response can be generated. If it is instead determined that the spoken user input was not intended for the virtual assistant, the spoken user input can be ignored and/or no response can be generated.