Patent classifications
G10L25/54
VOICE IDENTIFICATION FOR OPTIMIZING VOICE SEARCH RESULTS
Systems and methods are provided for processing a voice input stream with interruptions and/or supplemental comments. Generally, a virtual voice assistant may receive an input stream with a first input comprising a voice query from a first voice and a second input comprising a secondary query from a second voice (e.g., an interruption or a supplement). The virtual assistant may determine that the second voice does not match the first voice, and then process the voice query to produce first results. Some embodiments may determine whether the secondary query is a supplement or an interruption and, e.g., choose to ignore an interruption or set aside a supplement if it may be used to help the search query. In some embodiments, results for the first query may be compared with results for the first query with a portion of the supplement.
Server for navigation, navigation system, and navigation method
Because in a server for navigation that causes a navigation device mounted in a moving object to provide guidance on a moving route and display map data on the basis of the map data and the position of the moving object, from a recognition result of a user's utterance and position information, evaluation information about a point or is acquired automatically, an evaluation about the point can be left automatically and safely without requiring the user's special operation. Further, by adding the evaluation information to map data and displaying this map data at the time of use, the navigation system enables the user to revisit a point through which the user desires to pass once again, bypass a point through which the user desires to never pass again, and so on without requiring the user's work of consciously memorizing the impression about each point and so on.
Server for navigation, navigation system, and navigation method
Because in a server for navigation that causes a navigation device mounted in a moving object to provide guidance on a moving route and display map data on the basis of the map data and the position of the moving object, from a recognition result of a user's utterance and position information, evaluation information about a point or is acquired automatically, an evaluation about the point can be left automatically and safely without requiring the user's special operation. Further, by adding the evaluation information to map data and displaying this map data at the time of use, the navigation system enables the user to revisit a point through which the user desires to pass once again, bypass a point through which the user desires to never pass again, and so on without requiring the user's work of consciously memorizing the impression about each point and so on.
METHODS AND APPARATUS TO PERFORM SIGNATURE MATCHING USING NOISE CANCELLATION MODELS TO ACHIEVE CONSENSUS
Examples are disclosed to perform signature matching using noise cancellation models to achieve consensus. Example apparatus disclosed herein include a signature matcher to compare a first stream of monitored media signatures to streams of reference signatures representative of corresponding reference media to determine a first signature match, and compare a second stream of monitored media signatures to the streams of reference signatures to determine a second signature match; a match selector to use at least one the first signature match or the second signature match to identify a first one of the reference media corresponding to the monitored media data; and a creditor interface to output identification data for the first one of the reference media identified with the at least one the first signature match or the second signature match, the identification data to be used to credit a media exposure corresponding to the monitored media.
Gathering information for members of call
Responsive to detecting a call from a first user into a call system, a user identifier associated with the first user by the call and user information associated with the user identifier can be added to respective user information stored in an e-meeting data structure associated with the call. Responsive to determining the call system recognizes the voice of the first user, the first user can be identified based on the voice of the first user. Responsive to determining that the user information associated with the user identifier contains information not related to the identified first user, the information not related to the identified first user can be removed from the e-meeting data structure. An e-meeting for the call can be generated, the e-meeting including as participants the first user and at least a second user.
Gathering information for members of call
Responsive to detecting a call from a first user into a call system, a user identifier associated with the first user by the call and user information associated with the user identifier can be added to respective user information stored in an e-meeting data structure associated with the call. Responsive to determining the call system recognizes the voice of the first user, the first user can be identified based on the voice of the first user. Responsive to determining that the user information associated with the user identifier contains information not related to the identified first user, the information not related to the identified first user can be removed from the e-meeting data structure. An e-meeting for the call can be generated, the e-meeting including as participants the first user and at least a second user.
Method and system for learning and using latent-space representations of audio signals for audio content-based retrieval
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
Method and system for learning and using latent-space representations of audio signals for audio content-based retrieval
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
Methods and apparatus to perform signature matching using noise cancellation models to achieve consensus
Examples are disclosed to perform signature matching using noise cancellation models to achieve consensus. Example apparatus disclosed herein include a signature matcher to compare a first stream of monitored media signatures to streams of reference signatures representative of corresponding reference media to determine a first signature match, and compare a second stream of monitored media signatures to the streams of reference signatures to determine a second signature match; a match selector to use at least one the first signature match or the second signature match to identify a first one of the reference media corresponding to the monitored media data; and a creditor interface to output identification data for the first one of the reference media identified with the at least one the first signature match or the second signature match, the identification data to be used to credit a media exposure corresponding to the monitored media.
PROCESSING AUDIO SIGNALS
The application describe a data processing system and associated methods for processing received speech data. The data processing system comprises: a classification unit configured to receive data derived from an audio signal and, based on the received data, to determine a classification state of an acoustic environment; wherein access to a subsequent processing unit is controlled based on the classification state of the acoustic environment. The classification state may be derived based on a pre-trained model, wherein the representation comprises a representation of the direct to reverberant ratio (DRR) of the audio signal.