G10L21/01

System and method for multifaceted singing analysis

A system for multifaceted singing analysis for retrieval of songs or music including singing voices having some relationship in latent semantics with a singing voice included in one particular song or music. A topic analyzing processor uses a topic model to analyze a plurality of vocal symbolic time series obtained for a plurality of musical audio signals. The topic analyzing processor generates a vocal topic distribution for each of the musical audio signals whereby the vocal topic distribution is composed of a plurality of vocal topics each indicating a relationship of one of the musical audio signals with the other musical audio signals. The topic analyzing processor generates a vocal symbol distribution for each of the vocal topics whereby the vocal symbol distribution indicates occurrence probabilities for the vocal symbols. A multifaceted singing analyzing processor performs analysis of singing voices included in musical audio signals, in the multifaceted viewpoint.

System and method for multifaceted singing analysis

A system for multifaceted singing analysis for retrieval of songs or music including singing voices having some relationship in latent semantics with a singing voice included in one particular song or music. A topic analyzing processor uses a topic model to analyze a plurality of vocal symbolic time series obtained for a plurality of musical audio signals. The topic analyzing processor generates a vocal topic distribution for each of the musical audio signals whereby the vocal topic distribution is composed of a plurality of vocal topics each indicating a relationship of one of the musical audio signals with the other musical audio signals. The topic analyzing processor generates a vocal symbol distribution for each of the vocal topics whereby the vocal symbol distribution indicates occurrence probabilities for the vocal symbols. A multifaceted singing analyzing processor performs analysis of singing voices included in musical audio signals, in the multifaceted viewpoint.

Coherent pitch and intensity modification of speech signals

A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Coherent pitch and intensity modification of speech signals

A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Pitch marking in speech processing

According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

Pitch marking in speech processing

According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

Vehicle audio transmission control

Methods and systems for controlling audio communications between occupants of a vehicle are provided. In accordance with one embodiment, a system includes an interface and a processor. The interface is configured to at least facilitate receiving a request for sound transmission from a first occupant inside a vehicle to a second occupant inside the vehicle. The processor is coupled to the interface, and is configured to at least facilitate identifying respective locations of the first occupant and the second occupant, and performing the sound transmission with an adjustment for a phase difference based at least in part on the respective locations of the first occupant and the second occupant.

Vehicle audio transmission control

Methods and systems for controlling audio communications between occupants of a vehicle are provided. In accordance with one embodiment, a system includes an interface and a processor. The interface is configured to at least facilitate receiving a request for sound transmission from a first occupant inside a vehicle to a second occupant inside the vehicle. The processor is coupled to the interface, and is configured to at least facilitate identifying respective locations of the first occupant and the second occupant, and performing the sound transmission with an adjustment for a phase difference based at least in part on the respective locations of the first occupant and the second occupant.

AUTOMATIC EMPHASIS OF SPOKEN WORDS
20170148464 · 2017-05-25 ·

Embodiments of the present invention provide systems, methods, and computer storage media directed towards automatic emphasis of spoken words. In one embodiment, a process may begin by identifying, within an audio recording, a word that is to be emphasized. Once identified, contextual and lexical information relating to the emphasized word can be extracted from the audio recording. This contextual and lexical information can be utilized in conjunction with a predictive model to determine a set of emphasis parameters for the identified word. These emphasis parameters can then be applied to the identified word to cause the word to be emphasized. Other embodiments may be described and/or claimed.

PITCH MARKING IN SPEECH PROCESSING
20170117001 · 2017-04-27 ·

According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.