Patent classifications
G10L21/0356
METHODS AND SYSTEMS FOR IMAGE AND VOICE PROCESSING
Systems and methods are disclosed configured to train an autoencoder using images that include faces, wherein the autoencoder comprises an input layer, an encoder configured to output a latent image from a corresponding input image, and a decoder configured to attempt to reconstruct the input image from the latent image. An image sequence of a face exhibiting a plurality of facial expressions and transitions between facial expressions is generated and accessed. Images of the plurality of facial expressions and transitions between facial expressions are captured from a plurality of different angles and using different lighting. An autoencoder is trained using source images that include the face with different facial expressions captured at different angles with different lighting, and using destination images that include a destination face. The trained autoencoder is used to generate an output where the likeness of the face in the destination images is swapped with the likeness of the source face, while preserving expressions of the destination face.
Methods and systems for image and voice processing
Systems and methods are disclosed configured to train an autoencoder using images that include faces, wherein the autoencoder comprises an input layer, an encoder configured to output a latent image from a corresponding input image, and a decoder configured to attempt to reconstruct the input image from the latent image. An image sequence of a face exhibiting a plurality of facial expressions and transitions between facial expressions is generated and accessed. Images of the plurality of facial expressions and transitions between facial expressions are captured from a plurality of different angles and using different lighting. An autoencoder is trained using source images that include the face with different facial expressions captured at different angles with different lighting, and using destination images that include a destination face. The trained autoencoder is used to generate an output where the likeness of the face in the destination images is swapped with the likeness of the source face, while preserving expressions of the destination face.
ELECTRONIC DEVICE WITH AUDIO ZOOM AND OPERATING METHOD THEREOF
An electronic device is provided. The electronic device includes a camera, a plurality of microphones, at least one processor electrically coupled with the camera and the plurality of microphones. The at least one processor may acquire a video signal, based on a designated zoom level via the camera, acquire a plurality of audio signals respectively via the plurality of microphones while acquiring the video signal, identify a first signal characteristic of a first audio signal acquired via a first microphone and a second signal characteristic of a second audio signal acquired via a second microphone among the plurality of microphones, derive a control parameter for signal processing for the first audio signal and the second audio signal, based on the designated zoom level, the first signal characteristic, and the second signal characteristic, and perform audio signal processing including beamforming using the first audio signal and the second audio signal, based on the derived control parameter. Various other embodiments are also possible.
METHOD, SYSTEM AND STORAGE MEDIUM FOR SIGNAL SEPARATION
Methods, systems and storage medium for separating a target signal from noise are disclosed. A method comprises providing a plurality of input signals, each of the plurality of input signals comprising the target signal; synchronizing the plurality of input signals; and separating the plurality of synchronized input signals into the target signal and the noise.
Contextual center-of-gravity for audio output in collaborative environments
Embodiments of systems and methods for using contextual center-of-gravity for outputting audio in collaborative environments are described. In some embodiments, an Information Handling System (IHS) may include: a processor; and a memory coupled to the processor, the memory having program instructions stored thereon that, upon execution by the processor, cause the IHS to: identify a first position of a first participant and a second position of a second participant during a collaboration session; calculate a Center-of-Gravity (CoG) based, at least in part, upon the first and second positions; and output audio during the collaboration session with a level determined based upon the CoG.
FACIAL RECOGNITION METHOD FOR VIDEO CONFERENCE AND SERVER USING THE METHOD
A facial recognition method for video conferencing requiring a reduced bandwidth and transmitting video and audio frames synchronously first determines whether a 3D body model of a first user at a local end has been currently retrieved or is otherwise retrievable from a historical database. Multiple audio frames of first user are collected and audio frequency at a specific range are filtered out. An envelope curve of the first audio frames and multiple attacking time periods and multiple releasing time periods of the envelope curve is calculated and correlated with lip movements of first user. Information packets of same and head-rotating and limb-swinging images of the first user are transmitted to a remote second user so that the 3D body model can simulate and show lip shapes and other movement of the first user.
FACIAL RECOGNITION METHOD FOR VIDEO CONFERENCE AND SERVER USING THE METHOD
A facial recognition method for video conferencing requiring a reduced bandwidth and transmitting video and audio frames synchronously first determines whether a 3D body model of a first user at a local end has been currently retrieved or is otherwise retrievable from a historical database. Multiple audio frames of first user are collected and audio frequency at a specific range are filtered out. An envelope curve of the first audio frames and multiple attacking time periods and multiple releasing time periods of the envelope curve is calculated and correlated with lip movements of first user. Information packets of same and head-rotating and limb-swinging images of the first user are transmitted to a remote second user so that the 3D body model can simulate and show lip shapes and other movement of the first user.
Facial recognition method for video conference and server using the method
A facial recognition method for video conferencing requiring a reduced bandwidth and transmitting video and audio frames synchronously first determines whether a 3D body model of a first user at a local end has been currently retrieved or is otherwise retrievable from a historical database. Multiple audio frames of first user are collected and audio frequency at a specific range are filtered out. An envelope curve of the first audio frames and multiple attacking time periods and multiple releasing time periods of the envelope curve is calculated and correlated with lip movements of first user. Information packets of same and head-rotating and limb-swinging images of the first user are transmitted to a remote second user so that the 3D body model can simulate and show lip shapes and other movement of the first user.
Facial recognition method for video conference and server using the method
A facial recognition method for video conferencing requiring a reduced bandwidth and transmitting video and audio frames synchronously first determines whether a 3D body model of a first user at a local end has been currently retrieved or is otherwise retrievable from a historical database. Multiple audio frames of first user are collected and audio frequency at a specific range are filtered out. An envelope curve of the first audio frames and multiple attacking time periods and multiple releasing time periods of the envelope curve is calculated and correlated with lip movements of first user. Information packets of same and head-rotating and limb-swinging images of the first user are transmitted to a remote second user so that the 3D body model can simulate and show lip shapes and other movement of the first user.
Distributed audio capture and mixing controlling
Apparatus including a processor configured to: receive an audio signal from a close microphone, wherein the audio signal is input in a spatial audio mixing, and the close microphone is associated with a first sound source; receive a beam-formed audio signal from a microphone array, wherein the beam-formed audio signal is a result of forming a beam of the microphone array directed from the microphone array towards the close microphone so as to enhance the audio signal; determine a time duration where no further sound source is active within the sound scene the first sound source; and determine a time difference, during the time duration, between the audio signal and the beam-formed audio signal to enable alignment of the audio signal and the beam-formed audio signal.