Patent classifications
H04R3/005
FACE DETECTION GUIDED SOUND SOURCE LOCALIZATION PAN ANGLE POST PROCESSING FOR SMART CAMERA TALKER TRACKING AND FRAMING
A videoconferencing system includes a camera acquiring image data and a microphone array acquiring audio data. Image data is used in conjunction with sound source localization (SSL) data to locate a talker depicted in the image data. SSL processes the audio data and determines SSL pan angle values indicative of an estimated direction of a sound. Columns of pixels in an image are associated with bins. A bin count is incremented for each SSL pan angle value of the audio data that falls within a given bin. A bounding box in the image data is determined that encompasses a face depicted in the image data. A range of pixels is determined for the bounding box, such as extending from a leftmost column to a rightmost column. The bin with the highest bin count that also overlaps a range of pixels for a bounding box is deemed to contain the talker.
Distributed audio processing system for processing audio signals from multiple sources
A distributed audio processing system is disclosed, for providing users with the capability of producing a personalized audio mix of a plurality of signals from a plurality of audio sources. The system includes a wireless transmitter for each audio source and, for each user, a wireless receiver. The receiver comprises a programmable audio signal processor configured to process and mix a plurality of audio tracks received via a radio broadcast of a multi-track audio signal comprising the audio signals from the plurality of sources, said processing and mixing being programmable via received commands, instructions and/or parameters. The transmitters are configured to process the audio signals received from their respective sources, according to received commands, instructions and/or parameters. According to an embodiment, a user may provide commands, instructions and/or parameters to any of the receivers and/or transmitters of the system.
BEAM GENERATOR, BEAM GENERATING METHOD, AND CHIP
A beam generator, a beam generating method, and a chip are provided. The beam generator comprises a first channel, a second channel, and a signal merging module; the first channel comprises a first-channel filter, the first-channel filter is used to filter an input signal to obtain a first filtered signal; the first filtered signal comprises a desired signal; the second channel comprises: a second-channel blocking module, used to block the desired signal in the input signal to obtain a blocked signal; a compensation filter, connected to the second-channel blocking module for compensating for the blocked signal to obtain a second filtered signal; and an adaptive filter connected to the compensation filter for adaptively filtering the second filtered signal to obtain a third filtered signal; the signal merging module is for merging the first filtered signal and the third filtered signal to obtain an output signal.
POSITION LOCATING SYSTEM, MARINE VESSEL, AND TRAILER FOR MARINE VESSEL
A position locating system that locates relative position information between a marine vessel and a trailer includes a wave signal generator located on a first object that is one of the marine vessel and the trailer to emit wave signals from at least three different positions having known relative positional relationships with each other, a wave signal receiver located a second object that is the other of the marine vessel and the trailer to receive the wave signal emitted from each of the positions of the wave signal generator, and a position locator configured or programmed to locate relative position information between the marine vessel and the trailer that includes at least a direction of the second object as viewed from the first object based on the wave signal from each of the positions received by the wave signal receiver.
ELECTRONIC DEVICE AND METHOD FOR PROCESSING SPEECH BY CLASSIFYING SPEECH TARGET
Various embodiments of the disclosure provide a method and a device which includes multiple cameras arranged at different positions, multiple microphones arranged at different positions, a memory, and a processor operatively connected to at least one of the multiple cameras, the multiple microphones, and the memory, wherein the processor is configured to: determine, using at least one of the multiple cameras, whether at least one of a user wearing the electronic device or a counterpart having a conversation with the user makes an utterance, configure directivity of at least one of the multiple microphones based on the determination, obtain an audio from at least one of the multiple microphones based on the configured directivity, obtain an image including a mouth shape of the user or the counterpart from at least one of the multiple cameras, and process speech of an utterance target in a different manner based on the obtained audio and the image.
METHOD FOR GENERATING A DIGITAL MODEL-BASED REPRESENTATION OF A VEHICLE
A method for generating a digital model-based representation of a vehicle. The method includes: receiving sensor data of a plurality of acoustic sensors of a vehicle, wherein the sensor data describes sounds of the vehicle and/or sounds of an environment of the vehicle, and wherein the sensor data has been recorded for a plurality of trips of the vehicle; evaluating the sensor data and the creation of relations between the received sounds of the vehicle and/or the environment and the particular sound-causing statuses of the vehicle and/or the environment; and storing in a model-based representation of the vehicle and/or the environment, the determined relations between the sounds of the vehicle and/or the environment in a model-based representation of the vehicle.
MICROPHONE ARRAY
A microphone array includes a four-channel serial peripheral interface, a core logic unit, a data receiving unit and a voice recognition unit. The four-channel serial peripheral interface includes a bit clock signal line, a frame clock signal line, and four data signal lines, the core logic unit includes a frequency divider module for converting the control signal and the clock signal to provide a bit clock signal and a frame clock signal. The data receiving unit includes a shift register and a buffer, the shift register is connected to four data signal lines and receives input data of the four digital microphones, and the buffer is connected to the shift register. The voice recognition unit is connected to the data receiving unit and receives microphone signals of the four digital microphones to perform voice recognition.
ELECTRONIC DEVICE
An electronic device includes a housing sidewall defining an opening and a display component, such as a display cover, disposed in the opening to form a gap between the housing sidewall and the display component. In at least one example, the cavity is defined by the sidewall and the display cover with the cavity in fluid communication with an external environment through the gap. In at least one example, an epoxy component at least partially defines the cavity and can be in direct contact with the housing sidewall.
PRIVACY-PRESERVING SOCIAL INTERACTION MEASUREMENT
Various systems, devices, and methods for social interaction measurement that preserve privacy are presented. An audio signal can be captured using a microphone. The audio signal can be processed using an audio-based machine learning model that is trained to detect the presence of speech. The audio signal can be discarded such that content of the audio signal is not stored after the audio signal is processed using the machine learning model. An indication of whether speech is present within the audio signal can be output based at least in part on processing the audio signal using the audio-based machine learning model.
Methods and system for adjusting level of tactile content when presenting audio content
An audio system presented herein includes a transducer array, a sensor array, and a controller. The transducer array presents audio content to a user. The controller controls the transducer array to adjust a level of tactile content imparted to the user via actuation of at least one transducer in the transducer array while presenting the audio content to the user. The audio system can be part of a headset.