METHOD, APPARATUS AND COMPUTER-READABLE MEDIA FOR VIRTUAL POSITIONING OF A REMOTE PARTICIPANT IN A SOUND SPACE
20170353811 · 2017-12-07
Inventors
Cpc classification
H04S2400/15
ELECTRICITY
H04M3/568
ELECTRICITY
G06F3/167
PHYSICS
H04S2400/01
ELECTRICITY
G06F3/165
PHYSICS
H04S2400/13
ELECTRICITY
H04S2420/01
ELECTRICITY
H04S7/302
ELECTRICITY
H04S2400/11
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04M3/56
ELECTRICITY
Abstract
Method, apparatus, and computer-readable media for virtual positioning one or more remote participants in a sound space includes structure and/or function whereby sound signals are received from a plurality of microphones in the shared space. One or more processors identifies one or more sound sources in the shared space, based on the received sound signals. The processors(s) map respective locations of the sound source(s) in the shared space, based on the received sound signals. The processor(s) receive from the remote participant(s) signals corresponding to respective position placements of the remote participant(s) in the shared space. The processor(s) mix the received sound signals to output corresponding sound signals for each participant based on relationships between (i) the respective locations of the sound source(s) and (ii) the respective position placements of the remote participant(s) in the shared space. The processor(s) then transmit the corresponding sound signals to the remote participant(s).
Claims
1. A method for simulating a presence of one or more remote participants in a shared space, comprising: receiving, from a plurality of microphones, sound signals of the shared space; identifying, by one or more processors, one or more sound sources in the shared space based on the received sound signals; mapping, by the one or more processors, respective locations of the one or more sound sources in the shared space, based on the received sound signals; receiving, by the one or more processors and from the one or more remote participants, signals corresponding to respective position placements of the one or more remote participants in the shared space; mixing, by the one or more processors, the received sound signals to output corresponding sound signals for each of the one or more remote participants based on relationships between (i) the respective locations of the one or more sound sources and (ii) the respective position placements of the one or more remote participants in the shared space; and transmitting, by the one or more processors, the corresponding sound signals to the one or more remote participants.
2. The method according to claim 1, wherein there are plural sound sources in the shared space, and wherein there are plural remote participants.
3. The method according to claim 1, wherein each remote participant can independently control his/her (i) sound field size and/or shape, (ii) facing position, and (iii) position, within the shared space.
4. The method according to claim 3, wherein each remote participant can independently control in real-time his/her (i) sound field size and/or shape, (ii) facing position, and (iii) position, within the shared space.
5. The method according to claim 4, wherein each remote participant can eliminate his/her reception of sound from a sound source from the sound field.
6. The method according to claim 1, further comprising displaying to the one or more remote participants a plurality of sound sources within the shared space.
7. The method according to claim 6, further comprising displaying to the one or more remote participants indicia of volumes of at least two sound sources within the shared space.
8. The method according to claim 1, wherein each remote participant can independently focus the plurality of microphones on a desired sound source in the shared space, and defocus the plurality of microphones from an undesired sound source in the shared space.
9. The method according to claim 1, further comprising using at least one bubble processor to, based on the received sound signals, define an array of virtual microphone bubbles in the shared space.
10. The method according to claim 1, further comprising tracking a moving sound source in the shared space.
11. The method according to claim 1, wherein the remote participant comprises a second shared space.
12. The method according to claim 1, wherein there are multiple sound fields in the shared space.
13. The method according to claim 1, wherein there are multiple sound fields for at least one remote participant in the shared space.
14. A sound mixing apparatus, comprising: an interface configured to receive, from a plurality of microphones, sound signals of a shared space; a network interface configured to receive, from one or more remote participants, respective position placements in the shared space; and one or more processors configured to: identify one or more sound sources in the shared space based on the received sound signals; map respective locations of the one or more sound sources in the shared space, based on the received sound signals; mix the received sound signals to output corresponding sound signals for each of the one or more remote participants based on relationships between (i) the respective locations of the one or more sound sources and (ii) the respective position placements of the one or more remote participants in the shared space; and transmit the corresponding sound signals to the one or more remote participants via the network interface.
15. The apparatus according to claim 14, wherein there are plural sound sources in the shared space, and wherein there are plural remote participants.
16. The apparatus according to claim 14, wherein each remote participant has at least one participant processor configured to independently control his/her (i) sound field size and/or shape, (ii) facing position, and (iii) position, within the shared space.
17. The apparatus according to claim 13, wherein each remote participant's at least one participant processor is configured to independently control in real-time his/her (i) sound field size and/or shape, (ii) facing position, and (iii) position, within the shared space.
18. The apparatus according to claim 14, wherein each remote participant's at least one participant processor is configured to eliminate his/her reception of sound from a sound source from the sound field.
19. The apparatus according to claim 14, further comprising one or more remote participant display configured to display to the one or more remote participant a plurality of sound sources within the shared space.
20. The apparatus according to claim 19, wherein the one or more remote participant display is configured to display to the one or more remote participant indicia of volumes of at least two sound sources within the shared space.
21. The apparatus according to claim 14, wherein each remote participant has at least one processor configured to independently focus the plurality of microphones on a desired sound source in the shared space, and defocus the plurality of microphones from an undesired sound source in the shared space.
22. The apparatus according to claim 14, further comprising at least one bubble processor configured to, based on the received sound signals, define an array of virtual microphone bubbles in the shared space.
23. The apparatus according to claim 14, wherein the one or more processors tracks a moving sound source in the shared space.
24. The apparatus according to claim 14, wherein the remote participant comprises a second shared space.
25. The apparatus according to claim 14, wherein the one or more processors provides multiple sound fields in the shared space.
26. The apparatus according to claim 14, wherein the one or more processors provides multiple sound fields for at least one remote participant in the shared space.
27. At least one computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors perform a method comprising: receiving, from a plurality of microphones, sound signals of a shared space; identifying one or more sound sources in the shared space, based on the received sound signals; mapping respective locations of the one or more sound sources in the shared space based on the received sound signals; receiving, from one or more remote participants, respective position placements in the shared space; mixing the received sound signals to output corresponding sound signals for each of the one or more remote participants based on relationships between the respective locations of the one or more sound sources and the respective position placements of the one or more remote participants in the shared space; and transmitting the corresponding sound signals to the one or more remote participants.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
DETAILED DESCRIPTION OF THE PRESENTLY PREFERRED EXEMPLARY EMBODIMENTS
[0031] The present invention is directed to systems and methods that enable groups of people, known as participants, to join together over a network, such as the Internet or similar electronic channel, in a remotely distributed, real-time fashion, employing personal computers, network workstations, or other similarly connected appliances or devices, without face-to-face contact, to engage in effective audio conference meetings that utilize large multi-user spaces (spaces) with distributed participants.
[0032] Advantageously, embodiments of the present systems and methods provide an ability to provide remote participants the capability to focus the in-multiuser-space microphone array to the desired speaking participant and/or sound sources. And the present invention may be applied to any one or more shared space(s) having multiple microphones for both focusing sound source pickup and simulating a local sound recipient for a remote listening participant.
[0033] It is important to establish good quality immersive and spatially accurate audio for conference or multi-person audio with a plurality of remote participants and in-space participants. The remote participants are usually constrained to the placement of the microphones in the multiuser space, which limits their ability to reduce unwanted sound sources, and are, as a result, not able to control the focus on the desired sound sources. In the present embodiments, it is desirable to give the remote participants the ability to manage (i) the desired microphone placement and (ii) focus direction to give an in-space presence that is optimized for desired individual sound source pickup, while reducing unwanted sound sources.
[0034] Implementation of the process is preferably on at least one field programmable gate array (FPGA) or, equivalently, it could be implemented on one or more application-specific integrated circuit (ASIC), or one or more Digital Signal Processor (DSP). On the FPGA is a processor core that can preferably do all the basic operations in parallel in a single clock cycle. Twelve copies of the processor core are preferably provided, one for each microphone to allow for sufficient processing capability. This system now can compute 60 operations in parallel and operate at a modest clock rate of 100 MHz. A small DSP processor for filtering and final array processing may also preferably be used. The processing functions (in the sound system, processors, and the remote participant processors) can be performed by any of the above and any suitable combination of Personal Computers, servers, cloud-based devices, etc.
[0035] The words computational device, computer and device are used interchangeably and can be construed to mean the same thing.
[0036] A “device” in this specification may include, but is not limited to, one or more of, or any combination of processing device(s) such as, a cell phone, a Personal Digital Assistant, a smart watch or other body-borne device (e.g., glasses, pendants, rings, etc.), a personal computer, a laptop, a pad, a cloud-access device, a white board, and/or any device capable of sending/receiving messages to/from a local area network or a wide area network (e.g., the Internet), such as devices embedded in cars, trucks, aircraft, household appliances (refrigerators, stoves, thermostats, lights, electrical control circuits, the Internet of Things, etc.).
[0037] An “engine” is preferably a program that performs a core function for other programs. An engine can be a central or focal program in an operating system, subsystem, or application program that coordinates the overall operation of other programs. It is also used to describe a special-purpose program containing an algorithm that can sometimes be changed. The best known usage is the term search engine which uses an algorithm to search an index of topics given a search argument. An engine is preferably designed so that its approach to searching an index, for example, can be changed to reflect new rules for finding and prioritizing matches in the index. In artificial intelligence, for another example, the program that uses rules of logic to derive output from a knowledge base is called an inference engine. A “module” may comprise one or more engines and/or one or more hardware modules, or any suitable combination of both.
[0038] As used herein, a “server” may comprise one or more processors, one or more Random Access Memories (RAM), one or more Read Only Memories (ROM), one or more user interfaces, such as display(s), keyboard(s), mouse/mice, etc. A server is preferably apparatus that provides functionality for other computer programs or devices, called “clients.” This architecture is called the client-server model, and a single overall computation is typically distributed across multiple processes or devices. Servers can provide various functionalities, often called “services”, such as sharing data or resources among multiple clients, or performing computation for a client. A single server can serve multiple clients, and a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, application servers, and chat servers. The servers discussed in this specification may include one or more of the above, sharing functionality as appropriate. Client-server systems are most frequently implemented by (and often identified with) the request-response model: a client sends a request to the server, which performs some action and sends a response back to the client, typically with a result or acknowledgement. Designating a computer as “server-class hardware” implies that it is specialized for running servers on it. This often implies that it is more powerful and reliable than standard personal computers, but alternatively, large computing clusters may be composed of many relatively simple, replaceable server components.
[0039] The servers and devices in this specification typically use the one or more processors to run one or more stored “computer programs” and/or non-transitory “computer-readable media” to cause the device and/or server(s) to perform the functions recited herein. The media may include Compact Discs, DVDs, ROM, RAM, solid-state memory, or any other storage device capable of storing the one or more computer programs.
[0040]
[0041] The Nureva sound system is preferably made up of the audio system 228 and the virtual position system 222. The Nureva sound system 200 preferably communicates the outbound and inbound signal traffic, which is made up of control and intelligence signals that include, but are not limited to, audio streams, positional and user position information, through the network 104 to remote locations 105. The Nureva sound system 200 preferably communicates with the remote locations 105 utilizing the main and back channel information with technologies—such as, but not limited to, Web Real-Time Communications (WebRTC) and/or Object Real-Time Communications (ORTC), which is used to communicate the high bandwidth real-time control and intelligence signals through an open framework for the web, that enables Real Time Communications in the browser, such as network, audio and video components used in voice and video chat applications.
[0042] A remote location can be made up of a plurality of remote participants 109, each using a PC 106 with video and audio capability running a Virtual Positioning Web application 401 connected to a stereo capable headset and microphone device 107. Although a headset is shown, any audio listening device is suitable at the remote user end, such as but not limited to, audio speakers and ultrasonic speakers. Through the Virtual Positioning Web application 401, the remote participants 105 are able to adjust, utilizing the control signals via WebRTC, the audio and position in space 101. Parameters—such as, but not limited to, position, direction and sound field size—are processed through the Nureva sound system 200 and sent back through the WebRTC Channel 104 to the remote participants 105.
[0043]
[0044]
[0045] The processing gain for each virtual microphone position 1001 (
[0046] The sound position unit 204 functionality can determine the sound source positions utilizing the 3D virtual bubble microphone matrix processing gain values 226, which are passed from the bubble processor 207. The processing gains for each microphone are examined to find the virtual microphone that has the largest processing gain and that is assigned to one of the four tracked sound source positions. That sound source will continue to be tracked (stationary or moving sound source(s), tracked in time and/or position) as long as there is a significant peak in the processing gain in the vicinity of that sound source. It will continue to be tracked until either it is lost for a given timeout period or four more recent strong signals are found. The sound position unit 204 sends position packets 215 through the network 104 at a rate of approximately 10 packets per second utilizing a technology—such as, but no limited to, WebRTC—to communicate the sound source positions to the Remote participant 109 web application, which can be used to display sound source location in a calibrated virtual space representing the multi-user space. The sound position unit 204 also sends the spatial positions and activity levels of the sound sources 219 of the four tracked positions to the microphone mixer 225 and sound mixer 205.
[0047] The raw microphones inputs from the microphone array 224 and the tracked sound positions 219 go into a microphone mixer 225, which combines the raw microphone 224 inputs to produce a mono sound channel 227 that is focused on each of the tracked sound sources. The user position unit 206 receives network packets 217 from the remote participants that indicate where each user wishes to be in the space, the direction that they wish to be facing, and the size of the sound field that they desire (for example the user may position themselves 3.2 m east of the west wall and 2.3 m north of the south wall, facing in the compass direction 40 degrees, and listening to all signals within a 2 m radius). The user position unit 206 stores the positions and passes the information signals 220 to the sound mixer 205 and additional signals 221 to the output mixer 208.
[0048] The sound mixer 205 creates a unique stereo sound output 216 for each of the participants 109. For each participant, it determines which of the sound sources are active and within the participant's desired sound field. It determines the angle of each sound source from the participant's virtual location and orientation and mixes a stereo signal for each (using known methods of different delays, gains, and filters on the left and right channels) so that the sound is presented to the remote participant 109 as if they are in the space at the specified virtual position. For example, if the sound source is to the left of the virtual position of the participant, the system would send a signal with more delay and attenuation in the right channel of the stereo signal. If there are more than one active sound sources within the participant's sound field, then they are added together. If there are no active sound sources within the sound field, then only the ambient space noise is sent. To accomplish this, the sound mixer 205 requires the sound position signals 219 from the sound position unit 204 and an ambient noise signal 223 from the audio processor 212 and the mono sound channels 227 from the microphone processer 225.
[0049] The voice recognition 203 functionality utilizes sound mapping and identification techniques to assign an ID to each sound source which can be a person or object, as shown in
[0050] The output mixer 208 receives the user position signals 221 and the remote participants 109 audio output signal 218 and mixes the correct appropriate sound location to output the signal in the multi-user space 101 utilizing the correct speaker 210 within the speaker system 209. The remote participants 109 voice location will be an accurate spatial representation of where they are configured as a virtual participant in the multi-user space 101.
[0051] The internet network 104 preferably transmits bi-directional information on a per-remote participant 109 basis for left audio channel (L) & right audio channel (R) 216, Voice recognition information (VR; e.g., name, volume, picture link) 214, users position parameters (UP; e.g., x,y, direction) 217, sound (SP) position parameters (e.g., x,y) 215, and output mixer (OM) 218 audio information (e.g., sound samples). It should be noted that the number of signal channels is dynamic, based on the number of remote participants 109 signed into the system and is not limited to the number shown in the example.
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060] Although this invention has been illustrated in a conference multi-user space scenario, the principles and application are directly applicable to any sound space environment—such as, but not limited to, passenger cabins, control rooms and spaces, lecture halls, class spaces, meeting spaces, and/or any configuration which allows for a space that is suitable to configure with a Nureva sound system to enable remote participation and control of the audio listening experience.
[0061] The individual components shown in outline or designated by blocks in the attached Drawings are all well-known in the electronic processing arts, and their specific construction and operation are not critical to the operation or best mode for carrying out the invention.
[0062] While the present invention has been described with respect to what is presently considered to be the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.