VIDEO CONFERENCE SERVER CAPABLE OF PROVIDING VIDEO CONFERENCE BY USING PLURALITY OF TERMINALS FOR VIDEO CONFERENCE, AND METHOD FOR REMOVING AUDIO ECHO THEREFOR
20210218932 ยท 2021-07-15
Assignee
Inventors
Cpc classification
H04M3/568
ELECTRICITY
H04M3/002
ELECTRICITY
H04N7/157
ELECTRICITY
International classification
Abstract
Provided are a videoconferencing server capable of providing multiscreen videoconferencing by using multiple videoconferencing terminals, and a method therefor. The videoconferencing server of the present disclosure can be implemented in such a manner that multiple conventional videoconferencing terminals (physical terminals) having one or two displays are logically grouped to operate as a logical terminal which operates as if it were one videoconferencing point. Through distribution of videos provided to multiple physical terminals constituting a logical terminal, the videoconferencing server can perform processing as if the logical terminal supported a multiscreen, and the videoconferencing server provides an echo cancellation function in the logical terminal.
Claims
1. A videoconferencing service provision method of a videoconferencing server, the method comprising: a registration step where multiple physical terminals are registered as a first logical terminal so that the multiple physical terminals operate like one videoconferencing point, and one of the multiple physical terminals is registered as an output-dedicated physical terminal; a call connection step where videoconferencing between multiple videoconferencing points is connected, and with respect to the first logical terminal, individual connections to the multiple physical terminals constituting the first logical terminal are provided; a source audio reception step where source audio signals provided by the multiple videoconferencing points are received, and with respect to the first logical terminal, the source audio signals are received from the respective multiple physical terminals; an audio processing step where from entire source audio received at the source audio reception step, the audio signals provided by the other videoconferencing points are mixed into an output audio signal to be provided to the first logical terminal; and an audio output step where the output audio signal is transmitted to the output-dedicated physical terminal among the multiple physical terminals belonging to the first logical terminal whereby the first logical terminal operates as one virtual videoconferencing point.
2. The method of claim 1, further comprising: removing, by using the output audio signal, an echo from the source audio signal provided by the physical terminal other than the output-dedicated physical terminal among the source audio signals received from the first logical terminal.
3. The method of claim 1, wherein at the audio output step, the output audio signal is transmitted to the physical terminal other than the output-dedicated physical terminal among the multiple physical terminals belonging to the first logical terminal so that the output audio signal is used for echo cancellation, and thus the physical terminal other than the output-dedicated physical terminal does not output the output audio signal to a speaker and uses the output audio signal as a reference audio signal for echo cancellation.
4. The method of claim 1, wherein at the audio processing step, when the other videoconferencing points are a second logical terminal including multiple physical terminals, one audio signal selected among audio signals provided by the multiple physical terminals belonging to the second logical terminal is mixed into the output audio signal to be provided to the first logical terminal.
5. The method of claim 1, further comprising: a source video reception step where source videos provided by the multiple videoconferencing points are received, and with respect to the first logical terminal, the source videos are received from the respective multiple physical terminals; and a multiscreen video provision step where among all the source videos received at the source video reception step, the videos provided by the other videoconferencing points are distributed to the multiple physical terminals of the first logical terminal, whereby the first logical terminal operates as the one virtual videoconferencing point.
6. The method of claim 1, wherein the call connection step comprises: receiving a call connection request message from a calling party point; inquiring, while a calling party and a called party are connected in response to the receiving of the call connection request message, whether the calling party or the called party is the first logical terminal; creating, when the calling party is the physical terminal of the first logical terminal as a result of the inquiring, individual connections to the other physical terminals of the first logical terminal; and creating, when the called party requested for call connection is a physical terminal of a second logical terminal as the result of the inquiring, individual connections to the other physical terminals of the second logical terminal.
7. A videoconferencing server capable of providing a videoconferencing service, the server comprising: a terminal registration unit registering multiple physical terminals as a first logical terminal that operates like one videoconferencing point, and registering one of the multiple physical terminals as an output-dedicated physical terminal; a teleconversation connection unit connecting videoconferencing between multiple videoconferencing points including the first logical terminal wherein with respect to the first logical terminal, individual connections to the multiple physical terminals constituting the first logical terminal are provided, and receiving source audio signals from the multiple videoconferencing points wherein with respect to the first logical terminal, the source audio signals are received from the respective multiple physical terminals; and an audio processing unit mixing the audio provided by the other videoconferencing points from entire source audio received, into an output audio signal to be provided to the first logical terminal, and transmitting the output audio signal to the output-dedicated physical terminal among the multiple physical terminals belonging to the first logical terminal.
8. The server of claim 7, further comprising: an echo processing unit removing, by using the output audio signal, an echo from the source audio signal provided by the physical terminal other than the output-dedicated physical terminal among the source audio signals received from the first logical terminal.
9. The server of claim 7, wherein the audio processing unit transmits the output audio signal to the physical terminal other than the output-dedicated physical terminal among the multiple physical terminals belonging to the first logical terminal so that the output audio signal is used for echo cancellation, and thus the physical terminal other than the output-dedicated physical terminal does not output the output audio signal to a speaker and uses the output audio signal as a reference audio signal for echo cancellation.
10. The server of claim 7, wherein the audio processing unit mixes, when the other videoconferencing points are a second logical terminal including multiple physical terminals, one audio signal selected among audio signals provided by the multiple physical terminals belonging to the second logical terminal into the output audio signal to be provided to the first logical terminal.
11. The server of claim 7, wherein the teleconversation connection unit receives source videos from the multiple videoconferencing points, and receives, with respect to the first logical terminal, the source videos from the respective multiple physical terminals, and the server further comprises: a video processing unit distributing the videos provided by the other videoconferencing points among all the source videos received by the teleconversation connection unit to the multiple physical terminals of the first logical terminal.
12. The server of claim 7, wherein the teleconversation connection unit is configured to inquire, while a calling party and a called party are connected in response to a call connection request message from a calling party point, whether the calling party or the called party is the first logical terminal, to create, when the calling party is the physical terminal of the first logical terminal as a result of the inquiring, individual connections to the other physical terminals of the first logical terminal, and to create, when the called party requested for call connection is a physical terminal of a second logical terminal as the result of the inquiring, individual connections to the other physical terminals of the second logical terminal.
Description
DESCRIPTION OF DRAWINGS
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
BEST MODE
[0050] Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.
[0051] Referring to
[0052] The connection network 30 between the server 110 and the videoconferencing terminals 11, 13, 15, 17, and 19 is an IP network, and may include a heterogeneous network connected via a gateway or may be connected with the heterogeneous network. For example, a wireless telephone using a mobile communication network may be the videoconferencing terminal of the present disclosure. In this case, the network 30 includes the mobile communication network where connection via a gateway is made to process an IP packet.
[0053] The server 110 controls the videoconferencing system 100 of the present disclosure, generally. In addition to functions of a conventional general server for processing videoconferencing, the server 100 includes a terminal registration unit 111, a teleconversation connection unit 113, a video processing unit 115, an audio processing unit 117, and an echo processing unit 119.
[0054] The terminal registration unit 111 performs registration, setting, management, and the like of a physical terminal and a logical terminal, which will be described below. The teleconversation connection unit 113 controls videoconferencing call connection of the present disclosure. When the videoconferencing call is connected, the video processing unit 115 processes (mixing, decoding, encoding, and the like) the videos provided between the physical terminals and/or logical terminals, thereby implementing a multiscreen, similarly to telepresence.
[0055] The audio processing unit 117, which is the feature of the present disclosure, controls audio processing in the logical terminal. The audio processing unit 119 removes an echo from the auto signal transmitted from the logical terminal.
[0056] Operation of the terminal registration unit 111, the teleconversation connection unit 113, the video processing unit 115, the audio processing unit 117, and the echo processing unit 119 will be described again below.
[0057] Logical Terminal
[0058] All the videoconferencing terminals 11, 13, 15, 17, and 19 included in the videoconferencing system 100 support the standard protocol related to videoconferencing. Each of the videoconferencing terminals 11, 13, 15, 17, and 19 is not a terminal capable of providing the telepresence service described in Background Art, but is a videoconferencing terminal to which one display device is connected or to which two display devices are connected for document conferencing. Examples of the standard protocol include H.323, Session Initiation Protocol (SIP), and the like. Naturally, among the videoconferencing terminals 11, 13, 15, 17, and 19, the terminal supporting document conferencing supports H.239 and Binary Floor Control Protocol (BFCP).
[0059] For example, in the case where the SIP session is created between the server 110 and the videoconferencing terminals 11, 13, 15, 17, and 19 according to the SIP protocol, a video signal or audio signal described below is transmitted in the form of an RTP packet.
[0060] In addition, the videoconferencing terminals 11, 13, 15, 17, and 19 are conventional videoconferencing terminals that have respective video/voice codecs, and have microphones 11-1, 13-1, 15-1, 17-1, and 19-1 converting the voices of the talkers into audio signals and speakers 11-2, 13-2, 15-2, 17-2, and 19-2 for audio output, respectively.
[0061] The videoconferencing terminals connected to the videoconferencing system of the present disclosure may constitute a logical terminal. The logical terminal is a logical combination of multiple videoconferencing terminals as if there was one videoconferencing terminal. The logical terminal may be composed of two or more videoconferencing terminals, but no direct connection is provided between the multiple videoconferencing terminals constituting the logical terminal. In other words, direct connection between the multiple videoconferencing terminals constituting the logical terminal is not required in constituting the logical terminal.
[0062] Hereinafter, in order to distinguish between the logical terminal and the conventional general videoconferencing terminal, the conventional general videoconferencing terminal is referred to as a physical terminal. In other words, the logical terminal is merely a logical combination of multiple physical terminals for videoconferencing.
[0063] Each conventional videoconferencing terminal operates as one videoconferencing point. However, in the present disclosure, all of the multiple videoconferencing terminals belonging to the logical terminal operate like one terminal, so all of them operate as one videoconferencing point. In another aspect, the logical terminal is one videoconferencing point that has as many display devices as there are display devices which are individually owned by the multiple physical terminals, namely, the constituent members of the logical terminal. When necessary, the logical terminal designates one of the multiple constituent terminals as a representative terminal. No matter how many physical terminals the logical terminal includes, the logical terminal is treated as a single videoconferencing point in videoconferencing.
[0064] For example,
[0065] The logical terminal is a logical component managed by the server 110 and the standard protocol between the server 110 and the terminal supports only 1:1 connection, and thus the connection between the server 110 and the logical terminal means that the multiple physical terminals constituting the logical terminal are individually connected to the server 110 according to the standard protocol. For example, according to the SIP protocol,
[0066] According to the present disclosure, the server 110 of the videoconferencing system supports the following connections.
[0067] (1) Videoconferencing in Which One Physical Terminal and One Logical Terminal Are Connected
[0068] For example, this relates to a case in which the fifth physical terminal 19 in
[0069] (2) Videoconferencing in Which a Single Logical Terminal Calls One Physical Terminal
[0070] For example, this relates to a case in which the user causes the first physical terminal 11 that is the representative terminal of the first logical terminal 130 to call the fifth physical terminal 19. The server 110 simultaneously or sequentially calls the second physical terminal 13 that is the other physical terminal of the first logical terminal 130 and the fifth physical terminal 19 that is a called party, for connection.
[0071] (3) Videotelephony in Which One Logical Terminal Calls Another Logical Terminal
[0072] For example, this relates to a case where the first logical terminal 130 in
[0073] (4) Multipoint Videoconferencing
[0074] The videoconferencing system of the present disclosure supports, as shown in
[0075] <Multiscreen Support>
[0076] The videoconferencing system 100 of the present disclosure may provide a multiscreen, similarly to telepresence, using a logical terminal structure. Although the logical terminal is a virtual terminal, the logical terminal is processed as having as many screens as all the multiple physical terminals, which are the constituent members, are able to provide.
[0077] The server 110 reconstructs the multi-videoconferencing video using a method of matching the number (m1, or the number of videos that the server needs to provide to each logical terminal) of display devices included in each logical terminal with the total number (M, the number of source videos) of physical terminals included in the points that are connected to videoconferencing, thereby re-editing m3 videos into m1 videos with respect to the logical terminal and providing the resulting videos. Herein, m3, as the number of source videos that the logical terminal needs to display for videoconferencing, is shown in Equation 1 below.
m3=Mm2 [Equation 1]
[0078] Herein, m2 is the number of physical terminals constituting each logical terminal.
[0079] In the meantime, each physical terminal may make a setting or a request in such a manner as to display its video (source video). In this case, when with respect to each logical terminal, the m3 videos are re-edited into the m1 videos and the resulting videos are distributed to each of the physical terminals constituting the logical terminal, the source videos provided by the corresponding physical terminals are also mixed and provided.
[0080] Unless m3 and m1 are the same value, the server 110 needs to perform reprocessing in which the source videos are mixed. However, according to an embodiment, with respect to the logical terminal, the m3 videos may not be re-edited into the m1 videos, and the m3 videos may be sequentially provided at regular time intervals. For example, in the case of m3=3 and m1=1, three source videos are not re-edited through mixing, and the like, and the three source videos may be sequentially provided. In this case, relay-type videoconferencing processing is possible, which was impossible in the conventional standard videoconferencing terminal.
[0081] In the meantime, regardless of the configuration of the logical terminal, any physical terminal participating in the videoconferencing of the present disclosure may provide two source videos when a presenter token is obtained. For example, as a result of obtaining the presenter token, the first physical terminal 11 may provide a main video with a video for document conferencing to the server 110. In this case, M is the sum of one and the total number of physical terminals included in the points connected to videoconferencing.
[0082]
[0083] The first logical terminal 130 has two display devices that the first physical terminal 11 and the second physical terminal 13 have, which refers to m1=2 and m2=2. In this multi-videoconferencing with three points, physical terminals connected to the first logical terminal 130 for videoconferencing are the third to fifth physical terminals 15, 17, and 19, which are three (m3, 3=52) in number, so three source videos that the three physical terminals provide need to be re-edited into two videos for display. Separately, it may be determined which source video is displayed on which screen. In
[0084] The third physical terminal 15 has two display devices and the fourth physical terminal 17 has one display device, so the second logical terminal 150 has the three display devices, which refers to m1=3 and m2=2. Therefore, with respect to the second logical terminal 150, the server 110 causes the source videos that the three physical terminals provide to be displayed as three videos. Since the number of source videos to be displayed and the number of screens are the same, one for each is displayed. Separately, it may be determined which source video is displayed on which screen. In
[0085] Equation 1 is applied to the fifth physical terminal 19 in the same manner. The fifth physical terminal 19 is related to m1=2 and m2=1, so the server 110 re-edits four source videos (m3=51) into two (m1) videos and provides the result to the fifth physical terminal 19. The fifth physical terminal 19 needs to display, on the two display devices, the source videos that a total of four physical terminals 11, 13, 15, and 17 of the first logical terminal 130 and the second logical terminal 150 provide, so the four source videos are appropriately edited to be displayed as two videos.
[0086] When the third physical terminal 15 of the second logical terminal 150 obtains the presenter token, two source videos are provided. In this case, the second logical terminal 150 provides a total of three source videos, and M is 6. The number of source videos to be processed by the server 110 for transmission to the first logical terminal 130, the second logical terminal 150, and the fifth physical terminal 19 is greater than that of the description above by one.
[0087] Videoconferencing Service (Video Processing) For Logical Terminal
[0088] Hereinafter, a multiscreen videoconferencing service provision method of the server 110 will be described with reference to
[0089] <A Registration Step of the Logical Terminal: S301>
[0090] The terminal registration unit 111 of the server 110 executes and manages registration of the physical terminal and the logical terminal. Registration of the physical terminal needs to precede registration of the logical terminal or simultaneously registration needs to be performed. For registration of each physical terminal, an IP address of each physical terminal is essential.
[0091] The process of registering the physical terminal may be performed by various methods known in the related art. For example, the registration of the physical terminal may be executed using a location registration process through a register command on the SIP protocol. Herein, a telephone number of the physical terminal, and the like may be included. When the location of the physical terminal is registered, the server 110 determines whether the physical terminal is currently turned on and is in operation.
[0092] In the registration of the logical terminal, the physical terminals included in the logical terminal are designated, and the number of the display devices connected to each physical terminal is registered. According to an embodiment, the arrangement (or relative positions) between the display devices included in the logical terminal, a video mixing method (including a relay method) or a layout of the mixed video according to the number (m3) of source videos, or the like may be set. For example, the terminal registration unit 111 receives configuration information for configuring the first physical terminal 11 and the second physical terminal 13 as the first logical terminal 130, for registration and management. In the registration of the logical terminal, a web page that the terminal registration unit 111 provides may be used, or a separate access terminal may be used.
[0093] In addition, with respect to the logical terminal, one of the physical terminals constituting the logical terminal is registered as a physical terminal for outputting an audio signal (hereinafter, referred to as an output-dedicated physical terminal) described below.
[0094] <An Outgoing Call-Connection Step For Videoconferencing: S303>
[0095] Videoconferencing call establishment between videoconferencing points is initiated as the teleconversation connection unit 113 of the server 110 receives a call connection request from one point. In the case of the SIP protocol, the teleconversation connection unit 113 receives an SIP signaling message which is INVITE. In the example in
[0096] <Inquiring Whether Caller and/or Receiver is Logical Terminal: S305>
[0097] The teleconversation connection unit 113 of the server 110 inquires of the terminal registration unit 111 whether the called-party telephone number is one of the telephone numbers (or IP addresses) of the respective physical terminals constituting the logical terminal. Similarly, the teleconversation connection unit 113 of the server 110 inquires of the terminal registration unit 111 whether the calling party has one of the telephone numbers (or IP addresses) of the respective physical terminals constituting the logical terminal. Through this, the teleconversation connection unit 113 determines whether the call connection is connection to the logical terminal.
[0098] According to an embodiment, when the called party is the physical terminal of the logical terminal, the teleconversation connection unit 113 additionally identifies whether the physical terminal is the representative terminal of the logical terminal. When the physical terminal is not the called-party representative terminal, the called party is not processed as the logical terminal. In addition, in the case of the calling party, it is additionally identified whether the calling party is the representative terminal of the logical terminal. When being not the calling-party representative terminal, the calling party is not processed as the logical terminal.
[0099] Videoconferencing Connection: S307 and S309>
[0100] When the called-party telephone number is the logical terminal's number, the teleconversation connection unit 113 performs a procedure for creating SIP sessions to all the physical terminals belonging to the called-party logical terminal. In the example in
[0101] In the example in
[0102] All the physical terminals of the called party receiving the INVITE and/or the calling party perform negotiation in which a video, a voice codec, and the like are selected through Session Description Protocol (SDP) information. When the negotiation is successfully completed, the actual session is established and the call is connected.
[0103] <A Step of Receiving the Source Video From Each Single Physical Terminal: S311>
[0104] As described above, since the teleconversation connection of the logical terminal is actually the connection to the individual physical terminals constituting the logical terminal, multiple sessions are established. The physical terminals constituting the logical terminal individually create the source videos and transmit the same to the server 110.
[0105] Therefore, in the case of
[0106] <Reprocessing of the Source Video by the Server: S313>
[0107] In order to render the source videos received from the physical terminals into a video for each point, the video processing unit 115 of the server 110 decodes the source videos, mixes the decoded videos, and encodes the resulting videos. In other words, the video processing unit 115 may re-edit m3 videos into m1 videos with respect to each logical terminal.
[0108] The video processing unit 115 performs mixing on the source videos according to a layout preset for each logical terminal or each physical terminal or according to a layout requested by each terminal.
[0109] As described above, without video processing by the video processing unit 115, the teleconversation connection unit 113 may provide the source videos sequentially at preset time intervals so that the source videos are displayed in relays. In this case, transmission takes place as it is without mixing and the like. When matching with the video codec of the terminal is necessary, change of the video format or transcoding is sufficient therefor.
[0110] <Transmitting of the Encoded Video Data to Each Physical Terminal: S315>
[0111] The teleconversation connection unit 113 provides the videos that the video processing unit 115 processes for the respective physical terminals 11, 13, 15, 17, and 19, to the respective physical terminals 11, 13, 15, 17, and 19 that participate in the videoconferencing. As a result, each point participating in the videoconferencing may receive a service similar to telepresence which uses a multiscreen.
[0112] By the above-described method, the multiscreen for videoconferencing of the videoconferencing system 100 of the present disclosure is processed.
[0113] (Embodiment) Another Method of Step S305
[0114] When registering the logical terminal, the terminal registration unit 111 generates a virtual telephone number for the logical terminal to register the same. In this case, at step S305, only when the called-party telephone number is the virtual telephone number of the logical terminal, the called party is processed as the logical terminal.
[0115] Provision of the Videoconferencing Service For the Logical Terminal (Audio Processing)
[0116] Since the videoconferencing system 100 of the present disclosure provides a feature which is the logical terminal, unlike the conventional videoconferencing system or device, the audio signal processing in the server 110 is different from the conventional method.
[0117] The audio processing unit 117 of the server 110 decodes the audio signal from the RTP packet that is received by the teleconversation connection unit 113 from each point participating in the videoconferencing. The videoconferencing system 100 in
[0118] As described above, the videoconferencing terminals 11, 13, 15, 17, and 19 have the SIP sessions individually created to the server 110 and are each the terminal for videoconferencing. Therefore, unless otherwise set, all the physical terminals participating in the videoconferencing configured by the server 110 may transmit the audio signals to the server 110 through the SIP sessions regardless of the configuration of the logical terminal. Hereinafter, the audio signal processing by the audio processing unit 117 will be described with reference to
[0119] <Source Audio Receiving Step: S501>
[0120] Referring to
[0121] <Source Audio Processing Step: S503>
[0122] The audio processing unit 117 decodes the RTP packets received through the SIP sessions to obtain the audio signals (hereinafter, referred to as source audio signals) provided from all the physical terminals 11, 13, 15, 17, and 19 participating in the videoconferencing, and mixes the signals into an audio signal (hereinafter, referred to as an output audio signal) to be provided to each videoconferencing point. This corresponds to step S313.
[0123] The output audio signal to be provided to each videoconferencing point is obtained by mixing audio signals provided from different videoconferencing points. Herein, various methods are possible.
[0124] (Method 1) First, regardless of whether each videoconferencing point is the physical terminal or the logical terminal, all audio signals provided from the corresponding videoconferencing point may be mixed. For example, in the output audio signal to be transmitted to the first logical terminal 130, the source audio signals provided by the second logical terminal 150 and the fifth physical terminal 19 need to be mixed, so the audio processing unit 117 mixes the source audio signals provided by the third physical terminal 15, the fourth physical terminal 17, and the fifth physical terminal 19. In the audio signal to be transmitted to the second logical terminal 150, the source audio signals provided by the first logical terminal 130 and the fifth physical terminal 19 need to be mixed, so the audio processing unit 117 mixes the source audio signals provided by the first physical terminal 11, the second physical terminal 13, and the fifth physical terminal 19. In the audio signal to be transmitted to the fifth physical terminal 19, the source audio signals provided by the first logical terminal 130 and the second logical terminal 150 need to be mixed, so the audio processing unit 117 mixes the source audio signals provided by the first physical terminal 11, the second physical terminal 13, the third physical terminal 15, and the fourth physical terminal 17.
[0125] (Method 2) When another videoconferencing point is the logical terminal, only the audio signal provided by one physical terminal selected among the physical terminals belonging to the logical terminal is subjected to mixing for the output audio signal. For example, in the output audio signal to be transmitted to the first logical terminal 130, the source audio signals provided by the second logical terminal 150 and the fifth physical terminal 19 need to be mixed. Since the second logical terminal 150 includes the third physical terminal 15 and the fourth physical terminal 17, the audio processing unit 117 mixes only the source audio signal provided by one terminal selected among the third physical terminal 15 and the fourth physical terminal 17 with the source audio signal provided by the fifth physical terminal 19. Herein, the source audio signal selected for mixing does not necessarily have to be the source audio signal provided by the output-dedicated physical terminal.
[0126] There are various reasons for adopting this method. For example, in the specific application step of this method, the audio signal received through the microphone closest to the talker's location on the side of the second logical terminal 150 may be selected for mixing, and the audio signal provided by the other physical terminal of the second logical terminal 150 may not be mixed. This solves the problem that due to the slight time difference occurring when the talker's speech is input to all the microphones 15-1 and 17-1 of the second logical terminal 150, the audio or voice is not clearly heard.
[0127] <Transmission of the Output Audio Signal: S505>
[0128] The audio processing unit 117 compresses the output audio signal obtained by mixing for provision to each videoconferencing point in a preset audio signal format and encodes the result into the RTP packet for transmission to each videoconferencing point. However, on the side of the logical terminal, the output audio signal is transmitted to the output-dedicated physical terminal described below.
[0129] The Output-Dedicated Physical Terminal
[0130] Regardless of the logical terminal settings, the server 110 establishes the SIP sessions to all the physical terminals participating in the videoconferencing, and the audio signals are transmitted through the SIP sessions. Herein, when the videoconferencing point is the logical terminal like the first point A and the second point B, the audio processing unit 117 transmits the newly encoded audio signal only to the output-dedicated physical terminal. When the videoconferencing point is not the logical terminal but the physical terminal like the third point C, the audio processing unit 117 transmits the newly encoded audio signal to the physical terminal as in the related art. To this end, at the process of registering the logical terminal, the terminal registration unit 111 of the server 110 receives and registers one of the physical terminals constituting the logical terminal as the output-dedicated physical terminal. The output-dedicated physical terminal may be the representative terminal of the logical terminal described above, or may be determined as a terminal different from the representative terminal.
[0131] When the logical terminal participates in the videoconferencing, the audio signals provided by another videoconferencing point are not output through all the physical terminals constituting the logical terminal, but output only through the output-dedicated physical terminal. Otherwise, the same audio signals are output through the multiple speakers with slight time differences and thus clear audio is not output. In addition, when the output-dedicated physical terminal is not determined, a number of complex cases regarding echo cancellation occur, which is inappropriate.
[0132] Therefore, all the physical terminals constituting the logical terminal convert the voices of the talkers, and the like into the audio signals and are capable of providing the results to the server 110, but the audio signal provided by the server 110 is provided only to the output-dedicated physical terminal.
[0133] Referring to
[0134] The audio processing unit 117 provides the output audio signal (15b+17b+19b) to be provided to the first point A only to the first physical terminal 11 that is the output-dedicated physical terminal, and provides the output audio signal (11b+13b+19b) to be provided to the second point B only to the fourth physical terminal 17. Since the third point C is the physical terminal, the audio processing unit 117 transmits the output audio signal (11b+13b+15b+17b) to be provided to the third point C to the fifth physical terminal 19.
[0135] Among the physical terminals constituting the logical terminal, to the terminal other than the output-dedicated physical terminal, the RTP packet having no audio signal may be transmitted. Herein, no audio signal refers to, for example, an audio signal having no amplitude. According to an embodiment, the RTP packet itself for the audio signal may not be transmitted.
[0136] Therefore, in the first point A that is the logical terminal, the first physical terminal 11 outputs the output audio signal (15b+17b+19b) through its speaker 11-2, and does not output any audio through the speaker 13-2 of the second physical terminal 13. Similarly, in the second point B that is the logical terminal, the fourth physical terminal 17 outputs the output audio signal (11b+13b+19b) through its speaker 17-2, and does not output any audio through the speaker 15-2 of the third physical terminal 15.
[0137] Paring Echo Cancelling in the Logical Terminal (
[0138] As described above, since all the physical terminals participating in the videoconferencing configured by the server 110 are each the terminal for videoconferencing regardless of the configuration of the logical terminal, the source audio signals being input to their microphones are not output their speakers.
[0139] In addition, the physical terminal participating in the videoconferencing configured by the server 110 may have an echo cancellation function regardless of the configuration of the logical terminal. However, in order to remove the echo from the input source audio signal, an audio signal (output audio signal) for reference is required. The output audio signal to be transmitted to the logical terminal is transmitted only to the output-dedicated physical terminal. Therefore, the videoconferencing terminal that belongs to the logical terminal but is not the output-dedicated physical terminal does not have the reference audio signal for performing the echo cancellation function.
[0140] In the example in
[0141] Conversely, on the side of the first logical terminal 130, the first physical terminal 11 and the second physical terminal 13 transmit, to the server 110, the source audio signals 11b and 13b received through their microphones 11-1 and 13-1, respectively. Herein, the first physical terminal 11 that is the output-dedicated physical terminal receives, from the server 110, the audio signal for output, and is thus capable of performing echo cancellation on the signal input through the microphone 11-1. However, the second physical terminal 13 is not the output-dedicated physical terminal and thus does not receive the output audio signal from the server 110 and does not have the reference signal for the echo cancellation.
[0142] 3 Therefore, the second physical terminal 13 is not capable of performing echo cancellation on the source audio signal input through the microphone 13-1. Accordingly, the echo processing unit 119 of the videoconferencing server 110 of the present disclosure performs the echo cancellation function.
[0143] The echo processing unit 119 performs the echo cancellation function before mixing for the output audio signal to be provided to each videoconferencing point, and may perform basic noise cancellation when necessary. The echo cancellation of the present disclosure is completely different from the echo cancellation in the conventional general videoconferencing system or equipment. Hereinafter, the echo cancellation function which is the feature of the present disclosure is referred to as paring echo cancelling.
[0144] When the source audio received at the logical terminal side is not provided by the output-dedicated physical terminal, the echo processing unit 119 uses the output audio signal transmitted to the logical terminal so as to remove the echo. Hereinafter, the echo cancellation method of the videoconferencing server 110 will be described with reference to
[0145] First, at step S501, when the audio processing unit 117 receives the source audio signal from each of the physical terminals 11, 13, 15, 17, and 19 participating in the videoconferencing, the echo processing unit 119 determines whether the source audio signal is the signal that is provided by the terminal other than the output-dedicated physical terminal as the physical terminal belonging to the logical terminal, at steps S601 and S603.
[0146] As the result of the determination at steps S601 and S603, when the source audio signal is the signal that is provided by the terminal other than the output-dedicated physical terminal as the physical terminal belonging to the logical terminal, the echo processing unit 119 performs the echo cancellation function on the basis of the output audio signal transmitted to the logical terminal. An echo cancellation algorithm of the echo processing unit 119 is an algorithm where waveform which is the same as that of the output audio signal is removed from the input audio signal, and a commonly known echo cancellation algorithm may be used. In the example in
[0147] As the result of the determination at steps S601 and S603, when the audio signal is not transmitted from the logical terminal or is the signal provided by the output-dedicated physical terminal as the physical terminal belonging to the logical terminal, the echo processing unit 119 does not need to perform the echo cancellation function. This is because the output-dedicated physical terminal has its own echo cancellation function and removes the echo. As another method, as in step S603, through comparison with the output audio signal that has already been transmitted to the first physical terminal 11, the echo may be removed.
[0148] Using the above-described method, the paring echo cancelling of the present disclosure is performed.
[0149] (Embodiment) Another Method For Audio Processing and Echo Cancellation in the Logical Terminal
[0150] In the example described above, the audio processing unit 117 provides the output audio signal only to the output-dedicated physical terminal, but no limitation thereto is imposed. For example, the same output audio signals may be provided to all the physical terminals constituting the logical terminal. However, only the output-dedicated physical terminal outputs the output audio signal, and the remaining physical terminal simply uses the output audio signal as a reference audio signal for echo cancellation.
[0151] The audio processing unit 117 transmits the same output audio signals to all the physical terminals constituting the logical terminal, but the audio signal in the RTP packet provided to the output-dedicated physical terminal is marked as for output, and the audio signal in the RTP packet provided to the remaining physical terminal is marked as for echo cancellation. In this case, echo cancellation is performed in each physical terminal, and thus the server 110 does not need to have the echo processing unit 119.
[0152] For example, in the example in
[0153] Accordingly, the first physical terminal 11 outputs the output audio signal through the speaker 11-2. The second physical terminal 13 retains the output audio signal provided from the server 110 without outputting the output audio signal through the speaker 13-2, and uses the output audio signal for removing the echo from the audio signal received through the microphone 13-1.
[0154] Although the exemplary embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the aforesaid particular embodiments, and can be variously modified by those skilled in the art without departing the gist of the present disclosure defined in the claims. The modifications should not be understood individually from the technical idea or perspective of the present disclosure.