REAL-TIME CALL TRANSLATION SYSTEM AND METHOD
20210312143 · 2021-10-07
Assignee
Inventors
Cpc classification
H04M3/42
ELECTRICITY
G06F40/58
PHYSICS
H04M2242/12
ELECTRICITY
G10L15/1815
PHYSICS
International classification
G06F40/58
PHYSICS
Abstract
A real-time call translation system and method is provided. The invention provides establishing a voice call between a user speaking a source language and another user understanding and speaking a different target language; and performing translation of the audio of the source user into audio in the target language, and translation of the audio of the target user back to audio in the source language during the call. Further, the invention provides interlacing of the audio of the source user, the target user and the translated audio; in which the listener first hears the original audio from the other participant and then the associated translated audio and the speaker synchronously also hears the translated audio. Further the interlacing provides participants a better understanding of the conversation and conversational flow. Further the method facilitates better translations and clearer transcription, as the audio streams are not overlapped, and further noise and interference are reduced in the audio streams.
Claims
1. A computer-implemented method of performing in-call translation through a communication interface, the method comprising: calling through a first device associated with a source user to a second device associated with a target user and establishing a call session, where the source user is speaking a source language and the target user is speaking a target language; selecting a target language of the target user to initiate translation of an audio of the source user during the call; performing translation of the audio of the source user into the selected target language; performing translation of an audio of the target user back to the language of the source user; analysing translated audio data of the call; interlacing the audio of the source user, the target user and the translated audio of the call; and transmitting the translated audio to the target user and playing back the translated audio to the source user.
2. The method of claim 1, wherein the in-call translation processing is executed on one or both devices, where the communication interface is executed on the first device associated with the source user and/or the second device associated with the target user, for the translation of the audio of the source user into the target language and the translation of the audio of the target user into the source language.
3. The method of claim 1, wherein the in-call translation is preformed within the communications infrastructure, such as, but not limited to, telephony network, IP network, cloud server or other connectivity.
4. The method of claim 1, wherein a voice command, a key button, a screen touch or visual gesture, automatic language detection are used, but not limited to, selecting the target language, pausing the call, repeating a sentence of the translated audio data, terminating the in-call translation.
5. The method of claim 1, where the target user first hears the original untranslated audio as it is spoken and then hears the translated audio.
6. The method of claim 1, wherein the source user pauses after speaking to hear the translated audio of their utterance, synchronously or largely synchronously with the target user.
7. The method of claim 1, wherein further a context of conversations during the call is used in the analysis and adaptation of the Speech to Text (STT) process that increases confidence and improves accuracy of the translation.
8. The method of claim 1, wherein the interlacing of the source audio, the target audio and the translated audio allows the target user to understand and know that the translation is being performed and alerts the target user to wait for both the source audio and the translated audio to be heard.
9. The method of claim 1, wherein the interlacing coordinates and synchronises overlapping between the source audio, the target audio with the translated audio, and further noise and interference are reduced which provides for improved transcribing and recording to aid documentation of the call session, as used in, but not limited to, security, proof, verification, evidence purposes, analysis, and collection of data for training.
10. A computer-implemented in-call translation system, comprising: a memory; a processor; and a communication interface; where the processor is coupled to the memory, the processor is configured with the communication interface to: establish a call with a first device associated with a source user to a second device associated with a target user, where the source user speaks a source language and the target user speaks a target language; select the target language to initiate translation process of an audio of the source user's audio during the call; perform the translation of the audio of the source user into the target language; analyse at least one part of the translated audio data; interlace the audio of the source user, the target user and the translated audio; and transmit the translated audio to the target user and simultaneously play back the translated audio to the source user.
11. The system of claim 10, wherein a device is any communications device, such as, but not limited to, Dial Phones, Mobile phones, Smartphones, Smart glasses, Tablets, Smart bands, Wearables or Human Augmentations.
12. The system of claim 10, wherein the in-call translation is executed on one-side or both-sides, where the communication interface is executed on either the first device associated with the source user and/or the second device associated with the target user, for the translation of the audio of the source user into the target language and the translation of the audio of the target user into the source language.
13. The system of claim 10, wherein the in-call translation is preformed within the network communication infrastructure or a cloud server or connectivity.
14. The system of claim 10, where the target user first hears the original untranslated audio as it is spoken and then hears the translated audio.
15. The system of claim 10, wherein the source user pauses after speaking to hear the translated audio of their utterance, synchronously or largely synchronously with the target user.
16. The system of claim 10, wherein the interlacing and feedback of the source audio, the target audio and the translated audio allows the target user to understand that the translation is being performed and alerts the target user to wait for both the source audio and the translated audio to be heard.
17. The system of claim 10, wherein further a context of conversations during the in-call is analysed for a Speech to Text (STT) perspective that increases confidence and improves accuracy of the translation.
18. The system of claim 10, wherein the interlacing coordinates and synchronises overlapping between the source audio, the target audio with the translated audio, and further noise and interference are reduced, which provides transcribing and recording to aid documentation of the call session for, but not limited to, security, proof, verification, evidence purposes, analysis, and collection of data for training.
19. The system of claim 10, wherein further provides a valid service for translating audio of the call from users such as including, but not limited to, legal, banking, and medical where a third party is not allowed on the call for privacy reasons.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0037] The object of the invention may be understood in more detail and particular description of the invention briefly summarized above by reference to certain embodiments thereof which are illustrated in the appended drawings, which drawings form a part of this specification. It is to be noted, however, that the appended drawings illustrate preferred embodiments of the invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective equivalent embodiments.
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
DETAILED DESCRIPTION OF THE INVENTION
[0048] The present invention will now be described by reference to more detailed embodiments. This invention may, however, be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
[0049] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for describing particular embodiments only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise.
[0050] The term “source user” as used herein refers to a user who is starting the call i.e. caller or dialler.
[0051] The term “target user” as used herein refers to a user who is recipient of the call i.e. receiver or recipient.
[0052] Further, in the present invention, when an audio/a voice is converted into another language from a language, the language originally is thus referred to as “source language”, and the language exported is then referred to as “target language”. In alternatives, the language of the source user is “source language” and the language of the target user is “target language”.
[0053] As described herein with several embodiments, the present invention provides a real-time call translation system and method. Now referring to figures the present invention provides a call translation system 10 as illustrated in the
[0054] In other words, the system 10 includes the interface 20 to facilitate communication and translation on the communication devices 16, 18 associated with the users. In one embodiment, the communication device 16, 18 is a mobile phone e.g., Smartphone, a personal computer, tablet, smart sunglass, smart band, or other embedded device. The application includes the communication interface 20, in which the source user can make a call to the target user who is on a standard phone with no special capabilities.
[0055] As shown in
[0056] In some embodiments, the system 10 facilitates the call translation on both-sides, where the communication interface 20 is executed on the device 16, 18 of both the source user 12 and the target user 14.
[0057] In some embodiments, the system 10 facilitates the call translation on one-side, where the communication interface 20 is executed on the device 16 associated with the source user 12 for the translation of the audio of the source user 12 into the target language as shown in
[0058] In some embodiments, the system 10 facilitates the call translation in group call or multi-participants conversation, where the communication interface 10 is executed on the communication device associated with each user for the translation into the target language. As shown in
[0059] In some embodiments as shown in
[0060]
[0061] The communication device 16, 18, 24 may be, for example, a mobile phone (e.g. Smartphone), a personal computer, tablet, smart sunglass, smart-band or other embedded device able to communicate over the network 36.
[0062] A control server 37 is operating the interface 20 for performing translation during call. The control server 37 is configured with the interface 20 for the communication along with the translation process. While the call may be a simple telephone call on one or both ends of a two-party call/more than two parties, the descriptions hereinafter will reference an embodiment in which at least one end of the call is accomplished using VOIP.
[0063] The control server 37 may accommodate two-party or multi-party calls and may be scaled to accommodate any number of users. Multiple users may participate in a communication, as in a telephone conference call conducted simultaneously in multiple languages.
[0064] Turning now to
[0065] In a preferred embodiment, the invention provides an interface 20 for establishing a call with the first communication device 16 associated with the source user to the second communication device 18 associated with the target user, where the source user speaking a source language and the target user speaking a target language, then requesting to select the target language to initiate the translation of the source language of the audio of the source user in the call by a voice command or pressing a key button or screen touch or visual gesture on the communication interface 20, performing the translation of the audio of the source user into the target language, analyzing at least one of translated audio call data; interlacing the audio of the source user, the target user and the translated audio; and transmitting the translated audio to the target user and simultaneously played back the translated audio to the source user.
[0066] As shown in
[0067] As discussed above, the translation engine 42 includes a speech recognition unit 54 that can accept speech, performing Speech to Text (STT) conversion, then performing Text Translation form source language to target language and then Text to Speech translation. In some embodiment context-based Speech to Text (STT) and context-based translation improves translation while giving possible alternative sentences. As shown in
[0068] As discussed herein, in some embodiments, the translation engine 42 is configured with the speech recognition unit 54; the speech recognition unit 54 performs a speech recognition procedure on the source audio. The speech recognition procedure is configured for recognizing the source language. Specifically, the speech recognition procedure detects particular patterns in the call audio which it matches to known speech patterns of the source language in order to generate an alternative representation of that speech. On the request of the source user, the system performs translation of the source language into the target language. The translation is performed ‘substantially-live e.g. on a per-sentence (or few sentences), per detected segment, on pause, or per-word (or few words). In one embodiment, the translated audio is not only sent to the target user but also played back to the source user. In a normal call the source audio is not played back as it confuses the speaker as it is an echo. But in this case, the translated audio is played back to the source user.
[0069] Further, in another embodiment, the present invention provides monitoring of the translation that allows the user to pause and wait for a response from the translation process.
[0070] In another embodiment, the present invention provides interlacing of the source audio, target audio and translated audio, that allows the target user to understand that there is a translation process, and they should wait until both source audio and translated audio are played. In an exemplary embodiment, some audio clues, such as beep tones are activated using the voice command or key button, which makes the users aware of the gap and coordination between the source audio and the translated audio.
[0071] In another embodiment of the present invention, the translation assistance can be turned on during the call (i.e. does not need to be turned on prior to making a call).
[0072] In another embodiment, the source user initiates the call and can subsequently turn on the translation through a voice command or via a key button feature or smart triggers or set the function to automatic detect and translate to the target language. The user can provide the commands for selecting a language for the translation, for pausing the call in between or repeating the sentence etc. For example, Polyglottel™ please pause the call for 10 second; Polyglottel™ please translate audio into Chinese language, etc.
[0073] Further, in another embodiment, the original audio of the source user is sent to the target user and vice-versa.
[0074] In another embodiment, the system 10 provides an ability to change the sound levels of both the source audio and the translated audio. This is done through the interface 20 (Graphical user interface—GUI) of the App on the device or through voice commands during the call. For example, it provides an interactive interface for increasing or decreasing the sound of the source audio and the translated audio as per the user's convenience.
[0075] The invention provides the audio stream in high quality that is the audio stream is not mixed with the source audio and the translated audio as prior art methods are doing.
[0076] Unlike other voice apps, this system allows both source and target user to hear the translation of their own audio input. This has the benefit of keeping the rhythm of natural speech within the context of the dialogue.
[0077] A method of facilitating communication and translation in real-time between users during an audio or video call will be described herewith reference to
[0078] In another embodiment, the method of facilitating communication and translation in real-time between users is described herein with various steps. The method includes at step 71, opening a communication interface 20 which is executed on a communication device; at step 72, calling through the communication interface 20 on a first communication device associated with a source user to a second communication device associated with a target user for establishing a call session, where the source user speaking a source language and the target user speaking a target language; at step 73, selecting the target language to initiate translation of the source language of an audio of the source user in the call through an interactive voice command or via key button or screen touch or visual gesture on the interface; at step 74, performing translation of the audio of the source user into the target language; at step 75, interlacing the audio of the source user, the target user and the translated audio during the call; at step 76, transmitting the translated audio to the target user and playing the translated audio back to the source user; and at step 77, transcribing and recording to aid documentation of calls for including but not limited to security, proof, verification, evidence purposes, analysis and collection of data for training.
[0079] In some embodiments, the interlacing function allows a pause recognition sound to be inserted to allow source user and target user to recognize start and end of the translation and/or output by both the user.
[0080] As shown in
[0081] One advantage, the present invention provides a call terminal (communication interface 20) for real-time original voice translation during the call, and the voice translated is sent to the users in which the sense of reality is stronger, the accuracy and quality is high.
[0082] One more advantage, the translation is performed by the interface on the communication device of the source user, therefore this system 10 does not require any additional equipment or process, as long as the side of the caller (source user) is equipped with the call terminal, the receiver (target user) can be equipped with a regular conversation terminal for example speaking to a bank representative or doctor or legal persons.
[0083] Another advantage, the invention provides interlacing the audio of the source user, the target user, and the translated audio during the call, which is beneficial for communication in which a normal third-party translator is not allowed, for example speaking to a bank representative or doctor or legal person.
[0084] Another advantage, the present invention provides interlacing of the audio for clear transcription of the conversation to text. Therefore, the interlacing of the audio between the source user and the target user means that the audio streams are not overlapping, and so noise and interference are reduced, which allows for better translation and transcription.
[0085] In one more advantage, the present invention provides call translation on the target user's side. The target user may provide this a valid service for translating audio of the call from users. For example, when talking to a Bank or a Doctor or a legal person in which confidential information cannot be shared to 3.sup.rd party human translators.
[0086] In another advantage, the present invention provides transcribing and recording of the audio of user and the translated audio aid documentation of calls for security purposes to meet the legal and security requirement of, but not limited to, financial, medical, government and military applications.
[0087] In another advantage, the present invention provides better audio translation and the users are aware an automated translation is taking place.
[0088] In another advantage, the present invention provides translation during the call, where the translation is further based on contexts of the conversation, accordingly the translation is performed which improves the accuracy of the translation.
[0089] The system implementations of the described technology, in which the application interface 20 is capable of executing a program to execute the translation, the interface 20 is connected with a network 36, control server 37 and a computer system capable of executing a computer program to execute the translation. Further, data and program files may be input to the computer system, which reads the files and executes the programs therein. Some of the elements of a general-purpose computer system are a processor having an input/output (I/O) section, a Central Processing Unit (CPU), translation program, and a memory.
[0090] The described technology is optionally implemented in software devices loaded in memory, stored in a database, and/or communicated via a wired or wireless network link, thereby transforming the computer system into a special purpose machine for implementing the described operations.
[0091] The embodiments of the invention described herein are implemented as logical steps in one or more computer systems. The implementation is a matter of choice, dependent on the performance requirements of the computer system implementing the invention. Accordingly, the logical operations making up the embodiments of the invention described herein are referred to variously as operations, steps, objects, or modules. Furthermore, it should be understood that logical operations may be performed in any order, unless explicitly claimed otherwise or a specific order is inherently necessitated by the claim language.
[0092] The foregoing description of embodiments of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.