SYSTEM AND METHOD FOR PROCESSING SPEECH FILES
20180322186 ยท 2018-11-08
Inventors
Cpc classification
International classification
Abstract
A system and method for speech file processing which provides users with differentially selectable speech file transcripts which can be sent to one or more other users. The speech files may be voicemail messages from which respective voicemail transcripts are created. The voicemail transcripts are provided in a user selectable format from which users may select non-contiguous portions of the transcript.
Claims
1. A method comprising: displaying, at a graphical user interface, a transcript of an audio file; receiving, via the graphical user interface, a selection of a portion of the transcript of the audio file; identifying the portion of the transcript selected based on a transcript index which indexes text in the transcript relative to an occurrence of the text in the audio file; and transmitting the portion of the transcript selected to a particular recipient.
2. The method of claim 1, further comprising: receiving a second selection of a second portion of the transcript associated with the audio file, wherein the portion and the second portion are not contiguous; and transmitting the second selection of the second portion of the transcript the particular recipient.
3. The method of claim 1, wherein information associated with the audio file is used to identify the particular recipient.
4. The method of claim 3, wherein the information comprises a name associated with the audio file, a message summary, a telephone number associated with the audio file, a date associated with the audio file, a size of the audio file, or any combination thereof.
5. The method of claim 1, further comprising: receiving an indication for including the audio file with the portion of the transcript with the portion of the transcript transmitted to the particular recipient; and transmitting the indication for including the audio file to a server for delivery of the audio file to the particular recipient with the portion of the transcript.
6. The method of claim 5, wherein the audio file comprises only a second portion of the audio file that corresponds to the portion of the transcript.
7. The method of claim 1, wherein the transmitting uses an electronic mail message, and wherein the electronic mail message comprises an attachment of the audio file corresponding to the portion of the transcript.
8. A communication device, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, comprising: displaying, at a graphical user interface, a transcript of an audio file; receiving, via the graphical user interface, a selection of a portion of the transcript of the audio file; identifying the portion of the transcript selected based on a transcript index which indexes text in the transcript relative to an occurrence of the text in the audio file; and transmitting the portion of the transcript selected to a particular recipient.
9. The communication device of claim 8, wherein the memory stores additional executable instructions that, when executed by the processor, facilitate performance of operations further comprising: receiving a second selection of a second portion of the transcript associated with the audio file, wherein the portion and the second portion are not contiguous; and transmitting the second selection of the second portion of the transcript the particular recipient.
10. The communication device of claim 8, wherein information associated with the audio file is used to identify the particular recipient.
11. The communication device of claim 10, wherein the information comprises a name associated with the audio file, a message summary, a telephone number associated with the audio file, a date associated with the audio file, a size of the audio file, or any combination thereof.
12. The communication device of claim 8, wherein the memory stores additional executable instructions that, when executed by the processor, facilitate performance of operations further comprising: receiving an indication for including the audio file with the portion of the transcript with the portion of the transcript transmitted to the particular recipient; and transmitting the indication for including the audio file to a server for delivery of the audio file to the particular recipient with the portion of the transcript.
13. The communication device of claim 12, wherein the audio file comprises only a second portion of the audio file that corresponds to the portion of the transcript.
14. The communication device of claim 8, wherein the transmitting uses an electronic mail message, and wherein the electronic mail message comprises an attachment of the audio file corresponding to the portion of the transcript.
15. A non-transitory machine-readable storage medium comprising executable instructions that, when executed by a processor, facilitate performance of operations comprising: displaying, at a graphical user interface, a transcript of an audio file; receiving, via the graphical user interface, a selection of a portion of the transcript of the audio file; identifying the portion of the transcript selected based on a transcript index which indexes text in the transcript relative to an occurrence of the text in the audio file; and transmitting the portion of the transcript selected to a particular recipient.
16. The non-transitory machine-readable storage medium of claim 15, further comprising: receiving a second selection of a second portion of the transcript associated with the audio file, wherein the portion and the second portion are not contiguous; and transmitting the second selection of the second portion of the transcript the particular recipient.
17. The non-transitory machine-readable storage medium of claim 15, wherein information associated with the audio file is used to identify the particular recipient.
18. The non-transitory machine-readable storage medium of claim 17, wherein the information comprises a name associated with the audio file, a message summary, a telephone number associated with the audio file, a date associated with the audio file, a size of the audio file, or any combination thereof.
19. The non-transitory machine-readable storage medium of claim 15, wherein the non-transitory machine-readable storage medium stores additional executable instructions that, when executed by the processor, facilitate performance of operations further comprising: receiving an indication for including the audio file with the portion of the transcript with the portion of the transcript transmitted to the particular recipient; and transmitting the indication for including the audio file to a server for delivery of the audio file to the particular recipient with the portion of the transcript.
20. The non-transitory machine-readable storage medium of claim 19, wherein the audio file comprises only a second portion of the audio file that corresponds to the portion of the transcript.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0008]
[0009]
[0010]
[0011]
[0012]
DETAILED DESCRIPTION OF THE DRAWINGS
[0013] Referring to
[0014] Referring again to
[0015] In this embodiment, voice mail server 20 includes at least one database 40, for storing, for example, voice mail message files and voice mail message transcripts as discussed in more detail later herein, as well as the operating programs for the particular voice mail server served by database 40. Database 40 may be any type or combination of types of storage media such as magnetic, optical, optical-magnetic, etc. so long as the storage facility has sufficient capacity to store a plurality of voice mail messages from a plurality of subscribers.
[0016] In one embodiment, voice mail server 20 is preferably a computer system that essentially functions as a central answering machine for subscribers to the voice mail system. It is understood that the present invention can be utilized in or adapted to a variety of voice mail servers or similar equipment.
[0017] Voice mail server 20 is also connected via respective trunk lines, not shown, to a communications network 70, which is illustrated in
[0018] In the present embodiment, voice mail server 20 is in communication with a message server 90, such as an electronic mail message server, for user in delivering messages, such as certain selections of voicemail transcripts and corresponding audio to one or more entities. As discussed in more detail later herein, voice mail server 20 processes speech files, in this case, the speech files are voicemail messages, to produce one or more voicemail transcripts. Users are then provided the opportunity to select one or more portions of a voicemail transcript. The one or more selected portions are provided to one or more identified recipients via message server 90.
[0019] Referring to
[0020] Referring to
[0021] Referring to
[0022] In an exemplary embodiment, automatic speech recognition or simply, speech to text techniques are used to derive text from speech, i.e. to identify the letters or words spoken by a human subject in one or more speech files, such as voicemail messages. In the present invention, automatic speech recognition is used to analyze the speech signals contained in a speech file, such as a voicemail message to produce a textual transcript of the speech signals in the voicemail message. In an exemplary embodiment, such speech recognition techniques may use a combination of pattern recognition and sophisticated guessing based on some linguistic and contextual knowledge to transcribe the speech files. It is contemplated that other methodologies and techniques may be used so long as the speech is properly transcribed into a textual format to produce a workable transcript from which a user may select one or more portions from to send or forward on to one or more other parties or entities.
[0023] In the present invention, transcribing of the voicemails by automatic speech recognition is preferably performed automatically, for example, as soon as a voicemail message is left for a user or alternatively, transcribing may be performed periodically as determined by the user or by system defaults. In one embodiment of the present invention, automatic speech recognition is performed in conjunction with or immediately subsequent to the recording of the voice or speech signals as voicemail messages. For example, transcribing may be performed as someone is leaving a voicemail message by transmitting the voice signals to the respective voicemail server for processing. Alternatively, transcribing may performed immediately after the voicemail is saved on the voicemail server by having the voicemail server first transmit the stored voicemail message to the speech recognition component of the voicemail server and then using automatic speech recognition to transcribe the voicemail.
[0024] Alternatively, the system may wait until a certain predetermined number of voicemails are stored for a certain user on the voicemail server before processing the voicemails. Once the certain predetermined number of voicemails is attained, processing of the voicemail messages may be performed on the group of voicemails by the speech recognition component. For example, the system may be configured to transcribe voicemail messages after at least two or more messages are left in a user's mailbox. As a further alternative, transcribing of the voicemails may be performed only after a user has actively selected for transcribing to be performed on the voicemails. For example, the user may be provided in the system with a menu selection or selection key which when pressed or selected, would initiate transcribing of their voicemails. The user may also be provided with the choice of having specific voicemails of their choosing processed by the system. In this instance, some users may prefer to listen to some of their voicemails in the conventional manner while having other voicemails, such as relatively longer voicemails, transcribed and indexed by the system. It is contemplated that the system may provide the user with the choice of having his/her voicemails processed by the system. In one embodiment, the user may be charged a certain fee for voicemail processing or alternatively, the voicemail processing may be offered as a free value added service.
[0025] Once the voicemails have been transcribed, the text of the voicemail message(s) may be indexed using full text indexing/retrieval techniques as known in the art. Once a user selects a portion or portions of a speech file transcript as described earlier herein, those selected portions are used in conjunction with the transcript index, such as the one shown in
[0026] In another embodiment of the present invention, a sound or audio file of the voicemail message is also provided to the one or more users. In one embodiment, the sound or audio file may be provided as an attachment to the electronic mail message. The sound or audio file may be provided as an MPEG-x Audio Layer-x (mpx) file such as an mp3 file, a .WAV file, a streaming audio file or other similar file format.
[0027] It will be apparent to those skilled in the art that many changes and substitutions can be made to the system and method described herein without departing from the spirit and scope of the invention as defined by the appended claims.