Use of ASR confidence to improve reliability of automatic audio redaction

Abstract

A speech redaction engine includes a natural language processing (NLP)-based content redaction module receives an automatic speech recognition (ASR) decoding of a decoded portion of said digitized speech signal and utilizes NLP techniques to determine whether it contains sensitive information that should be redacted, and an ASR confidence-based redaction module that receives a confidence indicator and utilizes said confidence indicator to determine, independent of said NLP-based content redaction module, whether said decoded portion contains one or more word(s) that were recognized with a confidence level that is below a threshold. The speech redaction engine includes means for redacting said decoded portion if the NLP-based content redaction module determines that said portion should be redacted, and means for redacting the one or more word(s) if the ASR confidence-based redaction module determines that the one or more word(s) have the confidence level that is below the threshold.

Claims

1. A method comprising: receiving an automatic speech recognition (ASR) decoding of a decoded portion of a digitized speech signal; determine determining whether said decoded portion of said digitized speech signal contains sensitive information that should be redacted by utilizing a national language processing (NLP); receiving a confidence score and utilizing said confidence score to determine, independent of said NLP-based content redaction determination, whether said decoded portion of said digitized speech signal contains one or more word(s) within said digitized speech signal that were recognized with a confidence level that is below a threshold; redacting said decoded portion of said digitized speech signal if the NLP-based content redaction determination determines that said portion should be redacted; and redacting the one or more word(s) within said digitized speech signal if the ASR confidence-based redaction module determines that the one or more word(s) have the confidence level that is below the threshold.

2. The method of claim 1 further comprising means for encrypting and storing an encrypted version of said digitized speech signal prior to redaction.

3. The method of claim 1 further comprising storing a redacted version of said digitized speech signal and an encrypted version of the decoded portion of said digitized speech signal that is unredacted.

4. The method of claim 1, wherein the ASR confidence score is derived, at least in part, from normalized likelihood scores.

5. The method of claim 1, wherein the ASR confidence score is computed, at least in part, using an N-best homogeneity analysis.

6. The method of claim 1, wherein the ASR confidence score is computed, at least in part, based an acoustic stability analysis.

7. The method of claim 1, wherein the ASR confidence score is computed, at least in part, based on a word graph hypothesis density analysis.

8. The method of claim 1, wherein the ASR confidence score is derived, at least in part, based on state, phoneme, or word durations.

9. The method of claim 1, wherein the ASR confidence score is derived, at least in part, from language model (LM) scores or LM back-off behaviors.

10. The method of claim 1, wherein the ASR confidence score is computed, at least in part, using a posterior probability analysis.

11. The method of claim 1, wherein the ASR confidence score is computed, at least in part, using a log-likelihood-ratio analysis.

12. The method of claim 1, wherein the ASR confidence score is computed, at least in part, using a neural net that includes word identity and aggregated words as predictors.

13. The method of claim 1, wherein identifying whether the decoded portion contains sensitive information is based, at least, on Personally Identifiable Information (PII).

14. The method of claim 1, wherein identifying whether the decoded portion contains sensitive information is based, at least, on Nonpublic Personal information (NPI).

15. The method of claim 1, wherein identifying whether the decoded portion contains sensitive information is based, at least, on Personal Health Information (PHI).

16. The method of claim 1, wherein identifying whether the decoded portion contains sensitive information is based, at least, on Sensitive Personal Information (SPI) or Personal Credit Information (PCI).

17. The method of claim 1 further comprising de-identifying a voice of a speaker in the digitized speech signal.

18. The method of claim 1 further comprising normalizing accent associated with the digitized speech signal.

19. The method of claim 1 further comprising de-identifying the redacted digitized speech signal.

20. The method of claim 1, wherein said digitized speech signal is determined to contain sensitive information based on a redaction strictness factor that, depending upon its setting, affects the likelihood of a given ASR decoding being identified as containing sensitive information.

Description

BRIEF DESCRIPTION OF THE FIGURES

(1) Aspects, features, and advantages of the present invention, and its numerous exemplary embodiments, can be further appreciated with reference to the accompanying set of figures, in which:

(2) FIG. 1 is a high-level block diagram, showing a Front End, Processing Tier, and Storage Tier;

(3) FIG. 2 depicts an exemplary embodiment in which the Front End, Processing Tier, and Storage Tier are all provisioned in one or more Data Processing Cloud(s);

(4) FIG. 3 depicts an exemplary embodiment in which the Processing and Storage Tiers are provisioned in Data Processing Cloud(s), and the Frond End is provisioned On Premises;

(5) FIG. 4 depicts an exemplary embodiment in which the Front End is provisioned in a Data Processing Cloud, and the Processing and Storage Tiers are provisioned On Premises;

(6) FIG. 5 depicts an exemplary embodiment in which the Processing Tier is provisioned in a Data Processing Cloud, and the Front End and Storage Tier are provisioned On Premises;

(7) FIG. 6 depicts an exemplary embodiment in which the Front End and Storage Tier are provisioned in Data Processing Cloud(s), and the Processing Tier is provisioned On Premises;

(8) FIG. 7 depicts an exemplary embodiment in which the Storage Tier is provisioned in a Data Processing Cloud, and the Front End and Processing Tier are provisioned On Premises;

(9) FIG. 8 depicts an exemplary embodiment in which the Front End and Processing Tier are provisioned in Data Processing Cloud(s), and the Storage Tier is provisioned On Premises;

(10) FIG. 9 depicts an exemplary embodiment in which the Front End, Processing Tier, and Storage Tier are provisioned On Premises;

(11) FIG. 10 depicts an exemplary embodiment in which the Front End, Processing Tier, and Storage Tier are provisioned partly in Data Processing Cloud(s) and partly On Premises;

(12) FIG. 11 depicts an exemplary embodiment in which the Front End and Processing Tier are provisioned partly in Data Processing Cloud(s) and partly On Premises, and the Storage Tier is provisioned On Premises;

(13) FIG. 12 depicts an exemplary embodiment in which the Front End and Processing Tier are provisioned partly in Data Processing Cloud(s) and partly On Premises, and the Storage Tier is provisioned in a Data Processing Cloud;

(14) FIG. 13 depicts an exemplary embodiment in which the Front End and Storage Tier are provisioned partly in Data Processing Cloud(s) and partly On Premises, and the Processing Tier is provisioned On Premises;

(15) FIG. 14 depicts an exemplary embodiment in which the Front End and Storage Tier are provisioned partly in Data Processing Cloud(s) and partly On Premises, and the Processing Tier is provisioned in a Data Processing Cloud;

(16) FIG. 15 depicts an exemplary embodiment in which the Processing and Storage Tiers are provisioned partly in Data Processing Cloud(s) and partly On Premises, and the Front End is provisioned On Premises;

(17) FIG. 16 depicts an exemplary embodiment in which the Processing and Storage Tiers are provisioned partly in Data Processing Cloud(s) and partly On Premises, and the Front End is provisioned in a Data Processing Cloud;

(18) FIG. 17 depicts an exemplary embodiment in which the Storage Tier is provisioned partly in a Data Processing Cloud and partly On Premises, and the Front End and Processing Tier are provisioned On Premises;

(19) FIG. 18 depicts an exemplary embodiment in which the Storage Tier is provisioned partly in a Data Processing Cloud and partly On Premises, the Front End is provisioned On Premises, and the Processing Tier is provisioned in a Data Processing Cloud;

(20) FIG. 19 depicts an exemplary embodiment in which the Storage Tier is provisioned partly in a Data Processing Cloud and partly On Premises, the Processing Tier is provisioned On Premises, and the Front End is provisioned in a Data Processing Cloud;

(21) FIG. 20 depicts an exemplary embodiment in which the Storage Tier is provisioned partly in a Data Processing Cloud and partly On Premises, and the Front End and Processing Tier are provisioned in Data Processing Cloud(s);

(22) FIG. 21 depicts an exemplary embodiment in which the Processing Tier is provisioned partly in a Data Processing Cloud and partly On Premises, and the Front End and Storage Tier are provisioned On Premises;

(23) FIG. 22 depicts an exemplary embodiment in which the Processing Tier is provisioned partly in a Data Processing Cloud and partly On Premises, the Front End is provisioned On Premises, and the Storage Tier is provisioned in a Data Processing Cloud;

(24) FIG. 23 depicts an exemplary embodiment in which the Processing Tier is provisioned partly in a Data Processing Cloud and partly On Premises, the Storage Tier is provisioned On Premises, and the Front End is provisioned in a Data Processing Cloud;

(25) FIG. 24 depicts an exemplary embodiment in which the Processing Tier is provisioned partly in a Data Processing Cloud and partly On Premises, and the Front End and Storage Tier are provisioned in Data Processing Cloud(s);

(26) FIG. 25 depicts an exemplary embodiment in which the Front End is provisioned partly in a Data Processing Cloud and partly On Premises, and the Processing and Storage Tiers are provisioned On Premises;

(27) FIG. 26 depicts an exemplary embodiment in which the Front End is provisioned partly in a Data Processing Cloud and partly On Premises, the Processing Tier is provisioned On Premises, and the Storage Tier is provisioned in a Data Processing Cloud;

(28) FIG. 27 depicts an exemplary embodiment in which the Front End is provisioned partly in a Data Processing Cloud and partly On Premises, the Storage Tier is provisioned On Premises, and the Processing Tier is provisioned in a Data Processing Cloud;

(29) FIG. 28 depicts an exemplary embodiment in which the Front End is provisioned partly in a Data Processing Cloud and partly On Premises, and the Processing and Storage Tiers are provisioned in Data Processing Cloud(s);

(30) FIGS. 29-31 show three examples of possible data flows between the Front End, Processing Tier, and Storage Tier, in accordance with the various exemplary embodiments herein;

(31) FIG. 32 depicts a stored call database, in accordance with certain exemplary embodiments herein;

(32) FIG. 33 depicts a block diagram of an ASR engine, in accordance with certain exemplary embodiments herein;

(33) FIG. 34 depicts a block diagram of an audio pre-processor, in accordance with certain exemplary embodiments herein;

(34) FIG. 35 depicts the operation of an audio format converter (or transcoder), in accordance with certain exemplary embodiments herein;

(35) FIG. 36 depicts the operation of a voice activity detector (VAD), in accordance with certain exemplary embodiments herein;

(36) FIG. 37 depicts the operation of a word/utterance separator, in accordance with certain exemplary embodiments herein;

(37) FIG. 38 depicts the operation of an ASR engine, in accordance with certain exemplary embodiments herein;

(38) FIG. 39 depicts the operation of an emotion/sentiment module, in accordance with certain exemplary embodiments herein;

(39) FIG. 40 depicts the operation of a gender identification module, in accordance with certain exemplary embodiments herein;

(40) FIG. 41 depicts the operation of an age identification module, in accordance with certain exemplary embodiments herein;

(41) FIG. 42 depicts the operation of a speaker identification module, in accordance with certain exemplary embodiments herein;

(42) FIG. 43 depicts the operation of an accent identification module, in accordance with certain exemplary embodiments herein;

(43) FIG. 44 depicts a structured data record, in accordance with certain exemplary embodiments herein;

(44) FIG. 45 contains a block diagram of a speech redaction engine, in accordance with certain exemplary embodiments herein;

(45) FIG. 46 depicts a first enhanced redaction flow, in accordance with one of the exemplary embodiments herein;

(46) FIG. 47 depicts a second enhanced redaction flow, in accordance with another of the exemplary embodiments herein;

(47) FIG. 48 depicts a third enhanced redaction flow, in accordance with another of the exemplary embodiments herein;

(48) FIG. 49 depicts a first exemplary assignment of the enhanced redaction flow elements (of FIGS. 46-48) between the Front End, Processing, and Storage Tiers, in accordance with certain exemplary embodiments herein;

(49) FIG. 50 depicts a second exemplary assignment of the enhanced redaction flow elements between the Front End, Processing, and Storage Tiers, in accordance with certain exemplary embodiments herein;

(50) FIG. 51 depicts a third exemplary assignment of the enhanced redaction flow elements between the Front End, Processing, and Storage Tiers, in accordance with certain exemplary embodiments herein;

(51) FIG. 52 depicts a fourth exemplary assignment of the enhanced redaction flow elements between the Front End, Processing, and Storage Tiers, in accordance with certain exemplary embodiments herein;

(52) FIG. 53 depicts a fifth exemplary assignment of the enhanced redaction flow elements between the Front End, Processing, and Storage Tiers, in accordance with certain exemplary embodiments herein; and,

(53) FIG. 54 depicts a sixth exemplary assignment of the enhanced redaction flow elements between the Front End, Processing, and Storage Tiers, in accordance with certain exemplary embodiments herein.

FURTHER DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

(54) Referring to FIGS. 10-28, the depicted data link(s) are each preferably provisioned as one or more secure data tunnels, using, for example, a secure shell (SSH) protocol. See e protocol (incorporated by reference herein). Indeed, such SSH-provisioned data link(s) may be used to support any, or all, of the data communication links between functional or structural components of the various embodiments herein.

(55) Reference is now made to FIGS. 29-31, which show three examples of possible data flows between the Front End, Processing Tier, and Storage Tier, in accordance with the various exemplary embodiments herein. While the arrows depict the directionality of “data” transfers (such as audio data, text, meta-data, etc.), it will be understood that there will or may exist additional signaling and control flows that may simultaneously operate in other directions or between other components. For example, in FIG. 29, data flow from the Front End to the Storage Tier is indicated as one way; however, those skilled in the art will appreciate that there will likely be some signaling channel or pathway that, for example, permits the Storage Tier to signal its readiness to receive data to the Storage Tier.

(56) FIG. 29 depicts an exemplary embodiment in which data is collected by or originates in the Front End, is then “sent” or “routed”—by, for example, a push protocol (see, e.g., https://en.wikipedia.org/wiki/Push_technology, incorporated by reference herein) or a pull/get protocol (see, e.g., https://en.wikipedia.org/wiki/Pull_technology, incorporated by reference herein)—to the Storage Tier. Data is then sent from the Storage Tier to the Processing Tier for processing, with the processed data sent back to the Storage Tier for storage/archiving. This embodiment also permits data that already exists in the Storage Tier, or is sent there through other network connections, to be routed to the Processing Tier for processing and sent back for storage/archiving.

(57) FIG. 30 depicts an exemplary embodiment in which data is collected by or originates in the Front End, is then sent directly to the Processing Tier for processing, and then sent to the Storage Tier for storage/archiving. Such direct data transfer from the Front End to the Processing Tier reduces latency, which is important in the case of systems that have “real time” monitoring or alerting aspects. This embodiment also permits data that already exists in the Storage Tier, or is sent there through other network connections, to be routed to the Processing Tier for processing and sent back for storage/archiving. Additionally, though not depicted in FIG. 30, “real time” systems may interact directly with the Processing Tier to receive processed data without the additional latency associated with the Storage Tier.

(58) FIG. 31 depicts a “hybrid” embodiment, in which data is collected by or originates in the Front End, some or all of which may be then sent directly to the Processing Tier for processing, then sent to the Storage Tier for storage/archiving, and some or all of which may also be sent to the Storage Tier, from which it is then sent to the Processing Tier for processing, with the processed data sent back to the Storage Tier for storage/archiving. This permits use of the direct data routing approach for “real time” audio feeds, and lower cost “batch mode” processing for other data feeds, which can be processed during time(s) when power and cloud resources are cheaper, for example.

(59) Referring now to FIGS. 46-48, which depict exemplary flow diagrams for improved automatic redaction flows that employ ASR confidence to avoid or minimize the effect of poor recognition on redaction completeness, additional detail regarding the depicted elements is as follows:

(60) Receive audio: Audio may be received or obtained from any source, whether a “live” feed (such as CTI, VOIP tap, PBX) or a recorded source (such as on-prem storage, cloud storage, or a combination thereof). Thus, the depicted Source Audio may be a disk, or a provisioned portion of a data processing cloud (configured as a cloud PBX), for example.

(61) Transcode: Audio transcoding is the direct digital-to-digital conversion of one audio encoding format to another (e.g., MP3 to WAV). This is an optional step. Typically, all audio is transcoded to a single, least-compressed format (such as WAV) prior to ingestion into an ASR engine. Numerous audio transcoders are available free of charge. See https://en.wikipedia.org/wiki/List_of_audio_conversion_software (incorporated by reference herein).

(62) VAD: Voice activity detection is another optional step. Its main function is to eliminate dead space, to improve utilization efficiency of more compute-intensive resources, such as the ASR engine, or of storage resources. VAD algorithms are well known in the art. See https://en.wikipedia.org/wiki/Voice_activity_detection (incorporated by reference herein).

(63) Segregate: Segregation of the speech input into words or utterances (preferred) is performed as an initial step to ASR decoding. Though depicted as a distinct step, it may be performed as part of the VAD or ASR processes.

(64) Confidence: Confidence may be determined either by the ASR engine (as per FIG. 46) or using a separate confidence classifier (as per FIGS. 47 and 48). As shown in FIG. 47, the confidence classifier may operate from the same input stream as the ASR, or may utilize both the input and output of the ASR in its computation, as per FIG. 48.

(65) Sensitive information: Identifying sensitive information may utilize simple text queries (such as simple number searches) or more sophisticated NLP and/or language classification techniques. As previously discussed, numerous techniques are known and available for identifying redaction targets in unstructured text.

(66) Low ASR confidence: If ASR confidence dips below a “threshold” value, then a word, phrase, or utterance that would ordinarily be passed unredacted will nevertheless be flagged for redaction. In some embodiments, the “threshold” is preset; whereas in other embodiments, it may vary dynamically, based for example on the moving average of confidence values being seen by the redaction module.

(67) Prepare redacted audio: This process involves removing (or masking with noise or voice-over token, e.g., “name”, “health condition”) those portions of the source audio that have been flagged for redaction. Preparing redacted audio preferably involves the creating of a structured data record (e.g., FIG. 44) that would include associated meta-data, as well as unredacted records encoded with appropriate levels of encryption. In some embodiments, additional processing of the source audio may be employed to eliminate all personally identifiable characteristics by, for example, using voice de-identification processing and/or accent normalization, as previously discussed.

(68) Redacted audio: The redacted audio may be stored in any means of permanent or temporary data storage (e.g., disk or cloud), or may be immediately passed on for ingestion into an analytics or real-time alerting system.

(69) FIGS. 49-54 depict how these functions/modules can be assigned amongst the Front End, Processing Tier, and Storage Tier, in accordance with the invention herein. These, however, are merely illustrative, and not meant to be in any way limiting. Furthermore, FIGS. 2-28 illustrate how the Front End, Processing Tier, and Storage Tier functions may be provisioned in cloud(s), on premises, or in embodiments that bridge the on-prem/cloud boundary with secure data links.

Use of ASR confidence to improve reliability of automatic audio redaction

Assignee

Inventors

Cpc classification

Classification Explorer

G10L25/78

PHYSICS

Classification Explorer

G10L15/02

PHYSICS

Classification Explorer

G06F3/167

PHYSICS

Classification Explorer

G10L2015/025

PHYSICS

Classification Explorer

G10L15/26

PHYSICS

Classification Explorer

G06F21/6254

PHYSICS

Classification Explorer

G06F21/6245

PHYSICS

International classification

Classification Explorer

G06F21/62

PHYSICS

Classification Explorer

G10L15/02

PHYSICS

Classification Explorer

G10L25/78

PHYSICS

Abstract

Claims

Description