SPEECH TO TEXT CONVERSION OF NON-SUPPORTED TECHNICAL LANGUAGE

Abstract

The invention relates to a computer-implemented method for converting speech to text. The method comprises: receipt (102) of a speech signal (206), which contains general language terms and technical language terms; input (104) of the received speech signal into a speech-to-text conversion system (226), which only supports the conversion of speech signals into a target vocabulary (234) which does not contain the technical language terms; receipt (106) of a text (208), which was generated by the speech-to-text conversion system from the speech signal; generation (108) of a corrected text (210) by automatically replacing terms and expressions from the target vocabulary in the received text with technical language terms according to an assignment table (238), which assigns at least one term or one expression from the target vocabulary, incorrectly recognized by the speech-to-text conversion system, to each of a plurality of technical language terms; and output (110) of the corrected text to the user or to software and/or a hardware component for executing a function.

Claims

1. A computer-implemented method for converting speech to text, including: receipt (102) by an end device (212) of a speech signal (206) of a user (202), wherein the speech signal contains general language terms and technical language terms spoken by the user; input (104) of the received speech signal into a speech-to-text conversion system (226), wherein the speech-to-text conversion system only supports the conversion of speech signals into a target vocabulary (234) which does not contain the technical language terms; receipt (106) from the speech-to-text conversion system of a text (208), which was generated by the speech-to-text conversion system from the speech signal; generation (108) of a corrected text (210) by automatically replacing terms and expressions from the target vocabulary in the received text with technical language terms according to an assignment table (238) of terms in text form, wherein the assignment table assigns at least one term from the target vocabulary to each of a plurality of technical language terms, wherein the at least one term of the target vocabulary, assigned to one technical language term, is a term or an expression, which the speech-to-text conversion system incorrectly recognizes when this technical language term is entered in the form of an audio signal; and output (110) of the corrected text to the user and/or to software (528/240) and/or to a hardware component (506-516, 240), wherein the software or hardware component is configured to execute a function according to information in the corrected text.

2. The computer-implemented method according to claim 1, wherein the generation of the corrected text is carried out by a correction system, wherein the correction system is the end device (212) or a correction computer system (314, 402) operatively connected to the end device via a network.

3. The computer-implemented method according to one of the preceding claims, wherein the target vocabulary comprises a quantity of general language terms; or wherein the target vocabulary comprises a quantity of general language terms and terms derived therefrom; or wherein the target vocabulary comprises a quantity of general language terms, supplemented by terms derived therefrom and/or supplemented by terms which are formed by combinations of recognized syllables.

4. The computer-implemented method according to one of the preceding claims, wherein the technical language terms are terms from one of the following categories: names of chemical substances, especially paints and lacquers or additives in the paint and lacquer sector; physical, chemical, mechanical, optical, or haptic properties of chemical substances; names of laboratory devices and equipment in the chemical industry; names of laboratory consumables and laboratory supplies; trade names in the paint and lacquer sector.

5. The computer-implemented method according to one of the preceding claims, further comprising: receipt or calculation of frequency information, wherein the frequency information for at least some of the terms in the text, which was generated by the speech-to-text conversion system from the speech signal, indicates how often the occurrence of this term is to be statistically expected; wherein, during the generation of the corrected text, only those terms of the target vocabulary in the received text, whose statistically-expected frequency of occurrence lies below a predefined threshold value according to the received frequency information, are replaced by technical language terms according to the assignment table.

6. The computer-implemented method according to claim 5, wherein the calculation of the frequency information is carried out by means of a hidden Markov model.

7. The computer-implemented method according to one of the preceding claims, further comprising: receipt of part-of-speech tags—POS tags—for at least some of the terms in the text, which were generated by the speech-to-text conversion system from the speech signal, wherein the POS tags contain at least tags for noun, adjective, and verb; wherein the technical language terms of the assignment table are stored together with the part-of-speech tags of the technical language terms; wherein, during the generation of the corrected text, only those terms of the target vocabulary in the received text are replaced by technical language terms, whose POS tags match, according to the assignment table.

8. The computer-implemented method according to one of the preceding claims, further comprising: for each of a plurality of technical language terms, recording of at least one reference speech signal, which selectively reproduces this technical language term, by at least one speaker; input of each of the reference speech signals into the speech-to-text conversion system; for each of the entered reference speech signals, receipt from the speech-to-text conversion system of at least one term of the target vocabulary, which was generated by the speech-to-text conversion system from the entered reference speech signal, wherein each of the received terms of the target vocabulary represents an incorrect conversion, since the target vocabulary of the speech-to-text conversion system does not support the technical language terms; wherein the assignment table assigns the at least one term of the target vocabulary in text form, which was respectively generated by the speech-to-text conversion system from the reference speech signal containing this technical language term, to each of the technical language terms and expressions, for which at least one reference speech signal was recorded.

9. The computer-implemented method according to claim 8: wherein multiple reference speech signals are respectively spoken and recorded by different speakers for at least some of the technical language terms, wherein the multiple reference speech signals reproduce this technical language term; wherein the assignment table assigns multiple terms of the target vocabulary in text form to each of the at least some of the technical language terms, wherein the multiple terms of the target vocabulary represent incorrect conversions, which the speech-to-text conversion system generated for the different speakers depending on their voices.

10. The computer-implemented method according to one of the preceding claims, wherein the output of the corrected text to the user is carried out and comprises: display of the corrected text on a screen (218) of the end device; and/or output of the corrected text via a text-to-speech interface and a speaker of the end device.

11. The computer-implemented method according to one of the preceding claims, wherein the output of the corrected text is carried out to the software, wherein the software is selected from a group comprising: a chemical substance database, which is designed to interpret the corrected text as a search input and to determine and return information related to the search input in the database; and/or an internet search engine, which is designed to interpret the corrected text as a search input and to determine and return information from the internet related to the search input; and/or simulation software, which is designed to simulate properties of chemical products, in particular of lacquers and paints, based on a predetermined recipe, wherein the simulation software is designed to interpret the corrected text as a specification of a recipe of a product, whose properties are to be simulated; control software for controlling chemical syntheses and/or the generation of substance mixtures, in particular of paints and lacquers, wherein the control software is designed to interpret the corrected text as a specification of the synthesis or of the components of the substance mixture.

12. The computer-implemented method according to one of the preceding claims, further comprising: output of a result of executing the function by the software or hardware component via a speaker or a screen of the end device.

13. The computer-implemented method according to one of the preceding claims, wherein the output of the corrected text is carried out to the hardware component, wherein the hardware component is a system for carrying out chemical analyses, chemical syntheses, and/or for generating substance mixtures, in particular of paints and lacquers, wherein the system is designed to additionally interpret the corrected text as a specification of the synthesis or of the components of the substance mixture or as a specification of the analysis.

14. The computer-implemented method according to one of the preceding claims, wherein the speech-to-text conversion system is implemented as a service which is provided via the internet to a plurality of end devices; and/or wherein the end device is a desktop computer, notebook computer, smartphone, a computer integrated into a laboratory device, a computer coupled locally to a laboratory device, or a single-board computer (Raspberry Pi).

15. An end device (212), comprising: a microphone (214) for receiving a speech signal (206) of a user, wherein the speech signal contains general language terms and technical language terms spoken by the user; an interface (224) to a speech-to-text conversion system (226), wherein the interface is designed to input the received speech signal into the speech-to-text conversion system, wherein the speech-to-text conversion system only supports the conversion of speech signals into a target vocabulary (234) which does not contain the technical language terms; and wherein the interface is designed to receive a text (208), which was generated by the speech-to-text conversion system from the speech signal; a data memory (220) with an assignment table (238) of terms in text form, wherein the assignment table assigns at least one term from the target vocabulary to each of a plurality of technical language terms, wherein the at least one term of the target vocabulary assigned to a technical language term is a term or an expression, which the speech-to-text conversion system incorrectly recognizes when this technical language term is entered in the form of an audio signal; and a correction program (222), which is designed to generate a corrected text (210) by automatically replacing terms and expressions of the target vocabulary in the received text with technical language terms according to the assignment table; and an output interface (218) to output (110) the corrected text to the user and/or to software (528/240) and/or to a hardware component (506-516, 240), wherein the software or hardware component is configured to execute a function according to information in the corrected text.

16. A system including one or more end devices (212) according to claim 15, further comprising a speech-to-text conversion system (226), wherein the speech-to-text conversion system includes: an interface (224′) for receiving speech signals (206) from each of the one or more end devices; an automatic speech recognition processor (232) for generating text (208) from a received speech signal (206), wherein the speech recognition processor only supports the conversion of speech signals into a target vocabulary (234), which does not include the technical language terms; and wherein the interface is designed to return the text (208), generated from the received speech signal, to that end device, from which the speech signal was received.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0098] Embodiments of the invention are explained in greater detail by way of example in the following images:

[0099] FIG. 1 shows a flow chart of a method for the speech-to-text conversion of texts with technical language terms;

[0100] FIG. 2 shows a block diagram of a distributed system for the speech-to-text conversion of texts with technical language terms;

[0101] FIG. 3 shows a block diagram of another distributed system for the speech-to-text conversion;

[0102] FIG. 4 shows a block diagram of another distributed system for the speech-to-text conversion; and

[0103] FIG. 5 shows a block diagram of another distributed system for the speech-to-text conversion in the context of a laboratory.

DETAILED DESCRIPTION OF THE INVENTION

[0104] FIG. 1 shows a flow chart of a computer-implemented method for the speech-to-text conversion of texts with technical language terms. The particular advantage of the method is that an existing speech-to-text conversion system may be used for the recognition and conversion of texts with technical terms, and namely even in the case that this conversion system does not even support the technical language vocabulary. The method may be executed by an end device alone, or by an end device and additional data processing devices, for example, a control computer and/or a computer which provides a correction service via a network. Some possible architectures of distributed and non-distributed data processing systems, which may implement a method according to embodiments of the invention, are depicted in FIGS. 2, 3, and 4. In these figures, reference is also partially made to the description of the flow chart in FIG. 1.

[0105] The method may typically be used in the context of a chemical or biological laboratory. A series of individual analysis devices and a high throughput environment system (HTE system) are located in the laboratory. The HTE system includes a plurality of units and modules, which may analyze and measure different chemical or physical parameters of substances and substance mixtures, and which may combine and synthesize a plurality of different chemical products based on a recipe entered by a user. In addition, an end device, for example, a notebook computer of the laboratory worker with corresponding software in the form of a browser plugin, is located in the laboratory. The HTE system includes an internal database, in which recipes are stored, for example, of paints and lacquers and their raw materials, and also their respective physical, chemical, optical, and other properties. In addition, other relevant data may be stored in the database, for example, product data sheets from the producers of the substances, safety data sheets, parameters for the configuration of individual modules of the HTE system for the analysis or synthesis of certain substances or products, or the like. The HTE system is designed to execute analyses and syntheses based on recipes and instructions, which are entered in text form.

[0106] Frequent activities inside of a laboratory with the laboratory room number 22 relate, for example, to the following activities and to possible, related speech inputs of a laboratory worker 202 to prompt software or hardware to execute an operation: [0107] The day before, the laboratory worker started an analysis of a certain lacquer with respect to its rheological properties, and would now like to retrieve the result stored in the database of the HTE system. Possible speech input: “CONTROL COMPUTER, show me the results of the rheological analysis on Feb. 24, 2019, by the HTE system in room 22.” [0108] The laboratory worker would like to reduce costs and considers replacing a certain solvent «SOLVENT_EXPENSIVE» with a less expensive solvent «SOLVENT_INEXPENSIVE». The name «SOLVENT_INEXPENSIVE» is a trade name of the manufacturer. However, the worker is not certain whether the less expensive solvent is suitable for the lacquer to be produced, and would like to view the product data sheet, in which additional information regarding the chemical and physical properties of the inexpensive solvent are specified. Possible speech input: “CONTROL COMPUTER, display the product data sheet for «SOLVENT_INEXPENSIVE»” or “CONTROL COMPUTER, display the product data sheet for «SOLVENT_INEXPENSIVE» stored in the HTE database of room 22”. [0109] After viewing the product data sheet for the solvent «SOLVENT_INEXPENSIVE», the laboratory worker is of the opinion that the solvent may be prospectively used for the production of the certain lacquer instead of the more expensive solvent. However, it is assumed that the recipe must be adapted somewhat, since multiple parameters, for example, pH value, rheological properties, polarity, and others deviate from those of the more expensive solvent. Since these properties interact with one another, it is not possible to manually identify the necessary adjustments to the recipe. Carrying out test series is labor intensive and costs time. However, the laboratory has software, which may predict (simulate) the properties of a chemical product, for example of paints and lacquers, on the basis of a certain recipe. The simulation may be based on, e.g., CNNs (convolutional neuronal networks). The laboratory worker would like to use this simulation software in order to simulate the predicted properties of a lacquer, based on a known recipe, in which the expensive solvent was replaced by the inexpensive one. Possible speech input: “CONTROL COMPUTER, prompt the HTE simulation software to calculate the properties of a lacquer with the following recipe: 70.2 g naphtenic oil, 4 g methyl n-amyl ketone, 1.5 g n-pentyl propionate, 1 g Ultrasorb, 50 g «LMGÜNSTIG»”. [0110] The simulation has shown that the inexpensive solvent is not suited for the production of the lacquer. The laboratory worker would now like to search the internet for other solvents, which may replace the expensive solvent without degrading the quality of the product, in order to reduce costs. Possible speech input: “CONTROL COMPUTER, search the internet for «high viscosity solvents for lacquer production»”.

[0111] According to embodiments of the invention, all of these inputs and commands to the respective execution systems may be carried out without the user having to leave the laboratory room and/or remove gloves.

[0112] In a first step 102, laboratory worker 202 makes a speech input 204 into a microphone 214 of end device 212, 312. For example, the speech input may comprise one of the above-mentioned voice commands. The speech inputs generally include both general language and also technical language terms and expressions. Thus, for example, the terms or expressions “rheological”, “naphtenic oil', “methyl n-amyl ketone”, “n-pentyl propionate”are chemical technical terms and «LMGÜNSTIG» is a trade name of a chemical product. These terms or expressions are typically not included in the vocabulary (“target vocabulary”) supported by the commonly used, general language speech-to-text conversion systems.

[0113] Microphone 214 converts the speech input into an electronic speech signal 206. This speech signal is then input into a speech-to-text conversion system 226 in step 104.

[0114] For example, as shown in FIG. 2, the end device may have an interface 224 and a client application 222 corresponding to one of the known general language speech-to-text conversion systems 226 from, for example, Google, Apple, Amazon, or Nuance. This client application 222 transmits the speech signal via interface 224 directly to speech-to-text conversion system 226. However, in other embodiments, it is also possible that the speech signal is transmitted to speech-to-text conversion system 226 via one or more intermediary data processing devices. According to the embodiments of the invention depicted in FIGS. 3 and 4, the speech signal is initially transmitted to a control computer 314, 414, which then forwards it to speech-to-text conversion system 226 via a network 236. This network may be, for example, the internet.

[0115] Control computer system 314, 414 executes coordination and control activities related to the management and processing of the speech signal and the text generated from the same. Control computer 314 is a data processing system which executes the text correction itself. Control computer 414 has outsourced this computing step to another data processing system.

[0116] Speech-to-text conversion system 226 is a general language conversion system, i.e., it only supports the conversion of speech signals into a general language target vocabulary 234, which does not contain the technical language terms of speech input 204.

[0117] The speech-to-text conversion system now carries out the conversion of the speech signal into a text based on the target vocabulary. Typically, speech-to-text conversion system 226 is a cloud service, which may process a plurality of speech signals of multiple end devices in parallel and may return these to the same via the network. However, the generated text—regardless of how the speech-to-text conversion system is implemented—certainly, or with a high degree of probability, contains incorrectly recognized terms and expressions, since at least some of the terms and expressions of speech input 204 comprise technical language terms or expressions, whereas the conversion system only supports the target vocabulary, which does not contain the technical language terms and expressions.

[0118] In step 106, that data processing system, which transmitted speech signal 206 to speech-to-text conversion system 226, receives, as a response thereto, text 208, generated by the speech-to-text conversion system from this signal. The data processing system functioning as the receiver (“receiving system”) may thus be, depending on the system architecture, the end device, or a control computer 314, as shown in FIG. 3, or a control computer 414, as shown in FIG. 4.

[0119] In another step 108, an assignment table 238 is used in order to correct the received text. The data processing system, which carries out the text correction, is also designated according to its function in this case as the “correction system”. This may be, depending on the embodiment, end device 212, or control computer system 314 or a correction computer system 402. In the case that the receiving system and the correction system are not identical, text 208, received by the receiving system, is forwarded to the correction computer system.

[0120] In assignment table 238, terms are assigned to one another in text form. Stated more precisely, the assignment table assigns at least one term from the target vocabulary to each of a plurality of technical language terms or technical language expressions. The at least one term of the target vocabulary, assigned to a technical language term (or technical language expression), is a term or an expression, which the speech-to-text conversion system incorrectly recognizes (and has incorrectly recognized earlier during the generation of the assignment table), when this technical language term is input into the speech-to-text conversion system in the form of an audio signal.

[0121] In step 108, correction system 212, 314, 402 generates a corrected text 210 from incorrect text 208 of conversion system 226. The corrected text is automatically generated by the correction system, in that terms and expressions of the target vocabulary in received text 208 are replaced with technical language terms according to assignment table 238.

[0122] In the case that the correction system is a correction computer, as shown in FIG. 4, the corrected text is returned to a control computer.

[0123] The end device or the control computer inputs corrected text 210 directly or indirectly into an execution system 240 in step 110. Examples for different execution systems are depicted in FIG. 5. The execution system, a software component and/or a hardware component, executes a software function and/or hardware function according to the corrected text and returns result 242. The result may be returned, for example, directly to the end device or may be returned to the end device via the control computer as an intermediate station. Alternatively or additionally, however, the result may also be returned to different end devices and other data processing systems.

[0124] In the embodiments depicted in FIGS. 3 and 4, control computer 314, functioning as the correction system, transmits the corrected text to execution system 240, receives result 242 of the execution by the same, and forwards this result to the end device to be output to user 202. The result is typically a text, e.g., a recipe, researched in a database, for the synthesis of a chemical substance; a document, e.g. product data sheet of a substance, identified in a database or the internet; the confirmation that a chemical analysis or synthesis, which was carried out according to the information in the corrected text, was successfully completed (or, if this was not the case, a corresponding error message).

[0125] Finally, the end device or another data processing system may output the result of carrying out the function by execution system 240, comprising software and/or hardware, to user 202. The software and/or the hardware is preferably software and hardware, which are developed inside of a laboratory or specifically for activities inside of a laboratory, or which are at least usable for this.

[0126] For example, end device 212 may include a speaker or may be communicatively coupled to the same and may output the result in acoustic form via this speaker.

[0127] Additionally or alternatively, the end device may include a screen to output the result to the user. Additional output interfaces are also possible, for example, Bluetooth-based components.

[0128] For example, the method according to embodiments of the invention may function for implementing voice control of electronic devices, in particular laboratory instruments and HTE systems by means of voice control. The voice control may also be used in order to research and to output results from analyses and syntheses, already carried out in the laboratory, laboratory protocols and product data sheets in corresponding databases of the laboratory, and to carry out voice-controlled supplemental searches both on the internet and in public and proprietary databases accessible via the internet. Voice commands, which include specific trade names of chemicals or laboratory devices or laboratory consumables and/or names and adjectives of the chemical technical language, are also correctly converted into text and may thus be correctly interpreted by the execution system. According to embodiments of the invention, a largely voice-controlled, highly integrated operation of a chemical or biological laboratory or a laboratory HTE system is thus facilitated. The term “CONTROL COMPUTER” in the speech input may, for example, represent the name of a virtual assistant 502 for speech-based operation of the devices of a laboratory and/or an HTE system of a laboratory. Analogous to the virtual assistants Alexa and Siri for everyday problems, the term “CONTROL COMPUTER” (or, optionally, any other name more reminiscent of a human being, like “EVA”) may function as a trigger signal to prompt a text evaluation logic of this laboratory assistant to evaluate the corrected text. The laboratory assistant is configured to subsequently check each received text, for whether this text includes its name and, optionally, other key terms. If this is the case, then the corrected text is further analyzed to recognize and execute commands encoded therein.

[0129] According to one embodiment, the output of the results data, which was determined on the basis of the corrected text input into the laboratory device or the HTE system, is carried out via a speaker, which is located within the laboratory room. For example, the speaker may be a speaker, which is a component of the end device that received the speech input of the user. This may, however, also be a different speaker, which is communicatively connected to this end device. This has the advantage that a laboratory worker may seamlessly enter commands with their voice, for example, about analysis results, product data sheets or another context, to quickly find out information for chemical analyses, syntheses, and products. The results of this verbal search instruction are acoustically output via the speaker. The user may use the heard information in order to formulate additional search commands and/or to speak a voice command into the microphone to carry out an analysis or synthesis while taking into account the acoustically-output research results. This cycle of acoustic input and output may be repeated multiple times without necessitating an input of data or commands via a keyboard for this. However, laboratory process may be configured substantially more efficiently.

[0130] In the context of the chemical synthesis of paints and lacquers, efficiently obtaining information related to chemical substances and a voice-based control of laboratory devices and HTE systems is particularly advantageous, as a large plurality of raw materials is necessary for the production of paints and lacquers, wherein their properties interact with one another in complex ways and strongly influence the properties of the product. Thus, a plurality of analyses, control steps, and test series arise in the context of the production of paints and lacquers. Paints and lacquers are highly complex mixtures of up to 20 raw materials and more, for example, solvents, resins, curing agents, pigments, fillers, and numerous additives (dispersing agents, wetting agents, adhesion promoters, defoamers, biocides, flame retardants, and others). An efficient procurement of information related to the individual components and for controlling the corresponding analysis and synthesis systems may substantially accelerate the production process and the quality assurance of the products.

[0131] FIG. 2 shows a block diagram of a distributed system 200 for the speech-to-text. conversion of texts with technical language terms.

[0132] The essential functions of the components of system 200 and its components were already described with reference to FIG. 1. End device 212 may be, for example, a notebook computer, a standard computer, a tablet computer, or a smartphone. Client software 222, which is interoperable with an existing general language speech-to-text conversion system 226, is installed on the end device. For example, speech-to-text conversion system 226 is a cloud computer system, which offers the conversion as a service over the internet via a corresponding speech-to-text interface (StT interface) 224. This service is a software program 232, implemented on the server side and which corresponds in a functional perspective to a speech recognition and speech conversion processor. For example, software program 232 may be Google's speech-to-text cloud service. Interface 224 is, in this case, a cloud-based API from Google.

[0133] In the embodiment depicted in FIG. 2, the end device has an assignment table 238 and sufficient computing power to itself carry out the correction, based on the table, of text 208 generated by speech-to-text conversion system 226. The transmission of speech signal 206 to server 226, the receipt of text 208 from server 226, and the correction of the text to generate corrected text 210, may thus be implemented in client program 222. Client program 222 may be, for example, a browser plugin or a standalone application, which is interoperable with server software 232 via interface 224.

[0134] FIG. 3 shows a block diagram of another distributed system 300 for the speech-to-text conversion.

[0135] The essential functions of system 300 and its components were already described with reference to FIG. 1 and FIG. 2. The system architecture of system 300 differs from the architecture of system 200 to the effect that end device 312 has outsourced the function of the text correction to a control computer 314. Client software 316, installed on end device 312 and called control client in this case, is interoperable with a corresponding control program 320, which is installed on control computer 314. The end device is connected to control computer 314 via a network 236, for example, the internet. Control interface 318 functions for data exchange between control client 316 and control program 320.

[0136] Control computer 314 may be, for example, a standard computer. However, the control computer is advantageously a server or a cloud computer system.

[0137] Control program 320, installed on the control computer, first implements a coordinative function 322 in order to coordinate the exchange of data (speech signal 206, recognized text 208, corrected text 210) between the various data processing devices (end device, control computer, speech-to-text conversion system). Secondly, in the embodiment shown here, control program 320 implements a text correction function 324, which is executed in system 200 by the end device. Correction function 324 comprises the replacement of terms and expressions of the target vocabulary in received text 208 with technical language terms and expressions according to assignment table 238. In addition, over the course of the replacement, probabilities of occurrence and/or POS tags may be taken into consideration, which are calculated by control computer 314 or are received via StT interface 224 from speech-to-text conversion system 226 together with text 208. Speech client 222, which in this embodiment only controls the data exchange with conversion system 226 and does not carry out the text correction, may be implemented as a component of control program 320. However, it is also possible that control program 320 and client 222 are separate but mutually interoperable programs.

[0138] The architecture depicted in FIG. 3 has the advantage that the end device does not have to execute any computationally intensive operations. Both the conversion of the speech signal into text and also the correction of this text are taken over by other data processing systems. The function of end device 312 is substantially limited to the receipt of speech signal 206, forwarding the speech signal to a predefined control computer 314 with a known address, and the output of a result, which is returned from an execution system for carrying out a function according to the corrected text.

[0139] FIG. 4 shows a block diagram of another distributed system 400 for the speech-to-text conversion.

[0140] The essential functions of system 400 and its components were already described with reference to FIGS. 1, 2, and 3. The system architecture of system 400 differs from the architecture of system 300 to the effect that control computer 414 does not itself undertake the text correction, but instead has it carried out by another computer, designated here as “correction computer” or “correction server” 402, wherein other computer 402 is interoperably connected to control program 320 of the control computer via a network and an intrinsic interface 406.

[0141] This architecture may be advantageous, since a separate computer or computer network, which may be designed as a cloud system, is used for the text correction. This enables a separate granting of access rights. Control program 320 on control computer 414 may, for example, have comprehensive access rights with respect to different, sometimes sensitive data, which is generated over the course of the analysis and synthesis of chemical substances and substance mixtures in the laboratory, for example, using an HTE system. According to embodiments of the invention, control computer 414 may have, for example, a machine-to-machine interface in order to transmit the corrected text, in the form of a control command, directly to a laboratory device or an HTE system, or to its database in order to initiate an analysis, chemical synthesis, or research, based on corrected text 210. Secure and strict access protection for control computer 414 is therefore particularly important.

[0142] In the context of the architecture of system 400, correction server 402 only functions to correct text 208, which was generated by speech-to-text conversion system 226 and returned to control program 320. A user, who receives access to correction server 402, for example, in order to update and supplement table 238 with additional technical terms and technical expressions, thus has no read and/or write access to control computer 414 according to embodiments of the invention. It is thus possible to continuously update the assignment table and thus the text correction, without necessitating the granting of comprehensive access rights to sensitive control logic and databases of a laboratory to the personnel responsible for this.

[0143] End device 312 of distributed systems 300, 400 may be, for example, computers, notebook computers, smartphones, and the like. However, it is also possible that this is comparatively computationally weak single-board computers, e.g., Raspberry Pi systems.

[0144] The hardware (smart speakers) of known speech-to-text cloud services providers, pursue the objective to directly control and use services developed by the cloud providers themselves. The use in the area of technical vocabulary is currently not developed or developed only to a very limited extent.

[0145] All of system architectures 200, 300, 400, and 500, shown here, allow the use of existing speech-to-text APIs of diverse cloud providers by means of separate hardware, independent of the cloud provider, in order to enable subject-specific speech recognition and, based on this, to control laboratory devices and electronic search functions in a laboratory.

[0146] FIG. 5 shows a block diagram of another distributed system 500 for the speech-to-text conversion in the context of a chemical laboratory. The laboratory comprises a laboratory area 504 with conventional safety regulations. Different individual laboratory devices 516, e.g., a centrifuge and an HTE system 518, are located in this area. The HTE system includes a plurality of modules and hardware units 506-514, which are managed and controlled by a controller 520. The controller functions as the central interface for external monitoring and control of the devices included in the HTE system. Control program 320 on control computer 414 includes a software module 502, which implements a virtual laboratory assistant.

[0147] The generation of a corrected text 210 from a speech input 204 of a user 202 is carried out as already described according to embodiments of the invention. After control program 320 has received the corrected text from correction computer 402, the control program evaluates this and thereby searches for a keyword, like “CONTROL COMPUTER” or “EVA”. In the case that the corrected text contains this keyword, then virtual laboratory assistant 502 is subsequently prompted to further analyze the corrected text to see whether the corrected text contains commands to carry out a hardware or software function and, if yes, which hardware or software, controlled by laboratory assistant 502, should execute these commands. For example, the corrected text may contain names of devices or laboratory areas, which specify to which device or to which software the command should be forwarded.

[0148] In one possible implementation example, the evaluation of corrected text 210 by the virtual laboratory assistant yields that an internet search engine 528 is to search for a certain substance, which is specified as a technical language term or expression in corrected text 210. The corrected text or certain parts thereof are input by virtual assistant 502 into the search engine via the internet. Results 524 of the internet research are returned to assistant 502, which forwards them to a suitable output device in the vicinity of user 202, for example, end device 312, where they are output via a speaker or screen 218.

[0149] In another possible implementation example, the evaluation of corrected text 210 by the virtual laboratory assistant yields that laboratory device 516, a centrifuge, should pelletize a certain material at a certain rotational speed. The name of the centrifuge and the material are specified in corrected text 210 as a technical language term or expression, which is sufficient, since the centrifuge automatically reads the centrifugation parameters to be used, like duration and number of revolutions, from an internal database based on the substance names. The corrected text or certain parts thereof are transmitted by virtual assistant 502 to centrifuge 516 via the internet. The centrifuge starts a centrifugation program, related to the substance, and returns a message about the successful or unsuccessful centrifugation as a text message 522. Result 522 is returned to assistant 502, which forwards this to a suitable output device, for example, end device 312, where it is output via a speaker or screen 218.

[0150] In another possible implementation example, the evaluation of corrected text 210 by the virtual laboratory assistant yields that HTE system 518 should synthesize a certain lacquer. The components of the lacquer are likewise specified in the corrected text and comprise a mixture of trade names of chemical products and IUPAC substance names. The HTE system receives corrected text 210 and autonomously decides to carry out the synthesis in synthesis unit 514. A message about the successful synthesis or an error message is returned as result 526 from synthesis unit 514 to the controller of HTE system 518, and the controller in turn returns result 526 to virtual laboratory assistant 502, which forwards it to a suitable output device, for example, end device 312, where it is output via a speaker or screen 218.

LIST OF REFERENCE NUMERALS

[0151] 102-110 Steps [0152] 200 Distributed system [0153] 202 User [0154] 204 Speech input [0155] 206 Speech signal [0156] 208 Recognized text [0157] 210 Corrected text [0158] 212 End device [0159] 214 Microphone [0160] 216 Processor(s) [0161] 218 Screen [0162] 220 Storage medium [0163] 222 Client program [0164] 224 Interface (client side) [0165] 224′ Interface (server side) [0166] 226 Speech-to-text conversion system/Cloud system [0167] 228 Processor(s) [0168] 230 Storage medium [0169] 232 Speech recognition processor [0170] 234 Target vocabulary [0171] 236 Network [0172] 238 Assignment table [0173] 240 Execution system (software and/or hardware) [0174] 242 Result of the execution of the corrected text (in text form) [0175] 300 Distributed system [0176] 312 End device [0177] 316 Client software of the control program [0178] 318 Interface of the control program [0179] 320 Control program [0180] 322 Coordination function [0181] 324 Text correction function/Text correction program [0182] 400 Distributed system [0183] 402 Correction server/Text correction cloud system [0184] 404 Client software of the text correction program [0185] 406 Interface of the text correction program [0186] 414 Control computer [0187] 500 Distributed system [0188] 502 Virtual laboratory assistant [0189] 504 Laboratory area [0190] 506 Analysis device [0191] 508 Analysis device [0192] 510 Mixer [0193] 512 Synthesis unit [0194] 514 Synthesis unit [0195] 516 Standalone laboratory device [0196] 522 Result of the execution of the corrected text (text form) [0197] 524 Result of the execution of the corrected text (text form) [0198] 526 Result of the execution of the corrected text (text form) [0199] 528 Internet search engine

SPEECH TO TEXT CONVERSION OF NON-SUPPORTED TECHNICAL LANGUAGE

Assignee

Inventors

Cpc classification

Classification Explorer

G10L15/22

PHYSICS

Classification Explorer

G10L15/142

PHYSICS

Classification Explorer

G10L15/19

PHYSICS

Classification Explorer

G10L15/30

PHYSICS

Classification Explorer

G06F40/253

PHYSICS

Classification Explorer

G06F40/20

PHYSICS

Classification Explorer

G10L2015/221

PHYSICS

Classification Explorer

G06F40/166

PHYSICS

Classification Explorer

G10L15/26

PHYSICS

Classification Explorer

G06F40/157

PHYSICS

International classification

Classification Explorer

G10L15/19

PHYSICS

Classification Explorer

G06F40/166

PHYSICS

Classification Explorer

G06F40/253

PHYSICS

Classification Explorer

G10L15/22

PHYSICS

Classification Explorer

G10L15/30

PHYSICS

Abstract

Claims

Description