Machine-Learning-Based Identification of Drawing Attributes
20230053656 · 2023-02-23
Inventors
- Patavee Meemeng (Los Angeles, CA, US)
- Gagik Hacobian (La Crescenta, CA, US)
- Hunter Buckhorn (Santa Barbara, CA, US)
Cpc classification
G06V30/414
PHYSICS
G06V30/416
PHYSICS
International classification
Abstract
An example computing system is configured to: (i) access a drawing associated with a construction project; (ii) identify, in the drawing, a set of candidate textual elements that potentially represent a title of the drawing; (iii) for each candidate textual element, (a) determine a respective dataset comprising values for a set of data variables that are potentially predictive of whether the candidate textual element is the title of the drawing, and (b) input the respective dataset into a machine-learning model that functions to (1) evaluate the respective dataset and (2) output, based on the evaluation, a respective score indicating a likelihood that the candidate textual element represents the title of the drawing; and (iv) based on the respective scores for the candidate textual elements that are output by the machine-learning model, select one given candidate textual element as the title of the drawing.
Claims
1. A computing system comprising: at least one processor; a non-transitory computer-readable medium; and program instructions stored on the non-transitory computer-readable medium that are executable by the at least one processor such that the computing system is configured to: access a drawing associated with a construction project; identify, in the drawing, a set of candidate textual elements that potentially represent a title of the drawing; for each candidate textual element in the set of candidate textual elements, (i) determine a respective dataset comprising values for a set of data variables that are potentially predictive of whether the candidate textual element is the title of the drawing, and (ii) input the respective dataset for the candidate textual element into a machine-learning model that functions to (a) evaluate the respective dataset and (b) based on the evaluation, output a respective score indicating a likelihood that the candidate textual element represents the title of the drawing; based on the respective scores for the candidate textual elements in the set of candidate textual elements that are output by the machine-learning model, select one given candidate textual element in the set of candidate textual elements as the title of the drawing; and populate a data attribute field representing the title of the drawing with the given candidate textual element.
2. The computing system of claim 1, wherein identifying the set of candidate textual elements that potentially represent the title of the drawing comprises: identifying one or more rectangular boxes in the drawing; identifying a first subset of candidate textual elements in the one or more rectangular boxes; and including the first subset of candidate textual elements in the set of candidate textual elements.
3. The computing system of claim 2, wherein identifying the set of candidate textual elements that potentially represent the title of the drawing further comprises: applying a mask to the one or more rectangular boxes in the drawing; identifying a second subset of candidate textual elements in one or more targeted portions of the drawing, wherein the one or more targeted portions of the drawing at least partially overlaps with the one or more rectangular boxes; and including the second subset of candidate textual elements in the set of candidate textual elements.
4. The computing system of claim 3, wherein the one or more targeted portions of the drawing includes at least one of (i) a first portion extending upward from a bottom edge of the drawing by no more than 30% of the drawing height, or (ii) a second portion extending leftward from a right edge of the drawing by no more than 30% of the drawing width.
5. The computing system of claim 1, wherein determining the respective dataset comprising values for the set of data variables that are potentially predictive of whether the candidate textual element is the title of the drawing comprises: determining (i) one or more spatial characteristics of the candidate textual element, (ii) one or more linguistic characteristics of the candidate textual element, and (iii) one or more surrounding characteristics of the candidate textual element; and determining the respective dataset based on (i) the one or more spatial characteristics of the candidate textual element, (ii) the one or more linguistic characteristics of the candidate textual element, and (iii) the one or more surrounding characteristics of the candidate textual element.
6. The computing system of claim 5, wherein the one or more spatial characteristics of the candidate textual element comprises at least one of (i) a distance between the candidate textual element and an edge of the drawing, (ii) a location of the textual element within the drawing, (iii) an orientation of the candidate textual element, or (iv) a size of the candidate textual element.
7. The computing system of claim 5, wherein the one or more linguistic characteristics of the candidate textual element comprises at least one of (i) one or more words in the candidate textual element, (ii) one or more parts of speech of the one or more words in the candidate textual element, (iii) information identifying whether the candidate textual element includes a word in a first predefined list of words, (iv) information identifying whether the textual element includes a capitalized character, or (v) information identifying whether the textual element includes a numerical character.
8. The computing system of claim 5, wherein the one or more surrounding characteristics of the candidate textual element comprises at least one of (i) information identifying whether the candidate textual element is located within a rectangular box, (ii) information identifying whether one or more other textual elements are within a threshold distance of the candidate textual element, or (iii) information identifying whether the one or more other textual elements within the threshold distance of the candidate textual element include a word in a second predefined list of words.
9. A non-transitory computer-readable medium, wherein the non-transitory computer-readable medium is provisioned with program instructions that, when executed by at least one processor, cause a computing system to: access a drawing associated with a construction project; identify, in the drawing, a set of candidate textual elements that potentially represent a title of the drawing; for each candidate textual element in the set of candidate textual elements, (i) determine a respective dataset comprising values for a set of data variables that are potentially predictive of whether the candidate textual element is the title of the drawing, and (ii) input the respective dataset for the candidate textual element into a machine-learning model that functions to (a) evaluate the respective dataset and (b) based on the evaluation, output a respective score indicating a likelihood that the candidate textual element represents the title of the drawing; based on the respective scores for the candidate textual elements in the set of candidate textual elements that are output by the machine-learning model, select one given candidate textual element in the set of candidate textual elements as the title of the drawing; and populate a data attribute field representing the title of the drawing with the given candidate textual element.
10. The non-transitory computer-readable medium of claim 9, wherein identifying the set of candidate textual elements that potentially represent the title of the drawing comprises: identifying one or more rectangular boxes in the drawing; identifying a first subset of candidate textual elements in the one or more rectangular boxes; and including the first subset of candidate textual elements in the set of candidate textual elements.
11. The non-transitory computer-readable medium of claim 10, wherein identifying the set of candidate textual elements that potentially represent the title of the drawing further comprises: applying a mask to the one or more rectangular boxes in the drawing; identifying a second subset of candidate textual elements in one or more targeted portions of the drawing, wherein the one or more targeted portions of the drawing at least partially overlaps with the one or more rectangular boxes; and including the second subset of candidate textual elements in the set of candidate textual elements.
12. The non-transitory computer-readable medium of claim 11, wherein the one or more targeted portions of the drawing includes at least one of (i) a first portion extending upward from a bottom edge of the drawing by no more than 30% of the drawing height, or (ii) a second portion extending leftward from a right edge of the drawing by no more than 30% of the drawing width.
13. The non-transitory computer-readable medium of claim 9, wherein determining the respective dataset comprising values for the set of data variables that are potentially predictive of whether the candidate textual element is the title of the drawing comprises: determining (i) one or more spatial characteristics of the candidate textual element, (ii) one or more linguistic characteristics of the candidate textual element, and (iii) one or more surrounding characteristics of the candidate textual element; and determining the respective dataset based on (i) the one or more spatial characteristics of the candidate textual element, (ii) the one or more linguistic characteristics of the candidate textual element, and (iii) the one or more surrounding characteristics of the candidate textual element.
14. The non-transitory computer-readable medium of claim 13, wherein: the one or more spatial characteristics of the candidate textual element comprises at least one of (i) a distance between the candidate textual element and an edge of the drawing, (ii) a location of the textual element within the drawing, (iii) an orientation of the candidate textual element, or (iv) a size of the candidate textual element; the one or more linguistic characteristics of the candidate textual element comprises at least one of (i) one or more words in the candidate textual element, (ii) one or more parts of speech of the one or more words in the candidate textual element, (iii) information identifying whether the candidate textual element includes a word in a first predefined list of words, (iv) information identifying whether the textual element includes a capitalized character, or (v) information identifying whether the textual element includes a numerical character; and the one or more surrounding characteristics of the candidate textual element comprises at least one of (i) information identifying whether the candidate textual element is located within a rectangular box, (ii) information identifying whether one or more other textual elements are within a threshold distance of the candidate textual element, or (iii) information identifying whether the one or more other textual elements within the threshold distance of the candidate textual element include a word in a second predefined list of words.
15. A method comprising: accessing a drawing associated with a construction project; identifying, in the drawing, a set of candidate textual elements that potentially represent a title of the drawing; for each candidate textual element in the set of candidate textual elements, (i) determining a respective dataset comprising values for a set of data variables that are potentially predictive of whether the candidate textual element is the title of the drawing, and (ii) inputting the respective dataset for the candidate textual element into a machine-learning model that functions to (a) evaluate the respective dataset and (b) based on the evaluation, output a respective score indicating a likelihood that the candidate textual element represents the title of the drawing; based on the respective scores for the candidate textual elements in the set of candidate textual elements that are output by the machine-learning model, selecting one given candidate textual element in the set of candidate textual elements as the title of the drawing; and populating a data attribute field representing the title of the drawing with the given candidate textual element.
16. The method of claim 15, wherein identifying the set of candidate textual elements that potentially represent the title of the drawing comprises: identifying one or more rectangular boxes in the drawing; identifying a first subset of candidate textual elements in the one or more rectangular boxes; and including the first subset of candidate textual elements in the set of candidate textual elements.
17. The method of claim 16, wherein identifying the set of candidate textual elements that potentially represent the title of the drawing further comprises: applying a mask to the one or more rectangular boxes in the drawing; identifying a second subset of candidate textual elements in one or more targeted portions of the drawing, wherein the one or more targeted portions of the drawing at least partially overlaps with the one or more rectangular boxes; and including the second subset of candidate textual elements in the set of candidate textual elements.
18. The method of claim 17, wherein the one or more targeted portions of the drawing includes at least one of (i) a first portion extending upward from a bottom edge of the drawing by no more than 30% of the drawing height, or (ii) a second portion extending leftward from a right edge of the drawing by no more than 30% of the drawing width.
19. The method of claim 15, wherein determining the respective dataset comprising values for the set of data variables that are potentially predictive of whether the candidate textual element is the title of the drawing comprises: determining (i) one or more spatial characteristics of the candidate textual element, (ii) one or more linguistic characteristics of the candidate textual element, and (iii) one or more surrounding characteristics of the candidate textual element; and determining the respective dataset based on (i) the one or more spatial characteristics of the candidate textual element, (ii) the one or more linguistic characteristics of the candidate textual element, and (iii) the one or more surrounding characteristics of the candidate textual element.
20. The method of claim 19, wherein: the one or more spatial characteristics of the candidate textual element comprises at least one of (i) a distance between the candidate textual element and an edge of the drawing, (ii) a location of the textual element within the drawing, (iii) an orientation of the candidate textual element, or (iv) a size of the candidate textual element; the one or more linguistic characteristics of the candidate textual element comprises at least one of (i) one or more words in the candidate textual element, (ii) one or more parts of speech of the one or more words in the candidate textual element, (iii) information identifying whether the candidate textual element includes a word in a first predefined list of words, (iv) information identifying whether the textual element includes a capitalized character, or (v) information identifying whether the textual element includes a numerical character; and the one or more surrounding characteristics of the candidate textual element comprises at least one of (i) information identifying whether the candidate textual element is located within a rectangular box, (ii) information identifying whether one or more other textual elements are within a threshold distance of the candidate textual element, or (iii) information identifying whether the one or more other textual elements within the threshold distance of the candidate textual element include a word in a second predefined list of words.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
DETAILED DESCRIPTION
[0041] The following disclosure makes reference to the accompanying figures and several example embodiments. One of ordinary skill in the art should understand that such references are for the purpose of explanation only and are therefore not meant to be limiting. Part or all of the disclosed systems, devices, and methods may be rearranged, combined, added to, and/or removed in a variety of manners, each of which is contemplated herein.
I. Example System Configuration
[0042] The present disclosure is generally directed to software technology for automatically determining a title of an electronic drawing file. The disclosed software technology may be used in many different contexts, one of which is in connection with technology for managing construction projects. As one possible implementation, this software technology may include both front-end software running on client stations that are accessible to individuals associated with construction projects (e.g., contractors, project managers, architects, engineers, designers, etc.) and back-end software running on a back-end platform (sometimes referred to as a “cloud” platform) that interacts with and/or drives the front-end software, and which may be operated (either directly or indirectly) by the provider of the front-end client software. As another possible implementation, this software technology may include front-end client software that runs on client stations without interaction with a back-end platform. The software technology disclosed herein may take other forms as well.
[0043] In general, such front-end client software may enable one or more individuals responsible for a construction project to perform various tasks related to the management of the construction project, which may take various forms. According to some implementations, these tasks may include, as some non-limiting examples, uploading electronic drawing files that depict various aspects of the construction project, viewing the uploaded drawings, and managing the uploaded drawings, such as by specifying various attributes of the drawings. Further, such front-end client software may take various forms, examples of which may include a native application (e.g., a mobile application) and/or a web application running on a client station, among other possibilities.
[0044] Turning now to the figures,
[0045] Broadly speaking, the back-end computing platform 102 may comprise one or more computing systems that have been provisioned with software for carrying out one or more of the functions disclosed herein, including but not limited to providing tools and/or interfaces for (i) uploading an electronic drawing file for a construction project, (ii) identifying candidate textual elements in the uploaded drawing that could potentially represent a title of the drawing, and (iii) using a machine-learning model to determine a likelihood that each candidate textual element is the title of the drawing. The one or more computing systems of back-end computing platform 102 may take various forms and may be arranged in various manners.
[0046] For instance, as one possibility, the back-end computing platform 102 may comprise computing infrastructure of a public, private, and/or hybrid cloud (e.g., computing and/or storage clusters) that has been provisioned with software for carrying out one or more of the functions disclosed herein. In this respect, the entity that owns and operates the back-end computing platform 102 may either supply its own cloud infrastructure or may obtain the cloud infrastructure from a third-party provider of “on demand” computing resources, such as Amazon Web Services (AWS) or the like. As another possibility, the back-end computing platform 102 may comprise one or more dedicated servers that have been provisioned with software for carrying out one or more of the functions disclosed herein. Other implementations of the back-end computing platform 102 are possible as well.
[0047] In turn, the client stations 112 may each be any computing system that is capable of running the front-end software disclosed herein. In this respect, the client stations 112 may each include hardware components such as a processor, data storage, a user interface, and a network interface, among others, as well as software components that facilitate the client station’s ability to run the front-end software disclosed herein (e.g., operating system software, web browser software, etc.). As representative examples, the client stations 112 may each take the form of a desktop computer, a laptop, a netbook, a tablet, a smartphone, and/or a personal digital assistant (PDA), among other possibilities.
[0048] As further depicted in
[0049] The interaction between the client stations 112 and the back-end computing platform 102 may take various forms. As one possibility, the client stations 112 may send certain user input related to a construction project to the back-end computing platform 102, which may in turn trigger the back-end computing platform 102 to take one or more actions based on the user input. As another possibility, the client stations 112 may send a request to the back-end computing platform 102 for certain project-related data and/or a certain front-end software module, and the client stations 112 may then receive project-related data (and perhaps related instructions) from the back-end computing platform 102 in response to such a request. As yet another possibility, the back-end computing platform 102 may be configured to “push” certain types of project-related data to the client stations 112, such as rendered three-dimensional views, in which case the client stations 112 may receive project-related data (and perhaps related instructions) from the back-end computing platform 102 in this manner. As still another possibility, the back-end computing platform 102 may be configured to make certain types of proj ect-related data available via an API, a service, or the like, in which case the client stations 112 may receive project-related data from the back-end computing platform 102 by accessing such an API or subscribing to such a service. The interaction between the client stations 112 and the back-end computing platform 102 may take various other forms as well.
[0050] Although not shown in
[0051] It should be understood that the network configuration 100 depicted in
II. Example Computing System
[0052]
[0053] The processor 202 may comprise one or more processor components, such as general-purpose processors (e.g., a single- or multi-core microprocessor), special-purpose processors (e.g., an application-specific integrated circuit or digital-signal processor), programmable logic devices (e.g., a field programmable gate array), controllers (e.g., microcontrollers), and/or any other processor components now known or later developed. In line with the discussion above, it should also be understood that the processor 202 could comprise processing components that are distributed across a plurality of physical computing devices connected via a network, such as a computing cluster of a public, private, or hybrid cloud.
[0054] In turn, the data storage 204 may comprise one or more non-transitory computer-readable storage mediums, examples of which may include volatile storage mediums such as random-access memory, registers, cache, etc. and non-volatile storage mediums such as read-only memory, a hard-disk drive, a solid-state drive, flash memory, an optical-storage device, etc. In line with the discussion above, it should also be understood that the data storage 204 may comprise computer-readable storage mediums that are distributed across a plurality of physical computing devices connected via a network, such as a storage cluster of a public, private, or hybrid cloud.
[0055] As shown in
[0056] The communication interface 206 may be configured to facilitate wireless and/or wired communication with configured other computing systems or devices, such as one or more of the client stations 112 when the computing system 200 serves as the back-end computing platform 102, or as the back-end computing platform 102 when the computing system 200 serves as one of the client stations 112. As such, the communication interface 206 may take any suitable form for carrying out these functions, examples of which may include an Ethernet interface, a serial bus interface (e.g., Firewire, USB 3.0, etc.), a chipset and antenna adapted to facilitate wireless communication, and/or any other interface that provides for wireless and/or wired communication. The communication interface 206 may also include multiple communication interfaces of different types. Other configurations are possible as well.
[0057] Although not shown, the computing system 200 may additionally include one or more other interfaces that provide connectivity with external user-interface components (sometimes referred to as “peripherals”), such as a keyboard, a mouse or trackpad, a display screen, a touch-sensitive interface, a stylus, a virtual-reality headset, speakers, etc., which may allow for direct user interaction with the computing system 200. Additionally or alternatively, the computing system 200 may itself include one or more integrated user interface components, such as a display screen, a touch-sensitive interface, a keyboard, speakers, a microphone, a camera, etc., which may allow for direct user interaction with the computing system 200.
[0058] Further, when performing the operations described herein, the computing system 200 may use various different technologies within an overarching technology stack. Examples of technologies that the computing system 200 may use in connection with the present disclosure include the Microservice Architecture, RestfulAPI, and Kubernetes, as well as various machine learning frameworks, such as Jupyter Notebooks, Scikit-learn, and SpaCy, to name a few.
[0059] It should be understood that the computing system 200 is one example of a computing system that may be used with the embodiments described herein. Numerous other arrangements are possible and contemplated herein. For instance, other computing systems may include additional components not pictured and/or more or fewer of the pictured components.
III. Example Operations
[0060] As mentioned above, the present disclosure relates to software technology that enables automatically determining a title of an electronic drawing. While the examples described herein are directed toward technical drawings for a construction project, it should be understood that the described techniques can be applied to other types of electronic drawings as well.
[0061] To illustrate an example drawing,
[0062] Another example of a 2D technical drawing is a schematic that visually communicates how the heating, ventilation, and air conditioning (HVAC) ductwork is routed throughout the building. Like the architectural drawing 300, this schematic may visually communicate the HVAC ductwork routing through the use of a scaled depiction of the ductwork along with indications of other relevant aspects of the ductwork, such as measurements, dimensions, materials, etc. Still other examples of 2D drawings are possible as well.
[0063] As noted above, using the software technology described herein, a construction professional has the ability to upload electronic versions of drawings associated with a construction project to a data repository (e.g., cloud-based data platform) so that such drawings can later be accessed in a quick and efficient manner. Such a process may involve assigning certain data attributes to each uploaded electronic drawing file to help with management of the drawings. Examples of such data attributes may include data fields containing information identifying a title, number, version, and/or date of the drawing, as well as data fields containing information for categorizing the drawings into sets, such as information identifying a discipline (e.g., architectural, electrical, HVAC, plumbing, etc.) of each drawing, or information identifying a portion (e.g., room, floor, wing, or building) of the construction project that each drawing corresponds to.
[0064] As further noted above, the software technology described herein may improve upon this process by using predictive analytics to automatically determine and auto-populate title information for an electronic drawing file. In accordance with the present disclosure, the predictive analytics that is utilized to identify and extract title information for an electronic drawing file may take the form of a machine-learning model that is trained to predict whether a given textual element from an electronic drawing file is likely to be the title of the drawing by (i) receiving, as input, data for a given set of data variables that are predictive of whether the given textual element is the title and (ii) based on an evaluation of the received data, outputting an indication of a likelihood that the given textual element is the title.
[0065]
[0066]
[0067] Further, as a general matter, when reference is made herein to a “computing system” engaging in one or more operations, it should be understood that such operations may, in practice, be carried out by one or more computing systems executing one or more instances of the disclosed software technology. In particular, it should be understood that such example operations may be carried out by a back-end computing platform, such as the back-end computing platform 102 depicted in
[0068] The process 500 begins at block 502, at which the computing system accesses a plurality of training drawings, where each training drawing has a known title. In line with the discussion above, the training drawings may include historical drawings (e.g., electronic drawing files that have previously been uploaded for construction projects) and/or simulated drawings having simulated drawing titles that are created for purposes of training the machine-learning model. The known title for each drawing may be specified in various manners. For example, a user may specify the known title for each drawing, such as by manually inputting the known title through a user interface of the computing system 200. Referring back to the drawing 300 depicted in
[0069] At block 504, the computing system determines a set of training data for each training drawing based on the training drawing’s known title. As noted above, the training data for a given training drawing may include data identifying various characteristics of the text that corresponds to the known title, which will be described in further detail below.
[0070] In order to determine the set of training data for a given training drawing, the computing system may first identify a textual element in the training drawing that corresponds to the known title. As an example, the computing system may apply OCR to the training drawing to identify textual elements in the training drawing. The computing system may apply OCR to the entire training drawing to identify all textual elements in the drawing, or the computing system may apply OCR to certain portions of the drawing instead.
[0071] When applying OCR, the computing system may use any OCR techniques now known or later developed. For instance, the computing system may use the Tesseract OCR engine, the CuneiForm OCR engine, or the OCRopus OCR engine to apply OCR to the training drawing. When using these or other OCR engines, the computing system may be configured to apply pre-processing to the training drawing to assist in recognizing textual elements in the drawing. For instance, the computing system may apply a rotation to the drawing to de-skew various textual elements in the training drawing, the computing system may binarize the drawing to convert the training drawing to a black-and-white binary image, and/or the computing system may perform line and word detection techniques for grouping adjacent characters into words and along common lines. In order to perform binarization and/or line detection on the drawing, the computing system may use various other image processing software technology, such as computer vision software tools like those available through the OpenCV computer vision platform. Additional or alternative pre-processing techniques may be employed as well when applying OCR to the training drawing.
[0072] After applying OCR to the training drawing to identify textual elements in the drawing, the computing system may then compare the identified textual elements to the known title to determine which of the identified textual elements matches the known title. Referring again back to the drawing 300 depicted in
[0073] Upon identifying the textual element in the training drawing that corresponds to the known title of the drawing, the computing system may determine various characteristics of the textual element. The determined characteristics may be predictive of whether the textual element is the title of the training drawing and may correspond with data variables that the machine-learning model will be trained to use when determining the likelihood of whether a given textual element is the title of a drawing.
[0074] Examples of characteristics that may be predictive of whether a given textual element is the title of a drawing may include spatial information about the textual element, linguistic information about the textual element, and/or information about how the textual element relates to other surrounding elements in the drawing. Spatial information may be predictive of whether a given textual element is the title of a drawing because drawing titles may be more or less likely to be presented in a particular location in the drawing, at a particular orientation, and/or in a particular font size. Linguistic information may be predictive of whether a given textual element is the title of a drawing because drawing titles may be more or less likely to include particular types and/or quantities of various parts of speech or to include or exclude particular words. Information about how the textual element relates to other surrounding elements may be predictive of whether a given textual element is the title of a drawing because drawing titles may be more or less likely to be contained within a particular shape (e.g., a rectangle) or to be nearby other text generally or other text that contains particular words.
[0075] In line with the discussion above, the computing system may determine spatial information about the textual element, which may take various forms. For example, the computing system may determine a position of the textual element in the drawing, such as a distance between the textual element and an edge of the drawing. To do so, the computing system may determine a number of pixels between a pixel in the textual element (e.g., the center pixel of the textual element, a corner pixel of the textual element, or a pixel of the textual element closest to an edge of the drawing) and one or more edges of the drawing (e.g., one or more horizontal edges and/or one or more vertical edges), and the computing system may treat the determined number of pixels as the distance between the textual element and the one or more edges of the drawing. The computing system may additionally or alternatively determine the position of the textual element by determining coordinates of one or more pixels of the textual element. For instance, the computing system may apply a pixel coordinate system to the drawing in which each pixel in the drawing is represented by an X-Y coordinate representing how many pixels away that pixel is from an arbitrary origin pixel, typically the first pixel in one of the corners of the drawing. Using the pixel coordinate system, the computing system may determine one or more pixel coordinates of the textual element, such as pixel coordinates representing a center pixel of the textual element and/or one or more corner pixels of the textual element (e.g., the coordinates of the upper-leftmost pixel of the upper-leftmost character in the textual element).
[0076] As another example of determining spatial information about the textual element, the computing system may determine information identifying an orientation of the textual element, such as whether the textual element is oriented vertically, horizontally, or at some other angle of rotation. In some examples, the computing system may determine this information as part of applying OCR to the drawing. For instance, as noted above, when applying OCR, the computing system may apply a rotation to the drawing to de-skew various textual elements. As such, the computing system can determine the orientation of a given textual element to be the angle of rotation applied to the drawing when using OCR to recognize that given textual element.
[0077] As yet another example of determining spatial information about the textual element, the computing system may determine information identifying a size of the textual element. For instance, the computing system may determine a number of pixels along one or more dimensions of the textual element, such as along an entire width and/or height of the entire textual element. Additionally or alternatively, the computing system may determine a number of pixels along a height of a single character of the textual element, which the computing system may use to determine a font size of the characters in the textual element. Further, it should be understood that these examples of determining spatial information about the textual element are for non-limiting illustrative purposes only and that the computing system may determine various other spatial information about the textual element that is predictive of whether the textual element represents a title of the drawing as well.
[0078] Further in line with the discussion above, the computing system may determine linguistic information about the textual element, which may also take various forms. For example, the computing system may determine some or all of the actual words that make up the textual element, which the computing system may do as part of the OCR process described above. As another example, once the computing system has determined the words in the textual element, the computing system may determine whether the textual element includes any predefined blacklisted words (i.e., words that are not typically found in drawing titles), such as “title,” “name,” or “schematic,” and/or how many of these blacklisted words are included in the candidate textual element. As yet another example, the computing system may determine which parts of speech are included in the textual element and/or how many of the different parts of speech are included in the textual element. For instance, the computing system may reference an electronic dictionary or other reference database that correlates words with their parts of speech, and the computing system may determine the part of speech for each word in a textual element by looking up the word in the dictionary or other database. As still another example, the computing system may determine whether the textual element is capitalized or whether the textual element includes any numerical characters. Further, it should be understood that these examples of determining linguistic information about the textual element are for non-limiting illustrative purposes only and that the computing system may determine various other linguistic information about the textual element that is predictive of whether the textual element represents a title of the drawing as well.
[0079] Still further in line with the discussion above, the computing system may determine information about how the textual element relates to other surrounding elements in the training drawing. For example, the computing system may determine whether the textual element is located within a particular shape, such as whether the textual element is located inside a rectangular box. One way for the computing system to make such a determination is to use computer vision software technology to identify shapes in the training drawing. For instance, the computing system may use a shape detector tool available through the OpenCV computer vision platform to identify any or all rectangles in the drawing. Once the computing system has identified the rectangles, the computing system may determine whether the textual element is inside one of the identified rectangles.
[0080] As another example of determining information about how the textual element relates to other surrounding elements in the training drawing, the computing system may determine whether any other textual elements are nearby (e.g., within a threshold distance of) the textual element. For instance, when applying OCR to the training drawing to identify the textual element that matches the known title of the drawing, the computing system may also identify other textual elements in the drawing. The computing system may determine which, if any, of the other textual elements are positioned within a threshold number of pixels of the titular textual element. The computing system may additionally determine whether any of these nearby textual elements include any predefined whitelisted words (i.e., words that are typically found nearby drawing titles), such as “title,” “name,” or “schematic.” While the example whitelisted words provided herein overlap with the example blacklisted words described above, it should be understood that there may be words that are only blacklisted words or only whitelisted words. In some examples, the blacklisted and/or whitelisted words may be predefined lists manually specified by a user. In other examples, the blacklisted and/or whitelisted words may be learned lists that are derived from titles of historical drawings as part of the machine learning training process described above, and that are continually updated based on new drawings that are processed using the machine learning model. Further, it should be understood that the above examples of determining information about how the textual element relates to other surrounding elements in the training drawing are for non-limiting illustrative purposes only and that the computing system may determine various other information about how the textual element relates to other surrounding elements in the training drawing that is predictive of whether the textual element represents a title of the drawing as well.
[0081] Once the computing system has determined various characteristics of the textual element that are potentially predictive of whether the textual element is the title of the drawing, the computing system may determine a set of training data based on these determined characteristics. In some examples, the set of training data may take the form of a vector or matrix, with each column or row corresponding to a particular data value indicating one of the determined characteristics. In line with the discussion above, data values representing spatial characteristics of the textual element may indicate a distance between the textual element and an edge of the drawing, coordinates of one or more pixels in the textual element, an orientation of the textual element, a font size of the textual element, and/or a width or height of the textual element. Further in line with the discussion above, data values representing linguistic characteristics of the textual element may indicate the actual words in the textual element, whether the textual element includes any blacklisted words, how many blacklisted words the textual element includes, the parts of speech included in the textual element, how many of the different parts of speech are included in the textual element, whether the textual element is capitalized, and/or whether the textual element includes any numerical characters. Still further in line with the discussion above, data values representing surrounding characteristics of the textual element may indicate whether the textual element is located within a rectangular box (or some other shape), whether and/or how many other textual elements are within a threshold distance of the textual element, a distance between the textual element and the nearest other textual element, and/or whether and/or how many whitelisted words are included in any nearby other textual elements. Again, it should be understood that these examples of data values are for non-limiting illustrative purposes only and that the computing system may determine a set of training data that includes additional, fewer, or different data values as well.
[0082] Returning to block 504 of the process 500, the computing system may repeat the above-described operations to determine sets of training data for each training drawing in the plurality of training drawings.
[0083] At block 506 of the process 500, the computing system may train a machine-learning model using a supervised learning process based on the determined sets of training data for the plurality of training drawings. In order to train the machine-learning model, the computing system may input the sets of training data into a machine-learning algorithm that is configured to identify patterns in the sets of training data and map those patterns to a desired target output. In the present case, because each set of training data corresponds to characteristics of known titles of drawings, the machine-learning algorithm may map any identified patterns in the training data to the target characteristic of whether the data corresponds to a textual element that represents a title of a drawing. As a result, the machine-learning model may be trained to output a score indicating a likelihood of whether a given set of data corresponds to a textual element that represents a title of a drawing. The score may be a continuous variable, such as a numerical value between 0 and 1, or between 0 and 100, where higher values correspond with a higher likelihood that the textual element represents the title of the drawing. Examples of machine-learning algorithms that the computing system may use to train the machine-learning model in this manner may include regression algorithms, but other machine-learning algorithms may be used as well.
[0084] Once the machine-learning model has been trained, the computing system may use the machine-learning model to automatically determine a title of an electronic drawing. Turning now to
[0085] The process 600 begins at block 602, at which the computing system accesses a drawing associated with a construction project. As described above, the drawing may be an electronic drawing that a construction professional uploads to the computing system using construction management software that includes various features or tools to help facilitate management of drawings associated with a construction project.
[0086] At block 604, the computing system identifies, in the drawing, a set of candidate textual elements that potentially represent a title of the drawing. The computing system may identify the set of candidate textual elements in various ways.
[0087] Using the drawing 300 depicted in
[0088] At step 604b, the computing system applies OCR to the identified rectangular boxes to identify textual elements in the rectangular boxes. The computing system can apply OCR to the identified rectangular boxes in the same or similar manner as described above in connection with applying OCR to a training drawing. For instance, the computing system can use any OCR techniques now known or later developed, such as those available through the Tesseract OCR engine, the CuneiForm OCR engine, or the OCRopus OCR engine. However, unlike applying OCR to the entire drawing, here the computing system applies OCR only to the identified rectangular boxes to textual elements in the rectangular boxes.
[0089]
[0090] At step 604c, the computing system applies a mask to the identified rectangular boxes. When applied to the rectangular boxes, the mask may serve to visually conceal the rectangular boxes, such that when OCR is applied to the masked rectangular boxes, any text previously visible in the boxes becomes unrecognizable by OCR. Examples of applying a mask to the rectangular boxes may include applying an entirely black mask, an entirely white mask, or a patterned mask to the rectangular boxes. However, these examples are merely illustrative and, in other examples, the computing system may apply any other type of mask that prevents OCR from identifying text in the masked boxes.
[0091] At step 604d, after applying the mask to the identified rectangular boxes, the computing system applies OCR to one or more targeted portions of the drawing to identify textual elements in the one or more targeted portions of the drawing. As noted above, the targeted portions of the drawing may include a first portion along a bottom edge of the drawing and/or a second portion along a right edge of the drawing. Targeting the bottom edge and/or the right edge of the drawing for OCR may be advantageous because the title block is often positioned near one of these edges. In some examples, the first portion may extend upward from the bottom edge by no more than 30% of the height of the drawing, and the second portion may extend leftward from the right edge by no more than 30% of the width of the drawing. However, these examples are merely illustrative, and the targeted portions may be smaller or larger in other examples, depending on an expected location of the title block. For instance, if the drawings of interest typically include a title block closer to the bottom edge or to the right edge, then the first or second portions may be reduced in size to better target the expected title block location without unnecessarily covering other portions of the drawing. Thus, in some examples, the first portion may extend upward from the bottom edge by no more than 25% of the height of the drawing, 20% of the height of the drawing, or even less. Similarly, the second portion may extend leftward from the right edge by no more than 25% of the width of the drawing, 20% of the width of the drawing, or even less. Further, the first and second portions may extend inward by different proportions. For instance, the first portion may extend upward from the bottom edge by 30% of the height of the drawing, and the second portion may extend leftward from the right edge by 20% of the width of the drawing. Other examples are possible as well.
[0092]
[0093] At step 604e, the computing system includes the textual elements identified in the rectangular boxes as well as the textual elements identified in the one or more targeted portions of the drawing in the set of candidate textual elements. For instance, with respect to the example depicted in
[0094] Referring back to
[0095] When determining the dataset for each candidate textual element, the computing system may determine the dataset to include values for the same set of data variables included in the training data used to train the machine-learning model. As such, in line with the discussion above, when determining the dataset for a given candidate textual element, the computing system may include data values in the dataset that represent spatial information about the candidate textual element, linguistic information about the candidate textual element, and/or information about how the candidate textual element relates to other surrounding elements in the drawing.
[0096] Further, when determining the dataset for each candidate textual element, the computing system may do so in the same or similar manner as described above in connection with determining the set of training data and the corresponding characteristics of the textual elements from which the training data is derived. For example, the computing system may determine spatial information about a given candidate textual element, such as a position of the candidate textual element in the drawing, an orientation of the candidate textual element, and/or a size of the candidate textual element using any of the techniques described herein to do so. Further, the computing system may determine linguistic information about a given candidate textual element, such as some or all of the actual words that make up the candidate textual element, whether the candidate textual element includes any predefined blacklisted words, which parts of speech are included in the candidate textual element, how many of the different parts of speech are included in the textual element, whether the textual element is capitalized, and/or whether the textual element includes any numerical characters using any of the techniques described herein to do so. Still further, the computing system may determine information about how the candidate textual element relates to other surrounding elements in the drawing, such as whether the candidate textual element is located inside a rectangular box, whether any other textual elements are nearby (e.g., within a threshold distance of) the candidate textual element, and/or whether any of these nearby textual elements include any predefined whitelisted words using any of the techniques described herein to do so.
[0097] Based on the determined spatial information about the candidate textual elements, linguistic information about the candidate textual elements, and information about how the candidate textual elements relate to other surrounding elements in the drawing, the computing system may determine the datasets for the candidate textual elements, with the datasets comprising values for a set of data variables that are potentially predictive of whether each candidate textual element is the title of the drawing. Again, the determined datasets may take the same or similar form as the set of training data used to train the machine-learning model. For instance, the computing system may determine the dataset for a given candidate textual element to be a vector or matrix, with each column or row corresponding to a particular data value indicating one characteristic of the determined spatial information about the candidate textual element, linguistic information about the candidate textual element, or information about how the candidate textual element relates to other surrounding elements in the drawing. As such, the vector or matrix may include data values indicating a distance between the candidate textual element and an edge of the drawing, coordinates of one or more pixels in the candidate textual element, an orientation of the candidate textual element, a font size of the candidate textual element, and/or a width or height of the candidate textual element. Further, the vector or matrix may include data values indicating the actual words in the candidate textual element, whether the candidate textual element includes any blacklisted words, how many blacklisted words the candidate textual element includes, the parts of speech included in the candidate textual element, and/or how many of the different parts of speech are included in the candidate textual element. Still further, the vector or matrix may include data values indicating whether the candidate textual element is located within a rectangular box (or some other shape), whether and/or how many other textual elements are within a threshold distance of the candidate textual element, a distance between the candidate textual element and the nearest other textual element, and/or whether and/or how many whitelisted words are included in any nearby other textual elements. Again, it should be understood that these examples of data values are for non-limiting illustrative purposes only and that the computing system may determine a dataset for the candidate textual element that includes additional, fewer, or different data values as well.
[0098] At block 608, the computing system inputs, for each candidate textual element in the set of candidate textual elements, the respective dataset for the candidate textual element into a machine-learning model that functions to (i) evaluate the respective dataset and (ii) based on the evaluation, output a respective score indicating a likelihood that the candidate textual element represents the title of the drawing. The machine-learning model may take the form of the machine-learning model described herein, such as a machine-learning model trained according to the process 500 depicted in
[0099] At block 610, the computing system selects one given candidate textual element in the set of candidate textual elements as the title of the drawing based on the respective scores for the candidate textual elements that are output by the machine-learning model. For instance, the computing system may access the scores output by the machine-learning model to identify the highest respective score, and the computing system may select the particular candidate textual element associated with the highest respective score as the title of the drawing based on the particular candidate textual element having the highest respective score.
[0100] In some examples, the computing system may perform various additional operations based on the respective scores for the candidate textual elements. For example, if the highest respective score output by the machine-learning model is below a threshold value, this may indicate that the machine-learning model has not confidently identified any candidate textual element as the title of the drawing. In such a scenario, it may be desirable to seek human intervention to confirm the accuracy of the drawing title. For instance, if the computing system determines that the highest score output by the machine-learning model is below a threshold value, then the computing system may display a prompt to a user via a user interface of the computing system to confirm whether the candidate textual element with the highest score is actually the title of the drawing. As an example, the prompt may include a picture of the drawing, and indication of the candidate textual element with the highest score, and one or more buttons or other user input mechanisms through which the user may approve or deny the selection of the candidate textual element as the title of the drawing. Still in other examples, the computing system may be configured to seek user confirmation for all title selections regardless of the score of the selected candidate textual element.
[0101] At block 612, the computing system populates a data attribute field representing the title of the drawing with the given candidate textual element. In some examples, this may involve the computing system storing the given candidate textual element as title metadata in association with the electronic drawing file in a data repository that includes a number of drawings or other files associated with a given construction project. In other examples, this may involve populating a title attribute field in a prompt for confirmation by a user as described above. However, these examples are merely illustrative, and other examples of populating title data attribute fields are possible as well
IV. Conclusion
[0102] Example embodiments of the disclosed innovations have been described above. Those skilled in the art will understand, however, that changes and modifications may be made to the embodiments described without departing from the true scope and spirit of the present invention, which will be defined by the claims.
[0103] For instance, those in the art will understand that the disclosed operations for training and utilizing a machine-learning model in the manner described herein to automatically determine the title of an electronic drawing may not be limited to only technical drawings for construction projects. Rather, the disclosed operations could be used in other contexts in connection with other types of electronic drawings as well.
[0104] Further, to the extent that examples described herein involve operations performed or initiated by actors, such as “humans,” “operators,” “users,” or other entities, this is for purposes of example and explanation only. The claims should not be construed as requiring action by such actors unless explicitly recited in the claim language.