Systems and Methods for Executing Robotic Process Automation (RPA) Within a Web Browser
20230236910 · 2023-07-27
Inventors
Cpc classification
G06F3/04842
PHYSICS
G06F3/04812
PHYSICS
G06F3/0484
PHYSICS
International classification
H04L67/12
ELECTRICITY
G06F3/04842
PHYSICS
G06F3/14
PHYSICS
Abstract
In some embodiments, a robotic process automation (RPA) agent executing within a first browser window/tab interacts with an RPA driver injected into a target web page displayed within a second browser window/tab. A bridge module establishes a communication channel between the RPA agent and the RPA driver. In one exemplary use case, the RPA agent receives a robot specification from a remote server, the specification indicating at least one RPA activity, and communicates details of the respective activity to the RPA driver via the communication channel. The RPA driver identifies a runtime target for the RPA activity within the target web page and executes the respective activity.
Claims
1. A method comprising employing at least one hardware processor of a computer system to execute a first web browser process, a second web browser process, and a bridge module, wherein: the bridge module is configured to set up a communication channel between the first web browser process and the second web browser process; the first web browser process exposes to a user a first web browser window, and is further configured to: receive a specification of a robotic process automation (RPA) workflow from a remote server computer, select an RPA activity for execution from the RPA workflow, the RPA activity comprising mimicking an action of the user on a target element of a target web page displayed within a second browser window, and transmit a set of target identification data characterizing the target element via the communication channel; and the second web browser process executes an RPA driver configured to: receive the set of target identification data via the communication channel, in response, identify the target element within the target web page according to the target identification data, and carry out the RPA activity.
2. The method of claim 1, wherein the bridge module is further configured to inject the RPA driver into the target web page.
3. The method of claim 1, wherein the bridge module is further configured to: detect an instantiation of a new browser window; in response, inject another instance of the RPA driver into a document displayed within the new browser window; and set up another communication channel between the first web browser process and another web browser process displaying the document.
4. The method of claim 3, wherein the other instance of the RPA driver is configured to: receive another set of target identification data via the other communication channel, the other set of target identification data characterizing an element of the document; in response, identify the element of the document according to the other target identification data; and carry out another RPA activity of the RPA workflow on the element of the document.
5. The method of claim 1, wherein: the RPA driver is further configured to transmit a result of carrying out the RPA activity to the first browser process via the communication channel; and the first browser process is further configured to generate a display according to the result within the first browser window.
6. The method of claim 5, wherein the result of carrying out the RPA activity comprises data extracted from the target webpage.
7. The method of claim 1, wherein the RPA driver is further configured to transmit a result of carrying out the RPA activity to the remote server computer.
8. The method of claim 1, wherein the RPA driver is further configured, in response to a failure to identify the target element, to automatically select an alternative target element within the target web page.
9. The method of claim 8, wherein the RPA driver is further configured, in response to selecting an alternative target element, to change an appearance of the alternative target element to highlight the alternative target element with respect to other elements of the target web page.
10. The method of claim 1, wherein the RPA driver is further configured, in response to a failure to identify the target element, to: receive a user input indicating an alternative target element of the target web page; and in response, carry out the RPA activity on the alternative target element.
11. The method of claim 1, wherein the RPA driver is further configured, in response to a failure to identify the target element, to transmit an activity report to the first browser process via the communication channel, the activity report identifying a subset of the target identification data that could not be matched to any element of the target webpage.
12. A computer system comprising at least one hardware processor configured to execute a first web browser process, a second web browser process, and a bridge module, wherein: the bridge module is configured to set up a communication channel between the first web browser process and the second web browser process; the first web browser process exposes to a user a first web browser window, and is further configured to: receive a specification of an RPA workflow from a remote server computer, select an RPA activity for execution from the RPA workflow, the RPA activity comprising mimicking an action of the user on a target element of a target web page displayed within a second browser window, and transmit a set of target identification data characterizing the target element via the communication channel; and the second web browser process executes an RPA driver configured to: receive the set of target identification data via the communication channel, in response, identify the target element within the target web page according to the target identification data, and carry out the RPA activity.
13. The computer system of claim 12, wherein the bridge module is further configured to inject the RPA driver into the target web page.
14. The computer system of claim 12, wherein the bridge module is further configured to: detect an instantiation of a new browser window; in response, inject another instance of the RPA driver into a document displayed within the new browser window; and set up another communication channel between the first web browser process and another web browser process displaying the document.
15. The computer system of claim 14, wherein the other instance of the RPA driver is configured to: receive another set of target identification data via the other communication channel, the other set of target identification data characterizing an element of the document; in response, identify the element of the document according to the other target identification data; and carry out another RPA activity of the RPA workflow on the element of the document.
16. The computer system of claim 12, wherein: the RPA driver is further configured to transmit a result of carrying out the RPA activity to the first browser process via the communication channel; and the first browser process is further configured to generate a display according to the result within the first browser window.
17. The computer system of claim 16, wherein the result of carrying out the RPA activity comprises data extracted from the target webpage.
18. The computer system of claim 12, wherein the RPA driver is further configured to transmit a result of carrying out the RPA activity to the remote server computer.
19. The computer system of claim 12, wherein the RPA driver is further configured, in response to a failure to identify the target element, to automatically select an alternative target element within the target web page.
20. The computer system of claim 19, wherein the RPA driver is further configured, in response to selecting an alternative target element, to change an appearance of the alternative target element to highlight the alternative target element with respect to other elements of the target web page.
21. The computer system of claim 12, wherein the RPA driver is further configured, in response to a failure to identify the target element, to: receive a user input indicating an alternative target element of the target web page; and in response, carry out the RPA activity on the alternative target element.
22. The computer system of claim 12, wherein the RPA driver is further configured, in response to a failure to identify the target element, to transmit an activity report to the first browser process via the communication channel, the activity report identifying a subset of the target identification data that could not be matched to any element of the target webpage.
23. A non-transitory computer-readable medium storing instructions which, when executed by at least one hardware processor of a computer system, cause the computer system to form a bridge module configured to set up a communication channel between a first web browser process and a second web browser process, wherein the first and second web browser processes execute on the computer system, and wherein: the first web browser process exposes to a user a first web browser window, and is further configured to: receive a specification of an RPA workflow from a remote server computer, select an RPA activity for execution from the RPA workflow, the RPA activity comprising mimicking an action of the user on a target element of a target web page displayed within a second browser window, and transmit a set of target identification data characterizing the target element via the communication channel; and the second web browser process executes an RPA driver configured to: receive the set of target identification data via the communication channel, in response, identify the target element within the target web page according to the target identification data, and carry out the RPA activity.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The foregoing aspects and advantages of the present invention will become better understood upon reading the following detailed description and upon reference to the drawings where:
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0028] In the following description, it is understood that all recited connections between structures can be direct operative connections or indirect operative connections through intermediary structures. A set of elements includes one or more elements. Any recitation of an element is understood to refer to at least one element. A plurality of elements includes at least two elements. Any use of ‘or’ is meant as a nonexclusive or. Unless otherwise required, any described method steps need not be necessarily performed in a particular illustrated order. A first element (e.g. data) derived from a second element encompasses a first element equal to the second element, as well as a first element generated by processing the second element and optionally other data. Making a determination or decision according to a parameter encompasses making the determination or decision according to the parameter and optionally according to other data. Unless otherwise specified, an indicator of some quantity/data may be the quantity/data itself, or an indicator different from the quantity/data itself. A computer program is a sequence of processor instructions carrying out a task. Computer programs described in some embodiments of the present invention may be stand-alone software entities or sub-entities (e.g., subroutines, libraries) of other computer programs. A process is an instance of a computer program, the instance characterized by having at least an execution thread and a separate virtual memory space assigned to it, wherein a content of the respective virtual memory space includes executable code. The term ‘database’ is used herein to denote any organized, searchable collection of data. Computer-readable media encompass non-transitory media such as magnetic, optic, and semiconductor storage media (e.g., hard drives, optical disks, flash memory, DRAM), as well as communication links such as conductive cables and fiber optic links. According to some embodiments, the present invention provides, inter alia, computer systems comprising hardware (e.g., one or more processors) programmed to perform the methods described herein, as well as computer-readable media encoding instructions to perform the methods described herein.
[0029] The following description illustrates embodiments of the invention by way of example and not necessarily by way of limitation.
[0030]
[0031] Mimicking a human operation/action is herein understood to encompass reproducing the sequence of computing events that occur when a human operator performs the respective operation/action on the computer, as well as reproducing a result of the human operator's performing the respective operation on the computer. For instance, mimicking an action of clicking a button of a graphical user interface (GUI) may comprise having the operating system move the mouse pointer to the respective button and generating a mouse click event, or may alternatively comprise toggling the respective GUI button itself to a clicked state.
[0032] Activities typically targeted for RPA automation include processing of payments, invoicing, communicating with business clients (e.g., distribution of newsletters and/or product offerings), internal communication (e.g., memos, scheduling of meetings and/or tasks), auditing, and payroll processing, among others. In some embodiments, a dedicated RPA design application 30 (
[0033] Some types of workflows may include, but are not limited to, sequences, flowcharts, finite state machines (FSMs), and/or global exception handlers. Sequences may be particularly suitable for linear processes, enabling flow from one activity to another without cluttering a workflow. Flowcharts may be particularly suitable to more complex business logic, enabling integration of decisions and connection of activities in a more diverse manner through multiple branching logic operators. FSMs may be particularly suitable for large workflows. FSMs may use a finite number of states in their execution, which are triggered by a condition (i.e., transition) or an activity. Global exception handlers may be particularly suitable for determining workflow behavior when encountering an execution error and for debugging processes.
[0034] Once an RPA workflow is developed, it may be encoded in computer-readable form and exported as an RPA package 40 (
[0035] In some embodiments, RPA package 40 further comprises a resource specification 44 indicative of a set of process resources used by the respective robot during execution. Exemplary process resources include a set of credentials, a computer file, a queue, a database, and a network connection/communication link, among others. Credentials herein generically denote private data (e.g., username, password) required for accessing a specific RPA host machine and/or for executing a specific software component. Credentials may comprise encrypted data; in such situations, the executing robot may possess a cryptographic key for decrypting the respective data. In some embodiments, credential resources may take the form of a computer file. Alternatively, an exemplary credential resource may comprise a lookup key (e.g., hash index) into a database holding the actual credentials. Such a database is sometimes known in the art as a credential vault. A queue herein denotes a container holding an ordered collection of items of the same type (e.g., computer files, structured data objects). Exemplary queues include a collection of invoices and the contents of an email inbox, among others. The ordering of queue items may indicate an order in which the respective items should be processed by the executing robot.
[0036] In some embodiments, for each process resource, specification 44 comprises a set of metadata characterizing the respective resource. Exemplary resource characteristics/metadata include, among others, an indicator of a resource type of the respective resource, a filename, a filesystem path and/or other location indicator for accessing the respective resource, a size, and a version indicator of the respective resource. Resource specification 44 may be formulated according to any data format known in the art, for instance as an XML, or JSON script, a relational database, etc.
[0037] A skilled artisan will appreciate that RPA design application 30 may comprise multiple components/modules, which may execute on distinct physical machines. In one example, RPA design application 30 may execute in a client-server configuration, wherein one component of application 30 may expose a robot design interface to a user of a client computer, and another component of application 30 executing on a server computer may assemble the robot workflow and formulate/output RPA package 40. For instance, a developer may access the robot design interface via a web browser executing on the client computer, while the software formulating package 40 actually executes on the server computer.
[0038] Once formulated, RPA script(s) 42 may be executed by a set of robots 12a-c (
[0039] Types of robots 12a-c include, but are not limited to, attended robots, unattended robots, development robots (similar to unattended robots, but used for development and testing purposes), and nonproduction robots (similar to attended robots, but used for development and testing purposes).
[0040] Attended robots are triggered by user events and/or commands and operate alongside a human operator on the same computing system. In some embodiments, attended robots can only be started from a robot tray or from a command prompt and thus cannot be controlled from orchestrator 14 and cannot run under a locked screen, for example. Unattended robots may run unattended in remote virtual environments and may be responsible for remote execution, monitoring, scheduling, and providing support for work queues.
[0041] Orchestrator 14 controls and coordinates the execution of multiple robots 12a-c. As such, orchestrator 14 may have various capabilities including, but not limited to, provisioning, deployment, configuration, scheduling, queueing, monitoring, logging, and/or providing interconnectivity for robots 12a-c. Provisioning may include creating and maintaining connections between robots 12a-c and orchestrator 14. Deployment may include ensuring the correct delivery of software (e.g, RPA scripts 42) to robots 12a-c for execution. Configuration may include maintenance and delivery of robot environments, resources, and workflow configurations. Scheduling may comprise configuring robots 12a-c to execute various tasks according to specific schedules (e.g., at specific times of the day, on specific dates, daily, etc.). Queueing may include providing management of job queues. Monitoring may include keeping track of robot state and maintaining user permissions. Logging may include storing and indexing logs to a database and/or another storage mechanism (e.g., SQL, ElasticSearch®, Redis®). Orchestrator 14 may further act as a centralized point of communication for third-party solutions and/or applications.
[0042]
[0043] Robot manager 24 may manage the operation of robot executor(s) 22. For instance, robot manager 24 may select tasks/scripts for execution by robot executor(s) 22 according to an input from a human operator and/or according to a schedule. Manager 24 may start and stop jobs and configure various operational parameters of executor(s) 22. When robot 12 includes multiple executors 22, manager 24 may coordinate their activities and/or inter-process communication. Manager 24 may further manage communication between RPA robot 12, orchestrator 14 and/or other entities.
[0044] In some embodiments, robot 12 and orchestrator 14 may execute in a client-server configuration. It should be noted that the client side, the server side, or both, may include any desired number of computing systems (e.g., physical or virtual machines) without deviating from the scope of the invention. In such configurations, robot 12 including executor(s) 22 and robot manager 24 may execute on a client side. Robot 12 may run several jobs/workflows concurrently. Robot manager 24 (e.g., a Windows® service) may act as a single client-side point of contact of multiple executors 22. Manager 24 may further manage communication between robot 12 and orchestrator 14. In some embodiments, communication is initiated by manager 24, which may open a WebSocket channel to orchestrator 14. Manager 24 may subsequently use the channel to transmit notifications regarding the state of each executor 22 to orchestrator 14, for instance as a heartbeat signal. In turn, orchestrator 14 may use the channel to transmit acknowledgements, job requests, and other data such as RPA script(s) 42 and resource metadata to robot 12.
[0045] Orchestrator 14 may execute on a server side, possibly distributed over multiple physical and/or virtual machines. In one such embodiment, orchestrator 14 may include an orchestrator user interface (UI) 17 which may be a web application, and a set of service modules 19. Several examples of an orchestrator UI are discussed below. Service modules 19 may include a set of Open Data Protocol (OData) Representational State Transfer (REST) Application Programming Interface (API) endpoints, and a set of service APIs/business logic. A user may interact with orchestrator 14 via orchestrator UI 17 (e.g., by opening a dedicated orchestrator interface on a browser), to instruct orchestrator 14 to carry out various actions, which may include for instance starting jobs on a selected robot 12, creating robot groups/pools, assigning workflows to robots, adding/removing data to/from queues, scheduling jobs to run unattended, analyzing logs per robot or workflow, etc. Orchestrator UI 17 may be implemented using Hypertext Markup Language (HTML), JavaScript®, or any other web technology.
[0046] Orchestrator 14 may carry out actions requested by the user by selectively calling service APIs/business logic. In addition, orchestrator 14 may use the REST API endpoints to communicate with robot 12. The REST API may include configuration, logging, monitoring, and queueing functionality. The configuration endpoints may be used to define and/or configure users, robots, permissions, credentials and/or other process resources, etc. Logging REST endpoints may be used to log different information, such as errors, explicit messages sent by the robots, and other environment-specific information, for instance. Deployment REST endpoints may be used by robots to query the version of RPA script(s) 42 to be executed. Queueing REST endpoints may be responsible for queues and queue item management, such as adding data to a queue, obtaining a transaction from the queue, setting the status of a transaction, etc. Monitoring REST endpoints may monitor the web application component of orchestrator 14 and robot manager 24.
[0047] In some embodiments, RPA environment 10 (
[0048] In some embodiments, RPA environment 10 (
[0049] A skilled artisan will understand that various components of RPA environment 10 may be implemented and/or may execute on distinct host computer systems (physical appliances and/or virtual machines).
[0050]
[0051] In some embodiments, RPA host 20 executes a bridge module 34 configured to establish a communication channel between at least two distinct browser processes 32. A communication channel herein denotes any means of transferring data between the respective browser processes. A skilled artisan will know that there may be many ways of establishing such inter-process communication, for instance by mapping a region of a virtual memory of each browser process (e.g., a page of virtual memory) to the same region of physical memory (e.g., a physical memory page), so that the respective browser processes can exchange data by writing to and/or reading the respective data from the respective memory page. Other exemplary inter-process communication means which may be used by bridge module 34 include a socket (i.e., transferring data via a network interface of RPA host 20), a pipe, a file, and message passing, among others. In some embodiments of the present invention, bridge module 34 comprises a browser extension computer program as further described below. The term ‘browser extension’ herein denotes an add-on, custom computer program that extends the native functionality of a browser application, and that executes within the respective browser application (i.e., uses a browser process for execution).
[0052]
[0053] Some modern browsers enable the rendering of web documents which include snippets of executable code. The respective executable code may control how the content of the respective document is displayed to a user, manage the distribution and display of third-party content (e.g., advertising, weather, stock market updates), gather various kinds of data characterizing the browsing habits of the respective user, etc. Such executable code may be embedded in or hyperlinked from the respective document. Exemplary browser-executable code may be pre-compiled or formulated in a scripting language or bytecode for runtime interpretation or compilation. Exemplary scripting languages include JavaScript® and VBScript®, among others. To enable code execution, some browsers include an interpreter configured to translate the received code from a scripting language/bytecode into a form suitable for execution on the respective host platform, and provide a hosting environment for the respective code to run in.
[0054] Some embodiments of the present invention use browser process 32a and agent browser window 36a to load a web document comprising an executable RPA agent 31, for instance formulated in JavaScript®. In various embodiments, RPA agent 31 may implement some of the functionality of RPA design application 30 and/or some of the functionality of RPA robot 12, as shown in detail below. RPA agent 31 may be fetched from a remote repository/server, for instance by pointing browser process 32a to a pre-determined uniform resource locator (URL) indicating an address of agent 31. In response to fetching RPA agent 31, browser process 32a may interpret and execute agent 31 within an isolated environment specific to process 32a and/or agent browser window 36a.
[0055] Some embodiments further provide an RPA driver 25 to browser process 32b and/or target window 36b. Driver 25 generically represents a set of software modules that carry low-level processing tasks such as constructing, parsing, and/or modifying a document object model (DOM) of a document currently displayed within target browser window 36b, identifying an element of the respective document (e.g., a button, a form field), changing the on-screen appearance of an element (e.g., color, position, size), drawing a shape, determining a current position of a cursor, registering and/or executing input events such as mouse, keyboard, and/or touchscreen events, detecting a current posture/orientation of a handheld device, etc. In some embodiments, RPA driver 25 is embodied as a set of scripts injected into browser process 32b and/or into a target document currently rendered within target window 36b.
[0056]
Robot Design Embodiments
[0057] Some embodiments use browser process 32a (
[0058] Workflow design region 51 may display a diagram (e.g., flowchart) of an activity sequence reproducing the flow of a business process currently being automated. The interface may expose various controls enabling the user to add, delete, and re-arrange activities of the sequence. Each RPA activity may be configured independently, by way of an activity configuration UI illustrated as items 54a-b in
[0059] Another exemplary parameter of the current RPA activity is the operand/target of the respective activity, herein denoting the element of the target document that the RPA robot is supposed to act on. In one example wherein the selected activity comprises a mouse click, the target element may be a button, a menu item, a hyperlink, etc. In another example wherein the selected activity comprises filling out a form, the target element may be the specific form field that should receive the input. Interfaces 50, 54 may enable the user to indicate the target element in various ways. For instance, they may invite the user to select the target element from a menu/list of candidates. In a preferred embodiment, activity configuration interface 54c may instruct the user to indicate the target directly within target browser window 36b, for instance by clicking or tapping on it. Some embodiments expose a target configuration control 56 which, when activated, enables the user to further specify the target by way of a target configuration interface.
[0060] In some embodiments, RPA driver 25 is configured to analyze a user's input to determine a set of target identification data characterizing an element of the target document currently displayed within target browser window 36b, element which the user has selected as a target for the current RPA activity.
[0061] Exemplary target identification data may further comprise a target image 64 comprising an encoding of a user-facing image of the respective target element. For instance, target image 64 may comprise an array of pixel values corresponding to a limited region of a screen currently displaying target element 60, and/or a set of values computed according to the respective array of pixel values (e.g., a JPEG or wavelet representation of the respective array of pixel values). In some embodiments, target image 64 comprises a content of a clipping of a screen image located within the bounds of the respective target element.
[0062] Target identification data may further include a target text 66 comprising a computer encoding of a text (sequence of alphanumeric characters) displayed within the screen boundaries of the respective target element. Target text 66 may be determined according to the source code of the respective document and/or according to a result of applying an optical character recognition (OCR) procedure to a region of the screen currently showing target element 60.
[0063] In some embodiments, target identification data characterizing target element 60 further includes identification data (e.g., element ID, image, text, etc.) characterizing another UI element of the target webpage, herein deemed an anchor element. An anchor herein denotes any element co-displayed with the target element, i.e., simultaneously visible with the target element in at least some views of the target webpage. In some embodiments, the anchor element is selected from UI elements displayed in the vicinity of the target element, such as a label, a title, etc. For instance, in the target interface illustrated in
[0064] In some embodiments, activity configuration interface 54c comprises a control 56 which, when activated, triggers the display of a target configuration interface enabling the user to visualize and edit target identification data characterizing target element 60.
[0065] In some embodiments, target configuration interface 70 comprises a menu 72 including various controls, for instance a button for indicating a target element and for editing target identification data, a button for validating a choice of target and/or a selection of target identification data, a button for selecting an anchor element associated with the currently selected target element and for editing anchor identification data, and a troubleshooting button, among others. The currently displayed view allows configuring and/or validating identification features of a target element; a similar view may be available for configuring identification features of anchor elements.
[0066] Interface 70 may be organized in various zones, for instance an area for displaying a tree representation (e.g., a DOM) of the target document, which allows the user to easily visualize target element 60 as a node in the respective tree/DOM. Target configuration interface 70 may further display element ID 62, allowing the user to visualize currently defined attribute-value pairs (e.g., HTML tags) characterizing the respective target element. Some embodiments may further include a tag builder pane enabling the user to select which tags and/or attributes to include in element ID 62.
[0067] Target configuration interface 70 may further comprise areas for displaying target image 64, target text 66, and/or an attribute matching pane enabling the user to set additional matching parameters for individual tags and/or attributes. In one example, the attribute matching pane enables the user to instruct the robot on whether to use exact or approximate matching to identify the runtime instance of target element 60. Exact matching requires that the runtime value of a selected attribute exactly match the respective design-time value included in the target identification data for the respective target element. Approximate matching may require only a partial match between the design-time and runtime values of the respective attribute. For attributes of type text, exemplary kinds of approximate matching include regular expressions, wildcard, and fuzzy matching, among others. Similar configuration fields may be exposed for matching anchor attributes.
[0068]
[0069] A further step 306 may set up communication channel(s) 138a-b. In an exemplary embodiment wherein browser processes 32a-b are instances of a Google Chrome® browser and wherein bridge module 34 comprises a browser extension, step 306 may comprise setting up a runtime. Port object that RPA agent 31 and driver 25 may then use to exchange data. In alternative embodiments wherein the respective browser application does not support inter-process communication, but instead allows reading and/or writing data to a local file, agent 31 and driver 25 may use the respective local file as a container for depositing and/or retrieving communications. In such embodiments, step 306 may comprise generating a file name for the respective container and communicating it to RPA agent 31 and/or driver 25. In one such example, the injected driver may be customized to include the respective filename. In some embodiments, step 306 comprises setting up distinct file containers for each browser window/tab/frame currently exposed on the respective RPA host. In yet other embodiments, agent 31 and driver 25 may exchange communications via a remote server, e.g., orchestrator 14 (
[0070] In some embodiments, bridge module 34 exposes target configuration interface 70 within to bridge browser window 36c (step 308). In a step 310, module 34 may then listen for communications from RPA driver 25; such communications may comprise target identification data as shown below. In response to such communications, a step 312 may populate interface 70 with the respective target identification data, enabling the user to review, edit, and/or validate the respective choice of target element. In some embodiments, step 312 may further comprise receiving user input comprising changes to the target identification data (e.g., adding or removing HTML, tags or attribute-value pairs to/from element ID 62, setting attribute matching parameters, etc.). When the user validates the current target identification data (a step 314 returns a YES), in a step 316 module 34 may forward the respective target identification data to RPA agent 31.
[0071]
[0072] The user may then be instructed to select a target for the respective activity from the webpage displayed within target browser window 36b. In some embodiments, in a sequence of steps 406-408 RPA agent 31 may signal to RPA driver 25 to acquire target identification data, and may receive the respective data from RPA driver 25 (more details on target acquisition are given below). Such data transfers occur over the communication channel set up by bridge module 34 (e.g., channels 138a-b in
[0073] In an alternative embodiment, instead of formulating an RPA script or package 40 for an entire robotic workflow, RPA agent 31 may formulate a specification for each individual RPA activity, complete with target identification data, and transmit the respective specification to a remote server computer, which may then assemble RPA package 40 describing the entire designed workflow from individual activity data received from RPA agent 31.
[0074]
[0075] In some embodiments, a step 508 may highlight the target candidate element identified in step 508. Highlighting herein denotes changing an appearance of the respective target candidate element to indicate it as a potential target for the current RPA activity.
[0076] In some embodiments, identifying a target candidate automatically triggers selection of an anchor element. The anchor may be selected according to a type, position, orientation, and a size of the target candidate, among others. For instance, some embodiments select as anchors elements located in the immediate vicinity of the target candidate, preferably aligned with it. Step 510 (
[0077] In a step 514, RPA driver 25 may determine target identification data characterizing the candidate target and/or the selected anchor element. To determine element ID 62, some embodiments may parse a live DOM of the target webpage, extracting and/or formulating a set of HTML tags and/or attribute-value pairs characterizing the candidate target element and/or anchor element. Step 514 may further include taking a snapshot of a region of the screen currently showing the candidate target and/or anchor elements to determine image data (e.g., target image 64 in
[0078] The exemplary flowchart in
[0079] The description above refers to an exemplary embodiment wherein bridge module 34 intermediates communication between RPA agent 31 and driver 25 (see e.g.,
[0080] The description above also focused on a version of robot design wherein the user selects from a set of activities available for execution, and then proceeds to configure each individual activity by indicating a target and other parameters. Other exemplary embodiments may implement another popular robot design scenario, wherein the robot design tools record a sequence of user actions (such as the respective user's navigating through a complex target website) and configure a robot to reproduce the respective sequence. In some such embodiments, for each user action such as a click, scroll, type in, etc., driver 25 may be configured to determine a target of the respective action including a set of target identification data, and to transmit the respective data together with an indicator of a type of user action to RPA agent 31 via communication channel 38 or 138a-b. RPA agent 31 may then assemble a robot specification from the respective data received from RPA driver 25.
Robot Execution Embodiments
[0081] In contrast to the exemplary embodiments illustrated above, which were directed at designing an RPA robot to perform a desired workflow, in other embodiments of the present invention RPA agent 31 comprises at least a part of RPA robot 12 configured to actually carry out an automation. For instance, RPA agent 31 may embody some of the functionality of robot manager 24 and/or robot executors 22 (see
[0082] In one exemplary robot execution embodiment, the user may use agent browser window 36a to open a robot specification. The specification may instruct a robot to navigate to a target web page and perform some activity, such as filling in a form, scraping some text or images, etc. For example, an RPA package 40 may be downloaded from a remote ‘robot store’ by accessing a specific URL or selecting a menu item from a web interface exposed by a remote server computer. Package 40 may include a set of RPA scripts 42 formulated in a computer-readable form that enables scripts 42 to be executed by a browser process. For instance, scripts 42 may be formulated in a version of JavaScript®. Scripts 42 may comprise a specification of a sequence of RPA activities (e.g., navigating to a webpage, clicking on a button, etc.), including a set of target identification data characterizing a target/operand of each RPA activity (e.g., which button to click, which form field to fill in, etc.).
[0083]
[0084] In a further sequence of steps 608-610, module 34 may inject RPA driver 25 into the target webpage/browser window 36b and set up a communication channel between RPA agent 31 and driver 25 (see e.g., channel 38 in
[0085]
[0086]
[0087] When target identification is successful (a step 808 returns a YES), a step 812 may execute the current RPA activity, for instance click on the identified button, fill in the identified form field, etc. Step 812 may comprise manipulating a source code of the target web page and/or generating an input event (e.g., a click, a tap, etc.) to reproduce a result of a human operator actually carrying out the respective action.
[0088] When the runtime target of the current activity cannot be identified according to target identification data received from RPA agent 31 (for instance in situations wherein the target webpage has changed substantially between design time and runtime), some embodiments transmit an error message/report to RPA agent 31 via communication channel 38. In an alternative embodiment, RPA driver 25 may search for an alternative target. In one such example, driver 25 may identify an element of the target webpage approximately matching the provided target identification data. Some embodiments identify multiple target candidates partially matching the desired target characteristics and compute a similarity measure between each candidate and the design-time target. An alternative target may then be selected by ranking the target candidates according to the computed similarity measure. In response to selecting an alternative runtime target, some embodiments of driver 25 may highlight the respective UI element, for instance as described above in relation to
[0089] When for any reason driver 25 cannot identify any alternative target, in some embodiments a step 814 returns an activity report to RPA agent 31 indicating that the current activity could not be executed because of a failure to identify the runtime target. In some embodiments, the activity report may further identify a subset of the target identification data that could not be matched in any element of the target webpage. Such reporting may facilitate debugging. When the current activity was successfully executed, the report sent to RPA agent 31 may comprise a result of executing the respective activity. In an alternative embodiment, step 814 may comprise sending the activity report and/or a result of executing the respective activity to a remote server computer (e.g., orchestrator 14) instead of the local RPA agent.
[0090]
[0091] The illustrated computer system comprises a set of physical devices, including a hardware processor 82 and a memory unit 84. Processor 82 comprises a physical device (e.g. a microprocessor, a multi-core integrated circuit formed on a semiconductor substrate, etc.) configured to execute computational and/or logical operations with a set of signals and/or data. In some embodiments, such operations are delivered to processor 82 in the form of a sequence of processor instructions (e.g. machine code or other type of encoding). Memory unit 84 may comprise volatile computer-readable media (e.g. DRAM, SRAM) storing instructions and/or data accessed or generated by processor 82.
[0092] Input devices 86 may include computer keyboards, mice, and microphones, among others, including the respective hardware interfaces and/or adapters allowing a user to introduce data and/or instructions into the respective computer system. Output devices 88 may include display devices such as monitors and speakers among others, as well as hardware interfaces/adapters such as graphic cards, allowing the illustrated computing appliance to communicate data to a user. In some embodiments, input devices 86 and output devices 88 share a common piece of hardware, as in the case of touch-screen devices. Storage devices 92 include computer-readable media enabling the non-volatile storage, reading, and writing of software instructions and/or data. Exemplary storage devices 92 include magnetic and optical disks and flash memory devices, as well as removable media such as CD and/or DVD disks and drives. The set of network adapters 94, together with associated communication interface(s), enables the illustrated computer system to connect to a computer network (e.g., network 13 in
[0093] The exemplary systems and methods described above facilitate the uptake of RPA technologies by enabling RPA software to execute on virtually any host computer, irrespective of its hardware type and operating system. As opposed to conventional RPA software, which is typically distributed as a separate self-contained software application, in some embodiments of the present invention RPA software comprises a set of scripts that execute within a web browser such as Google Chrome®, among others. Said scripts may be formulated in a scripting language such as JavaScript® or some version of bytecode which browsers are capable of interpreting.
[0094] Whereas in conventional RPA separate versions of the software must be developed for each hardware platform (i.e., processor family) and/or each operating system (e.g., Microsoft Windows® vs. Linux®), some embodiments of the present invention allow the same set of scripts to be used on any platform and operating system which can execute a web browser with script interpretation functionality. On the software developer's side, removing the need to build and maintain multiple versions of a robot design application may substantially facilitate software development and reduce time-to-market. Client-side advantages include a reduction in administration costs by removing the need to purchase, install, and upgrade multiple versions of RPA software, and further simplifying the licensing process. Individual RPA developers may also benefit by being able to design, test, and run automations from their own computers, irrespective of operating system.
[0095] However, performing RPA from inside of a browser presents substantial technical challenges. RPA software libraries may be relatively large, so inserting them into a target web document may be impractical and may occasionally cause the respective browser process to crash or slow down. Instead, some embodiments of the present invention break up the functionality of RPA software into several parts, each part executing within a separate browser process, window, or tab. For instance, in a robot design embodiment, a design interface may execute within one browser window/tab, distinct from another window/tab displaying the webpage targeted for automation. Some embodiments then only inject a relatively small software component (e.g., an RPA driver as disclosed above) into the target web page, the respective component configured to execute basic tasks such as identifying UI elements and mimicking user actions such as mouse clicks, finger taps, etc. By keeping the bulk of RPA software outside of the target document, some embodiments improve user experience, stability, and performance of RPA software.
[0096] Another advantage of having distinct RPA components in separate windows/tabs is enhanced functionality. Since modern browsers typically keep distinct windows/tabs isolated from each other for computer security and privacy reasons, an RPA system wherein all RPA software executes within the target web page may only have access to the contents of the respective window/tab. In an exemplary situation wherein clicking a hyperlink triggers the display of an additional webpage within a new window/tab, the contents of the additional webpage may therefore be off limits to the RPA software. In contrast to such RPA strategies, some embodiments of the present invention are capable of executing interconnected snippets of RPA code in multiple windows/tabs at once, thus eliminating the inconvenience. In one exemplary embodiment, the RPA driver executing within the target webpage detects an activation of a hyperlink and communicates the fact to the bridge module. In response, the bridge module may detect an instantiation of a new browser window/tab, automatically inject another instance of the RPA driver into the newly opened window/tab, and establish a communication channel between the new instance of the RPA driver and the RPA agent executing within the agent browser window, thus enabling a seamless automation across multiple windows/tabs.
[0097] Furthermore, a single instance of the RPA agent may manage automation of multiple windows/tabs. In a robot design embodiment, the RPA agent may collect target identification data from multiple instances of the RPA driver operating in distinct browser windows/tabs, thus capturing the details of the user's navigation across multiple pages and hyperlinks. In a robot execution embodiment, the RPA agent may transmit window-specific target identification data to each instance of the RPA agent, thus enabling the robot to reproduce complex interactions with multiple web pages, for instance scraping and combining data from multiple sources.
[0098] Meanwhile, keeping distinct RPA components in distinct windows/tabs creates extra technical problems by explicitly going against the browser's code isolation policy. To overcome such hurdles, some embodiments set up a communication channel between the various RPA components to allow exchange of messages, such as target identification data and status reports. One exemplary embodiment uses a browser extension mechanism to set up such communication channels.
[0099] It will be clear to one skilled in the art that the above embodiments may be altered in many ways without departing from the scope of the invention. Accordingly, the scope of the invention should be determined by the following claims and their legal equivalents.