Patent classifications
G06F16/316
Bit vector search index using shards
The technology described herein provides a bit vector search index for a search system that uses shards. The bit vector search index comprises a data structure for indexing data about terms from a corpus of documents. The data structure includes a number of bit vectors. Each bit vector comprises an array of bits and corresponds to a different set of terms. Bits in the bit vector are used to represent whether at least one document corresponding to the bit includes at least one term from the set of terms corresponding to the bit vector. The search index is provided in a number of shards. Each shard corresponds to a subset of documents having documents lengths within particular a range of document lengths.
Systems and methods for document analytics
A system and method dynamically analyzes documents to determine the relevancy of a document relatively quickly and efficiently. Potentially relevant documents can be determined using a search string and then converted into corresponding document data structures for analysis. Keywords can be used to identify documents of interest from the document data structures. Tools are provided to assess the relevancy of documents, including tools to determine the frequency of keywords in the documents, to compare documents, and to contrast documents. Algorithms are provided that use prior searches to determine sets of relevant documents. Adaptive search methods are provided that refine searching during analysis to reduce a number of documents that are not sufficiently relevant. A dynamic relevancy matrix can be generated that provides access to keyword frequency and associated keyword frequency for a plurality of documents.
Efficient migration to distributed storage
A computer program product, system, and method for determining a list of objects, within source storage, to migrate; generating a chunk layout for the objects to migrate; and for each unencoded chunk within the chunk layout: retrieving objects from source storage specified by the unencoded chunk within the chunk layout; generating data and coded fragments for the unencoded chunk using the retrieved objects; and storing the data and coded fragments to primary storage.
CHEMICAL FORMULATION-AWARE COGNITIVE SEARCH AND ANALYTICS
A method, computer system, and a computer program product for identifying and storing at least one representation to at least one chemical compound is provided. The present invention may include identifying a chemical compound associated with a source data. The present invention may also include assigning a structure representation to the identified chemical compound associated with the source data. The present invention may further include computing an unformulated representation based on the assigned structure representation. The present invention may then include indexing the computed unformulated representation and the assigned structure representation. The present invention may further include storing the indexed unformulated representation and the indexed structure representation separately as single records in a database.
Search index
Method of searching comprising applying a function to individual elements within a digital work to form a set of index elements. Storing the index elements as an index for the digital work. Receiving a search term. Applying the function to one or more individual elements within the search term to convert the search term into one or more converted search term elements. Identifying a digital work having an index containing one or more index elements that match one or more of the converted search term elements. Returning search results of the identified digital work. Method of searching for a digital work comprising the steps of providing a search term. Receiving search results formulated by applying a function to one or more individual elements within the search term to convert the search term into one or more converted search term elements. Identifying a digital work having an index containing one or more index elements that match one or more of the converted search term elements, wherein the index is formed by applying the function to individual elements within the digital work to form a set of the index elements. Searchable index for a digital work formed by applying a function to individual elements within the digital work to form a set of index elements.
Detecting violation of aircraft separation requirements
A computing system obtains flight information comprising a plurality of waypoints for each of a plurality of aircraft flight paths, and detects a violation of aircraft separation requirements at a given time instance. Each waypoint specifies an altitude, a longitudinal position, a latitudinal position, a velocity, and a time instance. Detecting the violation comprises selecting a set of time-correlated waypoints from the flight information, each time-correlated waypoint specifying the given time instance. The detecting further comprises selecting a set of altitude-correlated waypoints from the set of time-correlated waypoints, each of the altitude-correlated waypoints being vertically-separated from at least one other altitude-correlated waypoint by less than a threshold vertical separation. The detecting further comprises determining that first and second position-correlated waypoints from the set of altitude-correlated waypoints are vertically-separated from each other by less than the threshold vertical separation and horizontally-separated from each other by less than a threshold horizontal separation.
Method and system for hybrid information query
Method, system, and programs for hybrid information query. A request is first received from a user associated with a hybrid query. The hybrid query is expressed in accordance with an input in terms of one of a user, a feature, and a document, and a desired hybrid query result in terms of one of a user, a feature, and a document. A mapping is then determined between the input and the desired hybrid query result. A hybrid model is established based on hybrid information collected and associated with one or more users. The mapping is performed based on the hybrid model to obtain the desired hybrid query result based on the input. Eventually, the desired hybrid query result is provided as a response to the hybrid query.
Automatic ontology generation
An ontology is automatically generated for a set of data from multiple data sources. A semantic network of known concepts serves as an ontology template for a target domain of knowledge with known concepts defined as base entity types. Logical groupings of data and associated technical metadata are read from the data sources. Data discovery techniques are applied to detect semantic and/or syntactic classification of data attributes in the logical groupings of data. For each of the logical groupings of data, an entity type with properties for the data attributes are generated and the generated entity type is added to the semantic network with classifications of the properties as derived from the applied data discovery techniques. Semantic meanings of the generated entity types are generated and associated with other entity types within the semantic network and the semantic network is output as a resultant ontology for the set of data.
Computer Search Engine for Healthcare Outcome Efficiency
A computer search engine for medical and/or pharmacy claims (in combination with employer human resource records or on a stand-alone basis) that ranks healthcare providers and/or intervention strategies by root diagnosis based upon their overall average outcome efficiency. Outcome efficiency is the adjusted cost per day to keep a patient functional (or in the case of an employer, keep an employee at work), so the lower the outcome efficiency the better. The search engine uses drop-down menus and/or similar techniques that require the user to select a root diagnosis on which to search, as well as other variables (e.g. provider category, geographic proximity, in-network versus in or out of network, etc.), turning an open-ended question, e.g. Which doctor should I go to for back pain? to a closed-ended one Which surgeons in my network within 25 miles have the best outcome efficiencies for back surgery?
ORCHESTRATED SUPERVISION OF A COGNITIVE PIPELINE
A method, computer system, and a computer program product for coordinating supervision of at least one document processing pipeline is provided. The present invention may include receiving one or more documents. The present invention may then include parsing the received one or more documents to identify one or more performance indicators associated with the received one or more documents. The present invention may also include processing the parsed one or more documents based on a series of processor nodes. The present invention may further include identifying one or more deviations associated with the identified one or more performance indicators. The present invention may also include transferring the identified one or more deviations to a supervisor component. The present invention may then include generating at least one deviation escalation. The present invention may then further include reprocessing the generated at least one deviation escalation after a human response.