Patent classifications
G06F16/3335
Locating meaningful stopwords or stop-phrases in keyword-based retrieval systems
A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.
Object location and processing
Embodiments described herein locate objects in input. Embodiments first parse the input into a form that can be used to perform the analysis required to construct a set of one or more objects. Embodiments then form, when possible, object character strings by using the grammatical values of the underlying terms. The set of object character strings can be used in a variety of textual analysis procedures, such as search, comparisons, and other combinatorial analysis that requires the use of objects in performing tasks related to an information repository of documents, files, messages, etc.
Method for managing semantic information on M2M/IoT platform
A method for managing semantic information on an M2M/IoT platform is provided. The method for managing semantic information according to an embodiment of the present invention stores semantic data in the first attribute of an M2M resource and updates a part of the semantic data stored in the first attribute. Accordingly, efficient management of semantic information on an M2M/IoT platform is possible, and particularly a partial update of the semantic information can be performed.
Hierarchical search for improved search relevance
A computer-implemented method is provided that includes receiving a search query and, responsive to the search query, providing one or more textual comments relevant to the search query. This includes tokenizing the search query and calculating a set of query term frequency metrics. A set of records relevant to the search query is then selected, from a persistent storage, based on determined similarities between the query term frequency metrics and frequency metrics determined for the records in the persistent storage. Textual comments within the selected records are associated with usefulness metrics. The textual comments relevant to the search query are selected by selecting those textual comments within the selected records that are associated with usefulness metrics that are within a pre-determined range, e.g., an inter-quartile range for a population of usefulness metrics.
Normalization of unstructured catalog data
Provided is a method and system for normalizing catalog item data to create higher quality search results. In one example, the method may include receiving a record comprising an unstructured description of an object, identifying a type of the object from among a plurality of object types and identifying a predefined attribute of the identified type of object, extracting a value from the unstructured description corresponding to the predefined attribute and modifying the extracted value to generate a normalized attribute value, and storing a structured record of the object in a structured format comprising a plurality of values of a plurality of attributes of the object from the unstructured description including the normalized attribute value for the predefined attribute of the object.
SYSTEM AND METHOD FOR SELF-GENERATED ENTITY-SPECIFIC BOT
The present disclosure relates to a system and method for generating an executable bot application specific to an entity. In an exemplary implementation, the proposed system receives a knowledgebase comprising a set of potential queries associated with the entity, and receives video frame responses corresponding to the potential queries, wherein each potential query is mapped to an intent. The system processes, through a machine learning model, training data comprising the set of potential queries, the video frame responses, and the intent mapped to each potential query to generate a trained model, based on which a prediction engine is configured to process an end-user query and predict an intent associated with the end-user query, and facilitate response to the end-user query based on video frame response that is mapped with the predicted intent. Using the prediction engine, the proposed system auto-generates executable bot application by the entity.
METHOD AND APPARATUS FOR PROVIDING INFORMATION ABOUT SIMILAR ITEMS BASED ON MACHINE LEARNING
Provided is a method of providing information on similar items based on machine learning, the method including receiving information on a target item, generating a target vector based on a character string corresponding to the information on the target item using a machine learning model, identifying one or more vector sets respectively corresponding to a plurality of items derived through the machine learning model, providing information on one or more items corresponding to one or more vectors in the one or more vector sets, the one or more vector having similarity value with the generated target vector greater than or equal to a preset threshold value.
Biased string search structures with embedded range search structures
A method in a data processing system and apparatus for organizing electronic data, structured or unstructured, of one or more users stored across one or more server computers into structures on a recordable medium of a data processing system. The data items are structured in a heterogeneous string structure, and one or more embedded n-dimensional range structure within the heterogeneous string structure. Searching the plurality of string structures can then be done with a query including at least one term and a range threshold. Each data item is associated with a scoring function that is used to filter and rank the matched results.
SYSTEMS AND METHODS FOR MICRO-CREDENTIAL ACCREDITATION
Systems and methods provide micro-credential accreditation. The systems and methods analyze, using one or more prediction models, received text submissions received from applicants via interaction with an applicant device. The prediction model(s) fit one or more micro-credentials to the received text submission, which may collectively or independently qualify the applicant for one or more accreditation credits. By processing the received text submission, the systems and methods allow for consistent and standard output of micro-credentials by the prediction model(s). Furthermore, the systems and methods provide for monitoring the prediction model output(s) to ensure ethical fairness across varying demographic groups of applicants.
SYSTEM AND METHOD FOR CONTENT COMPREHENSION AND RESPONSE
A method, apparatus and system for training an embedding space for content comprehension and response includes, for each layer of a hierarchical taxonomy having at least two layers including respective words resulting in layers of varying complexity, determining a set of words associated with a layer of the hierarchical taxonomy, determining a question answer pair based on a question generated using at least one word of the set of words and at least one content domain, determining a vector representation for the generated question and for content related to the at least one content domain of the question answer pair, and embedding the question vector representation and the content vector representations into a common embedding space where vector representations that are related, are closer in the embedding space than unrelated embedded vector representations. Requests for content can then be fulfilled using the trained, common embedding space.