Patent classifications
G06F16/319
Method and device for creating an index
Embodiments of the present disclosure generally relate to a method and device for creating an index. For example, the embodiments of the present disclosure propose a method for creating an index, comprising: dividing a document into a plurality of regions; determining the number of times that a token appears in the plurality of regions, the token including at least one character in the document; assigning respective weights to the plurality of regions; and creating an inverted document linked list directed to the token based on the number of times that the token appears in the plurality of regions and respective weights of the plurality of regions. In addition, the embodiments of the present disclosure propose a corresponding device and computer program product for creating an index.
METHOD AND DEVICE FOR SEARCHING CHARACTER STRING
Embodiments of the present disclosure provide a method and device for searching a character string. In one embodiment, a method of searching a character string is provided. The method comprises: determining a first set of documents including a first token in the character string, and a second set of documents including a second token in the character string; and generating a third set of documents based on the first and second sets of documents, in the third set of documents: i) a document being included in the first and second sets of documents, and ii) a distance between the first and second tokens in the document being equal to a distance between the first and second tokens in the character string. A corresponding device and a computer program product are also disclosed.
Encrypted search over encrypted data with reduced volume leakage
A method for performing encrypted search includes receiving a search query for a plurality of keywords from a user device that appear in one or more encrypted documents stored on an untrusted storage device. The method also includes accessing an encrypted search index to obtain a first list of document identifiers each representative of a document that includes a first keyword and a second keyword of the plurality of keywords. The method also includes, for each remaining keyword, determining a corresponding list of document identifiers each representative of a document that includes the first, second, and respective remaining keyword. The method includes determining, based on the first list of document identifiers and each corresponding list of document identifiers, a second list of document identifiers each representative of a document that includes each of the plurality of keywords. The method also includes returning the second list to the user device.
Method, apparatus, server and storage medium for image retrieval
Embodiments of the present disclosure disclose a method, apparatus, server and storage medium for image retrieval. The method includes: identifying a plurality of groups of images having identical contents from images on all webpages; aggregating, for each image group, image-related texts on all source webpages of each image to obtain text descriptions of each image group; establishing an inverted index for each image in the image groups based on the text descriptions of the image group, the inverted index at least including, for each text description, source webpages corresponding to all text descriptions of the image group of the text description; and performing image retrieval based on an inputted query and the inverted index.
ELECTRONIC DEVICE FOR SORTING HOMOMORPHIC CIPHERTEXT USING SHELL SORTING AND OPERATING METHOD THEREOF
Provided are an electronic device for sorting homomorphic ciphertext by using shell sorting and an operating method thereof to sort ciphertext generated by using homomorphic encryption according to a size of an original number corresponding thereto.
Using aggregate compatibility indices to identify query results for queries having qualitative search terms
The disclosed embodiments relate to a system that facilitates performing searches based on qualitative search terms. During operation, the system receives a query that applies a qualitative search term to an attribute of data items in a set of data items. While executing the query, the system processes each data item in the set of data items by extracting an attribute value from the data item and then using a concept-mapping to determine a compatibility index for the attribute value, wherein the concept-mapping associates each attribute value with a numerical compatibility index that indicates a compatibility between the attribute value and the qualitative search term. Finally, the system uses the compatibility index as a factor in determining whether to include the data item in a set of query results.
MASK-AUGMENTED INVERTED INDEX
The embodiments disclosed herein are related to a computing system for generating a mask-augmented inverted index. The mask-augmented inverted index is structured to allow phrase query searching while minimizing the amount of computing system processing and memory resources needed to generate the mask-augmented inverted index. In one embodiment, a first token is mapped to a first listing of documents that include the first token. A first mask is included that comprises a probabilistic representation of a set of integers corresponding to one or more locations of the first token in each of the individual documents of the first listing. A second mask is included that comprises a probabilistic representation of a set of integers that indicate a positional relationship between the first token and one or more other tokens in each of the individual documents of the first listing.
FEEDBACK-BASED INVERTED INDEX COMPRESSION
The disclosed technology is generally directed to the compression of inverted indexes. In one example of the technology, an inverted index that includes a plurality of posting lists and metadata is provided. The inverted index indicates compression settings that are associated with the plurality of posting lists. At periodic scheduled times, a regeneration is performed on the inverted index. The regeneration includes decompressing the inverted index. The decompressing uses the compression settings indicated by the inverted index. The regeneration further includes determining compression settings to use during a next periodic scheduled time of the plurality of periodic scheduled times, such that at least a first posting list of the plurality of posting lists uses a different compression setting than a second posting list of the plurality of posting lists.
Superindexing systems and methods
Embodiments of the present disclosure are directed to systems and methods for managing a database and performing database operations. An exemplary method in accordance with embodiments of this disclosure comprises: receiving a request to perform one or more database operations on a dataset comprising one or more data items; inputting the dataset into a statistical model, wherein the statistical model is configured to identify one or more storage locations associated with the one or more data items based on a similarity between one or more properties of the one or more data items; receiving the one or more storage locations associated with the one or more data items; updating the one or more data items based on the received one or more storage locations; and performing the one or more database operations on the one or more updated data items based on the one or more storage locations.
DATA PROCESSING DEVICE, DATA PROCESSING METHOD, AND DATA PROCESSING PROGRAM
A data processing apparatus (10) includes an acquisition unit (151) that acquires words included in a displayed content of a terminal screen, and a determination unit (153) that determines that the words acquired by the acquisition unit (151) are search candidates in a case where the acquired words are included in an inverted index in which words and identification information of work data including the words are stored in association with each other.