G06F16/162

Systems and methods for improved web searching

Systems and methods are provided for improved web searching. In one implementation, suggested search queries am provided based on previous search queries and click data. A weighted bi-partite graph or index may be used to identify related search queries based on overlapping clicked URLs. According to a method, query-click log data of a search engine is processed to generate sets of suggested search queries, data corresponding to each suggested search query, and a set of clicked URLs related to each suggested search query. Additionally, or independently, methods may be provided for contextually correcting spelling errors within sets of suggested search queries using a contextual algorithm, and/or identifying and discarding sets of suggested search queries and URLs that lead to restricted material, such as restricted content and related URLs.

Method, Apparatus and Device for Deleting Distributed System File, and Storage Medium
20230025135 · 2023-01-26 ·

A method, apparatus and device for deleting a distributed system file, and a storage medium, comprising: querying whether an incomplete file deletion operation exists under a sub-tree root corresponding to a certain MDS (S102); if there is an incomplete file deletion operation, continuing to delete data under the sub-tree root corresponding to the MDS, and determining whether the sub-tree root is a copy (S103); and if the sub-tree root is a copy, deleting the sub-tree root copy in a memory of the MDS (S104). By means of the steps, the number of copies of the sub-tree root in the distributed file system can be reduced, thereby reducing the number of interaction times of master copy locking between different MDSs, improving the file deletion efficiency, and improving user friendliness and differentiated competitiveness of a product.

COMPRESSION OF LOCALIZED FILES
20230021891 · 2023-01-26 · ·

A method for compressing a first application file and second application file includes accessing the first and the second application files, the first application file being in a first language and the second application being in a second language and being a counterpart of the first application file, decompressing the first and second application files to access internal files for the first and the second application files, comparing one of the first internal files to one of the second internal files, upon determining that the first internal file is identical to the second internal file, copying one of the internal files to an output folder, and upon determining that the files are not identical, copying both of the internal files to the output folder, or executing a differencing procedure on the first and second internal files to identify differences between them, storing data about the differences in the output folder, and compressing the output folder into one output file.

DETERMINING SHARED NODES BETWEEN SNAPSHOTS USING PROBABILISTIC DATA STRUCTURES

The present disclosure is related to methods, systems, and machine-readable media for determining shared nodes between snapshots using probabilistic data structures. A unique identifier can be assigned to each node of a first tree data structure corresponding to a first snapshot of a virtual computing instance (VCI). A first probabilistic data structure representing the first tree data structure can be created that includes hashes of the identifiers assigned to the nodes of the first tree data structure. A unique identifier can be assigned to each node of a second tree data structure corresponding to a second snapshot of the VCI. A second probabilistic data structure representing the second tree data structure can be created that includes hashes of the identifiers assigned to the nodes of the second tree data structure. A particular node of the second tree data structure can be determined to be shared by the first tree data structure responsive to a determination that the first probabilistic data structure includes a hash of an identifier assigned to the particular node.

Information source agent systems and methods for distributed data storage and management using content signatures
11561931 · 2023-01-24 · ·

Information source agent systems and methods for distributed content storage and management using content signatures that use file identicality properties are provided. A data management system is provided that includes a content engine for managing the storage of file content, a content signature generator that generates a unique content signature for a file processed by the content engine, a content signature comparator that compares content signatures and a content signature repository that stores content signatures. Information source agents are provided that include content signature generators and content signature comparators. Methods are provided for the efficient management of files using content signatures that take advantage of file identicality properties. Content signature application modules and registries exist within information source clients and centralized servers to support the content signature methods.

Independent evictions from datastore accelerator fleet nodes

A fleet of query accelerator nodes is established for a data store. Each accelerator node caches data items of the data store locally. In response to determining that an eviction criterion has been met, one accelerator node removes a particular data item from its local cache without notifying any other accelerator node. After the particular data item has been removed, a second accelerator node receives a read query for the particular data item and provides a response using a locally-cached replica of the data item.

Method, device and computer program product for shrinking storage space

Techniques for shrinking a storage space involve determining a used storage space in a storage pool allocated to a plurality of file systems, and determining a usage level of a storage space in the storage pool based on the used storage space in and a storage capacity of the storage pool. The techniques further involve shrinking a storage space from one or more of the plurality of file systems based on the usage level of the storage pool. Such techniques may automatically shrink storage space in one or more file systems from the global level of the storage pool, which determines an auto shrink strategy according to overall performance of the storage pool, thereby improving efficiency of auto shrink and balancing system performance and saving space.

Intelligent management of stub files in hierarchical storage

Intelligent management of stub files in hierarchical storage is provided by: in response to identifying a file to migrate from a file system to offline storage, providing metadata for the file to a machine learning engine; receiving a stub profile for the file from the machine learning engine that indicates an offset from a beginning of the file and a length from the offset for previewing the file; and migrating the portion of the file from the file system to an offline storage based on the stub profile. In some embodiments this further comprises: monitoring file system operations; in response to detecting a read operation of the portion of the file: determining a file type; providing file data to the machine learning engine; and performing a supervised learning operation based on the file type and the file data to update the machine learning engine.

Efficient filename storage and retrieval
11704336 · 2023-07-18 · ·

The disclosed technology relates to a system configured to detect a modification to a node in a tree data structure. The node is associated with a content item managed by a content management service as well as a filename. The system may append the filename and a separator to a filename array, determine a location of the filename in the filename array, and store the location of the filename in the node.

Performing quantum file concatenation
11556833 · 2023-01-17 · ·

Performing quantum file concatenation is disclosed herein. In one example, a quantum file manager receives a request to concatenate a first quantum file comprising a first plurality of qubits and a second quantum file comprising a second plurality of qubits. Responsive to receiving the request, the quantum file manager concatenates the first quantum file and the second quantum file into a concatenated quantum file comprising a third plurality of qubits, wherein the third plurality of qubits comprises a same number of qubits as a union of the first plurality of qubits and the second plurality of qubits, and stores an identical sequence of data values as the first plurality of qubits followed by the second plurality of qubits.