H03M7/707

Managing compression and storage of genomic data

A computer-implemented method according to one embodiment includes identifying genomic data within a system, dividing the genomic data into a plurality of partitions, creating a plurality of groups of different data types within each of the plurality of partitions, independently compressing, within each of the plurality of partitions, each of the plurality of groups of different data types to create a plurality of independently compressed partitions, validating each of the plurality of independently compressed partitions to create a plurality of validated independently compressed partitions, and saving the plurality of validated independently compressed partitions within the system.

METHOD AND DEVICE FOR THE LOSSLESS COMPRESSION OF A DATA STREAM
20200007154 · 2020-01-02 ·

Provided is a method and a device for the lossless compression of a data stream which includes a sequence of structured data objects which have a list of properties which each contain a key value pair, the method having the following steps: dividing the structured data objects of the data stream into a constant data object portion which has key value pairs with constant values and into variable data object portions which have key value pairs with variable values; transmitting the constant data object portion of the structured data objects once to a receiver; and transmitting the variable data object portions of the divided data objects of the data stream to the receiver.

CUSTOMIZABLE DELIMINATED TEXT COMPRESSION FRAMEWORK
20240095218 · 2024-03-21 ·

A method for compressing data includes obtaining a compression schema customized to a format of a delimited text file, and using the compression schema to parse the delimited text file into a plurality of data blocks, split each of the data blocks into a plurality of data units for efficient selective access, and compress the plurality of data units in the plurality of data blocks using different compression algorithms for improved compression ratio. The delimited file is split into a plurality of data blocks based on the region definitions in the schema. Each of the plurality of data blocks is split into the plurality of data units based on its respective data unit size specified in the schema. The plurality of data units in each of the plurality of data blocks are compressed using the different compression algorithms indicated by the compression instructions in the schema. The compressed file consists of the compressed data blocks, the compression schema and various metadata for data decompression, file reconstruction and functionalities such as data security and search query. The delimited text file may include genomic information or another type of information.

SECURE DECOMPRESSION
20190377803 · 2019-12-12 ·

A method and system including receiving a main input stream for a compressed file at an application server, wherein the main input stream includes two or more file streams; extracting a file-type extension from each file stream input stream; determining the file-type extension is supported; determining, for each file stream with the supported file-type extension, a signature for the file stream with the supported file-type extension is valid; determining, for each valid file stream, a size of the file is less than a threshold level; and storing the valid file stream on a storage device when the size of the file is less than the threshold level. Numerous other aspects are provided.

INFORMATION PROCESSING METHOD AND RELATED DEVICE

An information processing method includes obtaining text information and a sentence set; encoding a sentence in the sentence set using a first encoder to obtain a first encoded vector, and encoding the sentence using a second encoder to obtain a second encoded vector. The first encoded vector is determined according to the sentence, and the second encoded vector is determined according to a feature of the sentence. The method also includes determining a sentence encoded vector according to the first and second encoded vectors; encoding the sentence encoded vector using a third encoder to obtain global information; decoding the global information using a decoder; and determining a probability value corresponding to the sentence. Accordingly, when a deep learning method is used, a manually extracted sentence is further added to perform feature training, to effectively improve a learning capability of a model, thereby improving an information processing capability and effect.

Method for processing and loading web pages supporting multiple languages and system thereof

The present invention relates to the application field of computer networks, and disclosed are a method for processing and loading a web page supporting multiple languages and a system thereof, so as to reduce time and cost of labor investment when some language is added or modified, save storage capacity of a web page server, increase the speed of page loading and translation rendering, and reduce the redundancy of a translation file set. The present invention is based on a tree-shaped translation file set, where each hypertext markup language (HTML) has a corresponding translation file. The method includes the following steps: scanning all translation files in a translation file set; extracting a same language string in different translation files or in different node sets of a same translation file, and inserting the same language string into a minimum common ancestor translation file of the different translation files or a common node set of the same translation file; deleting the same language string from the original translation files or node sets.

SYSTEM AND METHOD FOR CONTROLLING ACCESS TO ENCRYPTED VEHICULAR DATA
20190260580 · 2019-08-22 ·

This document describes a system and method for controlling access to encrypted vehicular data. The system described in this document employs a hierarchical access control method that allows select encrypted vehicular data stored in a cloud server to be accessed by an authorized user in a hierarchical manner whereby the authorized user is then able to decrypt the select encrypted data and all child data associated with the select encrypted data.

Computer Architecture for High-Speed, Graph-Traversal
20190258401 · 2019-08-22 ·

A computer architecture for graph-traversal provides a processor for bottom-up sequencing through the graph data according to vertex degree. This ordered sequencing reduces redundant edge checks. In one embodiment, vertex adjacency data describing the graph may be allocated among different memory structures in the memory hierarchy to provide faster access to vertex data associated with vertices of higher degree reducing data access time. The adjacency data also may be coded to provide higher compression in memory of vertex data having high vertex degree.

Warm start file compression using sequence alignment
11977517 · 2024-05-07 · ·

Compressing files is disclosed. An input file to be compressed is first aligned. Aligning the file includes splitting the file into sequences that can be aligned. The result is a compression matrix, where each row of the matrix corresponds to part of the file. The compression matrix may also serve as a warm start if additional compression is desired. Compression may be performed in stages, where an initial compression matrix is generated in a first stage using larger letter sizes for alignment and then a second compression stage is performed using smaller letter sizes. A consensus sequence id determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, PROGRAM AND STORAGE MEDIUM
20190251136 · 2019-08-15 · ·

An information processing device obtains an increasing tendency of a storage capacity utilized by a blog containing at least one article, sets, to the blog, a threshold for determining whether or not to compress at least a part of the article contained in the blog in accordance with the increasing tendency, determines whether or not the blog is to be compressed based on a total data amount of the at least one article contained in the blog and on the threshold, and determines whether or not to compress each of the at least one article contained in the blog in accordance with a degree of accessibility.