Patent classifications
G16B15/00
T-CELL RECEPTOR REPERTOIRE SELECTION PREDICTION WITH PHYSICAL MODEL AUGMENTED PSEUDO-LABELING
Systems and methods for predicting T-Cell receptor (TCR)-peptide interaction, including training a deep learning model for the prediction of TCR-peptide interaction by determining a multiple sequence alignment (MSA) for TCR-peptide pair sequences from a dataset of TCR-peptide pair sequences using a sequence analyzer, building TCR structures and peptide structures using the MSA and corresponding structures from a Protein Data Bank (PDB) using a MODELLER, and generating an extended TCR-peptide training dataset based on docking energy scores determined by docking peptides to TCRs using physical modeling based on the TCR structures and peptide structures built using the MODELLER. TCR-peptide pairs are classified and labeled as positive or negative pairs using pseudo-labels based on the docking energy scores, and the deep learning model is iteratively retrained based on the extended TCR-peptide training dataset and the pseudo-labels until convergence.
T-CELL RECEPTOR REPERTOIRE SELECTION PREDICTION WITH PHYSICAL MODEL AUGMENTED PSEUDO-LABELING
Systems and methods for predicting T-Cell receptor (TCR)-peptide interaction, including training a deep learning model for the prediction of TCR-peptide interaction by determining a multiple sequence alignment (MSA) for TCR-peptide pair sequences from a dataset of TCR-peptide pair sequences using a sequence analyzer, building TCR structures and peptide structures using the MSA and corresponding structures from a Protein Data Bank (PDB) using a MODELLER, and generating an extended TCR-peptide training dataset based on docking energy scores determined by docking peptides to TCRs using physical modeling based on the TCR structures and peptide structures built using the MODELLER. TCR-peptide pairs are classified and labeled as positive or negative pairs using pseudo-labels based on the docking energy scores, and the deep learning model is iteratively retrained based on the extended TCR-peptide training dataset and the pseudo-labels until convergence.
Methods and systems for analysis of mass spectrometry data
A method of analysing a structure of a composition of matter in a sample includes obtaining a data set comprising a plurality of spectra from the composition, from a first method of analysis, dividing each of the spectra into a plurality of bins, determining a control parameter or parameters indicative of synchronised fluctuations in signal intensity across some or all channels, resulting in universal correlation between said bins, and determining a partial covariance of different bins across the plurality of spectra using the control parameter to correct the correlation of intensity fluctuations between said bins.
Methods and systems for analysis of mass spectrometry data
A method of analysing a structure of a composition of matter in a sample includes obtaining a data set comprising a plurality of spectra from the composition, from a first method of analysis, dividing each of the spectra into a plurality of bins, determining a control parameter or parameters indicative of synchronised fluctuations in signal intensity across some or all channels, resulting in universal correlation between said bins, and determining a partial covariance of different bins across the plurality of spectra using the control parameter to correct the correlation of intensity fluctuations between said bins.
Transcriptome-wide design of selective, bioactive small molecules targeting RNA
Methods and computer systems are described herein for identifying small molecules that bind to selected RNA structural features (e.g., to RNA secondary structures). Also described are compounds and compositions that modulate RNA function and/or activity.
Transcriptome-wide design of selective, bioactive small molecules targeting RNA
Methods and computer systems are described herein for identifying small molecules that bind to selected RNA structural features (e.g., to RNA secondary structures). Also described are compounds and compositions that modulate RNA function and/or activity.
Method for verifying the primary structure of protein
Disclosed herein is a method for verifying the primary structure of a protein through comparative analyses between ion clusters observed in mass spectra and a series of simulated ion clusters deduced from its putative chemical formula. The method comprises the steps of: preparing a protein sample for mass spectrometric analyses; collecting mass spectra of the protein sample; obtaining master ion cluster from a plurality of ion clusters in the mass spectra; producing a series of simulated ion clusters according to the chemical formula of the protein; finding the best fit for the master ion cluster among the series of simulated ion clusters; and verifying if said best-fit simulated ion cluster corresponds to the chemical formula of the protein.
Method for verifying the primary structure of protein
Disclosed herein is a method for verifying the primary structure of a protein through comparative analyses between ion clusters observed in mass spectra and a series of simulated ion clusters deduced from its putative chemical formula. The method comprises the steps of: preparing a protein sample for mass spectrometric analyses; collecting mass spectra of the protein sample; obtaining master ion cluster from a plurality of ion clusters in the mass spectra; producing a series of simulated ion clusters according to the chemical formula of the protein; finding the best fit for the master ion cluster among the series of simulated ion clusters; and verifying if said best-fit simulated ion cluster corresponds to the chemical formula of the protein.
PREDICTION OF PEPTIDE CLEAVAGE IN POLYPEPTIDES THROUGH PHYSICS-BASED SIMULATIONS
The present disclosure relates to polypeptide degradation, and in particular to techniques for predicting the likelihood that a peptide bond for a given polypeptide molecule is susceptible to a cleavage reaction. Particularly, aspects of the present disclosure are directed to generating a representation of a polypeptide, performing a molecular-dynamics simulation using the representation to obtain a set of polypeptide conformations, determining, for each polypeptide conformation, a spatial characteristic of an amino acid, estimating a nucleophilic attack distance of each polypeptide conformation based on the spatial characteristic, identifying a reactive conformation that is susceptible to a cleavage reaction based on the nucleophilic attack distance of each polypeptide conformation, determining a free energy of the spatial characteristic of the amino acid in the reactive conformation; and predicting a probability of the side chain of the amino acid being trapped in the reactive conformation based on the free energy.
Flywheel discovery system that twins machine learning with high-throughput expression and laboratory analysis to identify and develop individual proteins as food ingredients
This disclosure provides a technology for developing alternative protein sources for use in industrial food production. The technology mines sequence data by a process that is done partly in silico. Instead of sampling and testing a vast library of compounds, machine learning and implementation narrows the field of functional candidates by predictive modeling based on known protein structure. Candidate proteins that are selected by this analysis are then produced and screened in a high-throughput manner by recombinant expression and testing to determine whether they have a target function. Multiple cycles of the machine learning, database mining, expression, and testing are done to yield potential ingredients suitable for assessment as part of a commercial food product.