Infrared and fluorescence spectroscopic finger-printing of raw materials for use in the cultivation of a mammalian cell expressing a protein of interest
10816477 ยท 2020-10-27
Assignee
Inventors
- Jose Cardoso-Menezes (Lisbon, PT)
- Christian HAKEMEYER (Munich, DE)
- Gledson Emidio Jose (Glasgow-Scotland, GB)
- Ulrike Strauss (Kochel am See, DE)
- Silke Werz (Munich, DE)
Cpc classification
G01N21/6486
PHYSICS
G01N24/087
PHYSICS
International classification
Abstract
The invention provides a method for the selection of cultivation component batches to be used in the cultivation of a mammalian cell expressing a protein of interest wherein at least two different components are employed in the cultivation.
Claims
1. A method for selecting testing lots of cultivation components to be used in the cultivation of a mammalian cell expressing a protein of interest wherein at least two different components are employed in the cultivation, the method comprising the following steps: (a) measuring spectra of different lots of a first component, wherein the spectra are measured by a first spectrometer using near-infrared (NIR) or mid-infrared (MIR) spectroscopy; (b) measuring spectra of different lots of a second component, wherein the spectra are measured by a second spectrometer using 2D-fluorescence spectroscopy; (c) processing the measured spectra data by chemometrics to generate fused and compressed spectral patterns; (d) cultivating the mammalian cell expressing the protein of interest for a period of time, using combinations of the different lots of the first and second components; (e) measuring a supernatant yield of the protein of interest isolated from the cultivation in step (d) and compiling a calibration dataset comprising the lots' information and corresponding yields for different lots of the first and second components; (f) establishing a mathematical model correlating the spectral patterns from step (c) with the calibration dataset from step (e) for the first and second components; (g) measuring and processing the spectrum of a testing lot of the first component, wherein the spectrum is measured and processed according to steps (a) and (c); (h) measuring and processing the spectrum of a testing lot of the second component, wherein the spectrum is measured and processed according to steps (b) and (c); (i) predicting cultivation supernatant yield for the testing lot of the first component and the testing lot of the second component by applying the spectral data from steps (g) and (h) to the mathematical model from step (f); and (j) selecting a combination of the tested lots of the first and second components for use in the cultivation of the mammalian cell expressing the protein of interest if the predicted cultivation supernatant yield from step (i) is within +/10% of the mean yield of the protein of interest measured in step (e), wherein the chemometrics used in step (c) is principal component analysis (PCA).
2. The method according to claim 1, wherein the spectral patterns are represented by PCA scores.
3. The method according to claim 1, wherein the mathematical model in step (f) is established using partial least square analysis (PLS).
4. The method according to claim 1, wherein the protein of interest is an antibody, an antibody fragment or an antibody conjugate.
5. The method according to claim 1, wherein the first component is a raw material.
6. The method according to claim 5, wherein the raw material is soy protein hydrolysate or rice protein hydrolysate.
7. The method according to claim 1, wherein the second component is a chemically defined basic medium.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
DETAILED DESCRIPTION OF THE INVENTION
(22) It has been found that the performance of production processes for recombinant proteins can be predicted based on the combined information contained in NIR and 2D-fluorescence spectra of media components, such as protein hydrolyzates and/or chemically defined media preparations which are used as components of a complex cultivation medium.
(23) Herein is reported a method in which spectra from two different (orthogonal) spectroscopy techniquesafter processing to make them additive via variable reduction to principal component analysis (PCA) scoresobtained on two media components used in the fermentation of recombinant biopharmaceuticals are combined and models of such transformed spectra (inputs) are used to predict the yields at harvest (output) of biopharmaceutical product's cultivations based on mixtures of studied media components with lot-to-lot variability in terms of different fermentation performance.
(24) By using different (orthogonal) spectroscopies in combination with PCA methods (to ensure their additivity) and producing process models of the effect of such cultivation media mixtures on yields at harvest of the main fermentation a predictive capability is established that allows selecting media lots of each raw material and/or formulating mixtures that best serve the process goals.
(25) Different lots of individual components forming a complete cultivation medium vary slightly in their detailed composition but are still within the specification given by the manufacturer. In some cases, it is possible to trace this variability to single ingredients, but most commonly the lot-to-lot variability cannot be detected by analytical means. For the evaluation of the influence of different individual component lots on product yield a comparable cultivation of the same mammalian cell line can be repeatedly performed.
(26) Herein are reported 56 cultivations in which nine different lots of a soy protein hydrolyzate, two mixtures of two different soy protein hydrolyzate lots, five lots of a rice protein hydrolyzate, and six lots of a chemically defined basic medium powder were employed in the fermentation and feed medium, respectively.
(27) To assess the influence of different soy protein hydrolyzate lots with respect to product yield comparable cultivations were performed in which the same lots of a chemically defined basic medium and a rice protein hydrolyzate were used in fermentation and feed media. The results can be grouped according to the different soy protein hydrolyzate lots employed. The performance of different lots was evaluated based on the product yield at similar average inoculation cell density (ICD) values (Table 1).
(28) TABLE-US-00001 TABLE 1 soy protein chemically rice protein product hydrolyzate defined basic hydrolyzate at 330 h batch lot No. medium lot No. lot No. ICD [mg/l] D45KD11 1 1 1 5.7 1319 D45KD12 5.3 1234 D45KD13 5.6 1305 D45KD22 2 5.3 1023 D45KD23 5.1 1070 D45KD31 3 4.8 1008 D45KD32 4.9 991 D45KD33 5.3 978
(29) The results obtained for a second set of cultivations are listed in Table 2.
(30) TABLE-US-00002 TABLE 2 soy protein chemically rice protein product hydrolyzate defined basic hydrolyzate at 330 h batch lot No. medium lot No. lot No. ICD [mg/l] D52KD11 1 2 2 6.1 1434 D52KD12 5.0 1411 D52KD13 5.6 1459 D52KD21 4 5.0 1213 D52KD22 5.3 1243 D52KD23 5.4 1163 D55KD11 5 5.0 1409 D55KD12 5.4 1426 D55KD13 5.7 1430 D55KD21 2 6.8 1263 D55KD22 6.8 1256 D55KD23 6.8 1278 D55KD31 6 6.1 1269 D55KD32 6.1 1262 D55KD33 5.8 1265
(31) It can be seen that different lots of the individual components result in different product yields. In this series of cultivations also different average ICD values were used. Although having low ICD values, cultivations using lot 1 and lot 5 gave significantly higher product yields than the ones having higher ICD values (lot 3 and lot 6). Thus, different soy protein hydrolyzate lots results in different production performance.
(32) Analogously the influence of rice protein hydrolyzate on process performance can be evaluated (Table 3).
(33) TABLE-US-00003 TABLE 3 soy protein chemically rice protein product hydrolyzate defined basic hydrolyzate at 330 h batch lot No. medium lot No. lot No. ICD [mg/l] D61KD11 3 3 2 5.9 1132 D61KD12 6.0 1085 D61KD13 5.3 1101 D61KD21 3 6.1 1062 D61KD22 6.1 1056 D61KD23 5.6 1043
(34) Six cultivations were performed and can be grouped according to the different lots of rice protein hydrolyzate used in each of them. Performance of the different rice protein hydrolyzate lots can be evaluated based on the mean product yield. Both groups, i.e. rice protein hydrolyzate lots, have similar ICD values.
(35) To assess the influence of the chemically defined basic medium on the product yield, cultivations can be performed with the same lots of soy protein hydrolyzate and rice protein hydrolyzate in the fermentation initial media formulation and feed media. Three series of experiments were performed (Tables 4, 5 and 6).
(36) The first series comprised six cultivations having soy protein hydrolyzate lot 3 (as in Table 3) and rice protein hydrolyzate lot 2 (as in Table 2) in the fermentation and feed media. Cultivations were grouped according to the chemically defined basic medium lot used. Performance of different chemically defined basic medium lots was evaluated based on the product yield. There is a slight difference between the two groups in both the average ICD and average product yield. With lower ICD a lower product formation can be obtained. Thus, the chemically defined basic medium lots have little or no effect on product yield.
(37) TABLE-US-00004 TABLE 4 soy protein chemically rice protein product hydrolyzate defined basic hydrolyzate at 330 h batch lot No. medium lot No. lot No. ICD [mg/l] D55KD21 3 2 2 6.8 1263 D55KD22 6.8 1256 D55KD23 6.8 1278 D61KD11 3 5.9 1132 D61KD12 6.0 1085 D61KD13 5.3 1101
(38) The second series involved six cultivations employing soy protein hydrolyzate lot 1 (as in Table 2) in the fermentation initial media formulation and feed media. Experiments were grouped according to the chemically defined basic medium lot used. No significant ICD differences were present. Thus, the differences on product yield are due to differences in the chemically defined basic medium lots used.
(39) TABLE-US-00005 TABLE 5 soy protein chemically product hydrolyzate defined basic at 330 h batch lot No. medium lot No. ICD [mg/l] D45KD11 1 1 5.7 1319 D45KD12 5.3 1234 D45KD13 5.6 1205 D52KD11 2 6.1 1434 D52KD12 5.0 1411 D52KD13 5.6 1459
(40) The third series involved five cultivations having soy protein hydrolyzate lot 2 in the fermentation initial media formulation and feed media. Experiments were grouped according to the chemically defined basic medium lot used. There is a difference between the two groups in both the ICD used and the product concentration obtained.
(41) TABLE-US-00006 TABLE 6 soy protein chemically product hydrolyzate defined basic at 330 h batch lot No. medium lot No. ICD [mg/l] D45KD22 2 1 5.3 1023 D45KD23 5.1 1070 D73KD11 4 4.9 1062 D73KD12 4.3 1112 D73KD13 4.4 1121
(42) From the above it can be seen that there exists a need for raw-material lot characterization and a need to provide a method in which the obtained data can be used to predict which raw-material lots produce higher yields of product without the need to perform fermentation experiments.
(43) NIR, MIR, and 2D-fluorescence spectra can be acquired of all lots of the three different cultivation media components. Thereafter spectra analysis can be performed with established chemometric methods. A novel way of analyzing the spectral information obtained with these different sources is reported herein and can be used for predictive modeling purposes.
(44) NIR spectra of the lots of the raw materials were obtained as triplicates in different time periods. For powder and heterogeneous coarse samples NIR spectra vary among replicates. Such outlying replicates can be eliminated based on their relative location in the PCA scores plot space (Euclidean distance).
(45) NIR spectra of 18 lots of soy protein hydrolyzate, 12 lots of rice protein hydrolyzate, and 14 lots of chemically defined basic medium were selected out of all provided measurements. NIR spectra were collected between 4,784 cm.sup.1 and 8,936 cm.sup.1. This spectral region does not contain noisy regions. The observed strong baseline shifts are due to light scattering associated with different raw-material lots having differences in mean particle size distributions (granularity). The analysis of raw spectra without baseline correction allows to focus on variations mainly caused by physical effects. PCA analysis of raw spectra was performed for each raw material separately.
(46)
(47)
(48)
(49) The three analyzed cultivation media components show significant lot-to-lot variability in granularity and humidity content, as can be seen by the NIR spectra obtained. NIR is very sensitive to both these factors. Additionally both these factors dominate over smaller but still significant chemical composition differences that might be present. Prior to PCA analysis physical information has to be removed by spectra pre-processing.
(50) Water absorbs very strongly in the NIR region especially in the range of from 6,900 cm.sup.1 to 7,150 cm.sup.1 and of from 5,160 cm.sup.1 to 5,270 cm.sup.1. These absorption regions are caused by the first overtone of the OH stretching band and the combination of the OH stretching and the OH bending bands, respectively. Water absorption regions can be removed. Moreover, the baseline shift can be eliminated by applying multiplicative scatter correction (MSC). In order to enhance the variance between samples, the Savitzky-Golay filtering and smoothing method can be applied, and spectra can be transformed to their first derivative (window of 25 points).
(51) The PCA analysis was performed on previously pre-processed spectra of soy protein hydrolyzates (
(52) The PCA analysis was performed on previously pre-processed spectra of rice protein hydrolyzates (
(53) The PCA analysis of the chemically defined basic mediums' pre-processed spectra (
(54) Besides NIR spectra, fluorescence excitation-emission spectra (EEM) acquired of different water soluble fermentation raw-materials can be analyzed. A three-way data array, with excitation wavelengths along the x-axis, emission wavelengths along the y-axis, and intensity along the z-axis can be established. In
(55) 2D-Fluorescence spectra of 19 lots of soy protein hydrolyzate, of 12 lots of rice protein hydrolyzate, and of 14 lots of chemically defined basic medium were obtained. The spectra were obtained using excitation wavelengths from 200 nm to 600 nm, with intervals of 5 nm, and emission wavelengths also from 200 nm to 600 nm, with intervals of 2 nm, giving a total of 81 excitation and 201 emission wavelengths.
(56) In order to allow a prediction of cultivation yield based on the analysis of the raw material a three-way array for each of the raw materials can be generated from the individual matrices.
(57) A typical EEM spectrum can be influenced by Rayleigh and Raman scattering effects, which affect the information content of the fluorescence landscape. To overcome the Rayleigh effect several strategies and techniques can be used: zeroing the emission wavelengths smaller than the excitation ones; inserting missing values in the region of scattering; excluding the region of scattering and interpolating the removed points; or subtracting the background spectra.
(58) It has been found that excluding the region of scattering and the interpolation of the removed points is most suited in the method as reported herein. The Matlab algorithm EEMscat can be employed therefore. This algorithm can be downloaded free from world-wide-web site: httt://www.models.kvl.dk/source/EEM_correction/. With this proceeding the scattering can be removed completely. The spectrum also shows pronounced noise along the entire emission axis in the first excitation wavelength. This region (200 nm to 225 nm) was excluded from the spectra, as well the non-informative emission wavelengths (200 nm to 315 nm and 596 nm to 600 nm) and excitation wavelengths (580 nm to 600 nm). The resulting spectrum is shown in
(59) The final soy protein hydrolyzate spectra are made up by the emission wavelength range of 320 nm to 594 nm and the excitation wavelength range of 230 nm to 575 nm, resulting in an array of 1913870 elements. The same procedure can be followed for the rice protein hydrolyzates and the chemically defined basic medium datasets. Thus, the final rice protein hydrolyzate spectra are comprised of the emission and excitation wavelength range of 290 nm to 594 nm and 230 nm to 550 nm, respectively, resulting in an array of 1215365 elements. The final chemically defined basic medium spectra comprises the emission wavelength range of 290 nm to 594 nm and the excitation wavelength range of 230 nm to 550 nm, resulting in an array of 1416260 elements.
(60) In conclusion, a pre-processing of the EEM spectra can be performed for each raw material data set to enhance signal to noise ratio. The differences between each raw material can thus be clearly seen: the soy protein hydrolyzate comprises 2 or 3 fluorophores, the rice protein hydrolyzate comprises 3 fluorophores and the chemically defined basic medium comprises more than 4 fluorophores.
(61) In order to obtain an overview of raw material lot-to-lot variability, a PCA of the unfolded fluorescence data array can be carried out for each component raw material. The unfolding procedure can be applied in any of the three modes of a three-way array. In order to enhance the lot-to-lot differences the unfolding preserving information of the first mode (samples) can be employed. In this way, the fluorescence landscapes can be unfolded into a row of emission spectra one after the other (
(62) The dimensions of the soy protein hydrolyzate array are 1913870 (lotemission wavelengthexcitation wavelength). After the unfolding strategy, a two-way matrix of size 199,960 can be obtained.
(63) To overcome these deviations, several strategies can be used. It has been found that the Savitzky-Golay smoothing using a window of 19 points and 2.sup.nd order polynomial to remove noise is best suited, and the Multiplicative Scatter Correction (MSC) is best suited to eliminate the baseline drift.
(64) Unfolded-PCA was applied to the soy protein hydrolyzate pre-processed matrix. The data was mean-centered, and the optimal number of principal components was chosen using the leave-one-out cross validation method.
(65) After unfolding the resulting rice protein hydrolyzate matrix had the size 129,945. The same pre-processing used for soy protein hydrolyzate was applied.
(66) The size of unfolded chemically defined basic medium matrix was 149,600. The same EEM spectra pre-processing procedure as applied to the other two media components was used.
(67) A PLS model can be developed for predicting the product yield at the end of the process based on NIR and/or fluorescence spectra obtained for different lots of each media component and/or their combinations. The PLS algorithm is given an X block (pre-processed spectra, with or without variable selection) and a Y block (product parameter) and correlates both by finding the variation in X responsible for changes in Y (i.e. maximizing the covariance between both blocks). A basic set can be defined wherein most of the different lots of raw materials can be included. Out of replicate batches having same the lot combinations, the one giving the highest product yield was selected for the calibration dataset (Table 7).
(68) TABLE-US-00007 TABLE 7 soy protein product hydrolyzate F/ZF at 330 h batch lot No. [mg/l] D52KD13 1 1458 D52KD22 4 1232 D55KD13 5 1430 D55KD23 3 1257 D55KD31 6 1263 D73KD13 2 1120 D73KD33 7 1044 D79KD22 8 1162
(69) NIR spectra can be pre-processed as described before to remove the influence of physical effects originating from different particle size distributions. As no replicate spectra were used, the leave-one-out cross-validation method was used as internal validation strategy.
(70) The obtained model was made up of only two LVs but a non-significant R.sup.2 of 0.139 was obtained. The measured vs. cross-validation predicted plot is presented in
(71) A PLS model correlating NIR spectra of different lots of the chemically defined basic medium and product yield can be built using the calibration dataset as presented in Table 8.
(72) TABLE-US-00008 TABLE 8 chemically defined product basic medium at 330 h batch F/ZF lot No. [mg/l] D45KD11 1 1314 D52KD13 2 1458 D61KD12 3 1134 D73KD21 4 1147 D79KD22 5 1162
(73) The obtained model was made up of only two LVs but again a non significant R.sup.2 of 0.04 was obtained (
(74) Considering not only one medium component, but the two most relevant ones influencing yield, and also taking into account that different chemical information is captured by each different spectroscopic method used, a combination strategy can be used between same spectroscopic/different media components and also between different spectroscopic/different media components.
(75) The criteria used for selecting calibration and validation batches were based in getting the widest range possible during calibration (Table 9).
(76) TABLE-US-00009 TABLE 9 chemically soy protein defined basic product hydrolyzate F/ZF medium F/ZF at 330 h batch lot No. lot [mg/l] calibration D45KD11 1 1 1314 D45KD31 3 1 999 D52KD13 1 2 1458 D52KD22 4 2 1232 D55KD13 5 2 1430 D55KD31 6 2 1263 D61KD12 3 3 1134 D73KD13 2 4 1120 D73KD33 7 4 1044 D79KD22 8 5 1162 validation D45KD23 2 1 1061 D55KD23 3 2 1257 D73KD21 8 4 1147
(77) External validation was done with one third of the data set. Calibration and validation data (NIR spectra) were pre-processed in the same manner as described before. The obtained prediction model is based on 3 LVs and the obtained R.sup.2 reached a significant value of 0.88.
(78) Model accuracy and long term robustness is reflected in a high R.sup.2 with both calibration and validation errors being low, with a small difference between RMSECV and RMSEP (
(79) Thus, it has been found that product yield can be correlated to spectroscopic data from different compounds of a cultivation medium obtained with a combination of spectroscopic information of same nature (NIR) for the two (most important) process raw-materials or media components. Each spectrum has 944 wavenumbers and the entire calibration dataset included in the model is represented by 18,880 variables (10 samples2 raw materials944 wavenumbers after variable selection). In order to reduce the required workload a PCA analysis based on the spectra that were first compressed by converting the contained information into a few non-correlated variables was performed. The therewith obtained model was simpler and contained only 2 latent variables (LV) and an R.sup.2 of 0.81 was obtained.
(80) Different spectroscopic methods capture complementary chemical information. Using two different types of spectroscopic information improved the predictive quality of the model. Therefore, fluorescence spectra of soy protein hydrolyzate and NIR spectra of the chemically defined basic medium were used (Table 10).
(81) TABLE-US-00010 TABLE 10 chemically soy protein defined basic product hydrolyzate F/ZF medium F/ZF at 330 h batch lot No. lot No. [mg/l] calibration D45KD11 1 1 1314 D45KD31 3 1 999 D52KD13 1 2 1458 D52KD22 4 2 1232 D55KD13 5 2 1430 D55KD31 6 2 1263 D61KD12 3 3 1134 D73KD13 2 4 1120 D73KD33 7 4 1044 D79KD22 8 5 1162 validation D45KD23 2 1 1061 D55KD23 3 2 1257 D73KD21 8 4 1147
(82) Fluorescence spectra and NIR spectrawere compressed to a few principal components after pre-processing as described before. The obtained model has only 3 latent variables and an R.sup.2 of 0.90 was obtained (
(83) A further test was made using MIR instead of NIR for the chemically defined basic medium. Calibration and validation datasets used were the same as presented before (see Table 10). Fluorescence and MIR spectra were pre-processed as described before. The obtained model has 3 latent variables, an R.sup.2 of 0.88, and low RMSECV and RMSEP values with no difference between them (ca. 100 mg/l both), thus showing no significant difference to the one obtained with the NIR data for the chemically defined basic medium (
(84) The NIR spectra of the soy protein hydrolyzate and fluorescence spectra of the chemically defined basic medium were joined together and the resulting model was evaluated. The calibration and validation datasets used for building the model were the same as before (see Table 10). The obtained model has 3 latent variables and a very similar R.sup.2 value (0.87) (
(85) With an analytical variance for the reference analytics of product at around 60 mg/l (5% of 1200 mg/l the average product concentration) most models developed showed a prediction accuracy very close to the experimental limit.
(86) In conclusion, to achieve a prediction of product yield at 330 h, spectral information of both soy protein hydrolyzate and chemically defined basic medium must be used. The use of fluorescence spectroscopy data for the chemically defined basic medium gives slightly lower (but even though very comparable) prediction errors, than models based on NIR spectroscopic data for the chemically defined basic medium and 2D-Fluorescence spectroscopic data for the soy protein hydrolyzate.
(87) The method as reported herein is directed to the combination of spectra of different nature (fluorescence spectra and IR spectra), which intrinsically have different dimensions (two (2D) and one (1D), respectively), and that requires the operations of first compressing each spectrum to principal component analysis scores and second producing linear combinations of each spectrum scores. The spectra of different nature are combined by means of a dimensional reduction and a linear combination of those reduced transformed variables (PCA scores obtained by compressing each spectrum).
(88) Thus, in the method as reported herein spectra of different dimensions and nature are used to capture in a mixture of two different fermentation raw materials the components responsible for fermentation performance of said raw materials and to make predictions of fermentation yields for a specific combination of lots.
(89) With the method as reported herein it is possible to predict based on the spectra of two different raw materials to be used in a fermentation process performance 10 to 14 days in advance by determining the conditions at harvest of the fermentation.
(90) The following examples and figures are provided to aid the understanding of the present invention, the true scope of which is set forth in the appended claims. It is understood that modifications can be made in the procedures set forth without departing from the spirit of the invention.
Example
(91) Materials and Methods
(92) Cell Culture:
(93) The cells were cultivated in shake flasks in a temperature, humidity and carbon dioxide controlled environment. In order to compare different lots, media were prepared with these lots and cells were inoculated in shake flasks containing these media. A certain volume of feed medium was added daily to the shake flask culture in order to prolong cell growth and achieve higher product concentrations.
(94) Near Infrared Spectroscopy (NIR):
(95) NIR emerges in 1960s into the analytical world, with the work of Karl Norris of the US Department of Agriculture (Siesler et al, 2002). In the electromagnetic spectrum, the NIR region is located in between Mid-Infrared and Visible. In a range of wavenumber 4,000-14,000 cm.sup.1 (respectively wavelength 700-2,500 nm), the absorption radiation of overtone and combination bands of covalent bonds such as NH, OH and CH of organic molecules (
(96) NIR spectra were collected using flat bottom scintillation vials in a Bruker MPA FT-NIR system, equipped with a tungsten-halogen source and an InAs detector. Each spectrum was recorded in the wavenumber range of 4,999 to 9,003 cm.sup.1, in an average of 32 scans and a spectral resolution of 8 cm.sup.1.
(97) Mid Infrared Spectroscopy (MIR):
(98) Mid Infrared Spectra were obtained using quartz cuvettes in an Avatar 370 FT-IR, Thermo Fischer, Diamant ATR. Each spectrum was recorded in the wavenumber range of 4,000 to 400 cm.sup.1.
(99) Fluorescence Spectroscopy:
(100) Fluorescence spectroscopy uses irradiation at a certain wavelength to excite molecules, which will then emit radiation of a different wavelength. This technique is often used for studying the structure and function of macromolecules, especially protein interactions. Tentative assignment of fluorescence characteristics of chromophores found in proteins and nucleic acids is presented in the following Table.
(101) TABLE-US-00011 Absorption Fluorescence Substance I.sub.max (nm) .sub.max (10.sup.3) I.sub.max(nm) f.sub.F tryptophan 280 5.60 348 0.20 tyrosine 274 1.40 393 0.14 phenylalanine 257 0.20 282 0.04 adenine 260 13.40 321 2.60 10.sup.4 guanine 275 8.10 329 2.60 10.sup.4 cytosine 267 6.10 313 0.80 10.sup.4 uracil 260 9.50 308 0.40 10.sup.4 NADH 340 6.20 470 0.02
(102) 2D-fluorescence spectra of cell culture raw materials were obtained using excitation wavelengths from 200 nm to 600 nm, with intervals of 5 nm, and emission wavelengths also from 200 nm to 600 nm, but with intervals of 2 nm, giving a total of 81 excitation and 201 emission wavelengths. Emission-excitation fluorescence spectra were measured using a Varian Cary Eclipse Spectrometer, over an excitation wavelength range from 200 nm to 600 nm with intervals of 5 nm, and emission wavelength range also from 200 nm to 600 nm, but with intervals of 2 nm, giving a total of 81 excitation and 201 emission wavelengths. Data was collected using the software Cary Eclipse Bio, Package 1.1.
(103) Spectral Treatment and Chemometrics Analysis:
(104) Spectra pre-processing and chemometrics calculations were performed in Matlab 7.2 (MathWorks, U.S.A.) using PLS toolbox 5.5 (Eigenvector, U.S.A.) and Simca P+ 12.01 (Umetrics, Sweden). Rayleigh and Raman scatterings were removed using the EEMscat algorithm (Bahram et al, 2006).
(105) Multivariate data analysis was performed using PCA (Principal Component Analysis) and PLS (Partial Least Squares). These techniques are based on the reduction of dimensionality present in the data, allowing the retrieval of relevant information hidden in the massive amount of data. It is made transforming the original measured variables into new variables called principal components. The PCA analysis was used to find patterns in the spectra. With the aim to relate these patterns with a particular parameter, PLS analysis was carried out to build a mathematical model able to predict the values of this parameter in future samples using only the spectral information.
(106) In order to build reliable models, the quality of analytical measurements has fundamental importance. Since noise and unwanted information are intrinsic to the measurements, it is necessary to pre-treat the obtained spectra.
(107) One of the most common techniques to deal with these problems in the NIR spectra is the Savitzky-Golay smoothing filter (Savitzky, A. and Golay, M. J. E., Anal. Chem., 36 (1964) 1627-1639), and it is commonly used in conjunction with derivatives, which has the advantage of reduce baseline shifts and enhance the significant properties of the spectrum.
(108) For fluorescence spectra, the major problems are related to the Raman and Rayleigh scattering, which are caused by deviations of the light that are not related to the fluorescence properties of the sample. Since the wavelength regions affected by scattering are known, the intensities measured in such particular regions can be removed replacing it by interpolated points.
(109) The three-way emission-excitation spectra were unfolded with the purpose of have a matrix suitable to the PLS and PCA analysis. A Parafac based three way analysis was also done for calibration purposes. (Bahram, M., et al., J. Chemometrics, 20 (2006) 99-105). The unfolding approach consists in concatenating two of these three dimensions, keeping the other fixed. In this case, the emission and excitation axis were concatenated, maintaining the information of the samples.