1. 
 Andersson, Per M, et al.
(författare)

Comparison between physicochemical and calculated molecular descriptors
 2000

Ingår i: Journal of Chemometrics: Special Issue: Proceedings of the SSC6, August 1999, HiT/TF, Norway . Issue Edited by Kim Esbensen. ; 14:56, s. 62942

Tidskriftsartikel (refereegranskat)abstract
 It has earlier been proven that measured physicochemical properties are useful in the selection of building blocks for combinatorial chemistry as well as for investigation of the scope and limitations of organic reactions. However, measured physicochemical properties are only available for small subsets of reagents, starting materials or building blocks; therefore it is necessary to use calculated descriptors and it is essential that the descriptors are relevant. The objective was to investigate whether three different descriptor data sets contained similar information about the chemical structure, with the major aim to investigate whether calculated descriptors contain similar information as experimental data. A total of 205 heterogeneous primary amines were characterized using three different data sets of molecular descriptor variables. The first set consisted of four physicochemical variables compiled from the literature and commercially available chemicals in chemical catalogues. From these four descriptors together with molecular weight, three additional descriptors could be calculated, resulting in a total of eight descriptor variables in the first data set. The second data set consisted of 81 calculated molecular descriptor variables relating to size, connectivity, atom count, topology and electrotopology indices. The third data set consisted of 10 semiempirical variables (AM1). All the calculated variables were generated using the software Tsar 3.11. The descriptor variable sets were compared using principal component analysis (PCA) and partial least squares projections to latent structures (PLS). The following result shows that the different descriptor sets do contain similar latent information and that the different types of calculated variables do correlate well with the experimental data, making them suitable to use for e.g. combinatorial library design.


2. 
 Artursson, Tom, et al.
(författare)

Study of Preprocessing Methods for the Determination of Crystalline Phases in Binary Mixtures of Drug Substances by Xray Powder Diffraction and Multivariate Calibration
 2000

Ingår i: Applied Spectroscopy.  00037028. ; 54:8, s. 272A301A

Tidskriftsartikel (refereegranskat)abstract
 In this paper, various preprocessing methods were tested on data generated by Xray powder diffraction (XRPD) in order to enhance the partial leastsquares (PLS) regression modeling performance. The preprocessing methods examined were 22 different discrete wavelet transforms, Fourier transform, SavitzkyGolay, orthogonal signal correction (OSC), and combinations of wavelet transform and OSC, and Fourier transform and OSC. Root mean square error of prediction (RMSEP) of an independent test set was used to measure the performance of the various preprocessing methods. The best PLS model was obtained with a wavelet transform (Symmlet 8), which at the same time compressed the data set by a factor of 9.5. With the use of wavelet and Xray powder diffraction, concentrations of less than 10% of one crystal from could be detected in a binary mixture. The linear range was found to be in the range 1070% of the crystalline form of phenacetin, although semiquantitative work could be carried out down to a level of approximately 2%. Furthermore, the waveletpretreated models were able to handle admixtures and deliberately added noise.


3. 
 Dåbakk, Eigil, et al.
(författare)

Inferring lake water chemistry from filtered seston using NIR spectrometry
 2000

Ingår i: Water Research. ; 34:5, s. 166672

Tidskriftsartikel (refereegranskat)abstract
 Nearinfrared spectrometry (NIR) is a rapid, inexpensive and reagentfree technique, widely used in industry in areas such as quality control and process management. The technique has great potential for environmental monitoring of aqueous systems. This study assesses relationships, using PLS regression, between NIR spectra of seston collected on glass fibre filters and the following measured lake water parameters: total organic carbon (TOC), total phosphorus (TP), Abs420 and pH. Water samples were collected from 271 oligotrophic lakes during autumn 1995. The predictive model for TOC explained 68% of the variance (SEP=2.1 mg L1, range 14.9 mg L1), and that for colour 71% (SEP=0.04 A, range 0.36 A), while the explained variances for pH and TP were 72% (SEP=0.36 μg L1, range 3.13 μg L1) and 45% (SEP=4 μg L1, range 41 μg L1), respectively. A model correlating NIR spectra and the actual amount of phosphorus in the seston captured on filters explained 86% of the variance (SEP=0.044 μg/filter, range 0.47). Several pretreatments and regression techniques were used in an attempt to enhance modeling performance. However, straightforward PLS on raw data performed best in all cases.


4. 
 Eriksson, Lennart, et al.
(författare)

GIFIPLS: Modeling of NonLinearities and Discontinuities in QSAR
 2000

Ingår i: QSAR. ; 19:4, s. 34555

Tidskriftsartikel (refereegranskat)abstract
 This paper introduces to the QSAR community a novel method for modeling and understanding nonlinear relationships between biological potency and chemical structure properties of molecules. The approach, GIFIPLS, is based on ``binning'' of quantitative Xvariables into categorical variables. Each categorical variable is then expanded into a set of linked 1/0 dummy variables, which enable modeling of nonlinearity. By way of four QSAR data sets, it is demonstrated that GIFIPLS is useful for modeling of nonlinearity and discontinuity in QSAR, and that the predictive power of a QSAR model may improve.


5. 
 Eriksson, Lennart, et al.
(författare)

On the selection of the training set in environmental QSAR analysis when compounds are clustered
 2000

Ingår i: Journal of Chemometrics. ; 14:56, s. 599616

Tidskriftsartikel (refereegranskat)abstract
 In QSAR analysis in environmental sciences, adverse effects of chemicals released to the environment are modelled and predicted as a function of the chemical properties of the pollutants. Usually the set of compounds under study contains several classes of substances, i.e. a more or less strongly clustered set. It is then needed to ensure that the selected training set comprises compounds representing all those chemical classes. Multivariate design in the principal properties of the compound classes is usually appropriate for selecting a meaningful training set. However, with clustered data, often seen in environmental chemistry and toxicology, a single multivariate design may be suboptimal because of the risk of ignoring small classes with few members and only selecting training set compounds from the largest classes. Recently a procedure for training set selection recognizing clustering was proposed by us. In this approach, when nonselective biological or environmental responses are modelled, local multivariate designs are constructed within each cluster (class). The chosen compounds arising from the local designs are finally united in the overall training set, which thus will contain members from all clusters. The proposed strategy is here further tested and elaborated by applying it to a series of 351 chemical substances for which the soil sorption coefficient is available. These compounds are divided into 14 classes containing between 10 and 52 members. The training set selection is discussed, followed by multivariate QSAR modelling, model interpretation and predictions for the test set. Various types of statistical experimental designs are tested during the training set selection phase.


6. 
 Eriksson, Lennart, et al.
(författare)

Orthogonal signal correction, wavelet analysis, and multivariate calibration of complicated process fluorescence data
 2000

Ingår i: Analytica Chimica Acta. ; 420:2, s. 18195

Tidskriftsartikel (refereegranskat)abstract
 In this paper, multivariate calibration of complicated process fluorescence data is presented. Two data sets related to the production of white sugar are investigated. The first data set comprises 106 observations and 571 spectral variables, and the second data set 268 observations and 3997 spectral variables. In both applications, a single response, ash content, is modelled and predicted as a function of the spectral variables. Both data sets contain certain features making multivariate calibration efforts nontrivial. The objective is to show how principal component analysis (PCA) and partial least squares (PLS) regression can be used to overview the data sets and to establish predictively sound regression models. It is shown how a recently developed technique for signal filtering, orthogonal signal correction (OSC), can be applied in multivariate calibration to enhance predictive power. In addition, signal compression is tested on the larger data set using wavelet analysis. It is demonstrated that a compression down to 4% of the original matrix size  in the variable direction  is possible without loss of predictive power. It is concluded that the combination of OSC for preprocessing and wavelet analysis for compression of spectral data is promising for future use.


7. 
 Linusson, Anna, et al.
(författare)

Statistical Molecular Design of Building Blocks for Combinatorial Chemistry
 2000

Ingår i: Journal of Medicinal Chemistry. ; 43:7, s. 13208

Tidskriftsartikel (refereegranskat)abstract
 The reduction of the size of a combinatorial library can be made in two ways, either base the selection on the building blocks (BB's) or base it on the full set of virtually constructed products. In this paper we have investigated the effects of applying statistical designs to BB sets compared to selections based on the final products. The two sets of BB's and the virtually constructed library were described by structural parameters, and the correlation between the two characterizations was investigated. Three different selection approaches were used both for the BB sets and for the products. In the first two the selection algorithms were applied directly to the data sets (Doptimal design and spacefilling design), while for the third a cluster analysis preceded the selection (clusterbased design). The selections were compared using visual inspection, the Tanimoto coefficient, the Euclidean distance, the condition number, and the determinant of the resulting data matrix. No difference in efficiency was found between selections made in the BB space and in the product space. However, it is of critical importance to investigate the BB space carefully and to select an appropriate number of BB's to result in an adequate diversity. An example from the pharmaceutical industry is then presented, where selection via BB's was made using a clusterbased design.


8. 
 Uppgård, LiseLott, 1970, et al.
(författare)

Multivariate quantitative structureactivity relationships for the aquatic toxicity of alkyl polyglucosides
 2000

Ingår i: Tenside Surfactants Detergents. ; 37:2, s. 1318

Tidskriftsartikel (refereegranskat)abstract
 The aquatic toxicity of 34 alkyl polyglucosides (APGs) towards two freshwater species, Thamnocephalus platyurus and Brachionus calyciflorus were studied. The toxicity tests were performed using socalled toxkits, and for each surfactant the results are presented as (10)log (mean LC50) values. The toxicity data were combined with physicochemical data for the APGs, and a Multivariate Quantitative StructureActivity Relationship (MQSAR) model was calculated. Partial Least Squares (PLS) regression was used to develop the MQSAR model. The resulting linear MQSAR model explained 93.6% of the variance in the biological response and had a predictability of 86.6% according to crossvalidation. The physicochemical properties with the strongest influences on the toxicity of the surfactants were the critical micelle concentration (c.m.c.), wetting, contact angle, and number of carbon atoms in their hydrophobic parts (C and redC).


9. 
 Uppgård, LiseLott, 1970, et al.
(författare)

Multivariate quantitative structureactivity relationships for the aquatic toxicity of technical nonionic surfactants
 2000

Ingår i: Journal of Surfactants and Detergents.  10973958 (Print) 15589293 (Online). ; 3:1, s. 3341

Tidskriftsartikel (refereegranskat)abstract
 The aquatic toxicity of 36 technical nonionic surfactants (ethoxylated fatty alcohols) was examined toward two freshwater animal species, the fairy shrimp Thamnocephalus playtyurus and the rotifer Brachionus calyciflorus. Responses of the two species to the surfactants were generally similar. A multivariatequantitative structureactivity relationship (MQSAR) model was developed from the data. The MQSAR model consisted of a partial least squares model with three components and explained 92.4% of the response variance and had a predictive capability of 89.1%. The most important physicochemical variables for the MQSAR model were the number of carbon atoms in the longest chain of the surfactant hydrophobe (redC), the molecular hydrophobicity (log P), the number of carbon atoms in the hydrophobe (C), the hydrophiliclipophilic balance according to Davis (Davis), the critical packing parameter with respect to whether the hydrophobe was branched or not (redCPP), and the critical micelle concentration. Surfactant toxicity tended to increase with increasing alkyl chain lengths.

