首页   按字顺浏览 期刊浏览 卷期浏览 Use of a neural network to determine the normal boiling points of acyclic ethers, perox...
Use of a neural network to determine the normal boiling points of acyclic ethers, peroxides, acetals and their sulfur analogues

 

作者: Driss Cherqaoui,  

 

期刊: Journal of the Chemical Society, Faraday Transactions  (RSC Available online 1994)
卷期: Volume 90, issue 14  

页码: 2015-2019

 

ISSN:0956-5000

 

年代: 1994

 

DOI:10.1039/FT9949002015

 

出版商: RSC

 

数据来源: RSC

 

摘要:

J. CHEM. SOC. FARADAY TRANS., 1994, 90(14), 2015-2019 Use of a Neural Network to determine the Normal Boiling Points of Acyclic Ethers, Peroxides, Acetals and their Sulfur Analogues Driss Cherqaoui, Didier Villemin* and Abdelhalim Mesbah Ecole Nationale Superieure d'lngenieurs de Caen (E.N.S.I. de Caen), I.S.M.RA., U.R.A. 480 CNRS, 6 boulevard du Marechal Juin, 14050 Caen Cedex, France Jean-Michel Cense Ecole Nationale Superieure de Chimie de Paris, 11 rue P. et M. Curie, 75005 Paris, France Vladi mi r Kvasn icka Department of Mathematics, Faculty of Chemical Technology, Slovak Technical University, 8 1237 Bra tisla va , Slova kia Models of relationships between structure and boiling point (bp) of 185 acyclic ethers, peroxides, acetals and their sulfur analogues have been constructed by means of a multilayer neural network (NN) using the back- propagation algorithm.The ability of a neural network to predict the boiling point of acyclic molecules containing polar atoms is outlined. The usefulness of the so-called embedding frequencies for the characterization of chemical structures in quantitative structureproperty studies has been shown. NNs proved to give better results than multiple linear regression and other models in the literature. NNs have recently'T2 become the focus of much attention, largely owing to their wide range of applicability and the ease with which they can handle complex and non-linear prob- lems. A leading reference book3 on the application and the meaning of NN in chemistry has recently been published in which an extensive list of references can be found.NNs have been applied to the identification of proton-NMR ~pectra,~ to the interpretation of IR ~pectra,~.~ to the prediction of 13C chemical shifts,' to the classification of mass spectra,* to the estimation of aqueous ~olubilities,~ to the determination of protein structure,".' ' to the investigation of quantitative structure-activity relationships (QSAR)' 2-'4 and to the pre- diction of chemical reactivity.' '*16 Boiling point (bp) is one of the properties used to charac- terize organic compounds. However, it may happen that this property is not available in the literature or difficult to evalu- ate experimentally. It appears obvious that the usefulness of quantitative structure-property relationships (QSPR) cannot be denied in those cases.Several method^'^,'^ for prediction of the bp of organic compounds have been described in the literature. We have recently used NNs to predict the bp of alkane~.'~These compounds were chosen because they are simple, easy to code and do not have polarized atoms nor intramolecular bonds. The goals of the current work are: (a) To provide an appli- cation of the NN theory (developed in our earlier paper") to acyclic ethers, peroxides, acetals and their sulfur analogues. (b) To show the NNs ability to predict the bp of acyclic molecules containing heteroatoms. (c) To call attention to the interest of molecular descriptors such as the embedding frequencies in the presence of hetero- atoms.(6)To compare the results obtained by an NN to those given by multiple linear regression (MLR) and to those given in the literature. Neural Networks Artificial NNs are mathematical models of biological neural systems. Three components constitute an NN : the processing elements, the topology of the connections between the nodes (vertices),20 and the learning rule. In this paper, the specific algorithm used is the back-propagation (BP) system. Its ' goal is to minimize an error function. A description of the BP algorithm was given previo~sly'~ with a simple example of application and a more extensive description can be found in other works.21,22 Embedding Frequencies In a BP NN the input layer contains information concerning the data samples under study.In chemistry this information is represented by molecular codes (molecular descriptors). In our study the molecular codes correspond to the embedding frequencie~.~~These integer entities determine to some extent the structure of acyclic compounds composed of carbon, oxygen and sulfur atoms (hydrogen atoms are ignored). Their simple graph-theoretical construction has been described in our recent p~blication.~~ Let T be a tree with vertices evalu- ated by symbols C, 0 or S. T is assigned to any acyclic molecule with skeleton composed of carbon, oxygen, and sulfur atoms. Let T' be a subtree of the tree T. T' corre-sponds to a connected cluster of atoms. The embedding frequen~y~~?~~of 7" in T,denoted by n(T, T'),is then defined as the number of appearance of the cluster T' in the tree- molecule T.In Table 1, 20 clusters used in the construction of input activities, are listed. The input activities correspond to 20 embedding frequencies assigned to these clusters, di= n(T, TJ, for i = 1, 2, . .., 20, where T formally treated as a tree corresponds to a molecule determined by these 20 descriptors. Examples of descriptors for three molecules are listed in Table 2. Table 1 List of 20 clusters used for the construction of embedding frequencies no. cluster no. cluster ~~ 1c 20 4 c-c 5 c-0 7 0-0 8 S-S 10 c-c-0 11 c-0-c 13 C-C-S 14 C-S-C 16 C-C-C-C 17 C-(C)3" 19 C-C-(C)3' 20 C-(C)4" no.3 6 9 12 15 18 cluster s C-S c-c-c0-c-0 S-C-S C-C-C-C-C 17: isobutyl; 19: isopentyl; 20: neopentyl. 2016 J. CHEM. SOC. FARADAY TRANS., 1994, VOL. 90 Table 2 Three examples of 20 descriptors assigned to acyclic mol- Table 3 (continued) ecules no. name bpcrp bPprcd 55 methyl tert-pentyl ether 86.30 80.03 6.27 56 1,2-dimethylpropyl methyl ether 82.00 85.08 -3.08 dimethyl 2 2 0 0 2 0 1 0 0 0 57 1,l-diethoxyethane 103.00 103.38 -0.38 58 l,l-dimethoxy-2-methylpropane 103.50 103.49 0.01peroxide 0 0 0 0 0 0 0 0 0 0 59 2-ethoxy-2-methoxypropane 96.00 96.67 -0.67 dipropyl 6 0 1 4 0 2 0 0 2 0 60 1,l-diaethoxybutane 112.00 114.82 -2.82 61 1-methoxy- 1-propoxyethane 104.00 107.02 -3.02sulfide 0 0 2 1 0 0 0 0 0 0 62 1,4-dimethoxybutane 132.50 131.04 1.46 dibutyl 8 0 2 6 0 2 0 1 4 0 63 1,2diethoxyet hane 123.50 120.49 3.01 disulfide 0 0 2 0 0 2 0 0 0 0 64 1,3-dimet hoxybutane 120.30 123.54 -3.24 65 methyl pentyl sulfide 145.00 146.94 -1.94 66 butyl ethyl sulfide 144.20 143.05 1.15 67 dipropyl sulfide 142.80 142.02 0.78 68 isopropyl propyl sulfide 132.00 131.01 0.99Method 69 ethyl isobutyl sulfide 134.20 132.84 1.36 137.00 138.59 -1.59The set of 185 compounds (Table 3) used in the present paper 70 isopentyl methyl sulfide 71 methyl 2-methylbutyl sulfide 139.00 138.21 0.79has been studied by Balaban et This set essentially con- 72 sec-butyl ethyl sulfide 133.60 131.66 1.94 sists of two basic types of molecules: (1) Acyclic ethers, per- 73 tert-butyl ethyl sulfide 120.40 116.57 3.83 74 diisopropyl sulfide 120.00 119.84 0.16oxides and acetals (73 ethers, 17 diethers, 21 acetals and 6 137.00 135.40 1.60peroxides).(2) Acyclic sulfide, disulfide and thioacetal (45 sul-75 1-ethylpropyl methyl sulfide 76 dipropyl disulfide 195.80 191.77 4.03 fides, 6 bis-sulfides, 4 thioacetals and 13 disulfides). 77 diisopropyl disulfide 177.20 176.05 1.15 78 sec-butyl ethyl disulfide 181.00 185.93 -4.93 79 isopropyl propyl disulfide 185.90 185.14 0.76 80 tert-butyl ethyl disulfide 175.70 172.93 2.77Table 3 Compounds studied with their experimental (exp) bps, pre- 81 1,l-bis(ethy1thio)ethane 186.00 185.68 0.32 dicted (pred) bps and corresponding residuals (res) (all in "C) 82 1,2-bis(ethylthio)ethane 211.00 210.96 0.04 ~~ 83 hexyl methyl ether 125.00 122.66 2.34 no. name bpcxp bPprcd 84 ethyl pentyl ether 118.00 115.99 2.01 85 butyl propyl ether 117.10 117.97 --0.87 dimethyl ether -23.70 -4.80 -18.90 86 butyl isopropyl ether 107.00 106.03 0.97 dimethyl peroxide 14.00 9.8 1 4.19 87 isobutyl propyl ether 102.50 106.13 -3.63 dimethyl sulfide 37.30 40.64 -3.34 88 ethyl isopentyl ether 112.00 108.48 3.52 dimethyl disulfide 109.70 112.31 -2.61 89 tert-butyl propyl ether 97.40 92.68 4.72 ethyl methyl ether 10.80 7.50 3.30 90 2,2-dimethylpropyl ethyl ether 91.50 97.97 -6.47 ethyl methyl peroxide 39.00 39.24 -0.24 91 tert-butyl isopropyl ether 87.60 87.88 -0.28 dimethoxymethane 42.00 36.41 5.59 92 ethyl 1-methylbutyl ether 106.50 103.06 3.44 ethyl methyl sulfide 66.60 66.95 -0.35 93 ethyl tert-pentyl ether 101.00 98.75 2.25 ethyl methyl disulfide 135.00 134.93 0.07 94 1,2-dimethylpropyl ethyl ether 99.30 104.49 -5.19 bis(meth ylt hio)met hane 148.50 150.48 -1.98 95 ethyl 1-ethylpropyl ether 90.00 105.45 -15.45 methyl propyl ether 40.00 35.63 4.37 96 dipropoxymethane 137.00 134.89 2.1 1 diethyl ether 34.60 34.98 -0.38 97 2,2-diethoxypropane 114.00 109.58 4.42 isopropyl methyl ether 32.00 31.41 0.59 98 1 -ethox y-1 -propoxyet hane 126.00 122.16 3.84 diethyl peroxide 63.00 58.15 4.85 99 1,l-diethoxypropane 124.00 122.39 1.61 isopropyl methyl peroxide 53.50 59.66 -6.16 100 1,3-diethoxypropane 140.50 139.22 1.28 ethoxymethoxyethane 67.00 69.19 -2.19 101 1,5-dimethoxypentane 157.50 152.06 5.44 1,l -dimethoxyethane 64.40 64.48 -0.08 102 1-ethoxy-4-methoxybutane 146.00 146.66 -0.66 1,2-dimethoxyethane 84.70 74.53 10.17 103 1,4-dimethoxypentane 145.00 143.88 1.12 methyl propyl sulfide 95.50 94.82 0.68 104 1,3-dimethoxypentane 141.00 144.78 --3.78 diethyl sulfide 92.00 90.89 1.11 105 hexyl methyl sulfide 171.00 169.07 1.93 isopropyl methyl sulfide 84.40 88.01 -3.61 106 butyl propyl sulfide 166.00 166.13 -0.13 diethyl disulfide 154.00 152.98 1.02 107 isobutyl propyl sulfide 155.00 155.18 -0.18 1,l -bis(methylthio)ethane 156.00 152.97 3.03 108 isobutyl isopropyl sulfide 145.00 147.67 -2.67 ethylthiomethylthiomethane 166.00 167.16 -1.16 109 ethyl 2-methylbutyl sulfide 159.00 153.68 5.32 1,2-bis(methylthio)ethane 183.00 187.30 -4.30 110 tert-butyl propyl sulfide 138.00 139.41 -1.41 butyl methyl ether 70.30 72.38 -2.08 111 sec-butyl isopropyl sulfide 142.00 144.8 1 -2.81 ethyl propyl ether 63.60 62.14 1.46 112 ethyl isopentyl sulfide 159.00 154.27 4.73 ethyl isopropyl ether 52.50 54.75 -2.25 113 butyl isopropyl sulfide 163.50 154.47 9.03 isobutyl methyl ether 59.00 61.79 -2.79 114 1,3-bis(ethylthio)propane 229.50 225.03 4.47 sec-butyl methyl ether 59.50 65.29 -5.79 115 dibutyl ether 142.00 142.18 -0.18 tert-butyl methyl ether 55.20 53.56 1.64 116 isopentyl propyl ether 125.00 130.53 -5.53 diet hoxymethane 88.00 92.65 -4.65 117 butyl isobutyl ether 132.00 129.77 2.23 2,2-dime thoxypropane 83.00 77.85 5.15 118 butyl sec-butyl ether 130.50 130.10 0.40 1,3-dimethoxypropane 104.50 105.09 -0.59 119 butyl tert-butyl ether 125.00 1 15.23 9.77 lethoxy-2-methoxyethane 102.00 104.25 -2.25 120 sec-butyl isobutyl ether 122.00 122.66 -0.66 1,2-dime thoxypropane 92.00 99.99 -7.99 121 1,3-dimethylpentyl methyl ether 121.00 133.12 -12.12 ethyl isopropyl sulfide 107.40 106.47 0.93 122 diisobutyl ether 122.20 119.50 2.70 butyl methyl sulfide 123.20 124.26 -1.06 123 isobutyl tert-butyl ether 112.00 115.69 --3.69 isobutyl methyl sulfide 112.50 114.89 -2.39 124 di-tert-butyl ether 106.00 113.24 -7.24 ethyl propyl sulfide 118.50 116.96 1.54 125 isopropyl tert-pentyl ether 114.50 114.79 -0.29 tert-butyl methyl sulfide 101.50 102.34 -0.84 126 heptyl methyl ether 151.00 148.31 2.69 ethyl propyl disulfide 173.70 174.22 -0.52 127 1-ethylpropyl propyl ether 128.50 126.77 1.73 ethyl isopropyl disulfide 165.50 165.57 -0.07 128 di-tert-butyl peroxide 109.50 101.08 8.42 bis(ethy1thio)methane 181.00 183.98 -2.98 129 1,1 -diisopropoxyethane 126.00 130.9 1 -4.9 1 methyl pentyl ether 99.50 97.25 2.25 130 1,1 -dipropoxyethane 147.00 141.33 5.67 ethyl butyl ether 92.30 93.88 -1.58 131 1,3-dirnethoxyet hane 158.00 157.66 0.34 dipropyl ether 90.10 89.07 1.03 132 2,4-dimethoxy-2-methylpentane 147.00 146.48 0.52 isopropyl propyl ether 80.20 79.21 0.99 133 1,4-diethoxybutane 165.00 157.69 7.31 ethyl isobutyl ether 82.00 84.03 -2.03 134 dibutylsulfide 188.90 187.68 1.22 isopentyl methyl ether 9 1.20 86.77 4.43 135 diisobutyl sulfide 170.00 169.06 0.94 methyl 2-methylbutyl ether 91.50 87.01 4.49 136 butyl isobutyl sulfide 178.00 177.68 0.32 ethyl sec-butyl ether 81.20 83.83 -2.63 137 di-tert-butyl sulfide 148.50 147.73 0.77 methyl 1-methylbutyl ether 93.00 85.81 7.19 138 di-sec-butyl sulfide 165.00 167.32 -2.32 diisopropyl ether 69.00 69.74 -0.74 139 butyl sec-butyl sulfide 177.00 177.83 -0.83 J.CHEM. SOC. FARADAY TRANS., 1994, VOL. 90 2017 Table 3 (continued) Table 4 Comparison of standard error of learning (SEL) and corre- lation coefficient (R) of NNs, MLR, eqn. (l),eqn. (2) and eqn. (3) method SEL R 140 sec-butyl isobutyl sulfide 167.00 170.87 -3.87 141 heptyl methyl sulfide 195.00 191.54 3.46 3.507 0.997142 dibutyl disulfide 226.00 225.07 0.93 143 diisobutyl disulfide 215.00 216.22 -1.22 3.31 1 0.998 144 di-tert-butyl disulfide 201.00 202.29 -1.29 2.942 0.998 145 1,1-bis(isopropylt hiokthane 205.00 215.31 -10.31 2.685 0.998 146 l-ethyl-1,3-dimethylbutylmethyl 2.800 0.998 ether 151.50 154.31 -2.81 2.948 0.998 147 ethyl heptyl ether 165.50 161.94 3.56 6.350 0.992 148 butyl isopentyl ether 157.00 151.68 5.32 9.0 0.982149 tert-butyl isopentyl ether 139.00 142.02 -3.02 10.5 0.977150 butyl pentyl ether 163.00 163.76 -0.76 8.2 0.986151 1,5-dimethylhexyl methyl ether 153.50 155.94 -2.44 152 isobutyl isopentyl ether 139.00 148.07 -9.07 153 methyl I-methylheptyl ether 162.00 160.11 1.89 "3 .. . "8 is for 3 ... 8 neurons in the hidden layer. 154 methyl octyl ether 173.00 175.51 -2.51 155 2-ethylhexyl methyl ether 159.50 162.75 -3.25 According to Zupan and Gasteiger2' 'a good rule of 156 methyl 1,1,4-trirnethylpentyl ether 159.50 144.30 15.20 thumb is that the number of data values taken for training 157 3,5-dimethylhexyl methyl ether 155.50 165.14 -9.64 158 ethyl 1,1,3-trimethylbutyl ether 141.00 142.65 -1.65 should be equal to or greater than the number of weights to 159 tert-butyl tert-pentyl peroxide 126.00 136.20 -10.20 be determined in the network' (i.e.p 2 1). In this paper, six 160 1,l-dimethoxy-2,2-architectures of NN (20-x-1; x = 3, 4, 5, 6, 7, 8; i.e. p E Cl.05,dimethylpentane 164.00 145.47 18.53 161 1,l-diethoxypentane 163.00 175.06 -12.06 2.761) have been tried, and two studies have been achieved: 162 1,1 -dipropoxypropane 166.50 158.68 7.82 learning and prediction. The term learning is used when the 163 1,l-diisopropoxypropane 146.00 149.80 -3.80 NN estimates bp values for molecules in the training set.164 1,3-dipropoxypropane 165.00 185.08 -20.08 165 1,3-diisopropoxypropane 159.00 152.69 6.8 1 When it estimates bp values for molecules not included in the 166 ethyl heptyl sulfide 195.00 211.74 -16.74 training set, this is prediction.167 methyl octyl sulfide 218.00 215.22 2.78 168 bis(buty1thio)methane 250.00 257.62 -7.62 Learning169 2,2-bis(propylthio)propane 235.00 225.10 9.90 170 ethyl octyl ether 186.50 187.80 -1.30 NNs 171 ethyl 1,1,3,3-tetramethylbutyl 156.50 161.80 -5.30 In order to determine the best architecture, six different ones ether 172 bis(1-ethylpropyl) ether 162.00 160.46 1.54 have been tried (20-x-1; x = 3, 4, 5, 6, 7, 8). The criteria used 173 bis(1-methylbutyl) ether 162.00 160.46 1.54 for the comparison of the six architectures are the correlation 174 butyl 1-methylpropyl ether 170.00 173.67 -3.67 175 diisopentyl ether 173.20 168.25 4.95 coefficient (R) and the standard error of learning (SEL) 176 dipentyl ether 186.80 185.45 1.35 defined by: 177 isopropyl heptyl ether 173.00 172.71 0.29 178 heptyl propyl ether 187.00 185.97 1.03 179 isopentyl pentyl ether 174.00 176.40 -2.40 180 methyl I-methyloctyl ether 188.50 186.00 2.50 181 di-tert-pentyl sulfide 199.00 195.40 3.60 182 dipentyl sulfide 228.00 227.28 0.72 183 disopentyl sulfide 215.00 210.35 4.65 184 isobutyl 4-methylpentyl sulfide 216.00 216.52 -0.52 185 methyl nonyl sulfide 240.00 232.63 7.37 where bprnea, stands for the arithmetic mean of all N observed values of the bp.We used a network with 20 units and a bias in the input The results obtained are given in Table 4. Fig. 1 clearly layer, a variable hidden layer including bias, and one unit in indicates that the SEL goes down to a minimum correspond- the output layer. Input and output data were normalized ing to six neurons in the hidden layer. It can be seen that the between 0.1 and 0.9. The weights were initialized to random SEL increases slightly (i.e. the learning performance values between -0.5 and +0.5 and no momentum was decreases) for seven and eight neurons. That is due to the fact added. The learning rate was initially set to 1 and was grad- that the number of weights is nearly equal to the number of ually decreased until the error function could no longer be molecules in the training set.Thus, the information brought minimized. All computations were performed on an Iris Indigo (Silicon Graphics) workstation using our own programs, written in C language. 3.4 Results and Discussion In a BP NN the input and output neurons are known since 3.2 they present, respectively, the embedding frequencies and the bp of the molecules. Unfortunately, there are neither theoreti- cal results available, nor satisfying empirical rules that would . enable us to determine the number of hidden layers and of neurons contained in these layers. However, for most of the applications of NNs to chemistry, one hidden layer seems to be sufficient.For the determination of the number of hidden neurons, we have recently" discussed the usefulness of the p parameter, defined as : no. of hidden neurons number of data point in the training set Fig. 1 SEL as a function of the number of neurons in the hidden = sum of the number of connections in the NN layer 201 8 to the training set is not sufficient to train correctly the NN with the architecture 20-x-1 (x = 7 and 8). MLR The most widely used mathematical method in QSAR or QSPR is MLR. The objective of such an analysis is to find an equation that relates a dependent variable (such as the bp property) to one or more independent variables (such as molecular descriptors). The solution to the problem consists in determining the coefficients a, and the constant term a, of the following equation: bp = a, + 1aidi It is helpful to note some inherent difficulties or* MLR in particular, arising from the interdependence of molecular descriptors.In this study MLR was used to correlate bp with only 15 independent molecular descriptors (d6, d,, d,, d,, and d,, are removed). The correlation coefficient and the standard error of learning are 0.992 and 6.350, respectively. Other Models in the Literature Bps of the 185 compounds studied were correlated by .~~Balaban et ~1with chemical structures using two or three topological descriptors. Three equations were found : bp = -59.10 + 44.30 'X + 42.88Ns; R = 0.982; S = 9.0 (1) bp = -11.23 -7.21She,+ 35.04'~' -18.30TMe; R = 0.977; S = 10.5 (2) bp = -41.75 + 43.79 'X + 45.03Ns -2.905he,; R = 0.986; S = 8.2 (3) All the results given by NNs, MLR, eqn. (l), eqn.(2) and eqn. (3) are shown in Table 4. We see that in all cases the NN approach gives the best results. However, the learning abil- ities of the models are not completely comparable since the descriptors used are not the same. In this study NNs show an interesting ability to extract information about cyclic com- pounds directly from the embedding frequencies. Prediction The predictive ability of an NN is its ability to give a satisfying output to a molecule not included in the examples the NN learned. To determine that predictive ability, cross- validation has been used.In this procedure one compound is removed from the data set, the network is trained with the remaining compounds and used to predict the discarded compound. The process is repeated in turn for each com- pound in the data set. After cross-validation, the predictive ability of different networks was assessed by the standard error of prediction (SEP) and the cross-validated R2 (R,2,). Table 5 Comparison of predictive ability for NNs and MLR method SEP R:v NN3 5.223 0.988 NN4 5.152 0.988 NN5 5.102 0.989 NN6 5.946 0.985 NN7 6.215 0.983 MLR 6.710 0.98 1 J. CHEM. SOC. FARADAY TRANS., 1994, VOL. 90 input layer bias hidden layer d20 di W W Fig. 2 Architecture of a BP network with three layers. The configu- ration shown is 20-5-1.Table 5 shows the results obtained with five different archi- tectures and with MLR. This table shows that the NN per-formance is a function of the number of hidden neurons. NNs give a superior performance to that given by MLR. In MLR the relationship between bp and molecular descriptors is expressed by a linear combination of the contributing terms. On the contrary the NN owes its predictive ability to its non- linear power. This does not mean that the NN is a poly- nomial model but it is able to learn by example how to make predictions for cases not belonging to the training set. It can be seen that the best architecture is 20-5-1 (p= 1.67; Fig. 2). It is interesting to note the variation of the SEP according to the number of iterations. Fig.3 shows this varia- tion for the NN with an architecture 20-5-1. The learning performance of the NN increases with the number of iter- ations, but its predictive ability slowly decreases after 4000 iterations. This is known as the overtraining effect, due to a too long learning time. Indeed, the weights obtained after the overtraining contain more information specific to the training set. Therefore, prediction on the test set will not really be satisfying. Thus, when a very low error in the training set is I\ !,,,21 '>''',' ,'I. 0 2000 4000 6000 8000 10000 no. of iterations Fig. 3 Predictive ability of NN (top curve). Learning ability of NN (bottom curve). J. CHEM. SOC. FARADAY TRANS., 1994, VOL.90 2019 sought, the predictive ability of an NN is less successful. The 6 M. E. Munk, M. S. Madison and E. W. Robb, Mikrochim Acta ability to predict being an essential quality of an NN, the overtraining effect must be avoided. The full results of cross- validation for 4000 iterations and with the NN architecture 20-5-1 are gathered in Table 3. Those results are satisfying and show that the embedding frequencies are very useful 7 8 9 (Wien), 1991,11, 505. V. Kvasnicka, J. Math. Chem., 1991,6,63. B. Curry and D. E. Rumelhart, Tetrahedron Comput. Methodol., 1990, 3,213. N. Bodor, A. Harget and M. J. Huang, J. Am. Chem. SOC., 1991, 113,9480. descriptors for the compounds studied. Nevertheless, six out- liers can be seen (compounds 1, 95, 156, 160, 164 and 166 with residuals between 15 and 20°C).For dimethyl ether, a large deviation is expected because it is the only one to have a negative experimental bp. It should be noted that the NN predicted a negative value for this compound. Since the bp is one of the physical properties that are difficult to measure,’* 10 11 12 13 14 15 L. H. Holley and M. Karplus, Proc. Natl. Acad. Sci. USA, 1989, 86, 152. N. Qian and T. J. Sejnowski, J. Mol. Biol., 1988,202, 865. D. Villemin, D. Cherqaoui and J-M. Cense, J. Chim. Phys., 1993, 90, 1505. T. Aoyama and H. Ichikawa, Chem. Pharm. Bull., 1991,39,358. T. Aoyama and H. Ichikawa, Chem. Pharm. Bull., 1991,39,372. V. Simon, J. Gasteiger and J. Zupan, J. Am. Chem. SOC., 1993, the experimental bps of the other outliers may be in error.16 115,9148. D. W. Elrod, G. M. Maggiora and R. G. Trenary, J. Chem. Znf: Comput. Sci., 1990,30, 477. Conclusion 17 18 D. E. Pearson, J. Chem. Educ., 1957,28,60. R. D. Cramer, J. Am. Chem. SOC.,1980,102, 1837. This paper has discussed the use of BP NN to predict the boiling point of acyclic ethers, peroxides, acetals and their sulfur analogues. The performances of NN were compared 19 20 D. Cherqaoui and D. Villemin, J. Chem. SOC., Faraday Trans., 1994,90,97. N. Trinajstic, in Chemical Graph Theory, CRC Press, Boca Raton, FL, 1992. with those given by MLR and those of other models in the literature, and proved to be better. It is interesting to note that the performances of the NNs decrease when overtraining occurs.The embedding frequencies provide enough informa- tion to an NN for prediction of the bp of the compounds studied. The approach using the embedding frequencies is adapted to the modelling of compounds containing hetero- 21 22 23 J. L. McClelland, D. E. Rumelhart and the PDP Research Group, in Parallel Distributed Processing, ed. J. L. McClelland and D. E. Rumelhart, MIT Press, Cambridge, MA, 1988, vol. I, p. 319. J. A. Freeman and D. M. Skapura, in Neural Networks Algo- rithms, Applications, and Programming Techniques, Addison-Wesley, Reading, 1991, p. 89. R. D. Poshusta and M. C. McHugues, J. Math. Chem., 1989, 3, atoms, which is not the case for descriptors based on topo- logical in dice^.^' 24 193. D. Cherqaoui, D. Villemin and V. Kvasnicka, Chemom. Zntell. tab. Syst., in the press. 25 V. Kvasnicka, D. Cherqaoui and D. Villemin, J. Cornput. Chem., References 1 J. Zupan and J. Gasteiger, Anal. Chim. Acta, 1991, 248, 1. 2 M. Tusar, J. Zupan and J. Gasteiger, J. Chim. Phys.,. 1992, 89, 1517. 3 J. Zupan and J. Gasteiger, in Neural Networks for Chemists, VCH, New York, 1993. 4 J. U. Thomsen and B. Meyer, J. Magn. Reson., 1989,84,212. 26 27 28 29 in the press. A. T. Balaban, L. B. Kier and N. Joshi, J. Chem. Znf: Comput. Sci., 1992, 32, 237. Ref. 3, p. 263. M. Randic, Croat. Chem. Acta, 1993,66,289. M. Randic and N. Trinajstic, J. Mol. Struct. (Theochem), 1993, 284,209. 5 E. W. Robb and M. E. Munk, Mikrochim Acta (Wien), 1990, I, 131. Paper 31073296; Received 13th December, 1993

 

点击下载:  PDF (596KB)



返 回