首页   按字顺浏览 期刊浏览 卷期浏览 New look at analytical data through the gnostical method
New look at analytical data through the gnostical method

 

作者: Tomáš Paukert,  

 

期刊: Analyst  (RSC Available online 1993)
卷期: Volume 118, issue 2  

页码: 145-148

 

ISSN:0003-2654

 

年代: 1993

 

DOI:10.1039/AN9931800145

 

出版商: RSC

 

数据来源: RSC

 

摘要:

ANALYST, FEBRUARY 1993, VOL. 318 145 New Look at Analytical Data Through the Gnostical Method TomaS Paukert* and Ivan RubeSka Czech Geological Survey, Malostranske namesti 19, 1 18 21 Prague I , Czechoslovakia Pave1 Kovanic Institute of Information Theory and Automation, Czechoslovak Academ y of Science, Pod vodarenskou v@i 4, 180 00 Prague 8, Czechoslovakia The gnostical theory represents a new, powerful approach towards evaluating data files. This paper describes the application of a gnostical analyser which may help i n finding outliers, testing the homogeneity of sets of data and classifying individual data. As an example, its use for ascertaining the recommended values for reference materials is demonstrated by means of homogeneous and heterogeneous sets of data. Keywords: Robust estimator; gnostical method; recommended value; reference material New analytical procedures may best be verified by analysing certified reference materials (CRMs) with well established concentration values for the constituents to be determined.Apart from this, CRMs are also important for quality assurance and quality control in analytical laboratories. The determination of the true analyte concentration in a CRM from experimental data is therefore an important task toward which the efforts of many workers have been directed. The list of papers dealing with this problem and using statistical methods is fairly extensive. The data to be treated, however, do not always represent a homogenous population. In such instances, robust estimators less susceptible to non-homogeneities in experimental data have been applied by many workers,l-3 but the outcome has often been dubious, if not frustrating.4.5 In this paper we describe the application of a new non-statistical ‘gnostical’ method.The gnostical theory (GT) is an axiomatic-deductive mathematical theory. It has been developed as an alternative to mathematical statistics for the treatment of data containing uncertainty, having a statistical model that is not known, or data with a statistical characterization that does not ad- equately describe the essence of phenomena. The GT may also be successfully applied to data for which, for various reasons, a limited number of observations or measurements are available, i.e., the quality of information is poor or the data are influenced by the contribution of a rare, but strong disturbance of an undefined character. During the development of GT programming, five different classes of gnostical programs have been distinguished.One of them, the gnostical analysis discussed in this work, may be applied for in-depth analysis of limited data files, robust estimations of expectancy or probability of phenomena, distribution functions of data files and their density, robust cluster analyses, investigation of file homogeneity and of the equivalence or diversity of two or more files, robust estima- tions of location parameters and of scale parameters of individual clusters and classification of individual data accord- ing to the degree of their relevance to individual clusters. Gnostical Theory of Uncertain Data Sound data deserve to be given greater weight than unsound data.However, two problems exist in this connection: (i) how to distinguish the unsound data from the sound data; and (ii) how to optimize the weights to make maximum use of the information. Heuristic approaches cannot ensure either optimization or the universal applicability required by practice. It is well * To whom correspondence should be addressed. known that ‘the most practical tool is a good thecry’. Unfortunately, a good theory of such a complex problem cannot be both simple and directly acceptable by way of some ‘common sense’ considerations. In contrast to other theories involving uncertainty, the GT of data files is based on an axiomatic theory of an individual uncertain datum and on a data composition axiom.The axioms of this theory have a simple algebraic nature. To illustrate the functions of the GT, first the main equations will be described. Consider the ith real-valued A, (an ‘additive’ datum) together with its ‘multiplicative’ equivalent, zr = exp(A,) (1) having a strictly positive value. For a positive real scale parameters, a real variable z > 0 and for a sample of N data, define the auxiliary quantities q,(z,s) = (2,/2)2’.* (2) (3) (4) for use in the calculation of N ‘fidelities’: fi(z,s) = 2/[1/q,(z,.9 + q,(z,s)l hr(z,s) = [l/q,(z,s) - qr(z,s)l/[~/qr(z,s) + q,(z,s)I and ‘irrelevancies’: Within the framework of the GT, irrelevance plays the role of the distance between z and 2, the fidelity being the weight of the datum 2,.Introduce the arithmetic mean N f(z,s> = c f i ( z , s Y N ( 5 ) r = l of the fidelities and define the symbol h(z,s) for the irrelevances analogously. Let w(z,s) be the function of weight defined by the relationship W ( z , s ) = Cf(z,s)l2 + [h(z,s)I2 (6) The distribution function generated by the individual datum 2, is then L,(z,s) = [I + hi(~,s)]/2 (7) having the density At least two theoretical results of the GT are immediately applicable to the data files provided by analytical chemistry, i. e . , the two types of data distribution functions (DDF), global (GDF) and local (LDF) . These functions play a role analogous to the probability distribution functions. The field of applica- tion of the gnostical distribution functions is, however, much broader, as these functions do not rely on some statistical146 ANALYST, FEBRUARY 1993, VOL.118 assumptions or probabilistic concepts. They characterize the data patterns and expectations of the subject deduced from their particular shape. For a weak influence of the uncertainty of the data, i.e., minor data errors, the two gnostical DDFs differ to only a negligible extent. However, their behaviour may be different for major data errors. This is a desirable feature that makes these gnostical DDFs very useful for analyses of totally different types of data files. The LDF L(z,s) is simply the arithmetic mean of the distribution functions [eqn. (7)] of individual data: N (9) The GDF G(z,s) is obtained using the function of weight w N [eqn. (611: G(z,s) = Ir(z,s)Iw(z,.~) (10) 1 = 1 The LDF is a relatively universal instrument which can be applied even to data files containing non-homogeneities such as individual subclusters.The derivative of the LDF, called the data density function, has a multi-modal form, in which each mode corresponds to a subcluster. Being a monotonous function for an arbitrary data file, the LDF can always be determined. In the special case of the reasonability of a statistical interpretation of data, the LDF [eqn. (9)] can be an asymptotically consistent kernel estimate of the Parzen type6 of probability distribution function. In such a case, the GT is used as a source of a theoretically justified kernel which generates remarkably clear and smooth density curves, even for small data files. In the much more general case of data files that do not allow statistical interpretation, the equation of the DDF is still valid as a continuous model of the distribution of the expectation that a ‘new’ datum of the same nature as the ‘old’ data will have a certain value.The LDF is ‘locally robust’ in the sense that its local form, corresponding to a subinterval of a data range, does not influence its form in another subinterval. This is due to the steep descent of the gnostical kernel [eqn. (S)]. Unlike the LDF, the GDF has theoretical justification only €or special data files of homogeneous type. Such files should have a unimodal data density function. For a non- homogeneous data file, the GDF may lose the fundamental feature of a distribution function, its monotony.7 This fact makes it possible to perform an efficient test of the homogeneity of the data files.The limited flexibility of the GDF permits the estimation of the proper scale parameter which characterizes the spread of the data. Most unique, however, is the GDF, which is globally robust in the sense of the low sensitivity of its shape with respect to the ‘outliers’ and also to all the other ‘peripheral’ subclusters of the data. This leads to a highly reliable prognosis of rare events (of values of the GDF for very small or very large quantiles). Such tasks often appear in practice in connection with random quality controls, studies of lifetimes, etc. This type of gnostical distribution function has no known statistical analogy. The LDF and GDF differ substantially in their dependence on the scale parameter s.Let F(N) be the ‘empirical’ distribution function of the data file. The function F(N) has the known form of an irregular staircase. The LDF of the same sample can be made to approach the F(N) as close as required, choosing a sufficiently small positive value for the s parameter. In contrast, the maximum distance of the GDF has a minimum for a ‘best’ s, which can be recognized as a robust estimate of the scale parameters. Hence the GDF is as close as possible to the F(W. The choice of s determines the resolution power of individual clusters of data files. An overall survey of the GT, with a detailed description of the mathematics involved, can be found elsewhere.8 Application of the GT in Analytical Chemistry In routine practice, the evaluation of data is usually carried out by statistical calculations.The accuracy of an analytical procedure may best be verified by analysing CRMs with well established ‘recommended’ values (RV). For rock CRMs, the RVs are usually derived from round-robin tests with the participation of many laboratories. If the central values (location parameters PL) for particular elements derived by statistical evaluation of the data are mostly concordant, the assignment job is relatively ‘easy’ and the RV may be established from the various central values. If the PL are discordant, a decision has to be made as to the suitability of the analytical methods employed for the particular concentration levels. Some elements in the Periodic Table are notoriously troublesome for quantitative determination, which may be demonstrated by repeated analyses.Differences of as much as 100% or more are sometimes encountered, especially when determining element contents at ultra-trace levels. To demonstrate the possibilities of the new non-statistical mathematical method discussed, we used data from the 1987 Compilation Report on the Ailsa Craig Granite, AC-E.9 The AC-E reference material, prepared with great care, was distributed to 128 laboratories in 29 countries. From the submitted data, RVs for many elements were successfully established. Some elements were determined by a limited number of laboratories only and for some elements discrepan- cies in the results were evident. Two elements were selected as examples to demonstrate the potential of the GT for evalua- tion of results; first, the results for europium, which represent good concordance of reported data, and second, results for cobalt, which have a significantly heterogeneous file of data.Homogeneous Data File The results reported for europium are a good example of a homogeneous data file. This element was determined in 41 laboratories. All the results are given in Table 1. Table 1 Results for Eu (ppm). For explanation of abbreviations used, see text Laboratory Value Procedure Laboratory Value Procedure Laboratory Value Procedure Laboratory Value Procedure 1 2 3 4 5 6 7 8 9 10 1.39 1.4 1.6 1.7 1.8 1.8 1.86 1.87 1.9 1.9 EMN CSP CSP CSF CSP EMN EMN EMN CSP EMN 11 12 13 14 15 16 17 18 19 20 1.9 1.92 1.92 1.93 1.94 1.94 1.94 1.96 1.98 1.99 EMN CSP EMN CSM CSP EMN EMN EMN EMN EMN 21 22 23 24 25 26 27 28 29 30 2 2 2 2 2 2 2 2.03 2.04 2.05 CSP CSP CSP CSP CSP EMN EMN EMN EMN ASM 31 32 33 34 35 36 37 38 39 40 41 2.07 2.09 2.1 2.1 2.16 2.23 2.3 2.3 2.4 2.68 3.1 EMN CSP CSP EMN EMN EMN BSM CSP CSP EMN ASMANALYST, FEBRUARY 1993, VOL.118 - - - - - 147 1.0 0.8 0.6 - ir ru 0.4 0.2 - 0 Each method is designated by a three-letter code, the first letter indicating the method of sample preparation and the last two the method of determination.9 For sample preparation, A = acid decomposition, B = fusion with fluxes, C = dissolution + separation, D = mixture with buffers and E = simple physical conditioning; for determination, AA = atomic absorption spectrometry (AAS), FX = X-ray fluorescence spectrometry (XRF), SM = mass spectrometry, MN = nuclear methods, SF = flame photometry and SP = direct reading atomic emission spectrometry (AES) .For the first evaluation of data files the data were treated by the GT using the GDF. The resulting distribution function, presented in Fig. 1, may help in investigating the file homogeneity. This type of estimate is based on the a priori assumption that the tested data file is homogeneous. It is robust with respect to peripheral data (outliers). In Fig. 1 both the distribution function and the density function of the data file are shown. The dotted lines indicate the degree for goodness of fit of the distribution function estimated by the Kolgomorov-Smirnov test with plotted intervals for 10, 20, 50, 95 and 99% probability.The central value for europium corresponding to the maximum of the curve was calculated to be 2.007 ppm. 6.946 -6 0 A Log(z) - Fig. 1 Global distribution of the Eu data file. z , Concentration of analyte; P ( z ) , distribution function of the concentration z and dPld(1og z ) , density function of the concentration z . Kolgomorov- Smirnov test: A, 10; B, 20; C, 50; D, 95; and E 99% Log(z1- Fig. 2 95; and E, 99% Local distribution of the Eu data set. A, 10; B, 20; C, SO; D, 9.258 n Log(z) - Fig. 3 data. A, 10; B, 20; C, SO; D, 95; and E, 99% Global distribution of the Eu data file after restriction of 4 Even though the peak in Fig. 1 appears clear, the distribution curve was recalculated using the LDF model, which should distinguish local clusters in a data file.The result, presented in Fig. 2, shows three small maxima, obviously corresponding to the values 1.39, 1.4, 2.68 and 3.1 ppm. These may be interpreted as outliers. After their exclusion, the calculation of the global distribution was repeated and the result is presented in Fig. 3. The central value calculated according to the LDF model is 1.980 k 0.109 ppm. After having eliminated the outliers, it decreases to 1.970 ppm. The value of 0.109 pprn is the standard deviation derived from the GT calculations. In Table 2 this PL is compared with the various mathematical parameters reported in the AC-E c~mpilation.~ The robustness of the GT-derived PL may be demonstrated by the following example. The x, and the gnostical central values (GCVs) were first calculated from the entire number of 41 analyses.Then, the one-step restriction was executed; this means that the values 2.68 and 3.1 ppm were trimmed off and x, and the GCVs were calculated again from the remaining 39 values. The comparison of the calculated results may be seen in Table 3. The GCV evidently changes considerably less than the arithmetic mean. Heterogeneous Data File We selected a data set with non-homogeneous results, at the same concentration level as europium and with a similar number of analyses. The 40-value file for cobalt from the AC-E CRM9 fulfilled these requirements well. The cobalt values have a great spread, ranging from 0.07 to 10 pprn (see Table 4). The outcome of the calculation of the global distribution from these data is presented in Fig.4. The broad maximum clearly reflects the wide range of the results reported. As the congruence between the GDF and the empirical distribution function is poor, the data file cannot be considered as homogeneous. The data set was recalculated once more, using the LDF model, and the result is shown in Fig. 5 . On the local distribution curve, three distinct maxima appear, the location of which, as calculated by the GT, results in three different concentration levels for cobalt, namely 0.16; 1.39 and 4.9 PPm. In Table 5 the central values, as reported for the AC-E in the compilation9 but split according to the method of analysis used, are given. A comparison with the results given by GT immediately suggests that the three maxima correspond to central values of the results by nuclear methods only, by optical spectrometric methods (AAS + AES) and by XRF spectrometry, respectively.It is worth mentioning that the RV chosen was 0.2 pprn and was based on the nuclear methods set of data. Table 2 Mathematical parameters calculated from 41 results for Eu" Parameter N x, M MG xp xgeo x,, xg GCV Value 41 2 2 1.99 2 1.99 2 1.99 1.98 * The derived RV = 2 ppm.9N = number of analyses; M = median; xp = preferred mean calculated after 1 k s elimination; xg = gamma central value; x, = arithmetic mean; M G = Gastwirth median; xgef = geometric mean; x,, = dominant cluster mode; GCV = gnostical central value. For more information about the statistical parameters, see ref. 9. Table 3 Comparison of robustness (for definitions of parameters see Table 2) Number of results x, GCV 41 2.007 1.983 39 1.962 1.974148 ANALYST.FEBRUARY 1993, VOL. 118 Table 4 Results for Co (ppm). For explanation of abbreviations used, see text Laboratory 1 2 3 4 5 6 7 8 9 10 Value 0.07 0.091 0.118 0.13 0.14 0.153 0.18 0.2 0.2 0.21 Procedure Laboratory Value EMN 11 0.21 EMN 12 0.28 EMN 13 0.5 EMN 14 0.6 EMN 15 1 EMN 16 1 ASP 17 1 EMN 18 1.15 EMN 19 1.4 EMN 20 1.4 Procedure EMN EMN AAA EFX AAA ASP EFX EMN AAA EMN Laboratory 21 22 23 24 25 26 27 28 29 30 Value Procedure 1.65 EMN 2 AAA 2 ASP 2 ASP 2 BAA 2 EFX 3 AAA 3 EFX 4 EFX 4.7 DFX Laboratory Value Procedure 31 4.9 DFX 32 5 EFX 33 6 ASP 34 6 BFX 3s 6 EFX 36 7 DFX 37 7 EFX 38 9.5 EFX 39 10 AAA 40 10 EFX 1 .o 0.8 0.6 2 0.4 0.2 0 0 Fig. 4 95; and E, 99% Global distribution of the Co data file.A, 10; B, 20; C, 50; D, 0.271 1 .o 0.8 - A ru 0.6 - I 0 2 z Is1 - I 0.4 TI 0.2 0 0 Fig. 5 and E, 99% Local distribution of Co data file. A, 10; B, 20; C, 50; D, 95; Table 5 Mathematical parameters for the analyses of Co. For explanation of abbreviations used, see text Method used Calculated parameter N Xa M MG XP Xgeo Xcrn xg MN AA SP 14 7 5 0.43 2.84 2.24 0.2 2 2 0.19 1.82 1.7 0.16 1.65 1.3 0.24 1.88 1.34 0.19 1.38 - 0.19 1.73 1.61 FX 5.05 4.95 4.98 5.29 4.01 4.87 4.96 14 GCV 0.16-1.39-4.90 Total 40 2.69 1.53 1.69 1.45 1.1s 0.17 1.4 From the analytical point of view, it seems surprising that for a frequently determined element such as cobalt, such large discrepancies in analytical data may occur. One should nevertheless bear in mind that the AC-E was specially prepared as a CRM for the rare earth elements and the analysis results for cobalt are only a useful by-product of no particular interest.As 0.2 ppm is an unusually low cobalt content in rocks, it is probably well below the concentration range for which the instruments are routinely calibrated. The readings by optical spectrometry and XRF were plausibly evaluated simply by extrapolation to lower concentration. The greater error by XRF probably reflects the fact that 0.2 ppm of cobalt is closer to the limit of detection by XRF than by optical spectrometry. However, the most remarkable outcome, in our view, is that the gnostical analyser applied to the entire set of data did provide three different results which, in addition, are in fairly good concordance with the statistical analysis applied sepa- rately to the results by different analytical methods.This demonstrates the possibilities of the GT when applied to non-homogeneous sets of data in discerning separate data files. Conclusions We have tried to draw attention to a new, powerful tool for treating analytical data provided by the gnostical theory. This is demonstrated by applications of the gnostical analyser to the data from a collaborative study9 and for deriving recom- mended values for the CRM rock AC-E. Although the GT cannot provide RVs from insufficient data, it can reveal their ‘heterogeneity’. For such sets the GT is highly sensitive and may distinguish the independent files even without any additional information. With respect to PL calculation, it also exhibits high robustness, thus providing a theoretically based, practical substitute for empirically derived robust estimators. The GT is still being developed and its applications are very promising. The program system entitled ‘interactive analyzer GA2’ has been adapted for use on an IBM PC. Readers interested in the GT should contact P. Kovanic. References 1 Ellis, P. J . , and Steele, T. W., Geostand. Newsl., 1982, 2, 207. 2 Lister, B., Geostand. Newsl., 1984, 7, 171. 3 Abbey, S., Geostand. Newsl., 1988, 9, 241. 4 Abbey, S . , paper presented at Geoanalysis 90, Huntsville, Canada, 1990. 5 Abbey, S . , Chem. Geol., 1992, 95, 123. 6 Parzen, E., Ann. Math. Stat., 1962, 35, 1065. 7 Baran. R. H.. Automatica, 1988.24, 283. 8 Kovanic, P., Automarica, 1986, 22, 657. 9 Govindaraju, K., Geostand. News/., 1987, 11, 203. Paper 2f02281 H Received May 1, 1992 Accepted August 5, 1992

 

点击下载:  PDF (572KB)



返 回