We must stress again that ing and interpreting effect sizes in the context of previously ed effects is essential to good research. It enables readers to evaluate the stability of results across samples, designs, and analyses. Reporting effect sizes also informs power analyses and meta-analyses needed in future research. (Wilkinson & Task Force, 1999, p. 599) The Task Force's reservations about the accept-reject decision about H0 and its insistence on ing the effect size (Wilkinson & Task Force, 1999, p.599) and confidence-interval estimates (Wilkinson & Task Force, 1999, p.599) have to be considered with reference to (a) Meehl's (1967, 1978) distinction between the substantive and statistical hypotheses, (b) what the statistical hypothesis is about, and (c) Tukey's (1960) distinction between making the statistical decision about chance influences and drawing the conceptual conclusion about the substantive hypothesis. As H0 is the hypothesis about chance influences on data, a dichotomous accept-reject decision is all that is required. It is not shown in the why psychologists can ignore Meehl's or Tukey's distinction in their methodological discourse. The main reason to require ing the effect size is that the information is crucial to meta-analysis. This insistence would be warranted if meta-analysis were a valid way to ascertain the tenability of an explanatory theory. However, there are conceptual difficulties with meta-analytic approaches (Chow, 1987). For the present discussion, note that 'effect' as a statistical concept refers to (a) the difference between two or more levels of an independent variable or (b) the relation between two or more variables at the statistical level. Given the fact that different variables are used in the context of diverse tasks in a converging series of experiments (Garner, Hake, & Eriksen, 1956), the effects from diverse experiments are not commensurate even though the experiments are all ostensibly about the same phenomenon (see Table 5.5 in Chow, 1996, p. 111). It does not make sense to talk about the 'stability results across samples' when dealing with apples and oranges. Consequently it is not clear what warrants the assertion, "ing and interpreting effect sizes in the context of previously ed effects is essential to good research" (Wilkinson & Task Force, 1999, p.599). Some Reservations about Statistical Power The validity of the power-analytic argument is taken for granted in the (Wilkinson & Task Force, 1999, p.596). It may be helpful to consider three issues about the power-analytic approach, namely, (a) the statistical power is a conditional probability, (b) statistical significance and statistical power belong to different levels of abstraction, (c) the determination of sample size is not a mechanical exercise. Power Analysis as a Conditional Probability Statistical power is the 1's complement of b, the probability of the Type II error. That is, statistical power is the probability of rejecting H0, given that H0 is false. The probability becomes meaningful only after the decision is made to reject H0. As b is a conditional probability, so should be statistical power. How is it possible for such a conditional probability to be an exact probability, namely, "the probability that it will yield statistically significant results" (Cohen, 1987, p. 1; italics added)? The Putative Relationship Between Statistical Power and Statistical Significance Central to the power-analytic approach is the assumption that statistical power is a function of the desired effect size, the sample size, and the alpha level. At the same time, the effect size is commonly defined at the level of the statistical populations underlying the experimental and control conditions (e.g., Cohen's, 1987, d). It take two statistical population distributions to defined the effect size. The decision about statistical significance, on the other hand, is made on the basis of a lone theoretical distribution in the case of the t-test (viz., the sampling distribution of the differences between two means). Moreover, the sampling distribution of difference is at a level more abstract than the distributions of the two statistical populations underlying the experimental and control conditions. Consequently, it is impossible to represent correctly both alpha and statistical power at the same level of abstraction (Chow, 1991, 1996, 1998). Should psychologists be oblivious to the `disparate levels of abstraction' difficulty noted above? Sample-size Determination It is asserted in the that using the power-analytic procedure to determine the sample size would stimulate the researcher "to take seriously prior research and theory" (Wilkinson & Task Force, 1999, p.586). This is not possible even if it were possible to leave aside the `disparate levels of abstraction' difficulty for the moment. A crucial element in determining the sample size with reference to statistical power is the 'desired effect size.' At the same time, it is a common power-analytic practice to appeal to "a range of reasonable alpha values and effect sizes" (Wilkinson & Task Force, 1999, p.597). Such a range consists typically of ten to fourteen effect sizes. Apart from psychological laws qua functional relationships between two or more variables, theories in psychology are qualitative explanatory theories. These explanatory theories are speculative statements about hypothetical mechanisms. Power-analysts have never shown how subtle conceptual differences in the qualitative theories may be faithfully represented by their limited range of ten or so 'reasonable' effect sizes. Furthermore, concerns about the statistical significance are ultimately concerns about data stability and the exclusion of chance influences as an explanation. These issues cannot be settled mechanically in the way depicted in power-analysis. The putative relationships among effect size, statistical power and sample size brings us to the putative dependence of statistical significance on sample size. The Relationship Between Statistical Significance and Sample Size Examined It is taken as a truism in the that statistical significance depends on sample size. Yet, there has been neither empirical evidence nor analytical reason for saying that "statistical tests depend on sample size" (Wilkinson & Task Force, 1999, p.598). Consider the assertion, "as sample size increases, the tests often will reject innocuous assumptions," (Wilkinson & Task Force, 1999, p.598) with reference to Table 3. Suppose that the result of the 1-tailed, independent-sample t-test with df = 8 is 1.58. It is not significant at the .05 level with reference to the critical value of 1.86. The df becomes 148 and In order for the `sample size-dependent significance' assertion to be true, the calculated t must become larger than 1.58 when the sample size is increased from n1 = n2 = 5 to n1 = n2 = 75. Even if there is no change in the calculated t when the sample size is increased to 75, the calculated t should become larger when the sample size is increased to n1 = n2 =750. Otherwise, increasing the sample size would not make the result significant if the t-ratio remains at 1.58. Six simulation trials were carried out to test the `sample size-dependent significance' thesis as follows. Conclusions Many of the observations made about psychologists' research practice would assume a more benign complexion if theoretical relevancy and some subtle distinctions are taken into account. For example, the evidential support for the experimenter's expectancy effects has to be re-considered if the distinction between meta-experiment and experiment is made. It is necessary for power-analysts to resolve the 'disparate levels of abstraction' difficulty and to explain how a conditional probability may be used as an exact probability. Despite what is said in the , it is hoped that non-psychologist readers have a better opinion of psychologists' methodological sophistication, conceptual rigor or intellectual integrity.(),英语论文范文,英语毕业论文 |