한국어 말하기 수행 평가의 발음 범주 채점에 대한 타당성 검증 [韩语论文]

资料分类免费韩语论文 责任编辑:金一助教更新时间:2017-04-27
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

The primary goal of speaking performance evaluation is to predict an examinee's speaking ability in terms of real-life communication, based on scores measured on speaking ability. To that end, all of the elements such as the construct, measuring proce...

The primary goal of speaking performance evaluation is to predict an examinee's speaking ability in terms of real-life communication, based on scores measured on speaking ability. To that end, all of the elements such as the construct, measuring process, and measuring results (or scores) should be valid. With that being said, what does it really mean by "valid evaluation"? With the concepts of 'validity' being unified into 'construct validity', the 'validity' of educational evaluation has been defined as the extent to which an evaluation can accurately indicate the level of an examinee's linguistic knowledge or ability (construct) through scores. Such a change in concepts of validity has brought people's attentions to the issue of how to verify the validity of an evaluation as well as how to validate the evaluation per se. Moreover, as types of evidences for proving validity of evaluation methods have become diverse ever since the Messick's model was applied to language evaluation in 1990's, verification of validity (will be referred to as 'validation' hereinafter) has been recognized as a due course for reaching a final conclusion based on all of the evidences that are collected. According to Weir (2004), such a validation for speaking performance evaluation is categorized into five specific steps such as 'theory-based validity', 'context validity', 'scoring validity', 'context validity', and 'criterion-related validity'. Based on such processes suggested by Weir, both theoretical and empirical validations were conducted in this study by running 'theory-base validity' and 'scoring validity' programs focusing solely on the 'pronunciation' category of speaking performance evaluation. All of such processes were aimed at recommending valid rating methods for measuring examinees' pronunciation abilities. First off, as a theory-based validation process, renowned communication models in the field of foreign language education were researched, while the status of 'pronunciation' category in studies related to speaking evaluation in the field of Korean speaking education, was examined. As a result, it was reassured that pronunciation ability is a critical category that must be included as an independent part of the speaking performance evaluation for assessing examinees' speaking ability. Also, it was discovered that the concepts of 'pronunciation' category that have been suggested up to present in studies pertaining to Korean language education are vague. Moreover, there were inconsistencies among the concepts or evaluation methods for constructs that are supposed to be graded from the pronunciation-specific point of view. Against the backdrop, after scrutinizing pronunciation-related constructs that have been used for pronunciation education and evaluation in foreign language field, this study has selected 'segments', 'suprasegment', 'speech speed', and 'pause' as the constructs for pronunciation evaluation in Korean speaking evaluation. In addition, 'phoneme', 'syllable', and 'phonological change' as well as 'intonation' were included in 'segment' and 'suprasegment' constructs respectively. As a next step, this study suggests specific rating methods for such constructs through pre-scoring validation process. In the mean time, the suggested rating methods were objectively validated through post-scoring validation process by performing quantitative analysis. First, as a pre-scoring validation process, criterion, rater, task, and rating method as well as rating scale, all went through theory-based validation. Based on the results, it is asserted in this that 'accuracy' and 'fluency' can be selectively used as rating criterion for evaluating pronunciation, in accordance with the given circumstances and as necessary. It is also reaffirmed that since examiner is one of the factors that can have the greatest impact on evaluation results, he or she will have to be experienced in teaching Korean language and rating speaking performances. Given that this evaluation is focused on evaluating general speaking performance rather than goal-specific speaking performance, 'construct based task' is suggested to be adequate enough for rating pronunciation ability. Additionally, it is proposed that tasks will need to be able to facilitate quality speech that is lengthy enough for reasonable evaluation. As for rating 'segment' and 'suprasegment' based on 'accuracy', 'analystic rating method' is suggested, while 'holistic rating method' is suggested for rating the same two constructs based on 'intelligibility'. Also, with respect to rating 'speed' and 'pause' of speech based on 'phonological fluency', 'holistic rating method' is recommended. Finally, as a rating scale, '6-point Likert scale' is proposed. In the following step, computer-based speaking performance evaluations were conducted, and seven raters were assigned to grade them according to the rating methods suggested previously. Based on the rating result data, post-rating validation using Multi-Facet Rasch Analysis and Generalizability Theory was performed. As a result of running the Multi-Facet Rasch Analysis, a couple of raters were identified to be disqualified, therefore, the analysis was re-run without them. Subsequently, it was confirmed not only that the suggested rating criterion were used independently and effectively for rating each of the constructs, but also that scores of the raters were showing 'inter-rater consistency' as well as 'intra-rater consistency'. In the mean time, 'severity' seemed to vary from one rater from another. Additionally, the level of 'difficulty' seemed to be rising in the following order: 'suprasegment rating based on accuracy' → 'segment rating based on accuracy' → 'speech speed and pause rating based on phonological fluency' → 'segment and suprasegment rating based on intelligibility'. With regard to evaluation task, tasks such as 'read aloud', 'describing picture', and 'narration' all seemed to be working independently and effectively in statistically significant ways regardless of the level of difficulty. As for the 6-point Likert scale, which was used for the rating, relatively the same intervals were exhibited amongst the raters indicating that they had the same understating as to how the scale works. In the next step, rating results of the five raters were taken to undergo validation based on Generalizability Theory. Based upon the Generalization Study, the key error source of this evaluation was unveiled to be the 'examinees' factor. However, considering what the evaluation scores are reflecting the most, which is the different levels of pronunciation abilities of the examinees, the results still can be viewed positively. In the final stage, a Decision Study was conducted to examine generalizability coefficients of the rated scores. As a result, it was discovered that in case two raters scored three or more than three tasks, even .9 or higher generalizability coefficients could be obtained. This means that once the conditions were met, consistent rating results could be achieved even when evaluations were graded by different raters using the suggested rating methods. Moreover, it was assured that increasing the number of raters rather than tasks worked better to acquire higher generalizability coefficients. In conclusion, this study narrows down its scope of speaking performance evaluation to pronunciation as a way to suggest valid rating methods. In order to verify the validity of the suggested rating methods, validation processes were undertaken. Acknowledging the limited scope of the study, the validation processes used in the study are still believed to be applicable to other evaluation categories. Moreover, once the qualitative analysis is also run in parallel with the theory-based validation and quantitative analysis used in this study, they are expected to play complementary roles for one another when used as reference data for developing speaking performance evaluations and designing rater training courses.

免费论文题目: