| The main goal of the study was to examine nine different DIF assessmentprocedures,â€SIB-STâ€,â€SIB-SPâ€,â€SIB-PAâ€,â€IRT-LR-STâ€,â€IRT-LR-SPâ€,â€IRT-LR-PAâ€,â€DFIT-STâ€,â€DFIT-SPâ€, andâ€DFIT-PA†s power and robustness under the gradedresponse model. Four independent variables were manipulated in a Monte Carlosimulation study, including sample sizes, the DIF pattern, the percentage of DIF itemsand the DIF seriousness. And the dependent variables were type I error and the powerof DIF assessment.The main conclusions of this study were summarized as follows.1. ALL nine methods for DIF, the power of DIF detectability and type I errorincreased as the sample size increased. SP and PA procedure effectively helpedcontrolling the type error and improving detecting power.2. In the condition of difference DIF seriousness, nine methods performed betterin moderate seriousness than two types of light DIF seriousness with the uniform ormixed DIF items.3. Nine IRT-based Procedures for DIF, the power of DIF detectability and type Ierror increased as the number of DIF items in a test increased.4. In a fully considering of both power and type I error, IRT-LR with scalepurification procedure and DFTD strategy were performed better than the others.5. Methods with scale purification procedure and DFTD strategy performedbetter in detecting high percentage of DIF items and large sample size than thestandard procedure, but when the number if DIF item and sample size were small, thegeneral procedure perform better. |