Overview

This study explores the relationship between English writing fluency and grades at Minerva while considering other factors, such as knowledge level and reading comprehension, to create a framework for assessing writing fluency and examining current institutional practices. We utilized all poll response data from 2019 and engineered linguistic features to create a set of fluency metrics for each poll response. We examined the relationship between grades and fluency and the difference in grades between low and high-fluency polls in different colleges. Furthermore, this study incorporates fluency metrics in machine-learning models to predict grades.

Dataset

This dataset contains 179,790 rows of data after data cleaning from all poll responses at Minerva University in 2019. Each row includes a Poll ID, Poll Response, Learning Objective tags (HC/LO), and Poll Grade. It is worth noting that 80% of the students at Minerva University are non-native English speakers. All responses are limited to 300 words of text and written in under 5 minutes, and the grade is scaled from 1 to 5. Student IDs have been excluded from the dataset to protect student identities. The dataset is categorized into four colleges: Computer Science (CS), Natural Science (NS), Social Science (SS), and Arts and Humanities (AH).

Method

The study utilized feature engineering and Natural Language Processing (NLP) analysis to generate fluency metrics, such as readability scores, lexical richness metrics, and sentence-to-vectors. Hypothesis testing was conducted to examine correlations between grades and fluency for coding- and writing-related data across the four colleges. Grade difference estimation was used to compare the top 20% and bottom 20% fluency polls. Finally, XGboost and Neural Network were deployed to predict grades based on fluency scores.

Result

The study found positive and significant grade-fluency correlations for all colleges, with the least significant for the Computer Science college (Arts & Humanity: 0.083, Social Science: 0.079, Natural Science: 0.075, Computer Science: 0.013). There were differences between high and low fluency polls, with the slightest grade difference in the Computer Science college (0.04) compared to the other colleges (Arts & Humanity: 0.16, Social Science: 0.16, Natural Science: 0.14). Machine learning prediction achieved an accuracy of 47.09 %. However, the causal interpretation between writing fluency and grade remains challenging.