NSF Awards: 1712423
2019 (see original presentation & discussion)
Undergraduate, Graduate
How can we provide students with more opportunities to practice meaningful scientific writing? How do we move beyond copy editing, and provide students with tailored, meaningful feedback that builds deep transferable communication skills? Are these goals even possible in classes with dozens or hundreds of enrolled students? Yes, we can, and yes, it is possible. Our process for teaching scientific writing brings together student annotation exercises, automated analysis of draft texts, and holistic grading and feedback with text analytics methods from data science. Our approach is designed so multiple instructors can implement them simultaneously, simplifying scale-up to large courses. The exercises and text analytics provide us with two important kinds of data. First we get benchmark data and actionable formative assessments of students' development as writers. Second, we get data that lets us monitor instructor performance, compare performance of instructors (either individually or as a cohort) to program guidelines, and make adjustments to instructor training in real time.
A Daniel Johnson
Teaching Professor
Welcome to our video on the STEM Writing Project!
After working for a couple years "below the radar" with our students and graduate teaching assistants (GTAs), we're really excited to have a chance to start sharing what we are learning about how undergraduates develop as writers. We've also learned a LOT about ways to help students get there sooner.
Don't just watch! We want to hear from you too!
Teachers: What bottlenecks make it hard for you to use writing in your classes, especially in large-enrollment intro courses? What problems do you run into when teaching scientific writing? What data do you wish you could get from your students, but don't know how?
Data Scientists: What data explorations can you envision doing with a datastore of >4000 student writing assignments?
Teaching Assistants: What is your biggest challenge when teaching scientific writing?
Phillip Eaglin, PhD
Founder and CEO
How exciting that you are supporting students to develop their scientific writing skills! We need students to communicate in writing so that others may learn from and verify prior scientific work! Questions: I see that you are using data science to analyze the writing--does that system involve the use of artificial intelligence to make recommendations for improving writing? How do instructors decide when a task will be a writing task versus a non-writing task?
A Daniel Johnson
Teaching Professor
Two great questions.
One advantage we have is that lab reports (which we model on scientific articles) have several well-defined elements that do not require AI to identify. Right now, all of the feedback students are getting is pattern- and rules-based feedback from our automated system. We identified a lot of our search patterns by looking at our own students' writing; for example,we screen for the presence of a hypothesis statement located in the Introduction section by searching for an "if...then" clause. Papers that earned an A or B almost always had it; reports with lower grades usually did not.
Since students had not read scientific literature before, we found they did not know where most of the pieces went, or how to format them. Getting the basics in the right place is where the automated system is helpful, because students can submit as often as they want. The instructors can focus on deep reading and spend their comments focused on global issues. All that said, what we're working on right now is a machine algorithm that can "read" the stepwise moves students make and provide even deeper early feedback.
Deciding what to make a writing task was something we committed to as a department. Every student needs to be able to communicate precisely regardless of their future career. Writing regularly with multiple rounds of feedback over the course of 1-2 years is the best way to build that skill. We're fortunate in all students in our 100-level lab courses design and complete several experiments of their own as part of inquiry-based labs. Writing is the natural end point for those experiments, because it's what we do as professionals.
Applying data science to student writing requires thinking about writing tasks a bit differently than what most instructors are used to. We tend to think in terms of individual grades; the DS approach requires thinking this way: "what data can I extract from this writing assignment? What can those data tell me about my class as a whole?"
Phillip Eaglin, PhD
Danielle Watt
Marcelo Worsley
Assistant Professor
Thank you for sharing this work. I have two questions.
1. I wanted to know more about your vision for the future of this work? Where do you see it going in the future, and how do you see the advances in AI allowing for increases complex automatic feedback?
2. Can you say a bit more about how students feel about this automated feedback? Have you encountered any concerns with algorithmic transparency or students trying to game the system?
A Daniel Johnson
Teaching Professor
We're very aware that using AI raises concerns about eliminating human teachers, and that's not our goal. What we are using the AI for is to identify the PATTERNS in student responses, so the teacher can target the specific problems their students have. For instance, with our annotation exercise this past semester, one TA found that 12% of their students mis-identified where data interpretation statements go (discussion, not results). This is only a couple students, but misplacing data interpretation is a major error in scientific writing. So the TA stepped back and addressed this misconception again in the class debrief of the annotation.
The data we are collecting about student writing is telling us what the most frequent errors are, so we address those first. That brings us to the automated feedback question; some students do not pay attention to it. We do not pressure them about it, because they find out from peers very quickly that SAWHET would have flagged an error that cost them a good score. Their own desire to get a good grade is the motivator.
Students complained the first couple of semesters about the upload process, but we had surprisingly little negative response to the feedback itself. When we asked students, they said they got auto-feedback during their activities online already, so did not feel intrusive.
We tell students the first day of a class that uses this system that they are welcome to ask us how their feedback is being generated, and have a dedicated email address to do so. The fact we offer transparency seems to work well, because in 4 years we have had less than a dozen questions. It also helps that we tell students on their feedback that SAWHET is a computer system, and might be wrong. The final decision rests with their TA, and if they have questions, they should talk to their TA. This reiterates our belief that a human has final decision-making authority about grades.
A couple students have tried to game the system, but doing so takes MUCH more time and effort than they would spend following instructions and writing a good report. We count on this "time tax" to limit gaming. Having a live person at the end also limits it.
Danielle Watt
Anya Goodman
I also share the excitement of others on this thread about the project! Thank you for sharing and tackling a much needed issue (scientific writing skills for undergrads). I cannot imagine that AI will ever replace human feedback (old fashioned thinking?), but I wonder if building in tiers of feedback, e.g. bins-based system for the first pass, followed by peer review, would allow to scale the process? Have you tried comparing peer-feedback with the automated feedback system both in terms of building skills, but also on the affective outcomes?
A Daniel Johnson
Teaching Professor
We agree that AI will not replace the human at the end; it is a tool for getting more information so we can help students master basic scientific writing skills sooner. What you describe as tiers is exactly what we are doing. Students submit reports to SAWHET as many times as they like to get feedback on basic structural issues: I think of them as the "what goes where" and "how do I format X" questions. After the deadline for submission passes, TAs get the most recently submitted version of the report, which is what they score using our bins-based model. Concurrently, students bring copies of the report for peers to review. The peer review and comments from TA on the submission go back to the student, who has a week to upload a revision version. The REVISED version is what gets graded for credit.
This combination has worked very well at improving student writing. We found that peer feedback did little to improve report scores, and that omissions and basic errors persisted even after peer reviews. Using SAWHET, students literally cannot submit work with missing elements; the intake form flags the errors even before submission. Using this approach, we went from an average of 64% of reports meeting basic criteria with the initial submission to >90% of reports meeting basic criteria in 1 year.
Affective outcomes have been positive too. We compared the number of negative comments about peer review and report grading processes on end-of-course lab evaluations, before vs. after implementing SAWHET. Total number of negative comments related to report grading dropped more than 50%, and negative comments about peer review dropped ~30% (this has been harder to put a firm number on.)
One of the unanticipated benefits of SAWHET and bins-based grading is that the remaining negative comments have shifted away from the instructors to the grading model. In the past, we routinely got "my TA is harder/more unfair than every other TA" comments for nearly every TA. This puts the TAs in a positive mentoring position, rather than the position of being the judge of students' abilities.
Danielle Watt
Director of Education, Outreach, Diversity
Thank you for sharing your project, science writing/communication is such an integral part of being successful. What are your plans to increase the effectiveness of peer review feedback - i.e. training, peer mentors who completed course, etc?
A Daniel Johnson
Teaching Professor
Tackling that is our project for the fall. What we want to try is flipping the order of experience so that students read skill-level appropriate primary literature first, discuss the findings, then re-analyze the article asking "what is in this section, and what does it accomplish? WHY does it belong there?" We also are testing having the undergraduates actually annotate their peers' papers using our grading rubric, so we can see what they are saying to one another. Big challenge for this latter approach is how best to collect the data so we can analyze it.
Danielle Watt
Danielle Watt
Director of Education, Outreach, Diversity
I really like the idea of having the students train to review literature and incorporating your grading rubric but I see the challenge you proposed. Good luck with the project!
Gabriela Weaver
This project sounds very promising and addresses an important need. Are there plans for developing faculty training materials to encourage them to use these tools in the most effective ways?
A Daniel Johnson
Teaching Professor
Absolutely. The way we approach writing instruction feels counter-intuitive at first, especially for faculty who are comfortable with points-based grading scales. What we hope is that we can refine the TA training materials we are building right now for fall semester so that faculty can adopt them without direct supervision. That's a bit further down the road.
Danielle Watt
Michelle Quirke
I am very interested in this topic. Will this be published in the future? I'd like to share the project in a journal club with faculty colleagues who are interested in learning more about scientific writing skills.
A Daniel Johnson
Teaching Professor
Definitely we are planning to publish. One of my hats is being part of the BioTAP Network and Association for Biology Laboratory Education. Both are interested in seeing this published. We are looking for a formal publication in the next year or so. In the mean time, anyone interest is welcome to contact me and find out more on an informal basis.
Laura Guertin
I agree that we can always find ways to do more to develop the writing and analytical skills of our students in introductory-level courses - as you mention, this skill is transferrable no matter what their intended major. I'm curious to know if you have worked at all with your university's writing studio/writing center, perhaps collaborating with them so they are best positioned to assist students that perhaps require additional help/tutoring?
If you are interested in learning more about student writing with audio recording ("audio narratives"), please check out my group's work! https://stemforall2019.videohall.com/presentations/1513
A Daniel Johnson
Teaching Professor
We definitely have been working with our Writing Center. As more students whose first language is not English came onto campus, we had to start making more referrals. At the same time, we built a schematic that shows students "Have this problem? Talk to your TA. Have this problem? Talk to the Writing Center." We also make sure they have a print copy and direct digital access to our writing resource guide and requirements.
I am curious about audio narratives - thanks for pointing me that way!
Rebecca Roberts
Great project. I am curious about how the students respond to the bin method of grading.
A Daniel Johnson
Teaching Professor
First semester we had enormous push-back, in part because we could not predict with certainty what the final averages would be. Turned out that average course grades went UP about 5%. We thought it was inflation in scaling, but we pulled reports and made random comparisons using our old rubric. The scores were higher because the reports had fewer errors and omissions.
Once we had numbers, we started telling students on Day 1 that we use this grading model, why we use it, and that we see higher scores as a result. That eliminated about 1/3 of the complaints between first and second semester. We are in the 6th semester of using bins grading now, and the number of complaints has dropped to about half of the number that we used to get on our old points-based grading rubric. The largest drop was in complaints that a particular TA was unfair or graded too harshly.
Anyone thinking about using bins-based grading needs to be ready for a couple of semesters of very loud student objections. Most of it is because it is different, and students cannot argue for points anymore. After a couple groups pass through, it becomes part of local dept. culture. It helps if you can use it for more than one course in a sequence too.
A Daniel Johnson
Teaching Professor
POP QUIZ / INFORMAL POLL !
How long do you estimate it would take you to grade a lab report?
Our students' reports are 3-4 pages long, with Abstract, Intro, Methods, Results, Discussion, Citations, and data figures and tables.
How much time would you estimate taking to read and provide meaningful feedback to a 100-level student, writing a median quality lab report of that length?
I'll give it a couple days to see who responds, then post one of our findings!
Danielle Watt
Director of Education, Outreach, Diversity
I would say 30 minutes
A Daniel Johnson
Teaching Professor
As I told Erin Kraal below, our TAs averaged 35 minutes per report pre-SAWHET/bins. Currently, TAs average 12-15 minutes per report. Some of my experienced TAs are down to 8-10 minutes.
We know this is not reporting bias because we compared their estimates to the metadata we extracted from commented Word documents. The timestamps on the comments corroborated their estimates.
Ellis Bell
I really like the idea that via SAWHET students get help to allow them to revise etc. After a student has taken a course using SAWHET, what happens to their writing skills in a course that doesnt use it?
A Daniel Johnson
Teaching Professor
We've not looked at that formally, but informally, I sat on several senior honors project committees in our department this past April, and about half said they appreciated how we had taught them writing, and that it made thinking about their research easier. Of those who did not say anything specifically, about 1/3 had elements in their writing and presentation style that I recognized as coming from our first year training program. We cannot claim sole credit, but I hope that we can start looking at longer-term impacts soon.
One random observation from this semester: we teach students to use a slightly unusual citation style that avoids favoring any particular field. I saw several senior students use it in their formal honors presentations. It's like a hidden Easter egg in software, or an inside joke. Not a big thing, but it warms your heart to know we made an impression.
This also is anecdotal, but faculty in our more advanced courses report that they don't have to explain the difference between private (meaning, incomplete) drafts and initial submissions anymore to their students. Juniors and seniors tend to turn in draft work that is more fully developed, so faculty can provide more meaningful feedback.
Sheila Homburger
Thanks for sharing about this valuable project!
I see a potential use for this type of tool in high school education. With the Next Generation Science Standards and three-dimensional learning, students are asked to process complex information and think like scientists. More complex tasks require more complex assessment tools; multiple choice questions are not going to cut it.
I'm part of a team that develops K-12 curriculum materials and conducts research on the materials' efficacy. A unit we developed recently was largely structured around the task of argumentation from evidence. It was very challenging for the team to come up with a way to assess the execution of this skill for well over 2,000 students!
Do you have any plans to go in this direction?
A Daniel Johnson
Teaching Professor
We do not have any immediate plans to develop materials ourselves, but I'd be happy to talk strategies and tools that your team could consider.
Big datasets like the one you have are ideal for training text and linguistic analyses. To give you some idea of what is possible, one of our analysis tasks is to read and classify every comment TAs make on student reports each semester, which is 10,000+ comments. By hand that takes about 2 months. We created a text classifier in R (which is free) that can classify the comments in about 15 minutes.
I'd love to talk further. Hit me up by email if you are interested.
Sheila Homburger
Erin Kraal
Daniel - very interesting project! I like how you mention in the video that 'grading' is an ongoing dialog with students. Two questions: How do you see this transferring to other types of scientific writing, lab reports are only one way we ask students to write about science. In particular I'm thinking about how this might transfer to large courses that are lecture only (as in no 'lab reports') but want to include a writing assignment. Second, I love the move to bin grading. I've personally moved away from points grading in all assignments and all classes (not just writing) and found it saves a huge amount of time AND allows me to provide meaningful feedback. I'm excited to see data on this type of evaluation. Does your research provide insight into bin grading for other areas?
A Daniel Johnson
Teaching Professor
Thanks for your questions Erin. Transfer to another format takes some work, but is definitely possible. Reports have a single standard format, which makes searching for structural issues easier, but if someone has a format they expect students to follow, they can implement form-based automated feedback. The limiting step is identifying what features to screen for, then writing rules to identify them. Fortunately, some features are pretty universal, like citing sources, or formats for summarizing statistics.
In free response writing the process is more complicated. What we did, and what I would suggest anyone thinking of this do, is collect a set of responses that cover the range of quality/scores. Identify the most important features on which the grade is based, ideally ones that can be scored as present/absent, or yes/partly/no. Those are the targets for automated feedback. We also looked at vocabulary range and readability scores (definitely easier with computational support.) They were not useful for our purposes, but the metrics are well-supported by research, and in a particular situation they may be useful.
What we have seen so far from bins grading is that grading time goes down drastically. Prior to using bins, our TAs spent an average of 35 minutes per paper, and had no criteria on which to say "this report does not even have enough content to grade." With bins, average grading time has dropped to 12-15 minutes, and there are clear criteria for handing a report back un-commented. The latter sounds harsh, but we do it very rarely; students know that they have to provide a complete report to get any feedback.
Based on the success with reports, we are now using bins grading for short answer questions on quizzes, notebooks, and oral presentations too.
One thing we found very useful was to remove %s and letter grades from our bins. We use "Acceptable", "Minor revisions needed", "Major revisions needed", and "Unacceptable" as the grading levels.
Kirsten Daehler
I was interested to see how you are working with faculty and undergraduate students around supporting writing in science. I can see how you are supporting students in strengthening their writing through peer feedback. I'm curious, are noticing any changes in incoming students' writing abilities or experience with peer feedback? We see more and more of this happening in middle school and high school science classrooms with the emphasis of the CCSS-ELA efforts.
We work in K–12 teacher professional learning and have found different challenges associated with supporting writing at various grade levels. Most elementary grade teachers know a whole lot about supporting writing, but want to know more about supporting disciplinary writing in science. At the secondary level, we see the reverse with science teachers not having formal training to support literacy skills in their classrooms. We've partnered with the National Writing Project (NWP) and integrated "Writers Groups" into the professional learning we do with teachers to model how they can incorporate writing into their own classrooms. One big ah-ha for teachers is that there is no such thing as "The Writing Process" (just as there is no such thing as "The Scientific Method"), but rather many different approaches individuals take to writing.
Feel free to check out our video and continue the conversation. There is some good discussion about the integration of science and writing in the discussion thread. https://stemforall2019.videohall.com/presentations/1519
Further posting is closed as the event has ended.