August 21, 2017
“We're doing this because we would like to get students that wouldn't have the opportunity otherwise to think about software, to learn about open source software, and then potentially be able to use that in either graduate school or in industry.” – Daniel S. Katz, PI
Ten undergraduate students were on campus this summer to participate in NCSA’s new 3-year REU, INCLUSION (Incubating a New Community of Leaders Using Software, Inclusion, Innovation, Interdisciplinary and OpeN-Science), funded through the NSF’s Office of Advanced Cyber Infrastructure. The goals of the REU are to enable ten undergraduate students each year to develop software and contribute to software projects, specifically Open Source Software projects; to make the population of software developers more diverse; and to foster cross-disciplinary collaboration across all fields with projects led by two mentors from different disciplines. Student participants gained skills they hope to use in the future: they learned about Open Source Software and programming; they learned how to present research; plus they made some relationships and networked.
Daniel S. Katz, NCSA's Assistant Director for Scientific Software & Applications, Research Professor (iSchool, Electrical and Computer Engineering. and Computer Science), and PI of the grant, defines it as free software, but adds the caveat: free can have different definitions. Free could be getting something free and clear, like free beer; a free puppy, on the other hand, has strings attached: one actually has an obligation after getting it. Katz likens open source software to the second, which he says, “only works if people don't just take from it, but they also contribute to it over time. That's part of what we're trying to encourage, is to build up some skills in the next generation of undergraduates that may go on to graduate school or may go to industry.”
According to Katz, one of the main tools people use to collaborate and write open source software is a website called GitHub. Git is a tool that people can use to collaboratively write and work on software. GitHub is a development platform that provides tools and features that foster collaboration among software developers.
Katz explains that the REU’s first goal was to help students grasp that software is an important knowledge product, and to be able to understand, encrypt, and read software code. The students were to create their own; contribute to already-existing software; or test software, make it better, or provide documentation and user support. But like the software itself, any contributions they made to the projects were free; “there’s no money involved,” he explains.
Besides being an acronym, the REU’s name, INCLUSION, refers to its second goal—the inclusion of students from underrepresented groups in an effort to increase diversity in software development. For instance, recent survey data identified the demographics of people contributing to software on GitHub’s website as “overwhelmingly white and male,” Katz reports.
How’d they do on meeting their inclusion goal in 2017? While more than half (60%), six of the ten participants were women, the percentage of students of underrepresented ethnicity was low (two Latinas). And while in some fields, women might not be considered to be underrepresented, in STEM, “women certainly are an underrepresented group,” Katz explains. He acknowledges that while they didn’t meet their goals 100%, “the goals that we had are still good goals,” and says, “I would like to see us get more underrepresented groups that are including women, but not exclusively women. That's probably the biggest challenge for the next two years.”
The REU’s third goal is related to collaboration. “Because open source software is a community activity, we wanted to focus on or emphasize people working together.” So they strove to have pairs of students working with pairs of mentors, which they weren’t always able to do. Sometimes they had two mentors, but not two students. In some cases, INCLUSION students were paired with other students already in the lab. Katz is hopeful that in the 2018 REU, some 2017 students will come back and they’ll be able to better implement the model described in their proposal: a more senior student helping a junior student.
Students were matched with their mentors via a mutual ranking system. First, pairs of interested NCSA mentors in different disciplines who were willing to work with a pair of students were recruited to participate. When students applied to the REU, they ranked the three projects in which they were most interested. Then, the mentors looked at the student applications and ranked the students that interested them the most. Olena Kindratenko, INCLUSION co-PI and project coordinator, then made the best matches possible.
Katz admits that while the REU mentors support the REU’s diversity goals, they also are motivated by “more specific goals…Partly, they'd like something useful to come out as a part of their project. Partly, they would like to have some new candidates for new graduate students that might work with them in the future.” The program and the mentors have to balance their desires for students who have already developed software skills and can contribute to projects immediately, perhaps due to advantages they’ve had in their lives, with training less advantaged students, who may be equally or more capable after being trained.
Additional skills students gained, along with coding software, were related to reporting on their research. Helping students regularly and in a focused way was SROP’s graduate student liaison, Nicole Jackson, who met with the REU students once a week, but was also involved with them electronically and helped them write research proposals, their final papers, and the posters they presented at the Illinois Summer Research Symposium near the end of the summer.
Katz credits NCSA's five-year-old SPIN program (Students Pushing Innovation) for many of the REU’s components: the students coming on campus for summer research, working with mentors, and ranking their research preferences when applying. One difference? SPIN is for Illinois students; the REU is primarily for students who are not from Illinois.
One such student was Alex Dickinson, a rising junior in computer science at University of California, Irvine, who chose INCLUSION because of Illinois’ strong CS program. Dickinson worked with mentor Prof. Victoria Stodden (iSchool, Law, Statistics, and Computer Science) and her NCSA postdoc Matthew Krafczyk.
Dickinson’s research was to assess the quality of code used in computational physics research, which he said has implications for the replicability of the studies with which the code was associated. So he examined code from computational physics papers, trying to get it to output the results in the corresponding paper. Of the nine papers he looked at, “The results aren't good,” he admits. “None of them achieved full replication of the paper.”
He says the authors of the papers wrote most of the code from scratch in C++ or Fortran. “But the C++ code was kind of a mess,” he acknowledges. “I could get it to work, but it took quite a bit of effort.”
Dickinson plans to use what he’s learned about Linux and the command line in the future. “So that's very useful information.” He adds that coming to Illinois, and “having that name on your resume is very useful due to the high ranking. This has definitely been a worthwhile experience.”
While his experience hasn’t changed the trajectory of his future plans, he thinks it will be advantageous: “The stronger my resume is, the more options I'll have. I'm pretty happy with the way this has played out. I've met some interesting people, and I'm sure that they will benefit me somehow in the future.”
Teamed up with Dickinson on his project was clinical psychology major Geraldine Padilla Sainez. A senior at Illinois, she applied to the INCLUSION program because it specified no previous knowledge of computer programming was needed, and she needed a research opportunity on her resume for graduate school. She confesses that at the beginning, it was “a little bit scary, seeing that everyone else was an engineer, and I was not. But I just kept with it and, luckily, I learned new skills that I will use eventually. Although they're not perfect, I'm still working through them, learning, coding programs, and languages. It's a fun experience for me.”
Sainez’ research was to answer the following questions about authors who have published articles: How do different institutions, regions, a person's job and tenure affect the availability of code and data in their articles? While she didn’t have results yet at the time of this interview, she expected to use R to create charts showing which regions have the most authors with the most code and data available in their articles or showing the relationship between tenure and the amount of code in the articles.
Sainez thinks that a lot of what she learned this summer should be helpful down the road. Coding, for one. She’s learned Python, R, and SQL in order to get graphs and charts to visualize her data, and feels like that's going to be helpful with the results; she also expects it will be helpful in grad school. “It's also easier to analyze data through SQL and Python, and with large amounts of data, I'm sure I'll use that,” she says.
She also feels like she learned about the writing process while writing her research paper. “Learning how to cite properly, learning what is appropriate for a research paper,” she says. She says she’s written only one other research paper which she claims was not her best, “So I feel like this time, just having one-on-one help with a mentor was very helpful. I feel like the networking is amazing. It's helped a lot.”
Besides learning about code and how to write about research, Sainez also did some networking this summer and built relationships. “I feel like I'll be able to contact them for just helping me with applying to grad school. Maybe not specifically to my field, because it's very different, but just recommendation letters, and overall just the friendships. Maybe eventually we will work together on some project.”
While Sainez’ dream job is to be a psychiatrist and go to medical school, she reports, “This program has made me realize that maybe a PhD in clinical psychology is where I want to go. I want to focus more on therapy than just medication, and I feel like I definitely want to help my community especially in Chicago's south side. I want to go back and give to my people.”
Another REU undergrad was University of Tennessee student Gavin Ridley, who describes how he got involved with INCLUSION. Already familiar with GitHub, he’d even seen someone make a fork (a copy of software one can work on and thus contribute to it). So when he became interested in Moltres MOOSE application, which was relevant to his nuclear engineering experience, he asked the developer of Moltres how he could get involved. The developer, NCSA postdoctoral scholar Alexander Lindsay, said, ‘Oh, you should apply to this INCLUSION program, and that would be a great opportunity for you to help with this.’”
Quite knowledgeable about GitHub jargon, Ridley explains that in order to contribute to projects on GitHub, after making a clone of the code and making modifications to it, you then make “a pull request,” saying, ‘Hey, you should incorporate the changes I made into your code; look at all the new stuff it can do!’ The idea is that you would like them to pull your code into their repository.”
He adds that those who maintain the software review pull requests. “They know what's best for the project, usually. Making my first pull request was very rewarding—and stressful—because I got some criticism, but it was very constructive.”
Co-mentored by Professors Katy Huff (Nuclear, Plasma, and Radiological Engineering & NCSA) and Matt Turk (iSchool, Astronomy, and NCSA), Ridley had done undergraduate research before, but he’d “never had any time to just focus on that before, which was what this summer was, so that was really good.”
What did Ridley learn this summer ? How to make scientific software sustainable and reproducible. “I've learned a huge amount through this. I learned how to make [software] reproducible, meaning others can get the same results, and sustainable, meaning the code can be continued.”
Ridley claims he’s absolutely going to be able to use the things he’s learned this summer down the road. In fact, he hopes to leverage what he’s learned to advertise himself to potential graduate schools, telling them, “I would like to use sustainable software practices to do some finite element modeling to advance nuclear power.”
Ridley cites a benefit of open source software: “The source code is available to everyone online…So I think it enables people from all over to enable your code and possibly extend it or contribute to it.”
Like Sainez, Ridley valued the networking he did over the summer, especially the “tight bonds” he made with people in the REU. He also got to know his mentors well, especially post doc Alex Lindsay. “He's not my main mentor,” Ridley explains, “but he's definitely the one that's taught me the most while I've been here.”
Lauding the REU as a “really fantastic program that can get people on their feet for a research-oriented life,” Ridley cites Dickinson’s project on “how a lot of modern software for physics is not sustainable and it's not reproducible,” and recommends that having “more people in programs like INCLUSION would be able to vastly improve the state of the art with computational sciences.”
Katz describes Ridley’s research as: “trying to simulate different kinds of nuclear reactors that may inherently be more passively safe and sustainable, with a focus on reactor behavior in challenging events.” Katz continues, explaining why Ridley needed to use an open source framework, in this case, MOOSE (Multiphysics Object-Oriented Simulation Environment).
According to Katz, open source software was integral to Ridley’s project because he was trying to design a new reactor, not trying to control an existing one. “There are existing design tools that can design existing solid-fueled reactors,” he explains, “But for people who want to design liquid-fueled reactor types, there is no easily customizable software that can do that right now, so they have to do it themselves. Using something that's open source that does a part of it, reduces the amount of work they have to do to do their own complete job.”
How prevalent is open source software for research? Katz estimates that in science and engineering, at least half of research that's done with software is using open source software. It's field-dependent. He believes that “engineering has probably been the slowest to accept open source software, because there is a lot of commercial software.” While some commercial software is licensed, and there is some regulatory reason that it has to be used, “in other places like physics, almost everything is open source,” he says.
For more I-STEM articles about Computer Science, see:
Story and photographs by Elizabeth Innes, Communications Specialist, I-STEM Education Initiative.
More: Computer Science, REUs, REU: INCLUSION, Undergraduate STEM Outreach, 2017