POSTS
Open Source at Harvard
- 3 minute readThe other day I mentioned in passing how Open Source at Harvard has a new website and when I wrote about GitHub Actions that was me taking a look at how the website is deployed.
But is Open Source at Harvard really a thing? If so, what it is? Let's dig in.
(Today I threw together the logo above in 60 seconds. Patches welcome!)
I've been fascinated with open source since I first discovered in the late 90s.
When I first started working at Harvard in 2006 it wasn't particularly known for creating open source software. It still isn't. But I could see pockets of it, even near me. Source releases of WhatIf and MatchIt on CRAN go back to 2005, for example.
Every once in a while I would suggest that code I was hacking on for the RCE could probably be open sourced or maybe even the whole system but it was never designed to be run anywhere but our cluster. When I left IQSS to work for a year or so at FASRC I would throw onto GitHub the occasional config file (which I remember being surprised was #2 in terms of stars).
In late 2012 I returned to IQSS to work on Dataverse, which was the biggest open source project coming out of Harvard. It still is the biggest, I think.
Time passed. I tried to help turn Dataverse into a proper open source project. My first accomplishment was moving the code from SourceForge to GitHub. I also beefed up our community involvement, establishing an IRC channel and answering questions on the mailing list.
By 2015 someone must have noticed how enthusiastic I am about open source because I was invited to attend a meeting about open sourcing a project. It was probably one of the best meetings I've ever attended but I declined future meetings. I did successfully convince the author of a thoughtful wiki page called “Harvard Library Open Source Project Considerations” to open the page to the public.
I kept thinking to myself that there have got to be other open source projects coming out of Harvard. In 2017 I published version 1 of a dataset of 48 open source projects from Harvard sorted by the number of stars on GitHub, including:
- Dataverse (192 stars)
- OpenScholar (184 stars)
- excel4node (170 stars)
- daisy (115 stars)
- Perma.cc (85 stars)
Obviously, stars on GitHub don't indicate much but it's something.
Also in 2017 I was preparing for a talk about how to run an open source project. I gave a practice talk at ABCD and the actual talk at JavaOne. I encouraged people at Harvard to make to make pull requests to add their project to my list.
Around the same time I created a Google doc called “opensource.harvard.edu: What if Harvard showed the world how much we contribute to open source?” At the time opensource.google was new and I really liked how Google showcased their open source projects.
Time passed and I continued to merge pull requests adding more projects to my dataset, eventually releasing version 3 with 110 projects.