GitHub and the working group – Part II

In an earlier post, I outlined my plans to try to teach and use GitHub for code related to our big Biodiversity Working Group. Feb 22-26, 20 members of the group met in Leipzig, Germany, and 6 more (including myself) participated remotely (#sChange on twitter). I felt this was a bit of an experiment, as our group is quite large, and I knew that the majority of participants would be novice git users. There is much yet to do now that the workshop is over, but after a week, I can definitely identify a few challenges and successes in using GitHub for our working group.

Challenges

  1. More than half of our working group members had not used GitHub as a collaborative tool before, or had an only sort of heard of it, but didn’t actively use an account.
  2. Deciding on the best organization that would work for everyone, while still taking advantages of the workflow and discussion opportunities GitHub offers, and avoiding overloading a single repository with too many large files or users with overwhelming complexities.
    • We generally stuck to the OrganizationTools guidelines that I outlined in my post a few weeks ago, with a few additions (list of working group members who could troubleshoot problems, details on folder organization, reminder to use `git pull` before `git push`, and a note about licenses).
    • A change in repo organization was to have a main working group repository for all developing code (since we weren’t sure yet how code would be broken out vs. shared among projects) instead of separate repositories organized around papers. Hopefully, this doesn’t become too confusing down the road.
    • Another change was to have 2 `code` folders in our main repository. One for code that is ready to be shared with the group, and one that contains “personal” folders for each contributor to work privately on code that they are not ready to share yet.
      • Although these folders aren’t “locked”, the honor system dictates that members should only access their own named folder within this directory, and respect the privacy of others.
      • A major benefit to GitHub is to code collaboratively, but after discussing with the larger group, the consensus was to have this protected “safe space” for writing embarrassing, messy code first, before sharing in the main collaborative `code` folder.
      • The benefit to this structure, especially for a large group where many might not have extensive experience in coding collaboratively, or might worry about being “judged” for imperfect code (although we have established a friendly, supportive environment, and urged people to be gently critical of each others code editing), is that it gets everyone in the door. The #1 challenged to GitHub and the working group is just getting people to try it out and contribute – so if personal folders help us do that, then I think its a big win! The #2 challenge will be to encourage and ensure that code makes it out of these folders and into the main collaborative folder in a timely and organized way in the upcoming months as projects take shape.
  3. Making sure all working group members were successfully added to the repository, and that key players had extended privileges to allow them to help with adding members or acting as administrators for certain projects.
  4. Keeping momentum and activity moving after the workshop.
    • As with any working group, keeping progress alive after everyone goes home is always a challenge. But I’m optimistic that using GitHub will help greatly with keeping everyone in the loop and avoiding mass emailing and confusion of things like code and analysis updates.

Successes

  1. Although I was participating and co-organizing the workshop remotely (more on that later), I was able to request and set up a Private GitHub Organization ahead of time, and seed it with some starting repositories, README files, and organizational material. I think this really helped jump-start conversations on how we would use it for the working group, and to hit the ground running once coding began.
  2. There were several working group members in Leipzig who use GitHub on a regular basis, and they did an amazing job of stepping up in my absence to help teach the tools to the group, make sure git installs were working on laptops, and encourage members to view and use the GitHub repositories.
  3. Having the code online and updated throughout the week really helped remote participants contribute or at least try to keep track ofScreenshot 2016-03-07 18.19.02 what was happening at the meeting. It’s great that we now have the technology to make remote participation “easy”, but it is a completely different beast than being in the same room as the rest of your colleagues.
  4. At the end of the week, nearly all members were on Github (22/26) actively viewing, commenting, or editing code. Issues were being posted, discussed, and resolved!

 

At this point, I’d say that adding GitHub to our list of collaborative working group tools seems to be working out (our data, notes, and manuscripts are so far mostly on Dropbox and Google Drive), which is really exciting to me. There may still be some remaining challenges down the road, but for a big group with lots of ongoing projects, I really think that incorporating GitHub will help us much more than it will be an obstacle.

Here’s to hoping for continued progress and collaboration, and if we’re lucky, funding for more in-person meetings!

 

 

 

2 thoughts on “GitHub and the working group – Part II

  1. Great post, and great to hear that using Github worked out so well! When I was involved with running a workshop, I wanted to try to get people to use it, but I wasn’t then comfortable enough using it myself, so I chickened out. It sounds like it ended up being a really useful tool.

    One thing I was wondering; did you consider asking each participant to use a branch to do their own work, then when they were ready to share work, merge that branch back into the the main one, rather than having separate code folders for each person? I’m not sure if it would be a better approach or not, and it would require teaching another git tool. I was just curious of your group’s thought-process for going with the format you did.

    Like

    • Thanks for the comments! I have a few thoughts:

      I think GitHub can be really useful for a working group, but it is critical that there are at least 1-3 strong leaders (depending on total size of the group) who are very comfortable with git to help teach, encourage, and troubleshoot. If you weren’t very comfortable with GitHub yet, my guess is that it was the right decision to skip it at that time to avoid confusion and frustration. In our group (26 total members), we had at least 5 members who were comfortable teaching and troubleshooting GitHub (some via command line, some via RStudio), and at least 5 others who had heard of GitHub and had set up an account at some point in the past (even if they weren’t really using it).

      You are correct, that in a more advanced workflow, it would be cleaner to use branching to develop private edits and work, re-merging when ready. However, I think this is too complicated when working with a group that consists mainly of first-time and novice users. It requires teaching another git tool and represents a new level of abstraction for how you are asking participants to think about their scientific workflow. When I teach novices git and GitHub at Software Carpentry, I typically don’t teach branches, as the first big hurdle is just to get the basics down and justify git as a useful tool (true also for the working group). After more practice, or with a more experienced group, I think branching would be a great way to go. In a lab group, a smaller research group, or a long-term collaboration, you could work towards incorporating more advanced git workflows over time as participants become more comfortable with the tool and with collaborative coding in general.

      For the purposes of the working group, I think one reason that we got so many participants to sign up was that we kept the workflow and folder structure simple enough (no forking, no pull requests, no branching needed) that it isn’t a big mental leap from using other tools like Dropbox. Some members will mainly be using the repository as a way to look at files, and need to know where to find things, while others will be more active code contributors, and need to know how to pull and push files for updates. The #1 goal for our workshop was to justify using GitHub and make it minimally painful for participants to start using it so that we could keep all our code in one place.

      Like

Leave a comment