[Irtalk] Data (set) management - some insight from a researcher in biology

Smith, Ina <ismith@sun.ac.za> ismith at sun.ac.za
Sat Apr 5 12:24:28 SAST 2014


Every PhD student must use Git (aka research data management)
http://chem-bla-ics.blogspot.com/2014/04/every-phd-student-must-use-git-aka.html

Last Thursday and Friday the SURFAcademy<http://www.surf.nl/diensten-en-producten/surfacademy/index.html> Masterclass Research Data Management in Nederland<http://www.surf.nl/agenda/2014/04/masterclass-research-data-management-in-nederland/index.html> took place, and Chris Evelo<https://twitter.com/Chris_Evelo> and I presented some biology-world use cases. He focused more on the larger projects (e.g. ISA-TAB<http://dx.doi.org/10.1038/ng.1054>, GSCF<https://github.com/PhenotypeFoundation/GSCF>, and FAIRPort<http://esciencecenter.nl/nieuws/629-jointly-designing-a-data-fairport/>) while I exposed my day to day data management. My day to day work habit looks more or less like this.

Day 0 is to think about how to do it, but the answer is pretty simple: use a version control system, like Git. Because it tracks every bit of what you do, allows for easy back ups, and makes it easy to continue working on a different machine in case you forget to take your laptop adapter home :)


  *   Day 1: keep an electronic lab notebook (e.g. a version control system; read Git from the Bottom Up<http://newartisans.com/2008/04/git-from-the-bottom-up/>)
  *   Day 2: carefully select data you build on (can you indeed share it with the rest of your arguments in your next paper?)
  *   Day 3: do you research and store everything
  *   Day 4: integrate data repositories in your data analyses, e.g. rrdf<https://peerj.com/preprints/185/> and knitr<http://yihui.name/knitr/>
  *   Day 5: if you like scientific dissemination, collaboration, and progressing science, share your data in public repository, like FigShare<http://figshare.com/>, Data Dryad<http://www.datadryad.org/>, Dutch Dataverse<https://www.dataverse.nl/dvn/>, 3TU.Datacentrum<http://data.3tu.nl/repository/>, DANS<https://easy.dans.knaw.nl/ui/home>, etc. (that's a lot of D-D-D-Data...) or in a domain specific database, like WikiPathways<http://wikpathways.org/>, XMetDb<http://www.xmetdb.org/>, or DrugMet<http://drugmet.rilspace.org/>. And data copyright and licenses and particularly, whatever you chose, be explicit about it and don't let others guess (wrong).
  *   Day 6: think ahead of reuse, and suitable formats. Consider semantic web and linked data.
  *   Day 7: did you get impact? Think DataCite<http://www.datacite.org/>, ImpactStory<http://impactstory.org/>, and Altmetric<http://www.altmetric.com/> (and ORCID<http://orcid.org/> and DOI along the way).

The integrity and confidentiality of this email is governed by these terms / Hierdie terme bepaal die integriteit en vertroulikheid van hierdie epos. http://www.sun.ac.za/emaildisclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lib.sun.ac.za/pipermail/irtalk/attachments/20140405/87a66138/attachment.html>


More information about the Irtalk mailing list