Open notebook science

Crawled open notebook science

Yesterday, I did a Google search for a procedure I developed in grad school on OpenWetWare, my former open notebook where all my original content was placed. In the search results, I came across a website that had crawled my notebook and reposted a page on their site. This website seems to be about AIDS research but, my original notebook entry has nothing to do about AIDS research. You can take a look at it here. While I have come across other blog type websites that repost my original content, this is the first I’ve seen of a “medical” website reposting my content.

This website is obviously a site that is culling content from other places on the web. The science I did was completely unrelated to any forms of human health research. It was basic research, research done on a fundamental level with zero clinical implications. What happens to the people that are searching the internet for information on a scary subject such as AIDS and they come across my content there? Now, I’m happy that in the future, sites that crawl and repost information will help to perpetuate my original content. However, the information is now on a site guised to be about information on a deadly human disease. What if the person reading it does not possess enough information to filter my post and realize that it has zero pertinence to them? As a scientist and an educator, this concerns me a lot.

The persistence of ONS

It would seem that the persistence of original content in open notebook science will continue through repostings, however, without context to the content, that science is useless. This is not a good thing and has the capability of destroying the science being done in open science. I have no clue on how to fix it and this should be discussed at ScienceOnline2012 of which, I need to remember to register for.

Now, the origninal reason why I decided to start a new notebook on was because there are implications about using a service that has not been proven to be externally funded. For instance, what happens when the service is no longer available and all your content is gone? Of course, there are services such as the WayBackMachine that attempts to archive webpages but, from what I understand, services such as this are not well known and are basically an “after-thought” and an “exercise for a librarian” in the scientist’s mind.

Also, as gruesome as it sounds, I will eventually die. What happens to my scientific research then? Doing simple searches for some of the most popular sites around (in this era) on what to do when a user is deceased and how to access/memorialize their accounts shows that there is no standard procedure on the subject. If you have your own server and are using it to post your open notebook science, what happens to your content when you are no longer able to pay the fees associated with owning a domain? From what I understand, unless it has been archived, poof – it’s gone. This is not a good thing since open notebook science is a treasure trove of data; not only for science in the present, but for future historians and other scientists. It’s even a giant data source for linguists of the future because I’ll bet you $1 (which will probably vest to a lot of money in the future) that the language we speak and write now will not be the same in 500 years.

Both scenarios, culled content posted on unrelated websites and vanishing open notebook science due to death/lack of funds, are things that need to be discussed. Open scientists need tools in order to keep the persistence of open notebook science alive. Does this mean creating a non-profit to house our data or does it mean that University Libraries need to create programs in order to archive ONS?



  1. #1 by Anthony Salvagno (@Thescienceofant) on November 4, 2011 - 8:24 pm

    I would think the priority would be in the hands of Universities and Libraries. Non-profits can die with their human counterparts, but libraries withstand the test of time (they’ve been in existence for almost all of human history) and curation should fall on their hands. Unfortunately with the advent of the internet there has been a push for raw data which means there is so much more for libraries to keep up with. Hopefully the technology for archiving and curating catch up otherwise we may hit a new version of Y2K.

  2. #2 by Anthony Salvagno (@Thescienceofant) on November 4, 2011 - 8:25 pm

    BTW definitely register for ScienceOnline2012. It’s going to be a hoot!

  3. #3 by Jean-Claude Bradley on November 9, 2011 - 1:43 pm

    Andy – this is why Andy Lang and I have invested in making periodic archives of our notebooks and uploading these redundantly (e.g. our institutional repository, our server, LuLu, etc.)

    That way even if Google, Blogger, Wikispaces, etc. go out of business essentially nothing is lost. We have some automated tools to handle this as well – let us know if you would like to explore that possibility for your notebook.

    • #4 by andymaloney on November 10, 2011 - 6:13 pm

      This is great. It would appear that there is no standard service that helps to backup scientific notebooks. I’ll have to research some and see if my new institute will backup my online notebook and it seems that LuLu is a great service to get hard copies of the notebooks as well. It will be good to discuss this with the people attending ScienceOnline2012.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: