Showing posts with label ELN. Show all posts
Showing posts with label ELN. Show all posts

Sunday, September 16, 2012

The Great Electronic Lab Notebook Challenge, pt. II

DEVONthink Pro Office and MacJournal: A review and comparison

This is the second in a series of posts on searching for an ELN suitable for use in my research group.  This question is related to a series of hardware, workflow and "data management" questions.  These I will address elsewhere.  In this post I discuss my experiences with MacJournal and DEVONthink Pro Office. In The Great ELN Challenge, pt. I, I laid out what I'm looking for in an ELN, and how it fits my ideal group workflow.  Subsequent posts will address my experiences with MacJournal and DEVONthink Pro Office.

Both of the software tools discussed here are for Mac OS X.

MacJournal by Mariner Software
The MacJournal interface

The strength of MacJournal is also it's weakness:  it is journaling software.  This is well highlighted in Macworld's review.  Using an interface similar to  pre-Mountain Lion Mac Mail, MacJournal allows you to write text notes into which you can include images and PDFs.  Notes, or entries, are stored in a folder-based hierarchy.  Sets of notes ("notebooks") are searchable, and there are a range of display options.  These display options all focus on a chronological presentation of entries.  Therefore as a journal or chronological personal notebook, MacJournal is great.  You can even attached scanned input to text entries, e.g. as PDFs or images.  Even so, it is not generally possible to attach, include or work with non-text files (e.g., MS Excel files, binary data files, large ASCII data files).

There does not appear to be any way to lock/encrypt entries to prevent subsequent modification.  This is despite the fact that entries are "lock-able".  The issue is that this functionality is designed only to prevent unintentional modification, and can be turned on and off by the user at will.

There is now Dropbox support, but there is no server capability, and no multi-user database capability.  For personal use as a journal or notebook, MacJournal is great.  As an ELN it lacks critical functionality--chief among these is the ability to collect non-note files.

DEVONthink Pro Office by DEVON Technologies
The DEVONthink Pro Office interface

DTPO is basically an open database, wrapped in a relatively sophisticated, user friendly and extensible interface.  (Macworld review here.)  Files are imported into (external file physically copied into the DTPO database), indexed to (link to original file as extracted text contained in file are added to the DTPO database), or created within DTPO (new file created using either the built-in DTPO previewer or the native application associated with the file type created).

In the interface, you organize your files into a folder structure.  This appears to be similar, if not identical, to simply saving the files to disk. In DTPO, though, the files themselves are stored in a flat database. The database is open and unencrypted, so you can always access your files directly (there is even a button on the DTPO toolbar that opens a Finder window at the location of the file selected in DTPO).

In contrast to your disk-based file structure, the DTPO interface gives you access to a range of capabilities and metadata associated with your files via the underlying database.  For example, like in Gmail, the folders you create in the interface are really just tags, and you can manually tag individual files however you like.  Smart folders can slice and dice tagged files just like in Gmail.  Files can appear in multiple locations in your folder tree (i.e. be tagged by multiple folders), but even more powerfully, can either appear as "duplicates" (a separate copy of a file) or "replicates" (basically a soft link to a file).

Every file included in the database automatically has any contained text extracted and processed via DTPO's "AI", which seeks to improve searching by suggesting potentially related content, and allowing searches to be executed over all text in the database.  This concept is extended by the inclusion of a powerful OCR engine integrated with DTPO itself.

"Indexing" (versus "importing") files allows large files (e.g., files containing research data) to be linked to within the database without having to duplicate the files themselves.  Indexed files can live on remote  or archive disks that are not always mounted to the system running DTPO.  In addition, indexing folders allows "file groups" to be included in DTPO.  This is particularly useful when working with LaTeX documents (where the .tex, .aux, .bib, .dvi, etc., files must appear "together" to the LaTeX engine, not spread across DTPO's internal database), or coding projects (where multiple source files and make files must remain associated on the physical file system).  Note also, that "indexing" allows the metadata (and text) of "archived" files to be available for live searching.

DTPO does not lend itself directly to journaling, although RTF files can, of course, be created at will within DTPO itself.  Basically, while MacJournal provides more chronological organization than may be strictly necessary in an ELN, by default, DTPO provides NO chronological organization beyond date/time stamps on physical files.  While this is a weakness, it serves to highlight an additional (and major) strength of DTPO:  scriptability.

DTPO can be scripted at a number of levels.  You can create Automator scripts that push actions in DTPO, and you can create "smart document" creation scripts that push database actions at the time of file creation.  This last, combined with tagging and searching capabilities, allows the creation of journal capabilities in DTPO.  For example, the date and time can be auto-generated and inserted into a newly created RTF file.  This file can be auto-tagged at creation with, e.g., a "Journal" tag.  The creation of a smart folder in the interface filter for all "Journal" tagged files will then show all journal entries throughout the database.  This would allow, for example, journal entries to live topically-arranged in the file structure in the interface, but also appear chronologically by creation data/time in the Journal smart folder.

The "Pro Office" version--that is, the DT"PO" being discussed here--allows the creation of multiple databases whose search and AI are separate.  It also comes with a built-in web server allowing the database to be accessed via the web remotely.  While this gives definite multi-portal access to the database, it should also, in theory, give rudimentary multi-user capabilities, though I have yet to explore this.  A smattering of scripts are included in the DTPO install version, and an additional smattering are available for download at the DTPO support site.  In a later post, I will share scripts I have created.

In addition to scripting capabilities, there are a number of plug-ins and extensions for DTPO, chief among these are tools to integrate DTPO will all major email clients and web browsers.  This allows direct addition of emails and webpages to DTPO.

On the negative side, there does not appear to be any encryption or locking capability as all files remain modifiable.

Summary

Neither MacJournal nor DTPO are ideal for an ELN, primarily because they lack the ability to lock files and input.  DTPO, though, comes quite close.  As a database-centered tool for collecting and collating data in all forms, DTPO significantly outshines MacJournal as a research ELN.

Stay tuned for late posts on my workflow with DTPO, as well as some scripts and smart document templates I use in my research.

Saturday, September 15, 2012

The Great Electronic Lab Notebook Challenge, pt. I

This is the first in a series of posts on searching for an ELN suitable for use in my group.  This question is related to a series of hardware, workflow and "data management" questions.  These I will address elsewhere.  In this post I lay out what I'm looking for in an ELN, and how it fits my ideal group workflow.  Subsequent posts will address my experiences with MacJournal and DEVONthink Pro Office.

My research is almost entirely computational.  (For details on the research itself, visit the Beck Research Group page.) I have a small group, who are expected to use their laptops as their primary research gateway/tool.  The ideal research workflow for my group members looks something like this:
  1. Students take notes of discussions on laptop (ELN)
  2. Literature search with reference manager (SDB)
  3. Background study and hypothesis development (ELN)
    • Takes notes about (and on?) references (ELN/SDB)
    • Note thoughts, ideas, plans, etc., on laptop (ELN)
  4. Generate "Design of Calculations" preliminary report (SO)
  5. Preliminary calculations on local or production compute resources (DO)
    • Data analysis of preliminary results (ELN)
  6. "Pre-production Calculations" report, including: (SO)
    • Convergence parameters (ELN)
    • Comparison to prior results
    • Estimate of production calculation resource requirements (ELN)
  7. Production calculations on local or external resources (DO)
  8. Data analysis (ELN)
  9. "Draft results" report (SO)
  10. Discussion and further analysis *** (ELN)
  11. Paper preparation (ELN/SO/C)
    • Figure and chart preparation
    • Iterative and collaborative text preparation
  12. Archival of results, reports and paper (A)
For each of these steps, I have indicated a rough idea of the nature of the data/information storage environment required.
  • Electronic Lab Notebook (ELN) - Primary and complete legal record of the research. Personally created and recorded by the researcher, but the formal responsibility of the PI.  Must be continuously and constantly available.  Data should be recorded to the researcher's laptop to guarantee offline access, but synched to a central database available for online access from any computer.  I very much prefer an "IMAP" model to a "web-form" model (see below). Data itself  must be backed-up, and should be secure against tampering/re-writing.  The ELN itself should be permanently stored by the PI and the researcher.
  • Shared Database (SDB) - For external content and data that is of use to the whole group.  The obvious example is the PDF library of reference literature. Content is added by individual researchers and can be tagged/notated by individuals, but the collection itself should be group accessible.
  • *Shared Output (SO) - This is content prepared by individual researchers, but that final versions of must be available (read-only) to the group.  Drafts and information needed during preparation reside in the ELN or as DO (see below).
  • *Data Output (DO) - Content generated by calculations on local or external production resources.  Data must be staged physically on production resource, but must migrate to a central, group readable location.  Data should migrate to read-only.
  • Special categories:
    • Collaborative (C) - This is content that should have parallel multi-individual access to allow collaborative/interactive content generation (e.g., paper drafts).
    • Archive (A) - Not a primary end point for data, but both Shared Output and Data Output (indicated with asterisks, above) should eventually be archived.
Based on the above, an ELN must be able to handle the following either in the ELN app itself, or via importing of third-party files.  The ELN must allow users to:
  • Take notes, anytime and anywhere
  • Handle scanned input, e.g. of handwritten notes or diagrams.  The inclusion of a working OCR pathway would be beneficial as well.  from school: including input of scans of handwritten stuff, allow diagramming.
  • Enable both symbolic and numerical Math, spreadsheet capabilities, plotting and graphing, curve fitting
  • Support presentation and text report generation, as well as image and diagram preparation
  • Collect, and allow searching of the generated files, allow links pointing to specific locations of DO and SO objects.
More at the next update....