The January 2005 issue of ACM Computing Surveys (vol. 37, no. 1, if you prefer) has good review by Rajendra Bose and James Frew titled "Lineage Retrieval for Scientific Data Processing: A Survey". In it, they look at what scientists do to keep track of what data they have, where it came from, and what has been done to it. Some of my students last term were worrying about the same issues in the context of HL7 medical data. It seems like an ideal place for software engineers to apply their skills: I'd be interested in hearing from people who have home-grown or small-scale systems I could use as a starting point for a lecture in Software Carpentry.