by
lmh » Mon Oct 01, 2012 4:46 pm
no, no, nooooo, don't buy Sieve. OK, this is my personal opinion, and shouldn't be taken as in any way reflecting anything my employers might think, in the unlikely event that I've made my employers' traceable in the way I use this site. OK, that's the legal bit over.
Sieve may have improved, but my experience of it was dire. It is very much a black box. There's little information about how it actually works. The first critical step in the sort of analysis you're doing is to find the peaks (mz/retention time pairs: e.g. mass 405.123 eluting at 12.3min). Sieve used to just lower a ceiling towards the data, and then put boxes around anything that poked through the ceiling. This meant it was disastrous for gradient data as ionisation efficiency frequently rises hugely as the gradient becomes more organic, so if you put the ceiling low enough to see the real peaks in the early-middle part of the gradient, the whole middle-late part is a mass of noise-"peaks". Then sieve insists on you using Spotfire to look at your data, but you've only got a cut-down version of spotfire because it's a commercial program that is being tagged on to Sieve. etc. etc. It was also much slower at processing data than the software I'm describing below. For what it did, Sieve certainly used to be possibly the most overpriced piece of software on the market.
There are loads of other things available. For a simple free approach to metabolomics, I personally use XCMS (which runs in the numerical package "R") as a peak-finder. It's available from the Metlin site at the Scripps institute and has a huge number of users, and is continually being improved. You will need to convert your Thermo files to mzXML (cdf isn't as good) for which you can use something like ReadW (others are available); conversion software will generally only run on a PC with Xcalibur installed (as it will use the Xcalibur developers kit stuff).
XCMS might do cleverer stuff now, but the version I use generally just ranks all the peaks by t-test value between two treatments. That's rarely advisable or enough, so I generally do my own statistics, which might be PCA or multiple anovas; if you decide to get computer-savvy, R can do all this, but it will take a bit of getting used to it. There's no reason not to do PLS or PLS-DA in R except that it requires a lot more computer-skill, especially to process all the controls needed (cross-validation; I am lucky and use a collaborator). There are plenty of other packages around offering good analysis and peak-finding. I vaguely remember names like MetAlign, and I think Lloyd Sumner's group did something similar at some stage.
The Metlin site also has some references guiding you through how XCMS actually works (it's open-source; there are no hidden trade secrets), and how to use it. Huge thanks to those who have made it work!