by
lmh » Mon Mar 12, 2012 12:17 pm
The short answer is that it probably doesn't matter what you do. The explanation, and longer answers are:
(1) I find it helpful to remind myself of what a LOQ actually is. It's the threshold below which you feel the error in the measurement is likely to exceed a certain percentage of the value of the measurement (i.e. "if I measure below 1.3pmole, the typical error is greater than 10%, and this is unacceptable to me, so I do not wish to report it as quantitative"). That means that a value below the LOQ is still a value, just a less precise one. You can therefore include all values below the LOQ in your mean without worrying in the faintest. In fact ideally you should; otherwise you are shrinking the standard deviation of your samples by pretending that these low-ish results are always exactly reproducible.
(2) A value below the LOD must be very small. If you add it, or if you add zero, you will still never elevate the mean above the LOD. They can reduce the mean towards the LOQ or LOD, but if your mean value comes out close to the LOD, you will not be reporting it anyway, so the values that are below the LOD should not be able to influence the actual result in any way. Their contribution to the standard deviation of the samples should be very small (see below).
(3) If you are calculating a mean for your population, you are almost certainly also calculating a standard deviation or carrying out statistics that imply such calculations (t-tests, anova etc.). As such, the LOQ and LOD are actually superfluous. You are estimating the error in your mean anyway. The biological variation (inter-sample variation) is probably so much bigger than the analytical error (which is what gives rise to the LOQ and LOD) that the overall s.d. is dominated by genuine inter-sample difference, and the small contribution made by analytical error shrinks to insignificance. The exception is when you get down to situations where all the samples contain virtually nothing, and at this point you have to report that the mean has sunk to a point where you cannot measure it reliably (LOQ) or at all (LOD); but since this is meaningless unless compared to samples where you could measure reliably, again it doesn't matter. I have no problems with a bar-graph where one bar is too small to see, but has an error bar that, if only I could see it, would be as large as the bar!
[extra: (4) Yes, some statistical techniques don't like missing-values and lots of zeroes, but in practice this rarely matters. If you are comparing two groups of samples where both have loads of values that have had to be filled in with zeroes, you are probably up to no good. If you are comparing groups where one is full of zeroes and the other is perfectly respectable, then they are clearly so different that a mild misuse of statistics won't be generating a false-positive!]
Incidentally, I once went to a metabolomics talk (a Manchester meeting of the metabolomics society in the UK I think???) where a PhD student had investigated exhaustively how to handle missing data. She had compared replace-with-zero, use-integrated-baseline-at-right-time, replace-with-detection-threshold-value, and various fuzzy-logic options involving calculating a guess best-estimate from the other samples. She did this by taking a genuine set of measurements and missing out bits. Her conclusion was that it made very little difference whatever one did. I don't know if she went on to publish; very sadly, it's quite likely not, because although the work was very good, it's always hard to get a negative result into print.