Chromatography Forum

Posted: **Fri Mar 09, 2012 1:47 pm**

Hi folks,

I have a question regarding the graphic representation and statistical analysis of HPLC data.
I performed measurements of a compound at different time points in a population. My concentration decreases over time and falls below my limit of quantification (LOQ) at different time points for the different individuals.
I want to calculate mean value of my population at each time point for comparison between several study groups.

My question is : in my data sheet, should I
1) replace the "non-detected" values by "0" => I did not measure the value "0", my parameter is below a certain limit
2) replace the "non-detected" values by by the value of my LOQ => same, I did not measure this concentration
3) leave "no value" => the mean is biased as only the highest values are used for calculation.

sorry for the long post and thanks for the help.

Posted: **Fri Mar 09, 2012 8:14 pm**

If you have LOD and LOQ values expressed as concentrations you could enter nominal values where these occur:

eg where a BLQ value has been registered enter a value = ([LOQ]-[LOD])/2 and

where a BLD value has been registered enter a value = [LOD]/2

and where no peak is detected enter 0

We used a similar scheme when producing aerodynamic particle size distributions. We tabulated the data as normal, entring BLQ or <LLOQ and BLD or <LOD as applicable. However, as reporting all BLQ/BLD data as zeros severely skewed the calculated distributions, we stated that in the calculations values calculated by the above rules would be used.

Posted: **Mon Mar 12, 2012 12:17 pm**

The short answer is that it probably doesn't matter what you do. The explanation, and longer answers are:

(1) I find it helpful to remind myself of what a LOQ actually is. It's the threshold below which you feel the error in the measurement is likely to exceed a certain percentage of the value of the measurement (i.e. "if I measure below 1.3pmole, the typical error is greater than 10%, and this is unacceptable to me, so I do not wish to report it as quantitative"). That means that a value below the LOQ is still a value, just a less precise one. You can therefore include all values below the LOQ in your mean without worrying in the faintest. In fact ideally you should; otherwise you are shrinking the standard deviation of your samples by pretending that these low-ish results are always exactly reproducible.

(2) A value below the LOD must be very small. If you add it, or if you add zero, you will still never elevate the mean above the LOD. They can reduce the mean towards the LOQ or LOD, but if your mean value comes out close to the LOD, you will not be reporting it anyway, so the values that are below the LOD should not be able to influence the actual result in any way. Their contribution to the standard deviation of the samples should be very small (see below).

(3) If you are calculating a mean for your population, you are almost certainly also calculating a standard deviation or carrying out statistics that imply such calculations (t-tests, anova etc.). As such, the LOQ and LOD are actually superfluous. You are estimating the error in your mean anyway. The biological variation (inter-sample variation) is probably so much bigger than the analytical error (which is what gives rise to the LOQ and LOD) that the overall s.d. is dominated by genuine inter-sample difference, and the small contribution made by analytical error shrinks to insignificance. The exception is when you get down to situations where all the samples contain virtually nothing, and at this point you have to report that the mean has sunk to a point where you cannot measure it reliably (LOQ) or at all (LOD); but since this is meaningless unless compared to samples where you could measure reliably, again it doesn't matter. I have no problems with a bar-graph where one bar is too small to see, but has an error bar that, if only I could see it, would be as large as the bar!

[extra: (4) Yes, some statistical techniques don't like missing-values and lots of zeroes, but in practice this rarely matters. If you are comparing two groups of samples where both have loads of values that have had to be filled in with zeroes, you are probably up to no good. If you are comparing groups where one is full of zeroes and the other is perfectly respectable, then they are clearly so different that a mild misuse of statistics won't be generating a false-positive!]

Incidentally, I once went to a metabolomics talk (a Manchester meeting of the metabolomics society in the UK I think???) where a PhD student had investigated exhaustively how to handle missing data. She had compared replace-with-zero, use-integrated-baseline-at-right-time, replace-with-detection-threshold-value, and various fuzzy-logic options involving calculating a guess best-estimate from the other samples. She did this by taking a genuine set of measurements and missing out bits. Her conclusion was that it made very little difference whatever one did. I don't know if she went on to publish; very sadly, it's quite likely not, because although the work was very good, it's always hard to get a negative result into print.

Posted: **Tue Mar 13, 2012 8:01 am**

Many do ½ of LOD. Extrapolation of the distribution shape to low end is more accurate but does not give big difference in end results anyway.

Posted: **Tue Mar 20, 2012 8:52 am**

Thanks for your time. Very complete and helpful answers.

Chromatography Forum

<LLOQ means zero or no value in data sheet

<LLOQ means zero or no value in data sheet

Re: <LLOQ means zero or no value in data sheet

Re: <LLOQ means zero or no value in data sheet

Re: <LLOQ means zero or no value in data sheet

Re: <LLOQ means zero or no value in data sheet