Calculating Weighted Linear Least Square Regressions

Basic questions from students; resources for projects and reports.

15 posts Page 1 of 1
Hello All:

Please, can anyone describe how one could modify the calculation of linear regression, using a program like Excel for example, to employ weighting factors to variables being correlated? Quite often, software such as ChemStation, Chromeleon, Empower, TotalChrom and other CDS programs calculate fittings for us, but I find that folks ask me if I can provide them with a way to "check" the work of the CDS.

I understand that some of these CDSs are validated by their manufacturers, however that does not stop others from posing the question. I recall that John Dolan wrote a nice column in LC-GC North America back in 2004 or so concerning how/when to apply weighting to linear least squares (and also provided a nice reference within that column), but the math eludes me and I've not seen it explained in a fashion understandable to me anywhere.

I have heard "through the grapevine" of a text edited by Stavros Kromidas and Hans -Joachim Kuss titled "Quantification in LC and GC" that might help to answer my question...does anyone know about this text? Hard decision to plop down $150 for me these days kind of "in the dark," so-to-speak.

My thanks to you all, in advance.
MattM
There are some discussions on computing weighted least squares on the internet. I went looking some time back. It takes a bit of diggiging, but they are there. The math is a bit more complex than a non-weighted least squares calculation. My need at the time did not justify the effort - so I abandoned it. I don't have refereces to the web sites, but, I can give you the encouragement to look. The information is avaialble for less than $150.
My thanks for your encouragement...I've some other fish to fry at the moment, but the weekend ought to be good for looking again for answers. I guess I didn't try the right searching words...it will be good to get any kind of answer. Not a critical need at the moment, but it will come up again, this weighted linear regression fitting issue, I'm afraid.
MattM
If you have access to SAS, you can use it to check your chromatography software calculations.
All standard disclaimers apply. My posts are my opinions only and do not necessarily reflect the policies of my employer.
Hi Mary,

Good Suggestion...sadly, I do not have access to SAS or other reasonable statistical software such as JMP or SigmaPlot. I'd be thinking of writing a program in Excel, if I had more of an idea of the equations to enter.
MattM
Search Tom Jupille's posts on the subject in this very forum.
Thanks,
DR
Image
My thanks, DR, that will be a good start. I didn't see the Google search option at first under the "Archives" link above, and the list of hyperlinks was intimidating to see. Looks like Dolan's 2004 article is in the archives, in part, with a bit more explanation than in the original column (after HPLC 2006 presentation), and another work includes the equations I was missing.

Your Help Is Appreciated--my thanks to All Responders!

Just curious...no one saw the text edited by Kuss, H-J and Kromidas, S., Quantification in LC and GC? No matter for what I am looking for, but it seems an interesting text.
MattM
Can I just make a comment on the "just checking" of how the chromatography data system works?

There are two ways of looking at almost anything in HPLC: "We did the right thing, and therefore the answer must be right" and "We got the right answer, so the procedure must have been right". A lot of people are quite wary of the 2nd, and justifiably so, but calibration curves are a place where it makes sense.

Provided the calibration curve is in the right place, it doesn't actually matter how it got there. What you want to know is whether your measurements are right. For that reason, you have one or more QC samples containing known amounts of analyte. If these give the right answer, the calibration curve must be in the right place (to within the precision you expected of your samples). Whether you got it in the right place by the correct algorithm, or by training next door's dog to use a ruler, doesn't actually matter. The QC result is proof that it worked (assuming, of course, that your QC sample(s) is/are appropriately chosen to match the range of real samples).

If we're going to start just-checking that manufacturers have got their curve-fitting right, we should also be checking that their integrator is correct - and then we'll quickly realise that the choice of start/end points in a peak, and what we do about baseline, is often more critical than anything else.

I'm all for checking things are correct, but the quality of the curve-fitter in my CDS is probably the smallest worry on my radar. It's one of the few things whose practical effectiveness I can see at a glance (does the cali curve actually go through the points?), and it's probably the least likely thing in the lab to go wrong.
Hi Imh,

Your commentary is noted...however, please recall my second statement...far, far above in this thread.

"Please, can anyone describe how one could modify the calculation of linear regression, using a program like Excel for example, to employ weighting factors to variables being correlated? Quite often, software such as ChemStation, Chromeleon, Empower, TotalChrom and other CDS programs calculate fittings for us, but I find that folks ask me if I can provide them with a way to "check" the work of the CDS."

I was/am not the one that was/is particularly interested in " 'check(ing)' the work of the CDS", that would be my management--former management, actually. My suggestion is that what you say is simply misdirected. My supposition is that the folks that were asking me the question over-and-over again are not actually reading what you have taken the time and thought to convey.

As to checking into what the software does...as far as curve-fitting, or how a peak is defined...it is up to whomever is involved to decide how much or how little they want to understand about it, it seems to me. Some are more curious than others...

It is possible, though, to have wonderful calibration curves....and when the samples are measured, the data returned Don't Make Any Sense...at least, in my experience. This certainly Can Happen in cases where the calibration standards Fail to Represent the Samples in that the matricies of the two are different enough to make such a difference.

All of this said, quite agreed, I tend myself to worry little about calculations a CDS performs, and I rather do, like you, have better things to be concerned with.
MattM
MattM, please don't think I was blaming you for it! I was merely offering myself as one of the many who I suspect wouldn't mind standing up to be counted: a vote for not worrying about something that isn't worth worrying about.

There are so many things that can go wrong in analysis, it hurts me when I see some QA or regulatory thing heading off at an irrelevant tangent. It's so easy for the wrong sort of person to get terrible tunnel vision about things that might go wrong, and spend their life insisting people check things that don't matter, while ignoring huuuuuge disasters that don't happen to be easily classified.
lmh--you are absolutely right that a CDS manufacturer getting something basic like regression wrong is a very low risk. Pipettors being out of calibration is a far greater risk.

However, Matt is right. Folks do ask for verification, and I know from personal experience that some reviewers will check a calibration curve; some will even go so far as to write an SAS program to generate regression parameters using weighting commonly used in chromatography.

So, the last time I did an instrument qualification, I included a one-time-per-software-installation check of the regression calculations. Imagine my surprise when the R value reported from a weighted regression did NOT match the SAS value, even though the slope and intercept matched exactly.

I would not care that much, because I think residuals are a much better test of linearity than correlation coefficients, but one of the Guidance for Industry that I am following explicitly states a criterion for R^2 as a test of linearity.
All standard disclaimers apply. My posts are my opinions only and do not necessarily reflect the policies of my employer.
Hi Imh, MaryCarson,

Imh, I've no hard feelings...what you wrote brought back some really nasty memories. Heck--I hope your check valves and Acetonitrile have worked themselves out well! (I think some people in that thread mentioned sonication in mixtures of IPA or ACN/MeOH...sonication of the sticking valves in MeOH generally has worked for me in the past, and at least for Waters HPLCs, ceramic does seem to behave a bit better with ACN than ruby). And heck again, I agree with your notion completely...if the analyst has to be concerned with how the data system works, Wow. Someone will always ask about these types of things, though.

MaryCarson, my thanks to you for letting your experience be known. Sad to hear you've been down the same path as me...actually you've been quite a bit further. Agreed wholeheartedly on the "value" of R^2, it means something, but hardly "everything," and it gets too much weight (pun intended). Agreed wholeheartedly on the value of residuals' level-of-agreement, too.

Anyhow, I am also thankful that the answer to the question is here!! Wish I'd have known more about the Chromatography Forum about four years ago or so.

My thanks to all for putting up with this particular curmudgeon, too! :)
MattM
If an auditor asks for a validation of a manufacturer's curve fitting (or anything else that comes on the instrument software) it might be worth checking whether the instrument companies still submit their software to national or international accreditation bodies for validation, and that as long as the national body is happy (having looked at the source code or whatever they do) then all the accreditors that fall under that body have to be be happy too. This was certainly happening several years ago, and seemed to make a lot of sense (unlike the alternative of every lab cross checking everything with another set of unvalidated software).

Peter
Peter Apps
Mary's response brgings back a memory - It has been years ago - and calculation of results on some data system did not result in the same result as calculation outside the data system. And, this was fixed in a later version of the data system.

An accrediting body can only go so far in digging into software to see if it is correct. Reviewing code to see if it will always give the correct result is just about impossible. running a test set of data through the software is good - as long as the design of the test set of data will turn up the error. Given that different peak detection routines may give differing areas for a peak (although each correct by the routine) calibration curves and other values derived from the integrated areas may look to be close enough - until one diggs into the data.

For most of us, if a data system gives an r squares for two curves of 0.873 and 0.950 and the correct values for the r squares are really 0.870 and 0.949, we will make the correct decision as to which curve to use - and never notice that there was an error incalculation. We get the best validation for our analytical method and analyze samples.

But it is good that people do check. Vendors do make errors.
Oh, have to agree vendors do have errors. Years ago I found a UV-Vis spec with an enzyme kinetics package that would find the slope of a line of successive measurements from a cuvette (so you could find the rate of appearance of light-absorbing product). It also gave some statistics on the line, including confidence intervals on the slope. This was wrong at 2 levels. Trivially, in an enzyme assay, if someone says 44 +/- 5, I expect the error to be a proper biological replicate, not a mere statistical estimate of how well you can measure the change in one cuvette. But more alarmingly, one day someone took the cuvette out after only two measurements, and there was still a confidence interval, on a line fitted through two points!
I found later (by detailed inspection of data-files created from lots of artificial runs) that the software interpolates points between measured points, and therefore the line-fitter actually had 5 points to fit through, two measured and three interpolated, which was why it could do statistics. The error was greater than zero because the interpolation takes account of earlier points (i.e. before adding enzyme substrate) that weren't selected for the region where we're calculating slope

I regarded this as quite a serious failing from the manufacturer, but I can understand how it happened: someone decided there would be interpolation, and a programming team wrote it. Another team later wrote the kinetics and line-fitting bit, and they weren't aware that only some points were measured... There were a few fundamental mistakes from the manufacturer: interpolation was probably unjustified; it was fundamentally wrong to save the interpolated points without any indication that they weren't measured (you had to know the instrument to know the circumstances in which it interpolated).

Does anyone know what happens when a manufacturer has their software accredited? Reviewing code is a very difficult task (usually we're trying to understand what the code does, so there's a real temptation, having found what we think it's trying to do, to assume it does it, which is the very thing you mustn't assume if you're checking for errors!). And I agree with Don_Hilton that test-sets are OK but very prone to not finding errors where the test-set just doesn't happen to expose them.

Another thing I've noticed is that terminology can vary a little from manufacturer to manufacturer, which means it's easy to find you are not comparing like with like. Chemstation calls the standard deviation of a regression line something quite different to Excel, and yet they both produce (exactly) the same number.
15 posts Page 1 of 1

Who is online

In total there is 1 user online :: 0 registered, 0 hidden and 1 guest (based on users active over the past 5 minutes)
Most users ever online was 1117 on Mon Jan 31, 2022 2:50 pm

Users browsing this forum: No registered users and 1 guest

Latest Blog Posts from Separation Science

Separation Science offers free learning from the experts covering methods, applications, webinars, eSeminars, videos, tutorials for users of liquid chromatography, gas chromatography, mass spectrometry, sample preparation and related analytical techniques.

Subscribe to our eNewsletter with daily, weekly or monthly updates: Food & Beverage, Environmental, (Bio)Pharmaceutical, Bioclinical, Liquid Chromatography, Gas Chromatography and Mass Spectrometry.

Liquid Chromatography

Gas Chromatography

Mass Spectrometry