I need help understanding the Precision stage of validation, particularly how many different (known) samples must be tested at each concentration. I apologize for the length of this post, but I'm trying to provide as much context as I can, as concisely as possible. If I failed, there's a TL;DR at the end.

In my lab, we receive blood samples (usually serum or plasma) and analyze them to determine the concentrations of certain pharmaceuticals. We do this using a combination of HPLC and automated immunoassay. Our HPLC methods typically involve removing proteins (either by precipitation or by filtration) and maybe drying/reconstituting the sample to concentrate the drug in the final product (to be injected).

One point of contention between me (a lab tech) and our two HPLC experts has been the number of samples used in the precision stage of validation. This could be difficult to describe, but I'll do my best. Hopefully someone here can help me.

When validating a new method, we typically weigh out a certain amount of the target drug, dissolve it in something appropriate (usually water, ethanol, or acetonitrile), and create a "standard curve" and control samples by using that "stock solution" to spike blank plasma at certain concentrations. This is fine for the Linearity and Accuracy parts of the validation, but then we get to the Precision stage.

In all of the literature I've found, I see precision defined as either "the degree of agreement among individual test results when the procedure is applied repeatedly to multiple samplings of a homogeneous sample," or something extremely similar. Some documents suggest something like "a minimum of 9 determinations covering the specified range for the procedure (e.g., 3 concentrations/3 replicates each)."

I think this is one of two potential sources of the confusion.

My interpretation is that "3 replicates each" means that we take the prescribed volume of spiked plasma from one tube, three times (at each of three concentrations, of course, for the total of "9 determinations"). This makes sense to me. We're trying to find the approximate range of results that we could expect from a given sample using the test method, so we test one sample many times. Since it may vary depending on the concentration, we do it at multiple concentrations. Seems perfectly reasonable as far as I can tell.

The other (incumbent) interpretation is that we would need to spike a minimum of 3 tubes of blank plasma with the stock solution at each of the 3 levels, yielding a total of 9 vials of spiked serum (3 vials x 3 concentrations). Each of these 9 samples is then tested 3 times, and we end up with a total of 27 results (or 54, from duplicate injections). In practice, we usually do 6 of everything instead of 3, but that's not the point I'm trying to make.

Another contributing factor, I think, is that our protocol was determined primarily using documents that use the term "sample preparation" without clarifying what they mean by that. Granted, any validation guidelines would have to be generalized in order to apply to the various methods being validated, but I believe it's a source of confusion for us here. When I see phrases like "3 preparations of each sample," I think the authors mean we process each sample 3 times (i.e. 3 samplings, each of which undergoes the process of protein removal, etc., to "prepare" it for HPLC injection). However, others have interpreted the meaning as "prepare 3 samples of each concentration," which results in spiking 3 separate vials of plasma for each concentration.

Aside from the obvious additional effort it takes when processing additional samples, my biggest problem with their interpretation is this:
If the different "samples" we're testing at the same concentration actually are the same (that is, we miraculously spiked each vial with exactly the same concentration), then why would we need to spike the additional vials at all instead of simply sampling from the same one? And if (through human error) they are not exactly the same concentration, then how is this actually measuring the precision of the method? It seems to me like we're introducing artificial imprecision; if a hypothetical method has a precision of ±0.00000001, but the actual concentrations of our 3 "preparations" (spiked vials) have a range of, say, 0.4, then won't we end up calculating a precision that's much higher than the actual ±0.00000001?

I'm asking you guys because this seems like a question you could easily answer if you've ever done a method validation. My review of the literature (and I've searched pretty extensively) leads me to think I'm right, especially when I see many of them use the phrases "of the same homogeneous sample," but when I see some of them add "or similar samples," I think there's enough room for doubt that I should make sure I'm in the right before I challenge the established protocols.

EDIT: I should add that when I questioned this practice before, I was told that we have to do it this way to account for the human error incurred when spiking the plasma. That is, "We're validating the whole process, and that includes everything the operator does. There's going to be error in that, and we have to count that in the method validation. We have to prove that you can spike plasma with precision, because that's part of the validation process. It's in the GLP documentation, we have to do it." I remain unconvinced, but I want to make sure I'm not totally wrong here before I challenge the notion too openly.

TL;DR: When validating the precision of an assay, do you simply test one spiked blood sample multiple times at each concentration, or do you spike multiple blood samples at each concentration and test each of those?