Statistical Methods for Setting Acceptance Criteria on Parallelism in Bioassays Daniel Joelsson Senior Research Biologist Merck & Co., Inc. West Point, PA Parallelism Acceptance Criteria • Setting acceptance criteria on assay parameters is critical for proper interpretation of the results – Bioassays measure both potency and stability of samples • Current methods for setting criteria on parallelism – Rely heavily on the expertise of an experienced biometrician – Often involve judgment calls on what data to include in an analysis • New method using bootstrapping – Involves less subjectivity – Rugged and statistically defensible criteria Potency Assays and Parallelism Potency is a shift in the X-axis compared to a reference standard 120 100 80 60 Y Data 40 20 0 0 2 4 6 8 10 12 X Data • • Strictly speaking, curves must be the same exact shape for potency to have meaning This is usually called parallelism, but more accurately known as similarity Parallelism in the real world • Bioassays have high variability • Replication commonly performed to control the mean • Curves are samples from all possible curves 120 120 100 100 80 80 60 60 Y Data Y Data 40 20 40 20 0 0 0 2 4 6 8 10 12 -4 -2 0 2 X Data 4 6 X Data Are these curves parallel? How do we know? 8 10 12 14 Measurements of parallelism • • How to measure parallelism? Several different methods available: – Slope ratios – Hypothesis testing • F test and Chi-Square test • – Equivalence testing Each generates a ‘metric’ for parallelism – How parallel are these two curves? • What is the cutoff? – How far off do the curves have to be before they are no longer parallel? 120 120 100 100 80 80 60 60 Y Data Y Data 40 40 20 20 0 0 0 2 4 6 X Data 8 10 12 -4 -2 0 2 4 6 X Data 8 10 12 14 Acceptance Criteria • How do we set the cutoff? • Ideally set by clinical data – Not realistic in early development • Assay capability cutoffs – Based on 95% or 99% limits – How do we set these? – Run 2000 or so assays with the standard compared to another replicate of the standard. • “Parallel” by default • Distribution of metric due solely to assay variability Bioassay for a monoclonal antibody project Ligand Serial Dilution Competitive Antibody Receptor R efe rence S tanda rd Sa mpe l Pos/N e g Co n tr ols P Luminescence Readout StatLIA: Potency Calculation Chi-Square parallelism metric Traditional Method: Actual data from 95 runs Histogramof Chi-square values 40 Frequency 30 20 10 0 2 4 6 8 10 12 14 200 0 – Low sample number • 95% cutoff based on only ~ 4 runs 100 Frequency 50 • Eliminated outliers 0 – Some of these runs are probably NOT parallel – Data ‘massaging’ 150 ‘Traditional Method’ has problems: 7 8 9 10 11 12 13 95% percentile values, Chi-square (df = 4) End results = 95% cutoff at 7.28 S imula ted da ta wti h 100 r n u s. a rg L e va riabilit y n i 95%c u t off Standard Curves • • Can we use the data we already have? Many replicates of the standard – StatLIA reference set – 30 representative assays • Why 30? 18000 16000 14000 12000 RLU 10000 8000 6000 4000 2000 0 0 1 2 3 4 5 6 7 8 9 Normalized Data Normalized to a B-zero control (100% signal) 120 100 % RLU 80 60 40 20 0 0 1 2 3 4 5 6 7 8 9 Combining standards 120 100 Sample 2 curves 80 % RLU 120 100 60 % RLU 80 40 60 40 20 20 0 0 1 2 3 4 5 6 7 8 9 0 0 1 2 3 4 5 6 7 8 • Use StatLIA to generate chi-square value • 435 possible combinations of curves 9 Results of combinations Histogram of Chi-square value 100 Frequency 80 60 40 20 0 0 15 30 45 60 Chi-square value 75 90 • Not good – bimodal distribution? • 95% cutoff at ~62 105 Residuals 120 100 80 60 40 20 0 0 2 4 6 8 10 12 Residual = The distance from each data point to the fitted curve Due to the underlying variability of the assay Distribution of residuals Low end of the curve High end of the curve Histogram Histogram Normal Normal 20 M ean StDev N 0.8379 5.356 90 15 Frequency Frequency 15 10 5 0 • • Mean StDev N 20 10 5 -15 -10 -5 0 5 10 15 0 -2 Close to normally distributed Residuals larger at higher signal levels – Need to do weighted regression -1 0 1 2 3 0.2266 1.034 90 Bootstrap strategy Take the mean at each concentration of the 30 reference curves 100 Relative Signal 80 60 40 20 0 0 1 2 3 4 5 6 7 8 9 Concentration Concentration 1 2 3 4 5 6 7 8 Mean 95.52 94.26 88.56 68.55 35.99 19.37 14.39 10.64 Bootstrap strategy Randomly sample (with replacement) 3 of the 90 residuals at each concentration and add to the mean to make a standard curve 120 100 Relative Signal 80 60 40 20 0 0 1 2 3 4 5 6 7 8 9 Concentration Concentration 1 2 3 4 5 6 7 8 Residual 1 9.46 -1.43 3.38 0.92 1.22 1.59 2.36 0.08 Residual 2 2.28 2.76 -6.56 6.75 1.58 0.50 -0.28 0.13 Residual 3 -5.50 0.02 -8.73 2.04 -4.89 -1.88 1.62 0.54 Bootstrap strategy • Using this strategy we can generate 8 x 10^(46) possible standard curves • That’s 4 x 10^(46) pairs of curves! • Each pair generates one chi-square metric • We generated 2000 of those possible values – Same mean function = parallel curves by default – Differences are due to assay variability Results Histogram of Chi-square values 250 Frequency 200 150 100 50 0 0.0 2.5 5.0 7.5 10.0 12.5 Chi-square value 15.0 17.5 (Ov e ra l y: Chi-squ a re wit h 4d f ) 95t h percentile is at 7.499 *Estimated cutoff based on 95 runs = 7.28 *Estimated chi-square distribution cutoff = 9.483 Is the mean appropriate? “Low” mean “High” mean Histogram of Low mean 350 350 300 300 250 250 Frequency Frequency Histogram of High mean 200 150 200 150 100 100 50 50 0 0 2 4 6 8 Chi-square value 10 95% cutoff = 5.932 12 0 0 4 8 12 16 20 Chi-square value 95% cutoff = 11.21 Wider distribution with a low mean function. Weighting plays a role. 24 Sampling the mean function Added in an extra step – a new mean for each pair of curves Randomly selected one of the 30 original curves as the mean Histogram of Chi-square value 500 Frequency 400 300 200 100 0 0 4 8 12 16 Chi-square value 20 24 28 Distribution shifted slightly to the right 95t h percentile moved to 8.701 from 7.499 Can we use less information? Bootstrapping 20 curves Bootstrapping 10 curves Histogram of Chi-square value Histogram of Chi-square value 300 300 250 200 200 Frequency Frequency 250 150 150 100 100 50 50 0 0 3 6 9 12 Chi-square value 15 18 95% cutoff = 7.017 21 0 0 3 6 9 12 Chi-square value 15 18 95% cutoff = 7.683 21 Summary 95 actual runs of the assay 2000 bootstrapped runs Histogram of Chi-square value Histogram of Chi-square values 500 40 400 Frequency Frequency 30 20 10 0 300 200 100 0 2 4 6 8 10 12 14 95% cutoff = 7.28 Questionable data in the tail of the distribution. 0 0 4 8 12 16 Chi-square value 20 24 95% cutoff = 8.701 Consequence – would fail 7.7% of assays instead of 5% 28 Conclusions Bootstrapping to set a parallelism acceptance criterion has several advantages over traditional methods: • • • Eliminates the need for multiple assays comparing the reference standard to itself Eliminates the subjectivity involved with selecting the appropriate runs to include in the analysis Based on empirically observed data – no simulation necessary • • Can be used to set acceptance criteria very early in development (10 curves) and refined as more data is available Repeatable and defensible Acknowledgements Yue Wang - Senior Biometrician Don Warakomski Todd Ranheim Scott Barmat
© Copyright 2026 Paperzz