PDF

Statistical Analysis of TI
GHG Stack Data
Joel Dobson, TI
Wednesday, Nov 30, 2011
Objectives
• The objective of this presentation is to provide an in-depth
description of the rigorous statistical methods that have
been used to evaluate the TI stack emissions and gas usage
data.
• The objective of this statistical analysis has been to extract
the maximum amount of information from the data sets,
relative to the relation between gas usage and emissions
• However, this same level of analysis is neither appropriate,
or as shown in the following charts, even possible, for all
data sets.
Pg 2
An Outline of this Presentation
1.
Review our main point and the former work from July 27th
presentation at TI.
2.
Show that some GHG bottles have appreciable pressure drop (or weight
change) across the day of FTIR testing, while others do not.
3.
A review of our data structure and its implications for analysis.
4.
A statistical comparison of two stacks’ emission vs usage correlations in
two time frames. (Four simple linear regressions.)
5.
Checking for correlation of stack concentration (FTIR signal) vs. a
production metric (wafer moves per hour) in the GHG process steps.
6.
Our conclusions
Pg 3
Table of Contents
• Our main point
•
•
•
•
•
•
A brief review from July
Bottle pressure graphs vs time
Explaining our data structure
Comparing stacks and timeframes
Check for correlation vs wafer moves
Conclusions
4
Key Takeaways
• Comparison of slopes of CF4 FTIR vs CF4 Pressure Deltas
shows to be similar in both time frames for both stacks’ data.
• We tested our hypothesis that Greenhouse Gas FTIR signals
(emissions) track Greenhouse Gas Usage.
• We can estimate the emission factor by dividing total
emissions by total usage, across the same time frame.
5
Key Takeaways
• We showed that some GHG weight scale change are far less
than others over a single day. For those, the usage may need
to be taken across a longer period of time.
• Gas usage is a better indicator of emissions within the quarter
hours of a day than wafer moves. We obtained wafer moves
per time period from the GHG processes only. These time
series plots do not align to the FTIR time series plots,
regardless of lag applied.
6
What’s our point?
• Our point is: We can estimate the emission factor by dividing total
emissions by total usage, across the same time frame.
• We do not need the sophisticated statistical analysis to estimate our
emissions factors; we can use a quotient.
• The point of the emissions regression study was to demonstrate the
correlation.
7
Our main point
Linear regression models prove that:
[X = GHG usage] is a good predictor for [Y = GHG emissions.]
Let’s call this our ’key relationship’
The most salient point is that Y tracks X.
Variability in X is tracked by variability in Y.
Yes, indeed there is much variability! We agree.
We can more easily get data for X than for Y.
8
Table of Contents
• Our main point
• A brief review from July
•
•
•
•
•
Bottle pressure graphs vs time
Explaining our data structure
Comparing stacks and timeframes
Check for correlation vs wafer moves
Conclusions
9
A brief review from our July presentation.
We studied “Y = Qtr Hr avg of Instantaneous CF4 FTIR reading every 7 minutes”
as a function of “X = Qtr Hr avg of CF4 usage every 5 minutes”. To align the data,
we created quarter-hour averages of X and Y readings. Each point is
from one quarter-hour. These readings were not accumulated
either on X or on Y. Our latest analysis update repeats these non-accumulated
analyses, but adds on the new accumulated analyses. R squared of ~50%.
This is from South Stack (“Stack 2”) of Fab 1 from March.
10
An improvement over our earlier analysis
•
When
Y = m*X + b
then
delta(Y) = m * delta (X) + C
We would expect the value of C to be zero.
•
•
In our analyses in July, we had permitted C to be other than zero.
The statistical analyses confirmed that C was not zero, statistically.
We suspect this is because our 5-minute data is more poorly resolved
than if we had first accumulated it across the day, and then analyzed.
We will talk a lot about this accumulation idea in what follows.
•
Analytically, maybe we should have been doing zero intercept regressions !
In other words, maybe we should have locked the point (0,0) for
the best fit line and then let the slope vary to best-fit the data.
Dobson 214 882 1369
11
Table of Contents
• Our main point
• A brief review from July
• Bottle pressure graphs vs time
•
•
•
•
Explaining our data structure
Comparing stacks and timeframes
Check for correlation vs wafer moves
Conclusions
12
Trends of GHG pressures
• Two gases, CF4 and NF3, have appreciable
pressure drop across the 24 hour testing period.
• They are from gas bottles that are inside our
building at a controlled temperature.
• They are much smaller bottles than the type one
would store on a truck trailer.
• Our smaller tanks’ pressure drops are much more
of a % of total pressure, across a single day, than
would be the % pressure drop for a larger
cylinder.
13
CF4 bottle pressure trends
March Fab 1
CF4 cylinder
Pressures vs
Minutes in the
Day.
August Fab 1
CF4 cylinder
Pressures vs
Minutes in the
Day.
Each trellis panel is a separate bottle.
14
NF3 pressures
March Fab 1 NF3 cylinder
Pressures vs Minutes
in the Day.
August Fab 1 NF3 cylinder
Pressures vs Minutes
in the Day.
Each trellis panel is a separate bottle.
15
Other Gases ---- C4F8, August, Fab1
All trends
are flat or
show only
a few steps
along the
entire path.
Each trellis panel is a separate bottle.
Quite a
contrast
when
compared
to NF3 or
CF4.
16
Other Gases --- C5F8, August, Fab 1
All trends are flat
1
or show only
a few steps along
the entire path.
Quite a contrast
when compared
to NF3 or CF4.
Each trellis panel is a separate bottle.
These few might
show promise if
we divide total
emissions by total
usage.
17
Other Gases ----
CH2F2, August, Fab1
All trends are flat
or show only
a few steps
along the
entire path.
Quite a contrast
when compared
to NF3 or CF4.
Each trellis panel is a separate bottle.
This one has
a few steps,
though.
18
Other Gases ----
CHF3, August, Fab 1
All trends
are flat or
show only
a few steps
along the
entire path.
Each trellis panel is a separate bottle.
Quite a
contrast
when
compared
to NF3 or
CF4.
19
Other Gases ---- SF6, August, Fab1
All trends are flat or show
only a few steps along the
entire path.
Quite a contrast when
compared to NF3 or CF4.
These few might show
promise if we divide total
emissions by total usage.
Each trellis panel is a separate bottle.
20
Other Gases?
•
Two gases, CF4 and NF3, have appreciable pressure drop across the 24 hour
testing period.
•
The other GHGs we measured did NOT show much weight change across a day.
– Some only show weight measurement steps a few times a day.
– The scale may not be able to resolve the weight drop well enough for our data needs.
– No or low usage gases do not have enough data for statistical analysis.
•
We were NOT able to build good models for their emissions vs usage, using our
simple regression approach.
21
Table of Contents
• Our main point
• A brief review from July
• Bottle pressure graphs vs time
• Explaining our data structure
• Comparing stacks and timeframes
• Check for correlation vs wafer moves
• Conclusions
22
Explaining our data structure
• Our regression analysis assumes we have gas usages and gas emissions in
the same, aligned time-intervals.
– We have bottle pressures every 5 minutes.
– FTIR is irregularly spaced, typically from 6 to 9 minutes.
• This is neither good nor bad. We must take care when analyzing.
• We will use an interpolation method.
• We discussed this in July as well. We had averaged the data in quarter
hour intervals.
23
Our raw-data time-stamps misalign.
Fab Tower
start
stop
Minutes
N bottle
N Ftir
ratio
Min/FTIR
Fab1 N
3 11 11:09 3 12 11:09
1440
288
159
1.811
9.057
Fab1 S
3 14 17:49 3 15 17:53
1444
288
211
1.365
6.844
Fab1 S
8 1 13:03
8 2 12:45
1422
284
240
1.183
5.925
Fab1 N
8 2 17:29
8 3 15:35
1386
277
190
1.458
7.295
Note the difference in count of bottle readings and count
of FTIR readings. What is not shown here, is that the FTIR data
occur in irregularly spaced time intervals. This makes analysis harder.
Nomenclature: “North Stack” = “Stack 1”
and “South Stack” = “Stack 2”
Dobson 214 882 1369
24
Illustrating the misalignment of time stamps.
FTIR intervals vary.
Pressure intervals do not vary.
They are every 5 minutes.
The interpolated FTIR value.
25
What is different in our new analysis?
• First, we have interpolated the FTIR readings into the 5 minute intervals
from the pressure readings.
• Second, we have accumulated both the FTIR and the pressure drop across
the day.
• Because averages are appropriate scaled sums, scaled by 1/N, and
because N is the same for Y and X here, our overall regression slope will be
very similar to the average value of Y divided by average value of X. This
helps explain why a zero-intercept model might be reasonable, and the
intercept is not very important if we start-off X and Y both at zero.
• The July presentation used non-accumulated data.
Dobson 214 882 1369
26
A simple analogy
• Suppose we have a 3 foot by 48 foot sidewalk.
• Suppose it is laid in twelve 3x4 blocks.
• We could use a 50’ tape ruler to measure each of the
12 blocks and their measurement error.
• We could sum up the 12 lengths and combine the 12
errors by Pythagoras's rule.
• But, we would do better by pulling out the length of
the metal tape rule and making one measurement,
incurring only one measurement error.
• The point of this ----> Accumulated measurements are
better.
27
Table of Contents
•
•
•
•
Our main point
A brief review from July
Bottle pressure graphs vs time
Explaining our data structure
• Comparing stacks and timeframes
• Check for correlation vs wafer moves
• Conclusions
28
• We next show time line graphs of the FTIR signals
for CF4 and NF3
– Two separate stacks:
• North stack =“Stack 1”
• South stack = “Stack 2”
– Two separate months, March & August
• Makes 4 time line graphs.
29
Timeline Graphs FTIR from Fab 1
March, S Stack, (“Stack 2”)
March, N Stack, (“Stack 1”)
August, S Stack, (“Stack 2”)
August, N Stack, (“Stack 1”)
Blue is NF3 FTIR and Red is CF4 FTIR. Gaps show where the FTIR was not read
or where it was not detected( Cal? Spike test?). We interpolated the FTIR
readings into 5 minute intervals to align into the pressure drop readings. NF3 FTIR
has non-detects in S Stack, esp in August. This will be important later on.
Dobson 214 882 1369
30
CF4
• Simple linear regression fit for:
Y = CF4 FTIR accumulated across 24 hours
on
X = CF4 bottle pressure drop across 24 hours
For each of two stacks in each of two months.
We will show four best-fitted lines.
31
CF4 Accumulated Studies, Fab 1
Why are we using pressure on our X-axis? The data provided was pressure
data. We could just as easily calculate the mass of CF4 gas used using the
ideal gas law. Our cylinders are inside at controlled temperature. Regardless
of the units of usage, the slopes are proportional to emission factors. We can
later multiply by an appropriate scaling factor that will put the slopes into
more meaningful physical units. As stated earlier, the intercept is not of much
concern since we can ‘tare’ the X and Y readings at time zero.
Dobson 214 882 1369
32
CF4 Accumulated Studies, Fab 1
This looks like good agreement in the two time frames.
Remember that North Stack is sometimes called “Stack One”
and South Stack called “Stack 2” in some of our presentations.
Dobson 214 882 1369
33
Parameter Estimates
CF4 studies in fab 1
Term
Estimate
Std Error
t Ratio
Prob>|t|
P02.5
P97.5
N Stack, March
Intercept
CF4_AccumPDrop
R squared
count
-0.733936
0.1240551
0.99943
288
0.039488
0.000175
-18.59
708.28
<.0001*
<.0001*
0.12371
0.1244
Slopes differ by 5%.
N Stack, August
Intercept
CF4_AccumPDrop
0.59687
0.1171406
0.999249
267
0.049365
0.000197
12.09
593.96
<.0001*
<.0001*
0.11675
0.11753
Intercept
CF4_AccumPDrop
-2.250313
0.317801
0.148034
0.000568
-15.2
559.34
<.0001*
<.0001*
0.31669
0.31891
R squared
count
0.999083
R squared
count
S Stack, March
289
Slopes in N stack
are both 0.12 while
those for S stack
are 0.32 and 0.36.
Though the slopes
95% confidence
intervals do not
overlap, we can
choose either
the larger one or
their average to use
when estimating
our emissions factor.
Slopes differ by 12%.
S Stack, August
Intercept
CF4_AccumPDrop
-0.096082
0.3569387
R squared
count
0.999907
212
0.05623
0.000238
-1.71
1500.9
0.089
<.0001*
0.35647
0.35741
Though the slopes slightly differ in the 2 timeframes, we find the
observed agreement phenomenal, strikingly cogent.
34
What’s our point?
• Although the slopes in N stack (“Stack 1”) are statistically distinguishable
in the two time frames, March and August, they appear phenomenally
similar based on inspection.
– We can use the sharper one for our estimate.
– The same is true for S stack. (“Stack 2”)
• Our point is more subtle: We can estimate the emission factor by
dividing total emissions by total usage, across the same time frame.
• We do not need the sophisticated statistical analysis to estimate our
emissions factors; we can use a quotient.
• The point of the FTIR STUDY was to demonstrate the correlation.
• The time frame used to estimate the emissions factor may need to be
chosen based upon pressure gauge or weight scale resolution
considerations.
35
About those intercepts
•
In 3 of our 4 simple regressions, the intercept is NOT zero, statistically.
•
The intercepts are estimated to be: { -0.7, +0.6, -2.3, and -0.1}.
These depend on how we define our time zero.
But time zero was chosen arbitrarily.
It is really only the slopes we are interested in.
•
Our Y span goes from 0 to 40 for N stack and 0 to 140 for S stack.
These intercepts are a small fraction of our Y span.
•
Although the intercepts from 3 of our 4 best fitted lines are statistically
distinguishable, these intercepts are overall inconsequential.
We could use a zero-intercept regression and force the line through the
origin at (0,0). That seems reasonable.
36
NF3
For the South Stack (“Stack 2”) in each of two months:
First we start off with the time line plots.
Accumulated pressure drops are easy to explain: we just read the
gauge over time.
Analogously, accumulated FTIR concentration values can be
mathematically transformed into total mass emissions over the
period of the test by multiplying an appropriate conversion
factor.
That conversion would multiply the FTIR concentration in
ppm/volume by the appropriate time interval and by the stack
volume velocity.
37
NF3 from Fab1 Stack2 in March
This is for the South
Stack of Fab 1, which we
sometimes call Stack 2.
Left and Right Y axis scales will match in this March graph and
in the August graph on the next page.
38
NF3 from Fab1 Stack2 in August
The right most points to not
match only because of keeping
the uniform scales on last page
and on this page.
Dobson 214 882 1369
39
NF3
• Simple linear regression fits for:
Y = NF3 FTIR accumulated across 24 hours
on
X = NF3 bottle pressure drop across 24 hours
For the South Stack (“Stack 2”) in each of two months.
We will show two best-fitted lines.
40
NF3 FTIR regressions, Stack 2, Fab 1
NF3 FTIR. Blue is August and red is March. The software shaded in the
prediction intervals but they may be hard to see so we have attempted to
add the extra lines manually, to show them off better. (They are actually
hyperbolae but the prediction intervals and std errors are so tight, and
our data so uniformly spread, that they almost look like lines in our graphs.)
We will explain why the slopes are so different, shortly !!!
41
Summary of linear fits (NF3)
group
Term
Estimate
Std Error
03_March
Intercept
-0.514
0.107
slope
0.337
0.001
RSquare
0.999
N
291
Intercept
-4.992
0.302
slope
0.237
0.002
RSquare
0.983
N
287
08_August
P02.5
P97.5
0.336
0.338
0.233
0.241
The slope 95% confidence intervals do not overlap in the 2 separate time frames. We can
conservatively choose the higher one as the slope to scale-up for our emissions factor.
The lower slope in August is basically explained by the presence of far more “ND” (which we had set to
zero) readings in August. The fab was running lower level in August. Ideally, we would set the slope
during the more stringent time frame when the fab is running near full capacity.
42
Conclusions thus far.
• If the FTIR gauge can resolve the chemical being read, then it can
be accumulated over time.
– Non-Detects can make the analysis more difficult.
– CF4 FTIR signal works somewhat better than NF3 FTIR does, but both have
excellent fits.
– The slopes for Y = Accumulated FTIR on X = Accumulated Pressure Drop appear to
be similar.
– We can take our emissions factor from the larger of the two slopes.
43
Conclusions thus far.
We can estimate the emission factor by dividing total emissions
by total usage, across the same time frame.
– This supports our original proposal, that GHG usage can be used to predict
GHG emissions.
– We do not need the sophisticated statistical analysis to estimate our
emissions factors; we can use a quotient.
– And, it doesn’t have to be a one-day accumulation. Any representative
time period will work.
44
Table of Contents
•
•
•
•
•
Our main point
A brief review from July
Bottle pressure graphs vs time
Explaining our data structure
Comparing stacks and timeframes
• Check for correlation vs wafer moves
• Conclusions
45
Time Series for wafer moves.
Per request from EPA in July:
• We next present time series graphs showing:
– Wafer moves per quarter hour in the GHG process
steps.
– Average FTIR signal per quarter hour.
• Silane on next page.
• CF4 the page after that.
• C2F6 the page after that.
• We cannot ‘lag’ these time series to make them
to ‘align’.
46
Other possibly correlated measurements.
Red is wafer moves per quarter hour in the CFC process steps.
Blue is the FTIR signal for SiH4.
X-axis is the quarter hour.
We could find no lags that ‘align’ these time series.
47
Other possibly correlated measurements.
Red is wafer moves per quarter hour in the CFC process steps.
Blue is the FTIR signal for CF4.
X-axis is the quarter hour.
We could find no lags that ‘align’ these time series.
48
Other possibly correlated measurements.
Red is wafer moves per quarter hour in the CFC process steps.
Blue is the FTIR signal for C2F6.
X-axis is the quarter hour.
We could find no lags that ‘align’ these time series.
49
Time Series for wafer moves.
• We presented time series graphs showing:
– Wafer moves per quarter hour in the GHG process
steps.
– Average FTIR signal per quarter hour.
• We cannot ‘lag’ these time series to make them
to ‘align’.
• Emissions correlate to usage but not to wafer
moves.
50
Table of Contents
•
•
•
•
•
•
Our main point
A brief review from July
Bottle pressure graphs vs time
Explaining our data structure
Comparing stacks and timeframes
Check for correlation vs wafer moves
• Conclusions
51
Conclusions
• Comparison of slopes of CF4 FTIR vs CF4 Pressure Deltas
shows to be similar in both time frames for both stacks’ data.
• We tested our hypothesis that Greenhouse Gas FTIR signals
(emissions) track Greenhouse Gas Usage.
• We can estimate the emission factor by dividing total
emissions by total usage, across the same time frame.
52
Conclusions
• We showed that some GHG weight scale change are far less
than others over a single day. For those, the usage may need
to be taken across a longer period of time.
• Gas usage is a better indicator of emissions within the quarter
hours of a day than wafer moves. We obtained wafer moves
per time period from the GHG processes only. These time
series plots do not align to the FTIR time series plots,
regardless of lag applied.
53
What’s our point?
• Our point is: We can estimate the emission factor by dividing total
emissions by total usage, across the same time frame.
• We do not need the sophisticated statistical analysis to estimate our
emissions factors; we can use a quotient.
• The point of the emissions regression study was to demonstrate the
correlation.
54
Our main point
Linear regression models prove that:
[X = GHG usage] is a good predictor for [Y = GHG emissions.]
Let’s call this our ’key relationship’
The most salient point is that Y tracks X.
Variability in X is tracked by variability in Y.
Yes, indeed there is much variability! We agree.
We can more easily get data for X than for Y.
55
Thank you for your time.
56