Preliminary Evaluation of Lightning Jumps as a Predictor of Severe

Evaluation of Lightning Jumps as a
Predictor of Severe Weather in the
Northeastern United States
Pamela Eck, Brian Tang, and Lance Bosart
University at Albany, SUNY
CSTAR Spring Meeting
Friday, 5 May 2017
Motivation
• Terrain can play an important role in the evolution of severe convection in
the northeastern United States
Great Barrington, MA (1995)
• F4 tornado
• 3 killed, 24 injured
• 11.5 mile track
Springfield, MA (2011)
• EF3 tornado
• 3 killed, 200 injured
Springfield, Massachusetts on 1 June 2011
Mechanicville, NY (1998)
• F3 tornado
• 30.5 mile track
Duanesburg, NY (2014)
• EF3 tornado
• 10-cm-diameter hail
Elevation (m)
Markowski and Dotzek 2011
Background
• Upslope flow + convection =
locally enhanced updraft and increased probability of severe weather
qr = model rain fall at 1km (shading), w5km = vertical velocity at 5km (closed contours)
A
B
u
(b)
(c) 600
windward
leeward
500
400
Height (m)
(a)
300
200
100
0
A
B
Background
•
•
•
•
Lack of surface observations in regions of complex terrain
Inability of radar beams to sample behind mountains
Alternative method = sudden increase in total lightning (“lightning jump”)
A lightning jump is indicative of a strengthening updraft and an increase
in the probability of severe weather
Valatie supercell (19 July 2015)
Flash rate
increased by
2 standard
deviations (σ)
in 10 minutes
Minimum
threshold of
10 flashes min-1
Time
series of
flash rates
Background
• Schultz et al. 2011
• Severe wind producing thunderstorm on 20 June 2000 in western Kansas
Purple = total lightning
Red = cloud-to-ground lightning
lightning jumps
wind reports
Summary of Previous Work
• Using only lightning jumps to predict severe weather…
• High FAR (85%)
• Lightning jumps occur in sub-severe storms
Summary of Previous Work
• Using only lightning jumps to predict severe weather…
• High FAR (85%)
• Lightning jumps occur in sub-severe storms
• Implemented upslope filter to eliminate sub-severe storms
• FAR did not drop substantially (80%)
Summary of Previous Work
• Using only lightning jumps to predict severe weather…
• High FAR (85%)
• Lightning jumps occur in sub-severe storms
• Implemented upslope filter to eliminate sub-severe storms
• FAR did not drop substantially (80%)
• Introduce random forest algorithm that utilizes pattern recognition rather
than having to rely on strict thresholds and sigma levels
Pattern Recognition
non-binary
Lightning jumps = flash rate + flash rate change (DFRDT)
Upslope
(ms-1)
0.3
0.3
0.7
0.6
Flash Rate
(flashes min-1)
15
10
10
5
Flash Rate
Change
(flashes min2)
5
6
4
2
to-45
to-30
to-15
to
Time
Pattern Recognition
non-binary
For any given time (to)…
Upslope
(ms-1)
0.3
0.3
0.7
0.6
Flash Rate
(flashes min-1)
15
10
10
5
Flash Rate
Change
(flashes min2)
5
6
4
2
to-45
to-30
to-15
to
Time
Pattern Recognition
non-binary
…use the maximum value from the previous 45 minutes…
Upslope
(ms-1)
0.3
0.3
0.7
0.6
Flash Rate
(flashes min-1)
15
10
10
5
Flash Rate
Change
(flashes min2)
5
6
4
2
to-45
to-30
to-15
to
Time
Pattern Recognition
…to predict whether or not severe weather will occur!
0
0
0
???
Upslope
(ms-1)
0.3
0.3
0.7
0.6
Flash Rate
(flashes min-1)
15
10
10
5
Flash Rate
Change
(flashes min2)
5
6
4
2
to-45
to-30
to-15
to
non-binary
Severe
Reports
Time
Pattern Recognition
non-binary
binary
Use non-binary, continuous variables to predict a binary,
non-continuous variable
0
0
0
???1?
0 or
Upslope
(ms-1)
0.3
0.3
0.7
0.6
Flash Rate
(flashes min-1)
15
10
10
5
Flash Rate
Change
(flashes min2)
5
6
4
2
to-45
to-30
to-15
to
Severe
Reports
Time
Summary of Previous Work
• Using only lightning jumps to predict severe weather…
• High FAR (85%)
• Lightning jumps occur in sub-severe storms
• Implemented upslope filter to eliminate sub-severe storms
• FAR did not drop substantially (80%)
• Introduce random forest algorithm that utilizes pattern recognition rather
than having to rely on strict thresholds and sigma levels
1. Prove that lightning and upslope are actually correlated
• POD = 84%, FAR = 29%
• Verifies the findings of Markowski and Dotzek 2011
Summary of Previous Work
• Using only lightning jumps to predict severe weather…
• High FAR (85%)
• Lightning jumps occur in sub-severe storms
• Implemented upslope filter to eliminate sub-severe storms
• FAR did not drop substantially (80%)
• Introduce random forest algorithm that utilizes pattern recognition rather
than having to rely on strict thresholds and sigma levels
1. Prove that lightning and upslope are actually correlated
• POD = 84%, FAR = 29%
• Verifies the findings of Markowski and Dotzek 2011
2.
NEW! Use lightning and upslope to predict severe weather
Methodology
Spatial Domain:
• New England (CT, MA, RI, VT, ME, NH), New York, Pennsylvania
• 8-km resolution grid spacing (GOES LMA)
Temporal Domain:
• July 2015 (1, 9, 14, 18, 19, 24, 26, 28)
Lightning Data:
• National Lightning Detection Network (NLDN)
• Total lightning = intracloud (IC) and cloud-to-ground (CG)
• Lightning jumps = flash rate & flash rate change
• Flash rate = flashes min-1
• Flash rate change (DFRDT) = flashes min-2
Methodology
Severe Reports Data:
• Storm Prediction Center (SPC)
• Wind, hail, and tornado
• Severe weather day = 12Z–12Z
Upslope Data:
• High Resolution Rapid Refresh (HRRR)
• Upslope (Λ) = v ∙ ∇zs > 0
• v = u & v component of the 80-m wind
• ∇zs = gradient of terrain height
Now, put all data into the random forest… but what is a random forest???
• An ensemble learning method for classification that operates by
constructing a multitude of decision trees
• Think of the trees as deterministic models and the forest as an
ensemble…
Let’s look at a fictitious example of
how a decision tree works…
Decision Tree
Dataset is broken into two parts:
• 2/3 is for training
• 1/3 is for testing
Decision Tree
Training
Dataset is broken into two parts:
• 2/3 is for training
• 1/3 is for testing
• Nodes partition using best split
Upslope
Flash Rate
Flash Rate Change
Decision Tree
Training
Dataset is broken into two parts:
• 2/3 is for training
• 1/3 is for testing
• Nodes partition using best split
• Variables are weighted
differently based on importance
Upslope
Flash Rate
Flash Rate Change
Decision Tree
Training
• Nodes partition using best split
Non-Severe | Severe
8|5
• Variables are weighted
differently based on importance
Upslope
Testing
Flash Rate
• Each tree “votes” for a class…
The forest chooses the
classification with the most
votes
Flash Rate Change
Decision Tree
Training
• Nodes partition using best split
Non-Severe | Severe
8|5
• Variables are weighted
differently based on importance
Upslope
Testing
Flash Rate
• Each tree “votes” for a class…
The forest chooses the
classification with the most
votes
• How well did this tree do?
Calculate verification metrics!
Flash Rate Change
Non-Severe | Severe
8|5
Upslope
3|1
PREDICTED
Non-Severe Severe
Decision Tree
Flash Rate
ACTUAL
Severe
Non-Severe
Hit (A)
False Alarm (B)
4
Miss (C)
1+0+0 = 1
1
Correct Null (D)
3+2+2 = 7
FAR = B / ( A + B ) = 20%
POD = A / ( A + C ) = 80%
This was a pretty good
example! Now let’s try it
with some real data…
2|0
Flash Rate Change
2|0
1|4
Non-Severe | Severe
2888 | 281
Upslope
?|?
PREDICTED
Non-Severe Severe
Decision Tree
Flash Rate
ACTUAL
Severe
Non-Severe
Hit (A)
False Alarm (B)
?
Miss (C)
?
Correct Null (D)
?
?
FAR = ???
POD = ???
?|?
Flash Rate Change
?|?
?|?
Results
100%
80%
60%
40%
20%
0%
After 1000 runs we
found…
Verification:
FAR = 28%
POD = 82%
Very promising
result!!!
Results
Accuracy
• Each label must correctly predict each sample
• Accounts for true and false negatives and positives
n = # of samples
ŷi = predicted
yi = actual
96%
0% = worst
100% = best
Results
96%
Accuracy
• Each label must correctly predict each sample
• Accounts for true and false negatives and positives
0% = worst
100% = best
n = # of samples
ŷi = predicted
yi = actual
precision = A / ( A + B )
recall = A / ( A + C )
PREDICTED
Yes
No
F1 Score (F-measure, balanced F-score)
• Uses the harmonic mean to assess accuracy
• Does not take true negatives into account
ACTUAL
Yes
No
0.77
A. Hit
B. False
Alarm
C. Miss
D. Correct
Null
0 = worst
1 = best
Results
50%
45%
Variable Importance:
Flash Rate = 30%
Flash Rate
Change = 25%
Upslope = 45%
40%
35%
30%
25%
20%
How will the
importance of lightning
data compare to the
importance of radar
data?
Current Work
• Adding several radar products to the random forest in order to compare
the skill of lightning data to that of radar data
• National Center for Environmental Information
• NEXRAD Level III Radar Data
1. Maximum Reflectivity
• Base Reflectivity (0.5 Deg)
2. Enhanced Echo Tops
3. Digital (High Resolution) Vertical Integrated Liquid
• The following results are still preliminary, but I think they may spark
some interesting discussion…
Current Work
After 1000 runs we
found…
Verification:
FAR = 26%
POD = 84%
Accuracy: 96%
F1 Score: 0.79
Current Work
Variable Importance:
Flash Rate = 25%
Flash Rate Change = 20%
Upslope = 37%
Max dBZ = 18%
In the process of adding…
• Echo Tops
• VIL
• ???
Conclusions
• Lightning jumps can be a valuable tool for diagnosing
severe weather in regions of complex terrain
• High false alarm rates suggest that lightning jumps are
occurring in sub-severe storms
• Random forests provide a useful method for eliminating
minimum thresholds and sigma levels which helps to
differentiate between severe and sub-severe events
• Current work includes incorporating more radar variables to
compare importance
Thank you!
[email protected]
Markowski, P. M., and N. Dotzek, 2011: A numerical study of the effects of orography on supercells. Atmos.
Res., 100, 457-478
Schultz, C. J., W. A. Petersen, and L. D. Carey, 2011: Lightning and severe weather: A comparison between
total and clout-to-ground lightning trends. Wea. Forecasting, 26, 744-755
Results: Part II
Accuracy
n = # of samples
ŷi = predicted
yi = actual
Matthew’s Correlation Coefficient (Phi Coefficient)
96%
0% = worst
100% = best
0.75
-1 = worst
1 = best
F1 Score (F-measure, balanced F-score)
precision = A / ( A + B )
recall = A / ( A + C )
0 = worst
1 = best
PREDICTED
Yes
No
0.77
ACTUAL
Yes
No
A. Hit
(TP)
B. False
Alarm (FP)
C. Miss D. Correct
(FN)
Null (TN)