Understanding and predicting extreme wildfires in the contiguous United States

Wildfires are becoming more frequent in parts of the globe, but predicting where and when extreme events occur remains difficult. To explain and predict wildfire extremes across the contiguous United States, we integrate a 30 year wildfire occurrence record with meteorological and housing data in spatiotemporal Bayesian models with spatially varying nonlinear effects. We compared models with different distributions for the number and sizes of large fires. A zero-inflated negative binomial model for fire counts and a lognormal model for burn areas provided the best performance. This model attains 99% interval coverage for the number of fires and 92% coverage for fire sizes over a five-year withheld data set. Dryness and air temperature strongly regulate wildfire risk, with precipitation and housing density playing weaker roles. Statistically, these drivers affect the chance of an extreme wildfire in two ways: by altering fire size distributions, and by altering fire frequency, which influences sampling from the tails of fire size distributions. We conclude that recent extremes that have burned hundreds of thousands of acres should not be surprising, and that the contiguous United States may be on the verge of experiencing even larger (million acre) wildfire extremes.

The lognormal distribution performed best on the test set (Table 2), and captured tail-260 behavior better than other burn area distributions ( Figure 6). The GPD model was too 261 heavy-tailed to adequately capture the pattern in the empirical data, predicting fires far 262 larger than those observed in the training and test sets ( Figure 6). The tapered Pareto 263 11/55 distribution was too light-tailed ( Figure 6). The gamma and Weibull models performed very poorly overall on the test set (Table 2), apparently due to a lack of congruence be-265 tween the shapes of these distributions and the actual burn area distribution. Despite a 266 poor fit to the bulk of the wildfire burn area distribution, both performed adequately in 267 the upper tails ( Figure 6). Hereafter we present results for the lognormal model, which 268 had the highest test set log likelihood and captured tail behavior of the empirical fire size 269 distribution.

270
Relative humidity was the primary driver of expected burn area for a fire event (Figure   271 7A). The first basis vector for mean daily minimum relative humidity was the only coeffi-272 cient with a 95% credible interval that did not include zero (posterior median: 1.66, 95% 273 CI: (0.57 -2.28)). This nonlinear effect can be observed in Figure 7B as an increase in the 274 expected burn area below 20% mean daily minimum humidity. This leads to a seasonality  Overall, 95% posterior predictive interval coverage in the test set for burn areas was 92.3%.

281
The lowest test set coverage was 0%, for the Eastern Great Lakes Lowlands L3 ecoregion, 282 followed by 50%, for the Central California Valley L3 ecoregion, though these ecoregions 283 had just 1 and 2 wildfire events in the test set. When observed fire sizes fell outside the 284 95% prediction interval, 23.3% of wildfires were smaller than predicted, and 76.7% of wild-285 fires were larger than predicted. The largest discrepancy between the actual size of a wild-286 fire and the predicted 97.5% posterior quantile was observed with the Wallow Fire in 2011 287 which burned 563,655 acres, but the predicted upper limit for size was 50,136. We inves-288 tigate this discrepancy further in the case study below. The lognormal burn area model 289 achieved 100% interval coverage in 26 of 66 ecoregions that had wildfire events in the test 290 set, accounting for 29% of the land area of the contiguous U.S.

12/55
By combining the output of the event count and burn area models, we derived posterior 293 prediction intervals for the size of the largest fire in a month for each region (the "burn 294 area maximum"), integrating over uncertainty in the number of fires, as well as the log-295 normal mean and standard deviation for burn area. We evaluated the posterior distri-296 bution for the quantile function of the n-sample maximum of a lognormal distribution 297 (exp(µ + σ √ 2erf −1 (2P 1/n − 1)), where erf −1 is the inverse error function, and P is a prob- predictions differed. We defined the contribution of a variable as the dot product of the 329 elements in the design matrix X corresponding to a particular driver variable (e.g., hu- Here we provide the unnormalized posterior densities for each model. Square brackets rep-759 resent a probability mass or density function. Parameterizations for model likelihoods are 760 provided first, followed by the factorization of the joint distribution, with explicit priors.

761
Poisson wildfire count model 762 We used the following parameterization of the Poisson distribution: where µ is the mean and variance.

45/55
Negative binomial wildfire count model 766 We used the following parameterization of the negative binomial distribution: where µ is the mean, and δ is a dispersion parameter.
We used the following parameterization of the zero-inflated Poisson distribution: where µ is the Poisson mean, and 1 − π is the probability of an extra zero.

772
The unnormalized posterior density of this model is: We used the following parameterization of the zero-inflated negative binomial distribution: where µ is the negative binomial mean, δ is the negative binomial dispersion, and , and 776 1 − π is the probability of an extra zero.

777
The unnormalized posterior density of this model is: We used the following parameterization of the GPD/Lomax distribution: where κ is a shape parameter and σ is a scale parameter.

51/55
Tapered Pareto burn area model 783 We used the following parameterization of the tapered Pareto distribution: where κ is a shape parameter and ν a taper parameter.

52/55
Lognormal burn area model 787 We used the following parameterization of the lognormal distribution: where µ and σ are location and scale parameters, respectively.

53/55
Gamma burn area model 791 We used the following parameterization of the gamma distribution:

54/55
Weibull burn area model 795 We used the following parameterization of the Weibull distribution: where κ is a shape parameter and σ is a scale parameter.