Title: | Bootstrap Analyses of Sampling Uncertainty in Goodness-of-Fit Statistics |
---|---|
Description: | Uses jackknife and bootstrap methods to quantify the sampling uncertainty in goodness-of-fit statistics. Full details are in Clark et al. (2021), "The abuse of popular performance metrics in hydrologic modeling", Water Resources Research, <doi:10.1029/2020WR029001>. |
Authors: | Martyn Clark [aut], Kevin Shook [aut, trl, cre] |
Maintainer: | Kevin Shook <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2024-11-11 03:59:43 UTC |
Source: | https://github.com/cran/gumboot |
Does jackknife after bootstrap analyses of the error in hydrological models by estimating the empirical probability distributions of NSE (Nash-Sutcliffe efficiency) and KGE (Kling-Gupta efficiency) estimators.
The package was partly funded by the Global institute for Water Security (GIWS; https://water.usask.ca/) and the Global Water Futures (GWF; https://gwf.usask.ca/) program.
Coded by: Martyn Clark and Kevin Shook
Maintained by: Kevin Shook [email protected]
The package code is described in:
Clark et al. (2021), "The abuse of popular performance metrics in hydrologic modeling", Water Resources Research, <doi:10.1029/2020WR029001>.
Bootstrap-jacknife of flow calibration statistics
bootjack( flows, GOF_stat = c("NSE", "KGE"), nSample = 1000, waterYearMonth = 10, startYear = NULL, endYear = NULL, minDays = 100, minYears = 10, returnSamples = FALSE, seed = NULL, bootYearFile = NULL )
bootjack( flows, GOF_stat = c("NSE", "KGE"), nSample = 1000, waterYearMonth = 10, startYear = NULL, endYear = NULL, minDays = 100, minYears = 10, returnSamples = FALSE, seed = NULL, bootYearFile = NULL )
flows |
Required. Data frame containing the date, observed and simulated
flows. The variable names must be date, obs, and sim,
respectively. The |
GOF_stat |
Required. Name(s) of simulation goodness of fit statistic(s)
to be calculated. Currently both |
nSample |
Required. Number of samples for bootstrapping. |
waterYearMonth |
Required. Month of beginning of water year. Default
is |
startYear |
Optional. First year of data to be used. If |
endYear |
Optional. Last year of data to be used. If |
minDays |
Required. Minimum number of days per year with valid (i.e. greater than 0) flows. Default is 100. |
minYears |
Required. Minimum number years to be used. Default is 10. |
returnSamples |
Optional. Default is |
seed |
Optional. If |
bootYearFile |
Optional. If |
Returns a data frame containing the goodness of fit statistic name
(i.e. NSE and/or KGE), and seJack
= standard error of
jacknife, seBoot
= standard error of bootstrap, p05, p50, p95
,
the 5th, 50th and 95th percentiles of the estimates, score
= jackknife
score, biasJack
= bias of jackknife, biasBoot
= bias of bootstap,
seJab
= standard error of jackknife after bootstrap.
Martyn Clark and Kevin Shook
NSE_stats <- bootjack(flows_1030500, "NSE")
NSE_stats <- bootjack(flows_1030500, "NSE")
Hydrologic model simulations can be produced using input-response data from the 671 catchments in the CAMELS dataset (Catchment Attributes and MEteorology for Large-sample Studies). Newman et al. (2015) and Addor et al. (2017) provide details on the hydrometeorological and physiographical characteristics of the CAMELS catchments. The CAMELS catchments are those with minimal human disturbance (i.e., minimal land use changes or disturbances, minimal water withdrawals), and are hence almost exclusively smaller, headwater-type catchments (median basin size of 336 km^2^). The CAMELS data used for the large-domain model simulations are publicly available at the National Center for Atmospheric Research at https://ral.ucar.edu/solutions/products/camels.
CAMELS_bootjack( CAMELS_sites = NULL, NetCDF_file = NULL, sim_var = "kge", GOF_stat = c("NSE", "KGE"), nSample = 1000, waterYearMonth = 10, startYear = NULL, endYear = NULL, minDays = 100, minYears = 10, seed = NULL, bootYearFile = NULL, quiet = FALSE )
CAMELS_bootjack( CAMELS_sites = NULL, NetCDF_file = NULL, sim_var = "kge", GOF_stat = c("NSE", "KGE"), nSample = 1000, waterYearMonth = 10, startYear = NULL, endYear = NULL, minDays = 100, minYears = 10, seed = NULL, bootYearFile = NULL, quiet = FALSE )
CAMELS_sites |
Required. Data frame of CAMELS sites. Must contain a field called hcdn_site. The data frame
|
NetCDF_file |
Required. NetCDF file containing CAMELS modelled and gauged flows. |
sim_var |
Required. Name of variable containing simulated flows in |
GOF_stat |
Required. Name(s) of simulation goodness of fit statistic(s) to be calculated. Currently both |
nSample |
Required. Number of samples for bootstrapping. |
waterYearMonth |
Required. Month of beginning of water year. Default is |
startYear |
Optional. First year of data to be used. If |
endYear |
Optional. Last year of data to be used. If |
minDays |
Required. Minimum number of days per year with valid (i.e. greater than 0) flows. Default is 100. |
minYears |
Required. Minimum number years to be used. Default is 10. |
seed |
Optional. If |
bootYearFile |
Optional. If |
quiet |
Optional. If |
Returns a data frame containing the following variables:
CAMELS_site
CAMELS site number
lat
CAMELS site latitude
lon
CAMELS site longitude
GOF_stat
Goodness of fit statistics (i.e. NSE or KGE)
seJack
standard error of jacknife
seBoot
standard error of bootstrap
p05, p50, p95
the 5th, 50th and 95th percentiles of the estimates
score
the jackknife score
biasJack
the bias of the jackknife
biasBoot
the bias of the bootstrap
seJab
the standard error of the jackknife after bootstrap
Martyn Clark and Kevin Shook
N. Addor, A. Newman, M. Mizukami, and M. P. Clark, 2017. Catchment attributes for large-sample studies. Boulder, CO: UCAR/NCAR. doi:10.5065/D6G73C3Q
Addor, N., Newman, A. J., Mizukami, N. and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, doi:10.5194/hess-21-5293-2017, 2017.
## Not run: camels <- CAMELS_bootjack(CAMELS_sites = sites, NetCDF_file = "CAMELS_flow.nc") ## End(Not run)
## Not run: camels <- CAMELS_bootjack(CAMELS_sites = sites, NetCDF_file = "CAMELS_flow.nc") ## End(Not run)
A data frame containing observed and simulated flows for USGS site 1030500
flows_1030500
flows_1030500
A data frame with 6940 rows and 3 variables:
Date of flows
observed flows (m)/s)
simulated flows (m)/s)
Plots uncertainties in model error estimates
ggplot_estimate_uncertainties(JAB_stats, fill_colour = NULL)
ggplot_estimate_uncertainties(JAB_stats, fill_colour = NULL)
JAB_stats |
Required. Data frame of jackknife after boot statistics for a large number
of model runs, as produced by |
fill_colour |
Optional. If |
Returns a ggplot2
object of the plots, faceted by goodness of fit statistic, i.e. NSE/KGE.
The confidence interval (difference between the 95^th^ and 5^th^ quantiles, and the value of
2 x the Bootstrap estimate of the standard error are plotted as lines. The values of
2 x the Jackknife estimate of the standard error are plotted as filled)
Martyn Clark and Kevin Shook
## Not run: p <- ggplot_estimate_uncertainties(all_stats, "orange")
## Not run: p <- ggplot_estimate_uncertainties(all_stats, "orange")
A data frame containing the locations of the USGS Hydro-Climatic Data Network site for the continental US (CONUS). These are the same sites used by CAMELS (Catchment Attributes and MEteorology for Large-sample Studies).
hcdn_conus_sites
hcdn_conus_sites
A data frame with 670 rows and 3 variables:
HCDN site number (integer)
Site latitude (decimal degrees)
Site longitude (decimal degrees)
This data set is described in Lins, H. F. (2012). USGS Hydro-climatic data network 2009 (HCDN-2009). U.S. Geological Survey Fact Sheet 2012-3047. Retrieved from https://pubs.usgs.gov/fs/2012/3047/. The data can be downloaded at doi:10.5066/P9HP0WFJ.
Reads simulated and observed values from CAMELS netcdf file for a single location
read_CAMELS(nc_file, site, obsName = "obs", simName = "kge")
read_CAMELS(nc_file, site, obsName = "obs", simName = "kge")
nc_file |
Required. netCDF file to read CAMELS data from. |
site |
Required. Site number to extract data. |
obsName |
Required. Name for variable containing observations. Default is "obs". |
simName |
Required. Name for variable containing simulations. Default is "kge". |
Returns a data frame containing the date, observed and simulated flows. The name of the
observed flow variable is obs
, the name of the simulated flow variable is sim
.
Martyn Clark and Kevin Shook
## Not run: flows <- read_CAMELS(nc_file = "CAMELS_flow.nc", site = 1030500) ## End(Not run)
## Not run: flows <- read_CAMELS(nc_file = "CAMELS_flow.nc", site = 1030500) ## End(Not run)