GCMs data#

Global Climate Models (GCMs) are crucial tools for understanding and predicting the impacts of climate change on various systems. Several sources provide GCM-based projections, each with unique characteristics suited to different scales and regions. For assessing climate impacts in South Florida, it is essential to consider sources that can provide high-resolution data for local dynamics.

CMIP6-based Multi-model Hydroclimate Projection over the Conterminous US, Version 1.1#

This dataset presents a suite of high-resolution downscaled hydro-climate projections over the conterminous United States (CONUS) based on multiple selected Global Climate Models (GCMs) from the Coupled Models Intercomparison Project phase 6 (CMIP6). The CMIP6 GCMs are downscaled using either statistical (DBCCA) or dynamical (RegCM) downscaling approaches based on two meteorological reference datasets (Daymet and Livneh). Subsequently, the downscaled precipitation, temperature, and wind speed are employed to drive two calibrated hydrologic models (VIC and PRMS), enabling the simulation of projected future hydrologic responses across the CONUS. Each ensemble member covers the 1980-2019 baseline and 2020-2059 near-future periods under the high-end (SSP585) emission scenario. Moreover, utilizing only DBCCA and Daymet, the projections are further extended to the 2060-2099 far-future period and across three additional emission scenarios (SSP370, SSP245, and SSP126). This dataset is formulated to support the SECURE Water Act Section 9505 Assessment for the US Department of Energy (DOE) Water Power Technologies Office (WPTO). For further details on this dataset, please refer to Kao et al. (2022) and Rastogi et al. (2022)

The data can be accessed at: https://hydrosource.ornl.gov/data/datasets/9505v3_1/ory.

Visualization of Baseline and Near-Future Precipitation, Tmax. Tmin Projections for Soth Florida#

This script generates precipitation, Tmax and Tmin maps using data from multiple climate models under different time periods and scenarios which is sourced from CONUS. The maps cover South Florida’s geographical region and display total annual precipitation, mean Tmax and Tmin for two distinct periods: 1980-2019 (baseline) and 2020-2059 (near-future). The data is extracted from NetCDF files for six different global climate models (GCMs) under the SSP5-8.5 scenario. Each model is visualized in a subplot, with common color scales applied to facilitate comparison. A single color bar is added to represent each variable, and the figure is saved as a high-resolution image for each variable.

import xarray as xr
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

# Define Florida's bounding box
florida_bounds = {
    'lon_min': -81.39207,
    'lon_max': -80.09851,
    'lat_min': 25.10831,
    'lat_max': 25.96406
}
#upper_left  -81.39207 25.96406
#upper_right -80.09851 25.96406
#lower_right -80.09851 25.10831 
#lower_left  -81.39207 25.10831 

# PC, Tmax, and Tmin variables setup (common code block)
variables = ['prcp', 'tmax', 'tmin']  # List of variables to loop over

# Define the model names
model_names = [
    'ACCESS-CM2', 'BCC-CSM2-MR', 'CNRM-ESM2-1', 
    'MPI-ESM1-2-HR', 'MRI-ESM2-0', 'NorESM2-MM',
    'ACCESS-CM2', 'BCC-CSM2-MR', 'CNRM-ESM2-1',
    'MPI-ESM1-2-HR', 'MRI-ESM2-0', 'NorESM2-MM'
]

# Units setup based on variables
units = {
    'prcp': 'Precipitation [mm/year]',
    'tmax': 'Tmax [°C]',
    'tmin': 'Tmin [°C]'
}

for variable in variables:
    # Paths to the NetCDF files based on variable
    file_paths = [ 
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/ACCESS-CM2_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_1980_2019.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/BCC-CSM2-MR_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_1980_2019.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/CNRM-ESM2-1_ssp585_r1i1p1f2_RegCM_Daymet_VIC4_{variable}_yr_1980_2019.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/MPI-ESM1-2-HR_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_1980_2019.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/MRI-ESM2-0_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_1980_2019.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/NorESM2-MM_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_1980_2019.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/ACCESS-CM2_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_2020_2059.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/BCC-CSM2-MR_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_2020_2059.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/CNRM-ESM2-1_ssp585_r1i1p1f2_RegCM_Daymet_VIC4_{variable}_yr_2020_2059.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/MPI-ESM1-2-HR_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_2020_2059.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/MRI-ESM2-0_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_2020_2059.nc',
        f'C:/Users/mgebremedhin/Documents/FGCU/Notebook/PCP/NorESM2-MM_ssp585_r1i1p1f1_RegCM_Daymet_VIC4_{variable}_yr_2020_2059.nc'
    ]

    # Set up plotting grid
    fig, axes = plt.subplots(2, 6, figsize=(18, 6), subplot_kw={'projection': ccrs.PlateCarree()})
    axes = axes.flatten()

    data_ranges = []  # To determine min and max across all data

    for idx, file_path in enumerate(file_paths):
        # Open dataset and select the data slice
        ds = xr.open_dataset(file_path, decode_times=False)
        data_slice = ds[variable].sel(
            lon=slice(florida_bounds['lon_min'], florida_bounds['lon_max']),
            lat=slice(florida_bounds['lat_min'], florida_bounds['lat_max'])
        ).isel(time=0)  # Select first time step
        data_ranges.append((float(data_slice.min()), float(data_slice.max())))

        # Plot the data slice
        im = axes[idx].pcolormesh(
            data_slice['lon'], data_slice['lat'], data_slice,
            cmap='viridis' if variable == 'prcp' else 'coolwarm', shading='auto', transform=ccrs.PlateCarree()
        )
        axes[idx].coastlines()
        axes[idx].add_feature(cfeature.BORDERS, linestyle=':')
        axes[idx].add_feature(cfeature.STATES, linestyle=':', edgecolor='gray')
        axes[idx].set_extent([florida_bounds['lon_min'], florida_bounds['lon_max'],
                              florida_bounds['lat_min'], florida_bounds['lat_max']], crs=ccrs.PlateCarree())
        axes[idx].set_title(f'{variable.upper()} - {model_names[idx]}')

        # Close dataset
        ds.close()

    # Determine common color scale based on combined data range
    vmin = min(r[0] for r in data_ranges)
    vmax = max(r[1] for r in data_ranges)

    # Re-scale all plots to common color range
    for ax in axes:
        if ax.collections:  # Ensure the axis has a plot
            im.set_clim(vmin, vmax)

    # Add a single color bar with adjusted fraction and pad
    cax = fig.add_axes([0.92, 0.28, 0.01, 0.58])  # Adjust the position and size of the colorbar
    cbar = fig.colorbar(im, cax=cax, orientation='vertical')
    cbar.set_label(f"{units[variable]}")  # Update the label based on your datasets' units

    # Add common titles for the first six and next six figures
    fig.suptitle(f'{variable.upper()}: baseline (1980-2019)', fontsize=16, y=0.95)
    fig.text(0.5, 0.55, f'{variable.upper()}: near-future (2020-2059) under SSP585', ha='center', va='bottom', fontsize=16)

    # Adjust spacing to narrow the distance between the rows and columns
    plt.subplots_adjust(hspace=0.3, wspace=0.1, bottom=0.25)

    # Save the figure
    plt.savefig(f'{variable.upper()}_Baseline_Near-Future_{variable.upper()}.png', dpi=300, bbox_inches='tight')

    # Display the plot
    plt.show()
_images/2aa3baa3a335b353d786c56659bb6fd14f73f57abe4a11a82abd4d956b71fa6d.png _images/7787ab5c5538c5cbb0d22ea1458a6e60fc548faf8b0aadeec0426e0756a3af7d.png _images/502d09b0ab4c27a95592952a63455843783b045cdf06a0e77af6ad547d14f988.png

Code Summary: Downloading and Saving NetCDF Files (Baseline)#

This Python script downloads precipitation (or any other variable) data in NetCDF format for different climate models and years, and saves it to specified local directories. The main functions of the script are:

Functions#

download_netcdf(url)#

  • Downloads a NetCDF file from the given URL.

  • Returns the content as a BytesIO object for further processing.

download_and_save_netcdf(models, base_url, variable_name, years, output_path)#

  • Loops through the list of models and years.

  • Constructs the URL for each model-year combination based on the base URL and other parameters.

  • Downloads the corresponding NetCDF file using the download_netcdf function.

  • Opens the NetCDF file using xarray and saves it to a specified output directory in NetCDF format.

Example Usage#

  1. Model Names: A list of climate models (ACCESS-CM2, BCC-CSM2-MR, etc.) for which data is to be downloaded.

  2. Base URL: The base URL for accessing the NetCDF files from the server.

  3. Variable Name: The name of the variable (e.g., prcp for precipitation).

  4. Years: The list of years for which the data is to be downloaded (e.g., 1980, 1981).

  5. Output Directory: The directory where the downloaded NetCDF files will be saved.

The code ensures that the output directory exists, and for each year-model combination, it downloads and saves the NetCDF file.

Key Notes:#

  • The output files are saved with the format model_ssp585_variant_RegCM_Daymet_VIC4_variable_year.nc in the specified directory.

import requests
import xarray as xr
import io

def fetch_netcdf_from_url(url):
    """Fetch the NetCDF file from the given URL."""
    response = requests.get(url)
    if response.status_code == 200:
        return io.BytesIO(response.content)
    else:
        raise Exception(f"Failed to download NetCDF file. Status code: {response.status_code}")

def average_prcp_for_year(models, base_url, variable_name, year):
    """Average precipitation data from multiple NetCDF files for a specific year."""
    # List to store precipitation data
    prcp_list = []

    for model in models:
        # Determine variant label
        variant_label = "r1i1p1f2" if model == "CNRM-ESM2-1" else "r1i1p1f1"

        # Construct the URL for the model, use Daymet or Livneh
        url = f"{base_url}/{model}_ssp585_{variant_label}_RegCM_Livneh/{variable_name}/{model}_ssp585_{variant_label}_RegCM_Livneh_VIC4_{variable_name}_{year}.nc"
        print(f"Fetching data from: {url}")  # Debugging: Print the URL
        
        # Fetch the NetCDF file
        netcdf_file = fetch_netcdf_from_url(url)

        # Read the NetCDF dataset
        ds = xr.open_dataset(netcdf_file)
        
        # Check if the variable exists
        if variable_name not in ds.variables:
            raise Exception(f"'{variable_name}' variable not found in {url}")

        # Adjust names of lat/lon and time if needed
        lat_name = 'lat'
        lon_name = 'lon'
        time_name = 'time'
        if lat_name not in ds.variables or lon_name not in ds.variables or time_name not in ds.variables:
            raise Exception("Latitude, longitude, or time dimensions not found in the dataset.")
        
        # Filter data for the year
        year_data = ds.sel(
            {time_name: ds[time_name].dt.year == year}
        )

        # Append the prcp data to the list
        prcp_list.append(year_data[variable_name])

    # Compute the average across all models
    prcp_average = xr.concat(prcp_list, dim='model').mean(dim='model')

    return prcp_average

# Define model names, base URL, and other parameters
models = [
    'ACCESS-CM2', 'BCC-CSM2-MR', 'CNRM-ESM2-1',
    'MPI-ESM1-2-HR', 'MRI-ESM2-0', 'NorESM2-MM',
]
base_url = "https://hydroshare.ornl.gov/files/9505/SWA9505V3"
variable_name = "prcp"

# Loop over the years 1980 to 2019 and save averaged precipitation data
for year in range(2051, 2059):
    prcp_average = average_prcp_for_year(models, base_url, variable_name, year)
    output_filename = f"C:/Users/mgebremedhin/OneDrive - Florida Gulf Coast University/FGCU_Mewcha_Data/Livneh/Future/{year}.nc"
    prcp_average.to_netcdf(output_filename)
    print(f"Averaged precipitation data for {year} has been saved to {output_filename}.")

Precipitation validation#

This Python script performs a statistical comparison between observed and downscaled rainfall data across multiple stations using Q-Q (quantile-quantile) plots. It utilizes Pandas for data handling, NumPy for numerical operations, Matplotlib for visualization, and SciPy & Scikit-Learn for statistical analysis.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ks_2samp
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Define file paths for all stations
stations = ["MIAMI-FS_R","NP-201", "NP-202", "NP-203", "NP-205", "NP-206", "NP-A13", 
            "NP-FMB", "NP-NESS20", "NP-OT3", "NP-P33", "NP-P34", "NP-P35", "NP-P36", "NP-P38"]
file_paths = {}

for station in stations:
    file_paths[f"Livneh ({station})"] = f"C:/Users/mgebremedhin/Documents/FGCU/BISECT/PostProcessing/RCM_Validation/Livneh/{station}_L_2011_2019.csv"
    file_paths[f"Daymet ({station})"] = f"C:/Users/mgebremedhin/Documents/FGCU/BISECT/PostProcessing/RCM_Validation/Daymet/{station}_D_2011_2019.csv"

# Define columns of interest
models = ['ACCESS-CM2', 'BCC-CSM2-MR', 'CNRM-ESM2-1', 'MPI-ESM1-2-HR', 'MRI-ESM2-0', 'NorESM2-MM']
observed_col = 'Observed'

# Function to load data and drop NaNs
def load_and_clean_data(file_path):
    df = pd.read_csv(file_path)
    df = df.dropna(subset=[observed_col] + models)
    return df

# Load all datasets
dataframes = {name: load_and_clean_data(path) for name, path in file_paths.items()}

# Set up a grid for Q-Q plots
fig, axes = plt.subplots(15, 2, figsize=(15,70))
axes = axes.flatten()

# Loop over datasets and axes
for (title, df), ax in zip(dataframes.items(), axes):
    mean_observed = df[observed_col].mean()  # Compute mean observed rainfall

    for model in models:
        observed = np.sort(df[observed_col])
        modeled = np.sort(df[model])

        # Compute RMSE, MAE, and Mean
        rmse = np.sqrt(mean_squared_error(observed, modeled))
        mae = mean_absolute_error(observed, modeled)
        r2 = r2_score(observed, modeled)
        mean_model = df[model].mean()

        # Compute quantiles
        quantiles = np.linspace(0, 1, len(observed))
        obs_quantiles = np.quantile(observed, quantiles)
        mod_quantiles = np.quantile(modeled, quantiles)

        # Q-Q plot
        ax.scatter(obs_quantiles, mod_quantiles, s=15, alpha=0.8, 
                   label=f"{model} (MAE:{mae:.2f}, µ:{mean_model:.2f} mm)")

    # Plot 1:1 reference line
    max_val = max(df[observed_col].max(), df[models].max().max())  
    ax.plot([0, max_val], [0, max_val], 'r--', label='1:1 Line')

    # Formatting
    ax.set_xlabel("Observed Quantiles", fontsize=12)
    ax.set_ylabel("Modeled Quantiles", fontsize=12)
    ax.set_title(f"{title}", fontsize=12)
    ax.legend(loc="upper left", fontsize=8)
    ax.text(0.5, 0.1, f"Obs µ: {mean_observed:.2f} mm", 
            transform=ax.transAxes, fontsize=10, color='black', fontweight='regular')
    ax.grid(True, linestyle="--", alpha=0.5)

# Adjust layout and show the plot
plt.tight_layout()
plt.savefig("C:/Users/mgebremedhin/Documents/FGCU/BISECT/PostProcessing/RCM_Validation/Figures/Q-Q_Plot.png")
plt.show()
_images/6ae734fd68ee5dd40b9de8fc43e0b0d08f5e4c4a2c9496fe99e46774da0aa825.png

GeoTIFF files into a single .dat file#

This Python script processes multiple GeoTIFF files from a specified directory and combines their data into a single .dat file. The code is designed to handle thousands of TIFF files efficiently and outputs a formatted text file with rows prefixed by a unique column code for each TIFF file, which is in the format required by BISECT model

import numpy as np
import rasterio
import os

# Define fixed grid dimensions
cols, rows = 259, 186  # 259 columns × 186 rows
cell_size = 500  # Original grid resolution in meters

# Define new zone resolution
x_res, y_res = 500, 500  # Resolution in meters
zone_cols = cols // (x_res // cell_size)
zone_rows = rows // (y_res // cell_size)
num_zones = zone_cols * zone_rows

# Path to folder containing rainfall raster files
raster_folder = "C:/Users/mgebremedhin/OneDrive - Florida Gulf Coast University/FGCU_Mewcha_Data/Livneh/Model_based/SD/baseline/Tiff_MRI-ESM2-0"

# Get a sorted list of all .tif files
raster_files = sorted([os.path.join(raster_folder, f) for f in os.listdir(raster_folder) if f.endswith('.tif')])

# Output file path
output_file = "C:/Users/mgebremedhin/Documents/FGCU/BISECT/PostProcessing/BISECT_Rainfall_Input/Zones/inputrain_6_hr_SD_baseline500_MRI-ESM2-0_2011_2019.dat"

# Compute zone indices, ensuring bottom-left origin
def get_zone_indices():
    zones = []
    for i in range(zone_rows):  # Start from the bottom row
        for j in range(zone_cols):
            col_start = j * (x_res // cell_size) + 1
            col_end = min((j + 1) * (x_res // cell_size), cols)
            row_start = rows - (i + 1) * (y_res // cell_size) + 1  # Reverse row index
            row_end = min(rows - i * (y_res // cell_size), rows)
            zones.append((col_start, col_end, row_start, row_end))
    return zones

zones = get_zone_indices()

# Compute average rainfall per zone
def get_zone_average(data, col_start, col_end, row_start, row_end):
    zone_data = data[row_start-1:row_end, col_start-1:col_end]
    return np.nanmean(zone_data) if not np.isnan(zone_data).all() else 0

# Open output file for writing
with open(output_file, "w") as file:
    # Write grid cell information in the required format
    for idx, (col_start, col_end, row_start, row_end) in enumerate(zones, start=1):
        file.write(f"{idx}\n1\n{col_start} {col_end} {row_start} {row_end}\n")
    
    # End of header section
    file.write("0\n")
    file.write("1.00," + ",".join(["0."] * len(zones)) + "\n")

    # Process each raster file
    for i, raster_file in enumerate(raster_files, start=1):
        with rasterio.open(raster_file) as src:
            data = src.read(1)  # Read the first (only) band
            data = np.where(data == src.nodata, np.nan, data) / 86400000  # Convert mm/day to m/second
        
        # Generate time step labels
        row_codes = [f"{i + base:.2f}" for base in [0.21, 0.46, 0.71, 0.96]]
        
        # Write rainfall data four times for each time step (without trailing comma)
        for row_code in row_codes:
            values = [f"{get_zone_average(data, col_start, col_end, row_start, row_end):.2E}" for col_start, col_end, row_start, row_end in zones]
            file.write(f"{row_code}," + ",".join(values) + "\n")

print(f"Rainfall input file for each zone generated: {output_file}")