pygeohydro package

Submodules

pygeohydro.exceptions module

Customized PyGeoHydro exceptions.

exception pygeohydro.exceptions.InvalidInputRange[source]

Bases: ValueError

Exception raised when a function argument is not in the valid range.

exception pygeohydro.exceptions.InvalidInputType(arg, valid_type, example=None)[source]

Bases: Exception

Exception raised when a function argument type is invalid.

Parameters
  • arg (str) – Name of the function argument

  • valid_type (str) – The valid type of the argument

  • example (str, optional) – An example of a valid form of the argument, defaults to None.

exception pygeohydro.exceptions.InvalidInputValue(inp, valid_inputs)[source]

Bases: Exception

Exception raised for invalid input.

Parameters
  • inp (str) – Name of the input parameter

  • valid_inputs (tuple) – List of valid inputs

pygeohydro.helpers module

Some helper function for PyGeoHydro.

pygeohydro.helpers.nlcd_helper()[source]

Get legends and properties of the NLCD cover dataset.

Notes

The following references have been used:
pygeohydro.helpers.nwis_errors()[source]

Get error code lookup table for USGS sites that have daily values.

pygeohydro.plot module

Plot hydrological signatures.

Plots includes daily, monthly and annual hydrograph as well as regime curve (monthly mean) and flow duration curve.

class pygeohydro.plot.PlotDataType(daily, monthly, annual, mean_monthly, ranked, bar_width, titles, units)[source]

Bases: tuple

Data structure for plotting hydrologic signatures.

annual: pandas.core.frame.DataFrame

Alias for field number 2

bar_width: Dict[str, int]

Alias for field number 5

daily: pandas.core.frame.DataFrame

Alias for field number 0

mean_monthly: pandas.core.frame.DataFrame

Alias for field number 3

monthly: pandas.core.frame.DataFrame

Alias for field number 1

ranked: pandas.core.frame.DataFrame

Alias for field number 4

titles: Dict[str, str]

Alias for field number 6

units: Dict[str, str]

Alias for field number 7

pygeohydro.plot.cover_legends()[source]

Colormap (cmap) and their respective values (norm) for land cover data legends.

pygeohydro.plot.exceedance(daily)[source]

Compute Flow duration (rank, sorted obs).

pygeohydro.plot.prepare_plot_data(daily)[source]

Generae a structured data for plotting hydrologic signatures.

Parameters
  • daily (pandas.Series or pandas.DataFrame) – The data to be processed

  • ranked (bool, optional) – Whether to sort the data by rank for plotting flow duration curve, defaults to False.

Returns

Containing daily, ``monthly, annual, mean_monthly, ranked fields.

Return type

NamedTuple

pygeohydro.plot.signatures(daily, precipitation=None, title=None, title_ypos=1.02, figsize=(14, 13), threshold=0.001, output=None)[source]

Plot hydrological signatures with w/ or w/o precipitation.

Plots includes daily, monthly and annual hydrograph as well as regime curve (mean monthly) and flow duration curve. The input discharges are converted from cms to mm/day based on the watershed area, if provided.

Parameters
  • daily (pd.DataFrame or pd.Series) – The streamflows in mm/day. The column names are used as labels on the plot and the column values should be daily streamflow.

  • precipitation (pd.Series, optional) – Daily precipitation time series in mm/day. If given, the data is plotted on the second x-axis at the top.

  • title (str, optional) – The plot supertitle.

  • title_ypos (float) – The vertical position of the plot title, default to 1.02

  • figsize (tuple, optional) – Width and height of the plot in inches, defaults to (14, 13) inches.

  • threshold (float, optional) – The threshold for cutting off the discharge for the flow duration curve to deal with log 0 issue, defaults to \(1^{-3}\) mm/day.

  • output (str, optional) – Path to save the plot as png, defaults to None which means the plot is not saved to a file.

pygeohydro.print_versions module

Utility functions for printing version information.

The original script is from xarray

pygeohydro.print_versions.get_sys_info()[source]

Return system information as a dict.

From https://github.com/numpy/numpy/blob/master/setup.py#L64-L89

pygeohydro.print_versions.netcdf_and_hdf5_versions()[source]

Get netcdf and hdf5 versions.

pygeohydro.print_versions.show_versions(file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Print the versions of pygeohydro stack and its dependencies.

Parameters

file (file-like, optional) – print to the given file-like object. Defaults to sys.stdout.

pygeohydro.pygeohydro module

Accessing data from the supported databases through their APIs.

class pygeohydro.pygeohydro.NID[source]

Bases: object

Retrieve data from the National Inventory of Dams.

get_attrs(variables)[source]

Get descriptions of the NID variables.

get_codes()[source]

Get the definitions of letter codes in NID database.

get_xlsx()[source]

Get the excel file that containes the dam data.

class pygeohydro.pygeohydro.NWIS[source]

Bases: object

Access NWIS web service.

get_info(query, expanded=False)[source]

Get NWIS stations by a list of IDs or within a bounding box.

Only stations that record(ed) daily streamflow data are returned. The following columns are included in the dataframe with expanded set to False:

Name

Description

site_no

Site identification number

station_nm

Site name

site_tp_cd

Site type

dec_lat_va

Decimal latitude

dec_long_va

Decimal longitude

coord_acy_cd

Latitude-longitude accuracy

dec_coord_datum_cd

Decimal Latitude-longitude datum

alt_va

Altitude of Gage/land surface

alt_acy_va

Altitude accuracy

alt_datum_cd

Altitude datum

huc_cd

Hydrologic unit code

parm_cd

Parameter code

stat_cd

Statistical code

ts_id

Internal timeseries ID

loc_web_ds

Additional measurement description

medium_grp_cd

Medium group code

parm_grp_cd

Parameter group code

srs_id

SRS ID

access_cd

Access code

begin_date

Begin date

end_date

End date

count_nu

Record count

hcdn_2009

Whether is in HCDN-2009 stations

Parameters
  • query (dict) – A dictionary containing query by IDs or BBOX. Use query_byid or query_bbox class methods to generate the queries.

  • expanded (bool, optional) – Whether to get expanded sit information for example drainage area.

Returns

NWIS stations

Return type

pandas.DataFrame

get_streamflow(station_ids, dates, mmd=False)[source]

Get daily streamflow observations from USGS.

Parameters
  • station_ids (str, list) – The gage ID(s) of the USGS station.

  • dates (tuple) – Start and end dates as a tuple (start, end).

  • mmd (bool) – Convert cms to mm/day based on the contributing drainage area of the stations.

Returns

Streamflow data observations in cubic meter per second (cms)

Return type

pandas.DataFrame

static query_bybox(bbox)[source]

Generate the geometry keys and values of an ArcGISRESTful query.

static query_byid(ids)[source]

Generate the geometry keys and values of an ArcGISRESTful query.

pygeohydro.pygeohydro.cover_statistics(ds)[source]

Percentages of the categorical NLCD cover data.

Parameters

ds (xarray.Dataset) –

Returns

Statistics of NLCD cover data

Return type

dict

pygeohydro.pygeohydro.get_nid()[source]

Get all dams in the US (over 91K) from National Inventory of Dams 2019.

Notes

This function downloads a 25 MB xlsx file and convert it into a GeoDataFrame. So, your net speed might be a bottleneck. Another bottleneck is data loading since the dataset has more than 91K rows, it might take sometime for Pandas to load the data into memory.

Returns

A GeoDataFrame containing all the available dams in the database. This dataframe has an attrs property that contains definitions of all the NID variables including their units. You can access this dictionary by, for example, nid.attrs assuming that nid is the dataframe. For example, nli.attrs["VOLUME"] returns the definition of the VOLUME column in NID.

Return type

geopandas.GeoDataFrame

pygeohydro.pygeohydro.get_nid_codes()[source]

Get the definitions of letter codes in NID database.

Returns

A multi-index dataframe where the first index is code categories and the second one is letter codes. For example, tables.loc[('Core Type', 'A')] returns Bituminous Concrete.

Return type

pandas.DataFrame

pygeohydro.pygeohydro.interactive_map(bbox)[source]

Generate an interactive map including all USGS stations within a bounding box.

Notes

Only stations that record(ed) daily streamflow data are included.

Parameters

bbox (tuple) – List of corners in this order (west, south, east, north)

Returns

Interactive map within a bounding box.

Return type

folium.Map

pygeohydro.pygeohydro.nlcd(geometry, resolution, years=None, geo_crs='epsg:4326', crs='epsg:4326')[source]

Get data from NLCD database (2016).

Download land use/land cover data from NLCD (2016) database within a given geometry in epsg:4326.

Parameters
  • geometry (Polygon, MultiPolygon, or tuple of length 4) – The geometry or bounding box (west, south, east, north) for extracting the data.

  • resolution (float) – The data resolution in meters. The width and height of the output are computed in pixel based on the geometry bounds and the given resolution.

  • years (dict, optional) – The years for NLCD data as a dictionary, defaults to {‘impervious’: 2016, ‘cover’: 2016, ‘canopy’: 2016}. Set the value of a layer to None, to ignore it.

  • geo_crs (str, optional) – The CRS of the input geometry, defaults to epsg:4326.

  • crs (str, optional) – The spatial reference system to be used for requesting the data, defaults to epsg:4326.

Returns

NLCD within a geometry

Return type

xarray.DataArray

pygeohydro.pygeohydro.ssebopeta_bygeom(geometry, dates, geo_crs='epsg:4326')[source]

Get daily actual ET for a region from SSEBop database.

Notes

Since there’s still no web service available for subsetting SSEBop, the data first needs to be downloaded for the requested period then it is masked by the region of interest locally. Therefore, it’s not as fast as other functions and the bottleneck could be the download speed.

Parameters
  • geometry (shapely.geometry.Polygon or tuple) – The geometry for downloading clipping the data. For a tuple bbox, the order should be (west, south, east, north).

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

  • geo_crs (str, optional) – The CRS of the input geometry, defaults to epsg:4326.

Returns

Daily actual ET within a geometry in mm/day at 1 km resolution

Return type

xarray.DataArray

pygeohydro.pygeohydro.ssebopeta_byloc(coords, dates)[source]

Daily actual ET for a location from SSEBop database in mm/day.

Parameters
  • coords (tuple) – Longitude and latitude of the location of interest as a tuple (lon, lat)

  • dates (tuple or list, optional) – Start and end dates as a tuple (start, end) or a list of years [2001, 2010, …].

Returns

Daily actual ET for a location

Return type

pandas.DataFrame

Module contents