PyleoTUPS User APIs
The following describes the main classes that make up PyleoTUPS. Most users will primarily interact with the functionalities exposed in these classes.
Dataset (pyleotups.core.Dataset.Dataset)
- class pyleotups.core.Dataset.Dataset[source]
A wrapper class for interacting with the NOAA Studies API.
Manages the retrieval, parsing, and aggregation of NOAA study data, and provides methods to access summaries, publications, sites, and external data files.
- BASE_URL
The NOAA API endpoint URL.
- Type:
str
- studies
A mapping from NOAAStudyId to NOAADataset instances.
- Type:
dict
- data_table_index
A mapping from dataTableID to associated study, site, and paleo data.
- Type:
dict
- get_data(dataTableIDs=None, file_urls=None)[source]
Fetch external data for given dataTableIDs or file URLs, perform validations, and attach study and site metadata.
- Parameters:
dataTableIDs (list or str, optional) – One or more NOAA data table IDs.
file_urls (list or str, optional) – One or more file URLs.
- Returns:
A list of DataFrames corresponding to the fetched data.
- Return type:
list of pandas.DataFrame
- Raises:
ValueError – For missing parent study mapping, missing file URL, or proprietary/unsupported file types.
Exception – Propagates any exceptions raised by the parser.
Examples
from pyleotups import Dataset ds=Dataset() df = ds.search_studies(noaa_id=33213) dfs = ds.get_data(dataTableIDs="45859") dfs[0].head()
Site Hole Core Type Section Section_Depth Sample_Depth Age TEX86H SST 0 U1446 C 1 H 1 4.5 0.045 0.31 -0.1177 30.55 1 U1446 C 1 H 1 31.5 0.315 0.66 -0.1216 30.28 2 U1446 C 1 H 1 61.5 0.615 1.04 -0.1183 30.51 3 U1446 C 1 H 1 90.5 0.905 1.41 -0.1089 31.15 4 U1446 C 1 H 1 121.5 1.215 1.80 -0.1155 30.70
- get_funding()[source]
Get a DataFrame of all funding records across loaded studies.
- Returns:
A DataFrame with columns [‘StudyID’, ‘StudyName’, ‘FundingAgency’, ‘FundingGrant’]. Returns an empty DataFrame if no funding is available.
- Return type:
pandas.DataFrame
Examples
from pyleotups import Dataset ds=Dataset() dsf = ds.search_studies(noaa_id=33213) df = ds.get_funding() df.head()
StudyID StudyName FundingAgency FundingGrant 0 33213 Bay of Bengal, Northeast Indian Margin Stable ... US National Science Foundation OCE1634774 1 33213 Bay of Bengal, Northeast Indian Margin Stable ... Japan Society for the Promotion of Science (JSPS) JPMXS05R2900001 2 33213 Bay of Bengal, Northeast Indian Margin Stable ... UK Natural Environment Research Council (NERC) JPMXS05R2900001, 19H05595 3 33213 Bay of Bengal, Northeast Indian Margin Stable ... United States Geological Survey (USGS) NE/L002493/1
- get_geo()[source]
Get a DataFrame of site-level geospatial metadata and associated data types from all studies loaded into the Dataset.
- Returns:
A DataFrame with one row per site and columns: [‘StudyID’, ‘SiteID’, ‘SiteName’, ‘LocationName’, ‘Latitude’, ‘Longitude’, ‘MinElevation’, ‘MaxElevation’, ‘DataType’]
- Return type:
pandas.DataFrame
Examples
from pyleotups import Dataset ds=Dataset() dsf = ds.search_studies(noaa_id=33213) df = ds.get_geo() df.head()
StudyID DataType SiteID SiteName LocationName Latitude Longitude MinElevation MaxElevation 0 33213 PALEOCEANOGRAPHY 58697 IODP U1446 Ocean>Indian Ocean 19.083 85.733 -1440 -1440
- get_publications(save=False, path=None, verbose=False)[source]
Get all publications in both BibTeX and DataFrame formats.
- Parameters:
save (bool, default=False) – If True, save the BibTeX to a .bib file.
path (str or None, optional) – Path to save the .bib file. If None and save=True, saves to ‘bibtex_<timestamp>.bib’.
verbose (bool, default=False) – If True, print the BibTeX content to console.
- Returns:
BibTeX object and DataFrame of publication details.
- Return type:
tuple (pybtex.database.BibliographyData, pandas.DataFrame)
Examples
from pyleotups import Dataset ds=Dataset() dsf = ds.search_studies(noaa_id=33213) bib, df = ds.get_publications() df.head()
Author Title Journal Year Volume Number Pages Type DOI URL CitationKey StudyID StudyName 0 Clemens, Steven; Yamamoto, Masanobu; Thirumala... Remote and Local Drivers of Pleistocene South ... Science Advances 2021 7 23 NaN publication 10.1126/sciadv.abg3848 http://dx.doi.org/10.1126/sciadv.abg3848 M._Remote_2021_33213 33213 Bay of Bengal, Northeast Indian Margin Stable ...
- get_sites()[source]
Get a DataFrame of all sites expanded to paleo data files.
- Returns:
A DataFrame with one row per (Site × PaleoData × File).
- Return type:
pandas.DataFrame
- get_summary()[source]
Get a DataFrame summarizing all loaded studies.
- Returns:
A DataFrame with a summary of study metadata and components.
- Return type:
pandas.DataFrame
Examples
Examples
from pyleotups import Dataset ds=Dataset() df = ds.search_studies(noaa_id=33213) df.head()
StudyID XMLID StudyName DataType EarliestYearBP MostRecentYearBP EarliestYearCE MostRecentYearCE StudyNotes ScienceKeywords Investigators Publications Sites Funding 0 33213 74834 Bay of Bengal, Northeast Indian Margin Stable ... PALEOCEANOGRAPHY 1462580 280 -1460630 1670 Provided Keywords: Indian monsoon, South Asian... None Steven Clemens, Masanobu Yamamoto, Kaustubh Th... [{'Author': 'Clemens, Steven; Yamamoto, Masano... [[{'DataTableID': '45857', 'DataTableName': 'U... [{'fundingAgency': 'US National Science Founda...
- get_tables()[source]
Get a DataFrame of all sites expanded to paleo data files.
- Returns:
A DataFrame with one row per (Site × PaleoData × File).
- Return type:
pandas.DataFrame
Examples
from pyleotups import Dataset ds=Dataset() dsf = ds.search_studies(noaa_id=33213) df = ds.get_tables() df.head()
DataTableID DataTableName TimeUnit FileURL Variables FileDescription TotalFilesAvailable SiteID SiteName LocationName Latitude Longitude MinElevation MaxElevation StudyID StudyName 0 45857 U1446 Benthic Isotopes Clemens2021 cal yr BP https://www.ncei.noaa.gov/pub/data/paleo/contr... [Site, Hole, Type, Section, Core, Section_Dept... NOAA Template File 1 58697 IODP U1446 Ocean>Indian Ocean 19.083 85.733 -1440 -1440 33213 Bay of Bengal, Northeast Indian Margin Stable ... 1 45858 U1446 Planktic Isotopes Clemens2021 cal yr BP https://www.ncei.noaa.gov/pub/data/paleo/contr... [Site, Hole, Type, Section, Comment, Core, Sec... NOAA Template File 1 58697 IODP U1446 Ocean>Indian Ocean 19.083 85.733 -1440 -1440 33213 Bay of Bengal, Northeast Indian Margin Stable ... 2 45859 U1446 TEX86H_SST Clemens2021 cal yr BP https://www.ncei.noaa.gov/pub/data/paleo/contr... [Site, Hole, Type, Section, Core, Section_Dept... NOAA Template File 1 58697 IODP U1446 Ocean>Indian Ocean 19.083 85.733 -1440 -1440 33213 Bay of Bengal, Northeast Indian Margin Stable ... 3 45860 U1446 d18Osw Clemens2021 cal yr BP https://www.ncei.noaa.gov/pub/data/paleo/contr... [Age, SL_Scaled, SL_Scaled_averaged, d18O_G_ru... NOAA Template File 1 58697 IODP U1446 Ocean>Indian Ocean 19.083 85.733 -1440 -1440 33213 Bay of Bengal, Northeast Indian Margin Stable ... 4 45861 U1446 LeafWax CarbonIsotope Clemens2021 cal yr BP https://www.ncei.noaa.gov/pub/data/paleo/contr... [d13C_C28, d13C_C30, d13C_C32, d13C_Ave, Site,... NOAA Template File 1 58697 IODP U1446 Ocean>Indian Ocean 19.083 85.733 -1440 -1440 33213 Bay of Bengal, Northeast Indian Margin Stable ...
- get_variables(dataTableIDs)[source]
Retrieve variable metadata for specified dataTableIDs.
- Parameters:
dataTableIDs (list or str) – One or more NOAA dataTableIDs.
- Returns:
A DataFrame indexed by DataTableID with one row per (file × variable). Includes full variable metadata such as cvShortName, cvUnit, etc.
- Return type:
pandas.DataFrame
Examples
from pyleotups import Dataset ds=Dataset() dsf = ds.search_studies(noaa_id=33213) df_var = ds.get_variables(dataTableIDs="45859") df_var.head()
StudyID SiteID FileURL VariableName cvDataType cvWhat cvMaterial cvError cvUnit cvSeasonality cvDetail cvMethod cvAdditionalInfo cvFormat cvShortName DataTableID 45859 33213 58697 https://www.ncei.noaa.gov/pub/data/paleo/contr... Site PALEOCEANOGRAPHY sampling metadata>sample identification None None None None None None Site identification Character Site 45859 33213 58697 https://www.ncei.noaa.gov/pub/data/paleo/contr... Hole PALEOCEANOGRAPHY sampling metadata>sample identification None None None None None None Hole drilled at Site U1446 Character Hole 45859 33213 58697 https://www.ncei.noaa.gov/pub/data/paleo/contr... Type PALEOCEANOGRAPHY sampling metadata>sample identification None None None None None None H (9 m hydraulic piston core) F (4.5 m hydrau... Character Type 45859 33213 58697 https://www.ncei.noaa.gov/pub/data/paleo/contr... Section PALEOCEANOGRAPHY sampling metadata>sample identification None None None None None None Section number ( 1 through 7 and core catcher ... Character Section 45859 33213 58697 https://www.ncei.noaa.gov/pub/data/paleo/contr... Core PALEOCEANOGRAPHY sampling metadata>sample identification None None None None None None Core number Numeric Core
- search_studies(xml_id=None, noaa_id=None, data_publisher='NOAA', data_type_id=None, keywords=None, investigators=None, max_lat=None, min_lat=None, max_lon=None, min_lon=None, location=None, publication=None, search_text=None, earliest_year=None, latest_year=None, cv_whats=None, recent=False, limit=100)[source]
Search for NOAA studies using the specified parameters.
At least one parameter must be provided to perform a search. This method interfaces with the NOAA NCEI Paleo Study Search API. Use it to filter studies based on location, investigators, time range, keywords, and more.
- Parameters:
xml_id (str, optional) – Specify the internal XML document ID. Must be an exact match (e.g., ‘1840’).
noaa_id (str, optional) – Provide the unique NOAA Study ID as a number (e.g., ‘13156’).
search_text (str, optional) – General text search across study content. Supports wildcards (%) and logical operators (AND, OR). Examples: ‘younger dryas’, ‘loess AND stratigraphy’
data_publisher (by default 'NOAA') – Choose from: ‘NOAA’, ‘NEOTOMA’, or ‘PANGAEA’. Example: ‘NOAA’
data_type_id (str, optional) –
Filter by data type. Use one or more type IDs separated by ‘|’. Available IDs:
1: BOREHOLE, 2: CLIMATE FORCING, 3: CLIMATE RECONSTRUCTIONS, 4: CORALS AND SCLEROSPONGES, 6: HISTORICAL, 7: ICE CORES, 8: INSECT, 9: LAKE LEVELS, 10: LOESS, 11: PALEOCLIMATIC MODELING, 12: FIRE HISTORY, 13: PALEOLIMNOLOGY, 14: PALEOCEANOGRAPHY, 15: PLANT MACROFOSSILS, 16: POLLEN, 17: SPELEOTHEMS, 18: TREE RING, 19: OTHER COLLECTIONS, 20: INSTRUMENTAL, 59: SOFTWARE, 60: REPOSITORY
Example: ‘4|18’
keywords (str, optional) – Use hierarchical terms separated by ‘>’. Separate multiple values using ‘|’. Example: ‘earth science>paleoclimate>paleocean>biomarkers’
investigators (str, optional) – Specify one or more investigator names. Use ‘|’ to separate multiple names. Example: ‘Wahl, E.R.|Vose, R.S.’
max_lat (float, optional) – Upper bound for latitude. Must be between -90 and 90. Example: 90
min_lat (float, optional) – Lower bound for latitude. Must be between -90 and 90. Example: -90
max_lon (float, optional) – Upper bound for longitude. Must be between -180 and 180. Example: 180
min_lon (float, optional) – Lower bound for longitude. Must be between -180 and 180. Example: -180
location (str, optional) – Use region hierarchy separated by ‘>’. Example: ‘Continent>Africa>Eastern Africa>Zambia’
publication (str, optional) – Match against publication metadata such as title, author, or citation. Example: ‘Khider’
earliest_year (int, optional) – Starting year (can be negative for BCE). Used with timeFormat and timeMethod. Example: -500
latest_year (int, optional) – Ending year. Used with timeFormat and timeMethod. Example: 2020
cv_whats (str, optional) – Search using controlled vocabulary terms for measured variables. Format: Hierarchical string using ‘>’ Example: ‘chemical composition>compound>inorganic compound>carbon dioxide’
recent (bool, optional) – Set to True to only return studies from the last two years. Results are sorted by newest.
limit (int, optional) – Set to 100 by default. Limits the number of studies retrieved.
- Returns:
Response DataFrame. Fills the internal studies attribute with structured NOAA study data.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If no inputs are passed.
requests.HTTPError – If the HTTP request returned an unsuccessful status code.
Notes
At least one parameter must be specified, otherwise the API call will fail.
Examples
from pyleotups import Dataset ds=Dataset() ds.search_studies(noaa_id=33213)
StudyID XMLID StudyName DataType EarliestYearBP MostRecentYearBP EarliestYearCE MostRecentYearCE StudyNotes ScienceKeywords Investigators Publications Sites Funding 0 33213 74834 Bay of Bengal, Northeast Indian Margin Stable ... PALEOCEANOGRAPHY 1462580 280 -1460630 1670 Provided Keywords: Indian monsoon, South Asian... None Steven Clemens, Masanobu Yamamoto, Kaustubh Th... [{'Author': 'Clemens, Steven; Yamamoto, Masano... [[{'DataTableID': '45857', 'DataTableName': 'U... [{'fundingAgency': 'US National Science Founda...