Advanced Functionalities

NOAADataset (pyleotups.utils.NOAADataset)

class pyleotups.utils.NOAADataset.NOAADataset(study_data)[source]

This class encapsulates study metadata and its related components (e.g. publications, sites) retrieved from the NOAA API.

study_id

The unique NOAA study identifier.

Type:

str

xml_id

The XML identifier of the study.

Type:

str

metadata

A dictionary containing basic metadata such as studyName, dataType, earliestYearBP, etc.

Type:

dict

investigators

A comma-separated string of investigator names.

Type:

str

publications

A list of Publication objects associated with the study.

Type:

list of Publication

sites

A list of Site objects associated with the study.

Type:

list of Site

to_dict()[source]

Convert the study data and its components to a dictionary.

Returns:

A dictionary representing the study including metadata, investigators, publications, and sites.

Return type:

dict

PaleoData (pyleotups.utils.PaleoData)

class pyleotups.utils.PaleoData.PaleoData(paleo_data, study_id, site_id)[source]

Represents paleo data associated with a site, including multiple data files and full variable metadata per file.

datatable_id

Unique NOAA data table identifier.

Type:

str

dataTableName

Name of the data table.

Type:

str

timeUnit

Time unit used in the data table.

Type:

str

files

List of raw file info dicts.

Type:

list of dict

file_variable_map

Maps fileUrl to a dict of variables and their full metadata.

Type:

dict

file_url

Shortcut to first file URL (for backward compatibility).

Type:

str or np.nan

variables

Shortcut to variable names in first file (for backward compatibility).

Type:

list of str

to_dict(file_obj=None)[source]

Convert PaleoData into a dictionary, optionally for a specific file.

Parameters:

file_obj (dict, optional) – Specific file object (default is first file).

Returns:

Dictionary of core metadata for one file.

Return type:

dict

Publication (pyleotups.utils.Publication)

class pyleotups.utils.Publication.Publication(pub_data)[source]

Represents a publication within a study.

author

The name of the author(s) of the publication.

Type:

str

title

The title of the publication.

Type:

str

journal

The journal where the publication appeared.

Type:

str

year

The publication year.

Type:

str

volume

The volume number (if applicable).

Type:

str or None

number

The issue number (if applicable).

Type:

str or None

pages

The page numbers (if applicable).

Type:

str or None

pub_type

The type of publication.

Type:

str or None

doi

The Digital Object Identifier.

Type:

str or None

url

URL for the publication.

Type:

str or None

study_id

The NOAA study ID to which this publication belongs.

Type:

str or None

get_citation_key()[source]

Generate a unique citation key for the publication.

Returns:

A citation key in the format: “<LastName>_<FirstSignificantWord>_<Year>_<StudyID>”.

Return type:

str

to_dict()[source]

Convert the publication data into a dictionary.

Returns:

A dictionary representation of the publication.

Return type:

dict

Site (pyleotups.utils.Site)

class pyleotups.utils.Site.Site(site_data, study_id)[source]

Represents a site within a study.

to_dict()[source]

Convert the site into a list of dictionaries, one per PaleoData file.

Parsers

NonStandardParser (pyleotups.utils.Parser.NonStandardParser)

class pyleotups.utils.Parser.NonStandardParser.NonStandardParser(file_path)[source]

Parser for NOAA files that do not follow standard metadata formatting.

file_path

Path to the file to be parsed.

Type:

str

lines

Lines read from the file.

Type:

list of str

blocks

Segregated blocks of lines with associated metadata.

Type:

list of dict

detect_header_extent(block, delimiter)[source]

Detects how many initial lines qualify as header rows.

Parameters:
  • block (dict) – Block of lines.

  • delimiter (str) – Delimiter used to split lines.

Returns:

Number of header lines, and index of title line if found.

Return type:

tuple of (int, Optional[int])

parse()[source]

Parses the file and extracts tabular data.

Returns:

List of extracted tables.

Return type:

list of pandas.DataFrame

Raises:

ParsingError – If no usable tables are found.

exception pyleotups.utils.Parser.NonStandardParser.ParsingError[source]

Exception raised when parsing a non-standard file fails.

StandardParser (pyleotups.utils.Parser.StandardParser)

exception pyleotups.utils.Parser.StandardParser.ParsingError[source]

Exception raised when the StandardParser encounters a parsing error.

class pyleotups.utils.Parser.StandardParser.StandardParser(url=None)[source]

StandardParser parses NOAA .txt data files with standard format: Standard format refers to NOAA Templated file with metadata -> (# lines), variables -> (## lines), data (tab-deliimited).

url

URL of the file to parse.

Type:

str

lines

Fetched lines from file.

Type:

list of str

meta_start

Index where metadata block starts.

Type:

int

meta_end

Index where metadata block ends.

Type:

int

variables

Extracted variable names.

Type:

list of str

skip_lines

Lines to skip after metadata to reach data.

Type:

int

data

Parsed data rows.

Type:

list of list of str

df

Final constructed dataframe.

Type:

pandas.DataFrame

parse(url=None)[source]

Public method to parse the NOAA file.

Parameters:

url (str, optional) – URL to override the existing one.

Return type:

pandas.DataFrame