Skip to content

API Reference

NHANESExplorer

High-level convenience class for NHANES workflows.

Methods (selected)

  • get_detailed_component_manifest(...)
  • save_detailed_component_manifest(path, **kwargs)
  • get_demographics_data(cycle)
  • get_body_measures(cycle)
  • get_blood_pressure(cycle)
  • create_merged_dataset(cycle)
  • analyze_by_demographics(df, metric, demographic)
  • create_demographic_visualization(df, metric, demographic)
  • generate_summary_report(df)

Refer to inline docstrings for full parameter details.

Pesticide Laboratory Module

get_pesticide_metabolites(cycle, ref_path=None, timeout=30)

Load and harmonize NHANES pesticide laboratory analytes for a given cycle.

Parameters: - cycle (str): NHANES cycle in format YYYY-YYYY (e.g., '2017-2018') - ref_path (Path, optional): Path to pesticide_reference.csv (defaults to data/reference/pesticide_reference.csv) - timeout (int): Download timeout in seconds (default: 30)

Returns: DataFrame with schema:

Column Type Description
participant_id int NHANES SEQN identifier
cycle str Survey cycle
analyte_name str Normalized metabolite name (e.g., '3-PBA')
parent_pesticide str Parent active ingredient or chemical class
metabolite_class str Category (pyrethroid, OP, organochlorine, herbicide)
matrix str Biological matrix ('urine' or 'serum')
concentration_raw float Reported concentration (original units)
unit str Measurement unit (e.g., 'µg/L', 'ng/g lipid')
log_concentration float Natural log of concentration (NaN for ≤0)
detected_flag bool True if concentration_raw > 0
source_file str Originating XPT filename

Returns empty DataFrame if cycle has no pesticide data or download fails.

Raises: - ValueError: If cycle format invalid or not in known mapping

Example:

from pophealth_observatory.laboratory_pesticides import get_pesticide_metabolites

pest_df = get_pesticide_metabolites('2017-2018')
print(pest_df[['participant_id', 'analyte_name', 'concentration_raw']].head())

Data Sources: Attempts to download from multiple NHANES pesticide file series: - UPHOPM: Pyrethroids, Herbicides, & Organophosphorus Metabolites - OPD: Organophosphate Dialkyl Phosphate Metabolites - PP: Priority Pesticides - Current Use

Supported Cycles: 1999-2000 through 2021-2022 (availability varies by analyte)

Schema Notes (0.7.0): - log_concentration uses natural log; values <= 0 yield NaN to avoid math domain errors. - detected_flag is a simple > 0 heuristic; future versions may incorporate LOD/LOQ thresholds when published reference limits are integrated. - parent_pesticide enables grouping analytes by active ingredient lineage for aggregate exposure metrics. - Output remains wide-format per participant per analyte; long-format convenience helper planned.

Test Coverage (0.7.0): Ingestion paths validated via unit tests for empty cycles, synthetic datasets, and edge-case metabolite naming (commas, hyphenation, mixed case).

load_pesticide_reference(ref_path=None)

Load curated pesticide analyte reference metadata.

Parameters: - ref_path (Path, optional): Path to reference CSV

Returns: DataFrame with columns: - analyte_name, parent_pesticide, metabolite_class, cas_rn, typical_matrix, unit, first_cycle_measured, last_cycle_measured, etc.

Returns empty DataFrame if file not found.

Example:

from pophealth_observatory.laboratory_pesticides import load_pesticide_reference

ref_df = load_pesticide_reference()
pyrethroids = ref_df[ref_df['metabolite_class'] == 'Pyrethroid']

BRFSSExplorer

State-level health indicator access from CDC BRFSS dataset.

Methods

get_obesity_data(year=None)

Retrieve state-level adult obesity prevalence (BMI ≥ 30).

Parameters: - year (int, optional): Target year. If None, uses latest available.

Returns: DataFrame with columns: - year, state, state_name, value, low_ci, high_ci, sample_size, data_source, class_name, question

Raises: ValueError if specified year not found.

Example:

brfss = BRFSSExplorer()
obesity_data = brfss.get_obesity_data(year=2022)

get_indicator(class_name, question, year=None)

Retrieve any BRFSS health indicator by class and question.

Parameters: - class_name (str): BRFSS indicator class (e.g., "Physical Activity") - question (str): Exact question text from BRFSS dataset - year (int, optional): Target year. If None, uses latest available.

Returns: DataFrame with same structure as get_obesity_data()

Raises: ValueError if specified year not found for this indicator.

Example:

physical_activity = brfss.get_indicator(
    class_name='Physical Activity',
    question='Percent of adults aged 18 years and older who engage in no leisure-time physical activity'
)

list_available_indicators()

List all unique class/question combinations in BRFSS dataset.

Returns: DataFrame with columns ['class', 'question']

Example:

indicators = brfss.list_available_indicators()
print(indicators[indicators['class'] == 'Obesity / Weight Status'])

summary(df)

Generate summary statistics for a BRFSS indicator DataFrame.

Parameters: - df (DataFrame): Output from get_obesity_data() or get_indicator()

Returns: dict with keys: - count, mean_value, min_value, max_value, year, class_name, question

Example:

obesity_data = brfss.get_obesity_data()
stats = brfss.summary(obesity_data)
print(f"Mean: {stats['mean_value']:.1f}%")

Configuration

BRFSSConfig - base_url (str): CDC API endpoint (default: "https://data.cdc.gov/resource/hn4x-zwk7.json") - timeout (int): HTTP timeout in seconds (default: 30) - default_limit (int): API result limit (default: 5000)

BRFSSExplorer constructor: - config (BRFSSConfig, optional): Configuration object - session (requests.Session, optional): Reusable HTTP session - enable_cache (bool): In-memory caching (default: True)

Data Source

  • Dataset: CDC BRFSS Nutrition, Physical Activity, and Obesity (hn4x-zwk7)
  • Documentation: https://data.cdc.gov/Nutrition-Physical-Activity-and-Obesity
  • Coverage: State-level health indicators for all 50 states + DC

See docs/usage/brfss.md for detailed usage examples.