Oil Price API Documentation - Quick Start in 5 Minutes | REST API
GitHub
GitHub

Data Sources & Methodology

Overview

OilPriceAPI aggregates commodity price data from 15+ public sources to provide reliable, normalized pricing information through a single API.

Our Value Proposition

We don't just scrape data—we solve the hard problems:

  • ✅ 24/7 reliability: Scrapers running continuously with automatic failover
  • ✅ Format normalization: Convert 15+ different formats into one consistent API
  • ✅ Quality validation: Cross-check multiple sources to detect anomalies
  • ✅ Historical archival: Store and serve years of historical data
  • ✅ 99.5% uptime: Professional infrastructure with monitoring and alerts

You could scrape these sources yourself—but you'd spend weeks building and maintaining what we already run.

Primary Data Sources

Exchange Data (Public Quotes)

  • ICE Futures - Brent crude, natural gas (public quotes)
  • CME Group - WTI crude, RBOB gasoline (public data)
  • NYMEX - Heating oil, diesel (public quotes)
  • National Grid - UK natural gas (public pricing)

Government Sources (Public Domain)

  • U.S. EIA - Energy Information Administration
  • BEIS - UK Department for Energy Security
  • Eurostat - European energy statistics
  • DOE - U.S. Department of Energy

Industry Publishers (Public Data)

  • Argus Media - Public benchmark prices
  • Platts - Published spot prices
  • Reuters - Public commodity data
  • Trading Economics - Economic indicators

Mining & Metals

  • London Metal Exchange - Public quotes
  • COMEX - Gold, silver, copper (public data)

Data Collection Methodology

Automated Collection

  • Frequency: Every 5-15 minutes during market hours
  • Method: Respectful web scraping within robots.txt limits
  • Rate limiting: Throttled to avoid server overload
  • Retry logic: Exponential backoff on failures

Multi-Source Validation

When available, we cross-check prices across 3+ sources:

  1. Fetch price from Source A
  2. Fetch price from Source B
  3. Fetch price from Source C
  4. Compare: Flag if variance >5%
  5. Manual review for flagged data

Quality Assurance

  • Outlier detection: Statistical analysis flags anomalies
  • Timestamp validation: Ensure data freshness
  • Format verification: Check for parsing errors
  • Manual review: Human verification of suspicious data

Data Processing Pipeline

Raw Sources (15+)
    ↓
Scrapers (Python, Ruby)
    ↓
Raw Data Lake (PostgreSQL)
    ↓
Normalization Layer
    ↓
Validation Engine
    ↓
Clean Data Store
    ↓
API Layer (Rails)
    ↓
Your Application

Normalization Steps

  1. Currency conversion: All prices standardized to USD (unless specified)
  2. Unit conversion: Barrels, therms, MWh normalized
  3. Timezone handling: All timestamps in UTC
  4. Field mapping: Consistent field names across sources

Historical Data Archive

Storage

  • Retention: All historical data since 2020
  • Frequency: Daily snapshots
  • Granularity: Intraday data where available
  • Format: Time-series optimized storage

Backfilling

When adding new commodities, we backfill historical data:

  • Scrape archived public data
  • Validate against known benchmarks
  • Fill gaps with interpolation (marked as estimated)

Legal Compliance

What We Scrape

  • ✅ Public data only - No authentication bypass
  • ✅ Respect robots.txt - Honor crawler directives
  • ✅ Rate limiting - Responsible scraping practices
  • ✅ Public domain government data - Fully legal

What We Don't Scrape

  • ❌ Paywalled content - Never bypass authentication
  • ❌ Terms of Service violations - Respect all ToS
  • ❌ Private data - Only publicly accessible info
  • ❌ Excessive requests - Rate limiting prevents overload

Legal Precedents

Our data aggregation practices comply with established case law:

feist Publications v. Rural Telephone (1991)

  • Facts and data are not copyrightable
  • Applies to commodity prices (factual information)

hiQ Labs v. LinkedIn (2022)

  • Scraping publicly accessible data is legal
  • Computer Fraud and Abuse Act doesn't apply to public info

eBay v. Bidder's Edge (2000)

  • Establishes reasonable scraping limits
  • Key: Respect robots.txt, avoid server overload

Compliance Standards

  • GDPR: No personal data collected or stored
  • CCPA: California privacy compliance
  • Terms compliance: Respect all source ToS
  • Ethical scraping: Industry best practices

Data Accuracy & Reliability

Error Rates

  • Target accuracy: 99.9% for latest prices
  • Validation coverage: 85%+ cross-checked
  • Correction SLA: 24 hours for reported errors

Known Limitations

  • ⚠️ Delayed data: Some sources update with 5-15 min delay
  • ⚠️ Source failures: Temporary outages affect availability
  • ⚠️ Market closures: No data during non-trading hours for some commodities
  • ⚠️ Historical gaps: Some commodities have incomplete historical data

Transparency Commitment

Data Lineage

For any price data point, we can provide:

  • Source website(s)
  • Collection timestamp
  • Validation status (cross-checked or single-source)
  • Last verified timestamp

Reporting Issues

Found inaccurate data?

  • Email: [email protected]
  • Include: Commodity, timestamp, expected vs actual price
  • Response time: 24 hours for investigation

Comparison to Alternatives

vs. Manual Research

  • Manual: Hours per day checking 15+ websites
  • OilPriceAPI: Single API call, always current

vs. Bloomberg Terminal

  • Bloomberg: $24,000/year, complex interface
  • OilPriceAPI: Starting at $15/month, simple REST API

vs. Building Your Own

  • DIY: Weeks to build, ongoing maintenance
  • OilPriceAPI: 5 minutes to integrate, we maintain

Future Data Sources

We're constantly adding new sources:

  • 🔜 Asian benchmarks (Dubai, Tapis)
  • 🔜 Cryptocurrency correlation data
  • 🔜 Carbon credit pricing
  • 🔜 Renewable energy certificates

Have a data source request? Email us: [email protected]

Technical Details

Source Monitoring

  • Health checks: Every 5 minutes
  • Alerting: PagerDuty for source failures
  • Failover: Automatic switch to backup sources
  • Uptime target: 99.5% data availability

Infrastructure

  • Cloud provider: DigitalOcean
  • Database: PostgreSQL with TimescaleDB
  • Caching: Redis for hot data
  • CDN: CloudFlare for global performance

Contact & Support

Questions about our data sources?

  • Email: [email protected]
  • Documentation: https://docs.oilpriceapi.com
  • API Reference: https://docs.oilpriceapi.com/api-reference

Last updated: November 29, 2024