Data Sources & Methodology
Overview
OilPriceAPI aggregates commodity price data from 15+ public sources to provide reliable, normalized pricing information through a single API.
Our Value Proposition
We don't just scrape data—we solve the hard problems:
- ✅ 24/7 reliability: Scrapers running continuously with automatic failover
- ✅ Format normalization: Convert 15+ different formats into one consistent API
- ✅ Quality validation: Cross-check multiple sources to detect anomalies
- ✅ Historical archival: Store and serve years of historical data
- ✅ 99.5% uptime: Professional infrastructure with monitoring and alerts
You could scrape these sources yourself—but you'd spend weeks building and maintaining what we already run.
Primary Data Sources
Exchange Data (Public Quotes)
- ICE Futures - Brent crude, natural gas (public quotes)
- CME Group - WTI crude, RBOB gasoline (public data)
- NYMEX - Heating oil, diesel (public quotes)
- National Grid - UK natural gas (public pricing)
Government Sources (Public Domain)
- U.S. EIA - Energy Information Administration
- BEIS - UK Department for Energy Security
- Eurostat - European energy statistics
- DOE - U.S. Department of Energy
Industry Publishers (Public Data)
- Argus Media - Public benchmark prices
- Platts - Published spot prices
- Reuters - Public commodity data
- Trading Economics - Economic indicators
Mining & Metals
- London Metal Exchange - Public quotes
- COMEX - Gold, silver, copper (public data)
Data Collection Methodology
Automated Collection
- Frequency: Every 5-15 minutes during market hours
- Method: Respectful web scraping within robots.txt limits
- Rate limiting: Throttled to avoid server overload
- Retry logic: Exponential backoff on failures
Multi-Source Validation
When available, we cross-check prices across 3+ sources:
- Fetch price from Source A
- Fetch price from Source B
- Fetch price from Source C
- Compare: Flag if variance >5%
- Manual review for flagged data
Quality Assurance
- Outlier detection: Statistical analysis flags anomalies
- Timestamp validation: Ensure data freshness
- Format verification: Check for parsing errors
- Manual review: Human verification of suspicious data
Data Processing Pipeline
Raw Sources (15+)
↓
Scrapers (Python, Ruby)
↓
Raw Data Lake (PostgreSQL)
↓
Normalization Layer
↓
Validation Engine
↓
Clean Data Store
↓
API Layer (Rails)
↓
Your Application
Normalization Steps
- Currency conversion: All prices standardized to USD (unless specified)
- Unit conversion: Barrels, therms, MWh normalized
- Timezone handling: All timestamps in UTC
- Field mapping: Consistent field names across sources
Historical Data Archive
Storage
- Retention: All historical data since 2020
- Frequency: Daily snapshots
- Granularity: Intraday data where available
- Format: Time-series optimized storage
Backfilling
When adding new commodities, we backfill historical data:
- Scrape archived public data
- Validate against known benchmarks
- Fill gaps with interpolation (marked as estimated)
Legal Compliance
What We Scrape
- ✅ Public data only - No authentication bypass
- ✅ Respect robots.txt - Honor crawler directives
- ✅ Rate limiting - Responsible scraping practices
- ✅ Public domain government data - Fully legal
What We Don't Scrape
- ❌ Paywalled content - Never bypass authentication
- ❌ Terms of Service violations - Respect all ToS
- ❌ Private data - Only publicly accessible info
- ❌ Excessive requests - Rate limiting prevents overload
Legal Precedents
Our data aggregation practices comply with established case law:
feist Publications v. Rural Telephone (1991)
- Facts and data are not copyrightable
- Applies to commodity prices (factual information)
hiQ Labs v. LinkedIn (2022)
- Scraping publicly accessible data is legal
- Computer Fraud and Abuse Act doesn't apply to public info
eBay v. Bidder's Edge (2000)
- Establishes reasonable scraping limits
- Key: Respect robots.txt, avoid server overload
Compliance Standards
- GDPR: No personal data collected or stored
- CCPA: California privacy compliance
- Terms compliance: Respect all source ToS
- Ethical scraping: Industry best practices
Data Accuracy & Reliability
Error Rates
- Target accuracy: 99.9% for latest prices
- Validation coverage: 85%+ cross-checked
- Correction SLA: 24 hours for reported errors
Known Limitations
- ⚠️ Delayed data: Some sources update with 5-15 min delay
- ⚠️ Source failures: Temporary outages affect availability
- ⚠️ Market closures: No data during non-trading hours for some commodities
- ⚠️ Historical gaps: Some commodities have incomplete historical data
Transparency Commitment
Data Lineage
For any price data point, we can provide:
- Source website(s)
- Collection timestamp
- Validation status (cross-checked or single-source)
- Last verified timestamp
Reporting Issues
Found inaccurate data?
- Email: [email protected]
- Include: Commodity, timestamp, expected vs actual price
- Response time: 24 hours for investigation
Comparison to Alternatives
vs. Manual Research
- Manual: Hours per day checking 15+ websites
- OilPriceAPI: Single API call, always current
vs. Bloomberg Terminal
- Bloomberg: $24,000/year, complex interface
- OilPriceAPI: Starting at $15/month, simple REST API
vs. Building Your Own
- DIY: Weeks to build, ongoing maintenance
- OilPriceAPI: 5 minutes to integrate, we maintain
Future Data Sources
We're constantly adding new sources:
- 🔜 Asian benchmarks (Dubai, Tapis)
- 🔜 Cryptocurrency correlation data
- 🔜 Carbon credit pricing
- 🔜 Renewable energy certificates
Have a data source request? Email us: [email protected]
Technical Details
Source Monitoring
- Health checks: Every 5 minutes
- Alerting: PagerDuty for source failures
- Failover: Automatic switch to backup sources
- Uptime target: 99.5% data availability
Infrastructure
- Cloud provider: DigitalOcean
- Database: PostgreSQL with TimescaleDB
- Caching: Redis for hot data
- CDN: CloudFlare for global performance
Contact & Support
Questions about our data sources?
- Email: [email protected]
- Documentation: https://docs.oilpriceapi.com
- API Reference: https://docs.oilpriceapi.com/api-reference
Last updated: November 29, 2024