MarketSource and DataFeeds
Market data is the foundation of any trading system. TheoryCraft provides a flexible architecture for ingesting, transforming, and streaming market data through DataFeeds and MarketSource.
Why Data Quality Matters
Data is the most critical element in backtesting. A strategy is only as good as the data used to test it. Poor data quality or improper data handling renders backtests meaningless-live trading results will diverge significantly from historical simulations.
Common data problems that invalidate backtests:
- Look-ahead bias - Using future information that wasn't available at decision time
- Missing data - Gaps that hide important market events
- Incorrect timestamps - Events processed out of order
- Bad prices - Spikes, gaps, or errors in price data
- Survivorship bias - Testing on today's S&P 500 composition excludes companies like Lehman Brothers or Enron that went bankrupt, artificially inflating historical returns
TheoryCraft addresses these challenges through an event-driven architecture rather than a vectorized approach. Unlike vectorized backtesting engines that operate on pre-loaded arrays of data, TheoryCraft replays market data event by event, preserving temporal ordering and execution realism.
This design choice prioritizes accuracy over raw speed. By processing events sequentially with proper timestamps, TheoryCraft reproduces market conditions as faithfully as possible. Your strategy sees the same data flow in backtesting as it would in live trading, eliminating an entire class of simulation artifacts.
The goal is simple: backtests that translate reliably to live performance.
What is MarketSource?
MarketSource is the main orchestrator for market data in TheoryCraft. It manages the entire data pipeline, from raw data ingestion to processed events ready for strategy consumption.
MarketSource combines two types of components:
- DataFeed - Handles data ingestion from external sources
- Processors - Transform and compute on the data (resampling, indicators, filters)
Together, these components form a streaming pipeline that emits MarketEvent structures to downstream Engines.
MarketEvent represents the immutable market state at a given point in time and forms the contract between data ingestion and strategy execution.
What is a DataFeed?
A DataFeed is the component within MarketSource responsible for data ingestion. It connects to external data sources (files, APIs, databases) and streams raw price data into the pipeline.
In TheoryCraft, a DataFeed implements the TheoryCraft.DataFeed behaviour and is responsible for:
- Fetching raw market data from a source
- Converting data into TheoryCraft's internal format (Tick or Bar)
- Streaming data into the MarketSource pipeline
Think of a DataFeed as an adapter between external data and TheoryCraft.
What are Processors?
Processors are the compute layer within MarketSource. They transform, filter, and enrich the data stream. Common processors include:
- Resampler - Converts tick data to bars or aggregates bars to higher timeframes
- Indicators - Computes technical indicators (SMA, RSI, MACD, etc.)
- Filters - Removes or flags invalid data points
Processors can be chained together and independent processors can run in parallel for better performance.
Types of Market Data
Tick Data
The most granular form of market data. Each tick represents a single price update with bid, ask, and volume information.
%Tick{
time: ~U[2024-01-15 14:30:00.123Z],
bid: 1.08542,
ask: 1.08545,
bid_volume: 1.5,
ask_volume: 2.0
}Use cases: High-frequency strategies, spread analysis, order flow
OHLC Bars (Candlesticks)
Aggregated price data over a time period. Each bar contains open, high, low, close prices and volume.
%Bar{
time: ~U[2024-01-15 14:30:00Z],
open: 1.08540,
high: 1.08560,
low: 1.08535,
close: 1.08552,
volume: 1250.5
}Use cases: Technical analysis, trend following, most trading strategies
Supported Timeframes
TheoryCraft can resample data to any timeframe. Common examples include:
| Code | Meaning |
|---|---|
t1 | 1 tick (raw tick data) |
s30 | 30 seconds |
m1 | 1 minute |
m5 | 5 minutes |
m15 | 15 minutes |
h1 | 1 hour |
h4 | 4 hours |
D1 | 1 day |
W1 | 1 week |
M1 | 1 month |
These are just examples. You can resample to any interval (e.g., m3 for 3 minutes, h2 for 2 hours). The resampler processor converts lower timeframes into higher ones automatically.
Mental Model
Think of MarketSource as a streaming factory:
- DataFeed is the loading dock - raw materials (market data) enter here
- Processors are the assembly line - data gets transformed, enriched, and quality-checked
- MarketEvent is the finished product - ready for consumption by Engines
Data flows in one direction, from raw source to processed output, with each step adding value.
Building a MarketSource Pipeline
A typical MarketSource pipeline:
- Configure a DataFeed with instrument and date range
- Add a resampler processor to convert to the desired timeframe
- Add indicator processors for technical analysis
- Stream the resulting MarketEvents to an Engine
MarketSource coordinates the DataFeed and Processors into a unified streaming pipeline, handling backpressure and parallelization automatically.
Available DataFeeds
Dukascopy DataFeed
Free historical data from Dukascopy Bank for 1600+ instruments:
- Forex: Majors, minors, and exotic pairs
- Equities: US and European stocks
- Cryptocurrencies: Major crypto pairs
- Commodities: Precious metals, energy
- Indices: Major global indices
The Dukascopy library is stable and ready for use.
Custom DataFeed
You can create a custom DataFeed by implementing the TheoryCraft.DataFeed behaviour. This allows integration with any data source: proprietary databases, broker APIs, or real-time feeds.
ℹ️ Additional data sources and real-time feeds are planned.
Best Practices
Stream, Don't Load
Always use streaming to process data. Loading entire datasets into memory defeats the purpose of TheoryCraft's streaming architecture and can cause memory issues with large datasets.
Handle Data Gaps
Market data often has gaps (weekends, holidays, trading halts). Design your strategy to handle missing bars gracefully.
Validate Data Quality
Check for data quality issues:
- Missing timestamps
- Invalid prices (negative, zero, or extreme values)
- Incorrect OHLC relationships (high < low)
Next Steps
This page focused on market data ingestion and transformation. Strategy execution and order handling are covered in the following sections.
- Broker - Order execution and position management
- Engines - Strategy execution and analysis
- Event-Driven Workflow - See how MarketSource fits in the architecture
- Dukascopy DataFeed Guide - Get started with free historical data
- Visualization with Kino - Visualize your data in Livebook