Lake offers the following data types:

  • trades – trades occurring on the given exchange aggregated by a single taker order and price.
  • book – market depth (order book) snapshots in high frequency, at least once per 100ms depending on the exchange support. Contains 20 price levels for each order book side.
  • level_1 – a practical derivate of book containing just the first price level on each order book side. book rows containing updates on deeper price levels are skipped, which makes this data type much faster to load and work with.
  • book_delta_v2 – market depth (order book) updates in high frequency, supports unlimited (eg. 1000+) price levels on each order book side. Has even higher update frequency than book data, but is more complicated to process as you have to build order book model from the updates as shown in this interactive notebook.
  • candles – 1-minute OHLCV candles.
  • book_1m – 1-minute order book snapshots containing up to one thousand price levels on each order book side.
  • funding – funding and predicted funding rate plus mark_price for futures. On Binance for example this is updated every 3 seconds.
  • open_interest – open interest for futures. On Binance for example this is updated every 20 seconds.
  • liquidations – liquidation metrics such as side, price and quantity for futures. On Binance for example this is updated at most once per second.

All data are partitioned by exchange, symbol and day and you can access any combination of those. Note that level_1 and candles data are just a derivates of book and trades data. Perpetuals/futures symbols have ‘-PERP’ suffix in their name.

You can find more details about exchange coverage and available history in the Coverage section.

Access API

The data are available through an easy-to-use Python API. It offers high-performance parallelized data downloads, supports caching and downloading only new and unseen data once you add more time range or more token pairs to your query. This enables for example complex distributed machine-learning workloads.

For advanced users, it’s also possible to access the data directly through AWS S3. The data storage consists of a directory structure eg. trades/exchange=BINANCE/symbol=BTC-USDT/dt=2022-01-01/random_name.parquet and contains compressed parquet table files that are easy to load from any programming language. If you use this interface, please implement some kind of caching or don’t download the same data often to stay compliant with our Terms. Also be warned that the column names might slightly differ from the Python API and Schemata part of this page.

The access and amount of downloaded data are unlimited, but we reserve the right to contact customers who scrape all data or misconfigure cache and cause unnecessary traffic in the scale of ~100GBs. More details can be found in the Terms of Service.

Schemata

trades

name pandas type example value
side category ‘buy’
quantity float64 0.00342
price float64 19549.73
trade_id Int64 1704373229
origin_time datetime64[ns] 1666051199989000192
received_time datetime64[ns] 1666051200016254720
exchange category ‘BINANCE’
symbol category ‘BTC-USDT’

Note that on SERUM exchange the trades also contain MPID identification of the buying and selling party. This variant of the trades schema is called trades_mpid.

book

name pandas type example value
received_time datetime64[ns] 1666051200016254720
sequence_number Int64 548631456
bid_0_price float64 19549.73
bid_0_size float64 0.00342
bid_1_price float64
bid_1_size float64
bid_19_price float64
bid_19_size float64
ask_0_price float64
ask_0_size float64
ask_19_price float64
ask_19_size float64
exchange category ‘BINANCE’
symbol category ‘BTC-USDT’

level_1

name pandas type example value
received_time datetime64[ns] 1666051200016254720
bid_0_price float64 19549.73
bid_0_size float64 0.00342
ask_0_price float64 19549.75
ask_0_size float64 0.00634
exchange category ‘BINANCE’
symbol category ‘BTC-USDT’

book_delta_v2

name pandas type example value
origin_time datetime64[ns] 1666051200014000000
received_time datetime64[ns] 1666051200016254720
sequence_number Int64 5667521204
side_is_bid boolean False
price float64 33.97
size float64 393.38
exchange category ‘BINANCE’
symbol category ‘AVAX-USDT’

deep_book_1m

name pandas type example value
received_time datetime64[ns] 1666051200016254720
sequence_number Int64 5667521204
bid_prices list[float64] [36.6, 36.2, 36.77, …]
bid_sizes list[float64] 33.97
ask_prices list[float64] 33.97
ask_sizes list[float64] 33.97
exchange category ‘BINANCE’
symbol category ‘AVAX-USDT’

candles

name pandas type example value
origin_time datetime64[ns] 1666051140000000000
open float64 19549.73
high float64 19549.87
low float64 19548.48
close float64 19549.86
volume float64 0.03388
trades Int64 24
received_time datetime64[ns] 1666051140004562561
start float64 1666051140000000000
stop float64 166605120000000000
exchange category ‘BINANCE’
symbol category ‘BTC-USDT’

funding

name pandas type example value
origin_time datetime64[ns] 1682899248008000000
mark_price float64 1867.939062
index_price float64 0.0
rate float64 -0.000026
next_funding_time datetime64[ns] 1682928000000000000
received_time datetime64[ns] 1682899273386438400
exchange category ‘BINANCE_FUTURES’
symbol category ‘ETH-USDT-PERP’

Note that index price or mark price may be zero based on exchange or data origin.

open_interest

name pandas type example value
origin_time datetime64[ns] 1682899248008000000
open_interest float64 92098.846
received_time datetime64[ns] 1682899273386438400
exchange category ‘BINANCE_FUTURES’
symbol category ‘ETH-BUSD-PERP’

liquidations

name pandas type example value
origin_time datetime64[ns] 1682899248008000000
side string ‘buy’
quantity float64 0.457
price float64 1942.6
id float64 -1
status string filled
received_time datetime64[ns] 1682899273386438400
exchange category ‘BINANCE_FUTURES’
symbol category ‘ETH-BUSD-PERP’

Note that some exchanges only publish last liquidation in a given second, here is documentation for Binance.

Notes

  • All quantities, sizes and volumes are in the base asset (eg. BTC in case of BTC-USDT pair).
  • All times are in nanosecond unix integer timestamp format
  • All decimal numbers are as floats. This is to ensure good storage and computational performance. You may wish to round them to tick size and convert to Python Decimal or string to get the precise value eg. for a pretty human-readable representation.
  • Order book data usually don’t contain origin_time. Dataframes with them may contain an empty origin_time column with 0 or -1.
  • Data for the past day are uploaded every day between 00:00 UTC and 3:00 UTC.

Subscription

Market data

  • Subscribe now,
    start your research in 3 minutes.

Existing subscribers

  • Upgrade, pause or cancel
    your subscription any time.