Download data

For now hmile provide three ways to download the data:

  • yahoofinance : a reliable source, which contains a lot of assets. However YF contains inconsistencies in volume data

  • polygon.io : Good data provider, however it is not free

  • csv : a csv file containing the data, it’s efficient but it is not updated.

  • elasticsearch : a elasticsearch database containing the data. You have a full control on data but you need to maintains and update it yourself. See

All DataProvider provide a method named getData which returns a dictionnary of a pandas DataFrame. Each key is a name of a requested pair. Exemple :

{
   "BTCUSD": pd.DataFrame 1,
   "ETHUSD" pd.DataFrame 2
}

The dataframes are formatted like that.

data

date (index)

open

high

low

close

volume

2015-01-08

11.01

10.81

11.30

10.75

1433300

2015-01-09

10.96

10.98

11.18

10.72

18536300

Yahoofinance

class Hmile.DataProvider.YahooDataProvider(pairs: List[str], start_date: str, end_date: str, interval: str = 'hour')

Get data from Yahoo Finance

Variables
  • pairs – list of pairs to get

  • interval – The interval of the data

  • start_date – The start date

  • end_date – The end date

  • fill_policy – The fill policy to use

checkArguments(pairs: List[str], interval: str, start: str, end: str) None

Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.

Parameters
  • pairs (List[str]) – list of pairs to get

  • interval (str) – The interval of the data

  • start (str) – The start date

  • end (str) –

Raises

DataProviderArgumentException – When the arguments are not correct

checkDataframe(dataframe)

Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows

getAvailablePairs() List[str]

Return the list of available pairs

Raises

NotImplementedError – if the current dataprovider does not implement this method

Returns

the list of available pairs

Return type

List[str]

getData() Dict[str, DataFrame]

Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’

Returns

The dict of dataframes

Return type

Dict[str, pd.DataFrame]

normalizeColumnsOrder(dataframe)

Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order

Parameters

dataframe (pd.DataFrame) – The dataframe to treat

Returns

After traitement

Return type

pd.DataFrame

Example :

from Hmile.DataProvider import YahooDataProvider

PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
INTERVAL = "hour"

dp = YahooDataProvider([PAIR], START, END, interval=INTERVAL)
data = dp.getData()[PAIR]

Polygon.io

class Hmile.DataProvider.PolygonDataProvider(pairs: List[str], start_date: str, end_date: str, api_key: str, interval: str = 'hour')

Download financial data from polygon.io

Variables
  • pairs – list of pairs to get

  • interval – The interval of the data

  • start_date – The start date

  • end_date – The end date

  • fill_policy – The fill policy to use

  • key – The polygon api key to use

checkArguments(pairs: List[str], interval: str, start: str, end: str) None

Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.

Parameters
  • pairs (List[str]) – list of pairs to get

  • interval (str) – The interval of the data

  • start (str) – The start date

  • end (str) –

Raises

DataProviderArgumentException – When the arguments are not correct

checkDataframe(dataframe)

Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows

getAvailablePairs(market: str = 'crypto') List[str]

Return the list of available pairs

Returns

the list of available pairs

Return type

List[str]

getData() Dict[str, DataFrame]

Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’

Returns

The dict of dataframes

Return type

Dict[str, pd.DataFrame]

normalizeColumnsOrder(dataframe)

Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order

Parameters

dataframe (pd.DataFrame) – The dataframe to treat

Returns

After traitement

Return type

pd.DataFrame

Example :

from Hmile.DataProvider import PolygonDataProvider

PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
API_KEY = "YOUR_API_KEY"
INTERVAL = "hour"

dp = PolygonDataProvider([PAIR], START, END, API_KEY, interval=INTERVAL)
data = dp.getData()[PAIR]

CSV

class Hmile.DataProvider.CSVDataProvider(pairs: List[str], start_date: str, end_date: str, directory: str, interval: str = 'hour')

Get data from CSV file. The file name must be in the format f-{pair}-{interval}.csv

Variables
  • pairs – list of pairs to get

  • interval – The interval of the data

  • start_date – The start date

  • end_date – The end date

  • fill_policy – The fill policy to use

  • directory – The directory where the csv files are

checkArguments(pairs: List[str], interval: str, start: str, end: str) None

Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.

Parameters
  • pairs (List[str]) – list of pairs to get

  • interval (str) – The interval of the data

  • start (str) – The start date

  • end (str) –

Raises

DataProviderArgumentException – When the arguments are not correct

checkDataframe(dataframe)

Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows

getAvailablePairs() List[str]

Return the list of available pairs

Returns

the list of available pairs

Return type

List[str]

getData() Dict[str, DataFrame]

Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’

Returns

The dict of dataframes

Return type

Dict[str, pd.DataFrame]

normalizeColumnsOrder(dataframe)

Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order

Parameters

dataframe (pd.DataFrame) – The dataframe to treat

Returns

After traitement

Return type

pd.DataFrame

Example :

from Hmile.DataProvider import CSVDataProvider

PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
DATA_DIR = "mydata/"
INTERVAL = "hour"

dp = CSVDataProvider([PAIR], START, END, DATA_DIR, interval=INTERVAL)
data = dp.getData()[PAIR]

Remark :

The csv file must be named f-{pair}-{interval}.csv and present in the directory DATA_DIR. The csv file must contain the following columns : date, open, high, low, close, volume.

Elasticsearch

class Hmile.DataProvider.ElasticDataProvider(pairs: List[str], start_date: str, end_date: str, es_url: str, es_user: str, es_pass: str, interval: str = 'hour')

Get data from Elasticsearch. Index name must be in the format f-{pair}-{interval}. Main columns must be open, high, low, close, volume. And the date must be in the field @timestamp.

Variables
  • pairs – list of pairs to get

  • interval – The interval of the data

  • start_date – The start date

  • end_date – The end date

  • fill_policy – The fill policy to use

  • es_url – The url of the elasticsearch server

  • es_user – The elasticsearch user to connect to

  • es_pass – The elasticsearch password to connect to

checkArguments(pairs: List[str], interval: str, start: str, end: str) None

Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.

Parameters
  • pairs (List[str]) – list of pairs to get

  • interval (str) – The interval of the data

  • start (str) – The start date

  • end (str) –

Raises

DataProviderArgumentException – When the arguments are not correct

checkDataframe(dataframe)

Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows

getAvailablePairs() List[str]

Return the list of available pairs

Returns

the list of available pairs

Return type

List[str]

getData() Dict[str, DataFrame]

Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’

Returns

The dict of dataframes

Return type

Dict[str, pd.DataFrame]

normalizeColumnsOrder(dataframe)

Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order

Parameters

dataframe (pd.DataFrame) – The dataframe to treat

Returns

After traitement

Return type

pd.DataFrame

Example :

from Hmile.DataProvider import ElasticDataProvider

PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
ELASTIC_URL = "https://myelastic.com:9200" # the port must be specified
ELASTIC_USER = "myuser"
ELASTIC_PASSWORD = "mypassword"
INTERVAL = "hour"

dp = ElasticDataProvider([PAIR], START, END, ELASTIC_URL, ELASTIC_USER, ELASTIC_PASSWORD, interval=INTERVAL)
data = dp.getData()[PAIR]