Download data
For now hmile provide three ways to download the data:
yahoofinance : a reliable source, which contains a lot of assets. However YF contains inconsistencies in volume data
polygon.io : Good data provider, however it is not free
csv : a csv file containing the data, it’s efficient but it is not updated.
elasticsearch : a elasticsearch database containing the data. You have a full control on data but you need to maintains and update it yourself. See
All DataProvider provide a method named getData which returns a dictionnary of a pandas DataFrame. Each key is a name of a requested pair. Exemple :
{
"BTCUSD": pd.DataFrame 1,
"ETHUSD" pd.DataFrame 2
}
The dataframes are formatted like that.
date (index) |
open |
high |
low |
close |
volume |
|---|---|---|---|---|---|
2015-01-08 |
11.01 |
10.81 |
11.30 |
10.75 |
1433300 |
2015-01-09 |
10.96 |
10.98 |
11.18 |
10.72 |
18536300 |
Yahoofinance
- class Hmile.DataProvider.YahooDataProvider(pairs: List[str], start_date: str, end_date: str, interval: str = 'hour')
Get data from Yahoo Finance
- Variables
pairs – list of pairs to get
interval – The interval of the data
start_date – The start date
end_date – The end date
fill_policy – The fill policy to use
- checkArguments(pairs: List[str], interval: str, start: str, end: str) None
Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.
- Parameters
pairs (List[str]) – list of pairs to get
interval (str) – The interval of the data
start (str) – The start date
end (str) –
- Raises
DataProviderArgumentException – When the arguments are not correct
- checkDataframe(dataframe)
Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows
- getAvailablePairs() List[str]
Return the list of available pairs
- Raises
NotImplementedError – if the current dataprovider does not implement this method
- Returns
the list of available pairs
- Return type
List[str]
- getData() Dict[str, DataFrame]
Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’
- Returns
The dict of dataframes
- Return type
Dict[str, pd.DataFrame]
- normalizeColumnsOrder(dataframe)
Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order
- Parameters
dataframe (pd.DataFrame) – The dataframe to treat
- Returns
After traitement
- Return type
pd.DataFrame
Example :
from Hmile.DataProvider import YahooDataProvider
PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
INTERVAL = "hour"
dp = YahooDataProvider([PAIR], START, END, interval=INTERVAL)
data = dp.getData()[PAIR]
Polygon.io
- class Hmile.DataProvider.PolygonDataProvider(pairs: List[str], start_date: str, end_date: str, api_key: str, interval: str = 'hour')
Download financial data from polygon.io
- Variables
pairs – list of pairs to get
interval – The interval of the data
start_date – The start date
end_date – The end date
fill_policy – The fill policy to use
key – The polygon api key to use
- checkArguments(pairs: List[str], interval: str, start: str, end: str) None
Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.
- Parameters
pairs (List[str]) – list of pairs to get
interval (str) – The interval of the data
start (str) – The start date
end (str) –
- Raises
DataProviderArgumentException – When the arguments are not correct
- checkDataframe(dataframe)
Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows
- getAvailablePairs(market: str = 'crypto') List[str]
Return the list of available pairs
- Returns
the list of available pairs
- Return type
List[str]
- getData() Dict[str, DataFrame]
Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’
- Returns
The dict of dataframes
- Return type
Dict[str, pd.DataFrame]
- normalizeColumnsOrder(dataframe)
Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order
- Parameters
dataframe (pd.DataFrame) – The dataframe to treat
- Returns
After traitement
- Return type
pd.DataFrame
Example :
from Hmile.DataProvider import PolygonDataProvider
PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
API_KEY = "YOUR_API_KEY"
INTERVAL = "hour"
dp = PolygonDataProvider([PAIR], START, END, API_KEY, interval=INTERVAL)
data = dp.getData()[PAIR]
CSV
- class Hmile.DataProvider.CSVDataProvider(pairs: List[str], start_date: str, end_date: str, directory: str, interval: str = 'hour')
Get data from CSV file. The file name must be in the format f-{pair}-{interval}.csv
- Variables
pairs – list of pairs to get
interval – The interval of the data
start_date – The start date
end_date – The end date
fill_policy – The fill policy to use
directory – The directory where the csv files are
- checkArguments(pairs: List[str], interval: str, start: str, end: str) None
Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.
- Parameters
pairs (List[str]) – list of pairs to get
interval (str) – The interval of the data
start (str) – The start date
end (str) –
- Raises
DataProviderArgumentException – When the arguments are not correct
- checkDataframe(dataframe)
Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows
- getAvailablePairs() List[str]
Return the list of available pairs
- Returns
the list of available pairs
- Return type
List[str]
- getData() Dict[str, DataFrame]
Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’
- Returns
The dict of dataframes
- Return type
Dict[str, pd.DataFrame]
- normalizeColumnsOrder(dataframe)
Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order
- Parameters
dataframe (pd.DataFrame) – The dataframe to treat
- Returns
After traitement
- Return type
pd.DataFrame
Example :
from Hmile.DataProvider import CSVDataProvider
PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
DATA_DIR = "mydata/"
INTERVAL = "hour"
dp = CSVDataProvider([PAIR], START, END, DATA_DIR, interval=INTERVAL)
data = dp.getData()[PAIR]
Remark :
The csv file must be named f-{pair}-{interval}.csv and present in the directory DATA_DIR. The csv file must contain the following columns : date, open, high, low, close, volume.
Elasticsearch
- class Hmile.DataProvider.ElasticDataProvider(pairs: List[str], start_date: str, end_date: str, es_url: str, es_user: str, es_pass: str, interval: str = 'hour')
Get data from Elasticsearch. Index name must be in the format f-{pair}-{interval}. Main columns must be open, high, low, close, volume. And the date must be in the field @timestamp.
- Variables
pairs – list of pairs to get
interval – The interval of the data
start_date – The start date
end_date – The end date
fill_policy – The fill policy to use
es_url – The url of the elasticsearch server
es_user – The elasticsearch user to connect to
es_pass – The elasticsearch password to connect to
- checkArguments(pairs: List[str], interval: str, start: str, end: str) None
Check if the arguments are valid. pair should be like BTCUSD, interval should be in yahoointervalconverter, start and end should be like YYYY-MM-DD start should be before end. Length must be at least 3 interval.
- Parameters
pairs (List[str]) – list of pairs to get
interval (str) – The interval of the data
start (str) – The start date
end (str) –
- Raises
DataProviderArgumentException – When the arguments are not correct
- checkDataframe(dataframe)
Check if first columns in the dataframes are open, high, low, close, volume. Check if index is a date and if the interval is the same between all rows
- getAvailablePairs() List[str]
Return the list of available pairs
- Returns
the list of available pairs
- Return type
List[str]
- getData() Dict[str, DataFrame]
Return a dict of dataframes with the key the pair and the value the corresponding dataframe. Every dataframe should have the same columns and the same index : The main columns are named be open, high, low, close, volume. In index is the date. The index name is’date’
- Returns
The dict of dataframes
- Return type
Dict[str, pd.DataFrame]
- normalizeColumnsOrder(dataframe)
Normalize the order of the columns to open, high, low, close, volume. Sort others columns by alphabetical order
- Parameters
dataframe (pd.DataFrame) – The dataframe to treat
- Returns
After traitement
- Return type
pd.DataFrame
Example :
from Hmile.DataProvider import ElasticDataProvider
PAIR = "BTCUSD"
START = "2022-01-01"
END = "2022-01-03"
ELASTIC_URL = "https://myelastic.com:9200" # the port must be specified
ELASTIC_USER = "myuser"
ELASTIC_PASSWORD = "mypassword"
INTERVAL = "hour"
dp = ElasticDataProvider([PAIR], START, END, ELASTIC_URL, ELASTIC_USER, ELASTIC_PASSWORD, interval=INTERVAL)
data = dp.getData()[PAIR]