About this article

Create a program that changes the 1-minute data of FX to any hourly bar (such as 1-hour bar).

Deliverables

1-minute data ... Change to any time and output to CSV ↓ (The image below is for 1 hour)

Data preparation

The 1-minute data was downloaded from GMO Click Securities. If you have an account, you can download even if the deposit amount is 0. I am very grateful. (* For other companies, if the deposit amount is not a certain amount, it will be useless, or there will be only daily level data.)

This time, we downloaded the US dollar / yen data from January 2007 to September 2020.

Folder structure

When you unzip the downloaded data, it has the following folder structure. / Currency name_yyyymm/yyyymm/

For the August 2020 folder of USDJPY, / USDJPY_202008 / 202008

The CSV of each day (currency name_yyyymmdd.csv) is saved in that folder.

The CSV data downloaded this time is I plan to use it in future programs, so I will save it in the following tree structure.

With such a tree structure, it is easy to handle even if the types of currencies to be downloaded in the future increase. Also, the program created this time is in the same hierarchy as the csv folder, I will create a folder called "Make_OHCL" and save it there.

About the contents of CSV

csv has the following field structure in 2007.

However, since around 2016, the number of fields has increased as follows. Spreads (trading fees) are taken into consideration.

It's a bit annoying, but you need to support two types of CSV.

code

The program created this time is in the same hierarchy as the csv folder, Create a folder called "Make_OHCL" and create it directly under it. Data processing uses numpy instead of pandas to speed it up.

from copy import copy
import glob
import numpy as np
import pandas as pd

def make_ohlc(ashi, arr=None):
    """
Function Description: Creates an OHLC for the specified timeframe and returns an array.
    ashi:Timeframe after change. If it is 60, it is 1 hour.
    arr:A 1-minute csv file converted to an array.
    """

    #If there are 6 or more columns of the read CSV file, read only the 1st to 5th columns.
    if arr.shape[1] > 5:
        arr = arr[:,0:5]

    arr = np.c_[arr, np.zeros((len(arr),4))] #4 columns added
    for i in range(0, len(arr), ashi):
        try:
            arr[i,5] = arr[i,1] #Open price
            max_tmp = arr[i:i+ashi,2].astype(np.float) #Get a list of high prices for a specified period
            arr[i,6] = max_tmp.max() #High price
            min_tmp = arr[i:i+ashi,3].astype(np.float) #Get a list of low prices for a specified period
            arr[i,7] = min_tmp.min() #Low price
            arr[i,8] = arr[i+ashi-1,4] #closing price
        except IndexError:
            pass

    arr = np.delete(arr, [1,2,3,4], axis=1) #Delete the 2nd to 5th columns because they are no longer needed
    arr = arr[arr[:,4] != 0] #Delete line 0

    return arr

currency = 'USDJPY' #Currency pair name
ashi = 60 #The length of the foot you want to get(60 minutes for 60 minutes)
arr = None #Initialize arr

csv_dir = '../csv/' + currency + '/' # /csv/Currency name folder
dir_list = glob.glob(csv_dir + '*') # csv/Currency name/Currency name_Get a list of yyyymm folders

for i in range(len(dir_list)):
    file_list = glob.glob(dir_list[i] + '/' + dir_list[i][-6:] + '/*') #Get the path list of csv files

    for j in range(len(file_list)):
        pre_arr = copy(arr) #Pre the previous arr_Evacuate to arr
        csv_arr = np.loadtxt(file_list[j], delimiter=",", skiprows=1, dtype='object') #Load csv into array
        arr = make_ohlc(ashi, csv_arr) #Change foot length

        if pre_arr is not None:
            #Concatenate the previous arr and the converted arr
            arr = np.vstack([pre_arr,arr])


filename = currency + '_ashi=' + str(ashi) + '.csv'
np.savetxt(filename, arr , delimiter="," , header="Date,Open,High,Low,Close" ,fmt="%s") #Save to CSV

This time there were multiple CSV files in multiple directories, so The code became a little longer due to the concatenation process etc. If you already have CSV in one file, just the following code is fine.

def make_ohlc(ashi, arr=None):
    """
Function Description: Creates an OHLC for the specified timeframe and returns an array.
    ashi:Timeframe after change. If it is 60, it is 1 hour.
    arr:A 1-minute csv file converted to an array.
    """

    #If there are 6 or more columns of the read CSV file, read only the 1st to 5th columns.
    if arr.shape[1] > 5:
        arr = arr[:,0:5]

    arr = np.c_[arr, np.zeros((len(arr),4))] #4 columns added
    for i in range(0, len(arr), ashi):
        try:
            arr[i,5] = arr[i,1] #Open price
            max_tmp = arr[i:i+ashi,2].astype(np.float) #Get a list of high prices for a specified period
            arr[i,6] = max_tmp.max() #High price
            min_tmp = arr[i:i+ashi,3].astype(np.float) #Get a list of low prices for a specified period
            arr[i,7] = min_tmp.min() #Low price
            arr[i,8] = arr[i+ashi-1,4] #closing price
        except IndexError:
            pass

    arr = np.delete(arr, [1,2,3,4], axis=1) #Delete the 2nd to 5th columns because they are no longer needed
    arr = arr[arr[:,4] != 0] #Delete line 0

    return arr

currency = 'USDJPY' #Currency pair name
ashi = 60 #The length of the foot you want to get(60 minutes for 60 minutes)

csv_arr = np.loadtxt(<csv file path>, delimiter=",", skiprows=1, dtype='object') #Load csv into array
arr = make_ohlc(ashi, csv_arr) #Change foot length

filename = currency + '_ashi=' + str(ashi) + '.csv'
np.savetxt(filename, arr , delimiter="," , header="Date,Open,High,Low,Close" ,fmt="%s") #Save to CSV

Verification

Confirm that the changed CSV file is output for 1 hour.

If you find it helpful, please click LGTM. It will be encouraging of the update.

The following article introduces how to create a chart image from a CSV file. https://qiita.com/sw1394/items/b2a86cfc663d89915e28

I made a program in Python that changes the 1-minute data of FX to an arbitrary time frame (1 hour frame, etc.)