I am doing research using DNN.
After a year, I feel like I've finally found the best practice for project management, so I'll expose it.
Since Onedrive is insurance when a file suddenly blows off, I think that GitHub is basically enough.
program/
├ dataset/
│ ├ dev/
│ └ test/
└ src/
├ common/
│ ├ hoge.py
│ ├ fuga.py
│ ├ ...
├ method_xxx/
│ ├ output/
│ │ ├ YYYYMMDD_ID/
│ │ │ ├ loss/
│ │ │ │ ├ training_loss.npy
│ │ │ │ └ validation_loss.npy
│ │ │ ├ prediction/
│ │ │ │ ├ img/
│ │ │ │ ├ wav/
│ │ │ │ ├ ...
│ │ │ ├ condition.json
│ │ │ ├ model.pth
│ │ │ └ network.txt
│ │ ├ YYYYMMDD_ID/
│ │ ├ ...
│ ├ generate_dataset.py
│ ├ dataset_loader.py
│ ├ dnn_model.py
│ ├ dnn_training.py
│ ├ dnn_evaluation.py
│ ├ training_config.json
│ └ evaluation_config.json
├ method_zzz/
├ ...
method_xxx / method_zzz
: DNN models and datasets are created in various ways, so folders are created accordingly.common
: Contains modules that are commonly used by each methodmethod_xxx / output /
: The learning result and the inference result are spit out here.YYYYMMDD_ID / network.txt
: Describes the network structure of the created model. The instance of the model defined in PyTorch is output as it is.Output DNN model structure
class Model(nn.Module):
def __init__(self, in_units, hidden_units, out_units):
super(Model, self).__init__()
self.l1 = nn.Linear(in_units, hidden_units)
self.a1 = nn.ReLU()
self.l2 = nn.Linear(hidden_units, hidden_units)
self.a2 = nn.ReLU()
def forward(self, x):
x = self.a1(self.l1(x))
y = self.a2(self.l2(x))
return y
#Export network information for DNN model(.txt)
model = Model(in_size, hidden_size, out_size)
with open(OUT_DIR_NAME+'/network.txt', 'w') as f:
f.write(str(model))
network.txt
Model(
(l1): Linear(in_features=8546, out_features=682, bias=True)
(a1): ReLU()
(l2): Linear(in_features=682, out_features=682, bias=True)
(a2): ReLU()
)
Learning parameter settings, model evaluation, and experimental results are managed in a json file. The contents of each are as follows.
training_config.json
{
"method": "A detailed explanation of the method is described here.",
"parameters": {
"max_epochs": 1000,
"batch_size": 128,
"optimizer": "adam",
"learning_rate": 0.001,
"patience": 50,
"norm": true
},
"datasets": {
"data1": "../../dataset/dev/<file1_name>",
"data2": "../../dataset/dev/<file2_name>"
}
}
evaluation_config.json
{
"target_dir": "YYYYMMDD_ID",
"src_dir": {
"file1": "../../dataset/test/<file1_name>",
"file2": "../../dataset/test/<file2_name>"
},
"output_dir": "OUTPUT_DIR_NAME"
}
condition.json
{
"method": "Explanation of the method",
"parameters": {
"max_epochs": 345,
"batch_size": 128,
"optimizer": "adam",
"learning_rate": 0.001,
"patience": 50,
"norm": true
},
"datasets": {
"data1": "../../dataset/dev/<file1_name>",
"data2": "../../dataset/dev/<file1_name>",
},
"dnn": {
"input_size": 8546,
"output_size": 682
},
"loss": {
"training_loss": 0.087654,
"validation_loss": 0.152140
}
}
Regarding the evaluation of DNN performance, ʻevaluation.json is read first, the target folder is specified from
target_dir, and the parameters etc. are acquired from
condition.json` in it.
After some twists and turns, I settled on this kind of management method, but please tell me if there is a better management method ...
Recommended Posts