大神作者自己拿 TensorFlow 纯手写了一个 Fast RCNN,但是却几乎没有留下代码食用指南。
经过几天的跳坑,终于运行成功,遂在 Issue 里发手记。顺便搬运到博客里。
[Tutorial] How to run the code…
What is the most distressing thing in life?
Not money, not paper, but the author released their fantastic code while you do not know how to run it.
So, this tutorial records how I run the code incremental detectors using VOC 2007 dataset.
Prepare
- Install Tensorflow 1.5 (
conda install tensorflow-gpu==1.5
) - Install Matlab
- Clone the code
- Make folders:
datasets
,resnet
- Download the pre-trained weights of backbone
Download dataset
Well, nothing to say.
Then, make a soft link ln -s XXXX/VOCdevkit ./datasets/voc/VOCdevkit/
Finally, create a folder mkdir -p ./datasets/voc/VOCdevkit/VOC2007/EvalCache/
, or you will get an error when evaluating.
Generate proposals
I HATE MATLAB.
cd
into the code foldergit clone https://github.com/pdollar/edges
- Download
https://pdollar.github.io/toolbox/archive/piotr_toolbox.zip
and unzip it - Open
compute_edgeboxes.m
. Comment the COCO part, uncomment VOC part. Change paths. Notice that the result folder should beEdgeBoxesProposals
cd edges
and compile it follow the instructions:
mex private/edgesDetectMex.cpp -outdir private '-DUSEOMP' CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
mex private/edgesNmsMex.cpp -outdir private '-DUSEOMP' CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
mex private/spDetectMex.cpp -outdir private '-DUSEOMP' CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
mex private/edgeBoxesMex.cpp -outdir private '-DUSEOMP' CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
- Run
compute_edgeboxes.m
. It will take a looooooooooooooong time
PS: the toolbox says that it can process images at the speed of 60fps, but in my machine, 0.6 FPS. :sob: 6 hours for VOC, and 37+ hours for COCO.
Make tfrecord
Well, next, generate tfrecord files.
Open datasets.py
, in the end, you can see the __main__
. Modify it, or just stay them unchanged.
If your split is not called train
, val
, trainval
or test
(For me, I call them train_step_1
, test_step_1
and so on), delete the assert in VOCLoader
.
Then, run this file.
The result files will show in the datasets
folder.
Run the code! hahaha
For example, I want to train (5+5+5+5) classes.
To start the first session, run
python3 frcnn.py \
--run_name=voc_5 \
--num_classes=5 \
--dataset=voc07 \
--max_iterations=40000 \
--action=train,eval \
--eval_ckpts=40k \
--learning_rate=0.001 \
--lr_decay 30000 \
--sigmoid
And for the following sessions:
python3 frcnn.py sigmoid \
--run_name=voc_10 \
--num_classes=5 \
--extend=5 \
--dataset=voc07 \
--max_iterations=40000 \
--action=train,eval \
--eval_ckpts=40k \
--learning_rate=0.0001 \
--sigmoid \
--pretrained_net=voc_5 \
--distillation \
--bias_distillation
python3 frcnn.py sigmoid \
--run_name=voc_15 \
--num_classes=10 \
--extend=5 \
--dataset=voc07 \
--max_iterations=40000 \
--action=train,eval \
--eval_ckpts=40k \
--learning_rate=0.0001 \
--sigmoid \
--pretrained_net=voc_10 \
--distillation \
--bias_distillation
python3 frcnn.py sigmoid \
--run_name=voc_20 \
--num_classes=15 \
--extend=5 \
--dataset=voc07 \
--max_iterations=40000 \
--action=train,eval \
--eval_ckpts=40k \
--learning_rate=0.0001 \
--sigmoid \
--pretrained_net=voc_15 \
--distillation \
--bias_distillation
And you can notice that this code only use the first GPU.
BTW: you should change the parameter, like 40K
, to your own!!!
Use on my own dataset
Personally, it is easy to convert other dataset formats to VOC format. Isn’t it?
So, the first step, convert your dataset! (For me, I convert COCO to VOC)
Then, replace the class names in voc_loader.py
.
Then, search 20
through all codes, and replace into your class numbers. (For me, 20 to 80)
Finally, run python3 dataset.py
to generate tfrecord files, and train the net.
WARNING: Maybe you should write a script to re-start training after it crashes due to the annoying NaN loss……
import argparse
import subprocess
parser = argparse.ArgumentParser(description='To prevent loss Nan')
parser.add_argument("inc_cls", type=int, help='Inc')
parser.add_argument("step", type=int, help='Step')
parser.add_argument("--session", type=int, help='Session', default=0)
parser.add_argument("--iters", type=int, help='Iters', default=20000)
parser.add_argument("--learning_rate", type=float, help='Learning rate', default=0.0001)
parser.add_argument("--decay", type=int, help='Iters', default=0)
args = parser.parse_args()
cmd = [
"python3",
"frcnn.py",
"--run_name=voc_{}{}".format(args.inc_cls * (args.step + 1), args.session),
"--num_classes={}".format(args.inc_cls * (args.step + int(args.step == 0))),
"--step={}".format(args.step),
"--dataset=voc07",
"--max_iterations={}".format(args.iters),
"--action=train,eval",
"--eval_ckpts={}".format(args.iters),
"--learning_rate={}".format(args.learning_rate),
"--sigmoid"
]
if args.decay:
cmd.extend([
"--lr_decay={}".format(int(args.iters * args.decay * 0.01))
])
if args.step > 0:
cmd.extend([
"--extend={}".format(args.inc_cls),
"--pretrained_net=voc_{}{}".format(args.inc_cls * args.step, args.session),
"--distillation",
"--bias_distillation"
])
print(" ".join(cmd))
result = subprocess.run(cmd, stdout=subprocess.DEVNULL)
while result.returncode != 0:
result = subprocess.run(cmd, stdout=subprocess.DEVNULL)
And I think that something wrong in network.py
when distilling. So I use
cached_classes = self.subnet.num_classes
in compute_distillation_crossentropy_loss()
and compute_distillation_bbox_loss()
functions in network.py
.
UPDATE 30/11/2018
I noticed that if we train the net Verry Well in the last session, the net is easy to get Nan or Inf loss at the start of next session. Use a smaller learning rate (e.g. 2e-5 rather than 1e-3) will solve this phenomenon to some extent.
Thank you again for the amazing code, and have fun.
发表回复