MXNet provides a Python utility to create RecordIO packages of data that are supported by framework’s data iterators.
https://github.com/dmlc/mxnet/blob/master/tools/im2rec.py
Little detail, the im2rec.py doesn’t support .png files, unless you manually adjust the script and add it as allowed extensions.
In order to create a rec file, you first need to make a list of files using this command:
python \\mxnet-master\tools\im2rec.py --list 1 --recursive 1 --num-thread 4 --train-ratio 0.7 --test-ratio 0.2 prefix frames
--list 1 ==> tells the script to create the .lst files
--num-thread 4 ==> runs script in parallel
--train-ratio 0.7 ==> will split the data set between several list files
--test-ratio 0.2 ==> it will make sure that 20 percent of the data will be used in the test set.
After you get .lst files, lets create a record set.
Record objects will be about the same size as the data that being used for the objects.
python \\mxnet-master\tools\im2rec.py --num-thread 4 --quality 80 prefix frames
here we eliminating –list argument so the script will be working on .rec files.
At the end of its execution, it should display something like
time: 0.0130000114441 count: 0 time: 3.01699995995 count: 1000 time: 2.83000016212 count: 2000
and in the folder where you have your dataset you will see files
prefix_test.rec prefix_test.lst prefix_test.val
and same for train and validation sets.
Enjoy!
[…] one of the previous posts I’ve described how to create RecordIO data set for […]