Options
Click the large Options button.
File Sequences. Here you can limit the size of the Caterpillar output file created during extraction. This is useful for large projects where your computer might struggle with a single huge text file.
After each web page is processed Caterpillar checks the output file, and if it has reached a certain word count it will start a new file. Files are numbered sequentially e.g.
Caterpillar_00001.txt
Caterpillar_00002.txt
Caterpillar_00003.txt ...etc...
The default word count at which to begin a new output file is 250000 - so normally you will only see a single Caterpillar output file. You may wish to lower this value to 10000 or 50000 words. You can also force Caterpillar to produce a separate output file for every single web page processed by entering a value of zero here.
When you select a file for integration Caterpillar will automatically detect the presence of sequential files in the same folder. If you select a file for integration and it is part of a sequence Caterpillar will ask if you wish to integrate all following sequential files too.
Multiple output files may make a project harder to manage since you've more files to keep track of. However there are potential advantages - for example more than one translator can work on the project at the same time.
- You don't necessarily have to select the 1st file in a sequence. If you just want to integrate part of a project you can, for example, select the 5th file in the sequence. Caterpillar will then ask if you want to process just that one file, or also the 6th, 7th, 8th etc. In this example files 1 to 4 in the sequence will not be processed.
- The integration routine can process up to 99999 sequential files.
|