command -h will display all available options.
command -v will display the version.
At first, Restad needs to list the documents in the database. Then it will parse and index the listed files.
The restad commands can use a config file to read options, using INI format. Options given as command line arguments will overwrite the config file values. See option-parsing for more details.
Global options of the commands :
-h Display help -v Display version -f string Config file -q Silent mode -d string Database name -t string Database host name -u string Database user name -w string Database password -x string Database encoding
The database encoding is used to encode properly the parsed text, and display the database messages in the console.
This step is called preparsing, the command restad-preparser will list the files and add them to the database. You can specify here if a file contains more than one document by giving the tag surrounding every document.
Preparser command format :
preparser [options] path-to-explore -e string Filter by extensions. Example "*.xml *.html" -m string Document tag for multiple document files (the name of the tag enclosing one document) -r Recursive listing of the files
This tool will connect to the database, get back files to process (i.e. file path) and start parsing them using as much threads as it can. Maximum threads number is the number of cores of the system except one, or the maximum number specified with the option -m. It will get 1000 documents by default and process them all before quit. You can specify the number of documents to process with the option -c.
When looping, the indexer will close all threads and get new documents each time it consumes a "block" of documents, which is maximum -c. If you have a lot of file to process (and depending on your RAM) consider using much more than 1000 default value.
By default, indexer will insert a space to separate two words which would be concatenated when building the without-tag raw text. You can disable this using -s.
Indexer command format :
indexer [options] -c int Max document count to process, default is 1000 -p int Max threads to use, default is all available cores except one -l Loop until there is no more document to process -s Do not insert spaces for concatenated words -o string Append the parsing error log to a file
The database keeps the list of files to processed, and processed. If a file get an error of connection during indexing, indexer will try to put back the file to status Waiting, but in case of database connection error it will not be able to do it. You can check files still in Processing where the processing_start date is getting a bit too old or when you have no more indexer running. If a file is not well formed or get an error during parsing, the indexer will set its status to Error to prevent the file to be parsed again (which could lead to infinite loop if using the indexer looping mode). If you change the file content and want it to be parsed again, you can update its status to Waiting and run indexer.