July 9, 2007

Strigi-chemical test suite

The test suite of strigi-chemical deserves special a attention, because it will be probably used for all other Strigi analyzers, moreover it could be useful for Blue Obelisk projects.

The test suite is a set of python scripts using python unittest infrastructure. Suite provides StrigiTestCase as a base class for all test cases. It is a wrapper upon Strigi command line tools strigicmd and xmlindexer and assures that the fixtures have proper isolation. Test runner executes all testcases it can find in the current directory.

Each test focuses on a certain data format. Sample data is very important to have in hand before writing and testing the analyzers. Egon started a project recently to provide a central repository of chemical test files for Blue Obelisk. The problem at the moment is that the repository is incomplete. CALL FOR DATA is announced -- any chemical file with a free license can enter the repository. If you want your chemical data files to be recognized by Strigi, please release samples of your files under an OSI-approved license!

Subversion provides a very nice trick which allows to include external repositories into the project tree. Once set up it requires no further actions. For those developers interested: use svn propset, propget and proplist to manipulate svn:externals property. In our case, after checkout, ctfr will appear as subdirectory in /test:

ctfr http://blueobelisk.svn.sourceforge.net/svnroot/blueobelisk/ctfr/trunk/

TestFileRepository is a python class for generalized access to contents of XML-based CTFR repository. Every testcase inherits from StrigiTestCase an initialized self.ct object, which you can ask to getTOC(), getFiles() or getFleByName() without taking care of the CTFR internals.

Strigi-chemical testcases already helped me to detect problems with text tokenizers, float values and keyword queries. Fortunately, all these problems already found their solutions in the Strigi core. And now, tests won't allow these problems to appear again without been noticed.