July 9, 2007

Strigi-chemical test suite

The test suite of strigi-chemical deserves special a attention, because it will be probably used for all other Strigi analyzers, moreover it could be useful for Blue Obelisk projects.

The test suite is a set of python scripts using python unittest infrastructure. Suite provides StrigiTestCase as a base class for all test cases. It is a wrapper upon Strigi command line tools strigicmd and xmlindexer and assures that the fixtures have proper isolation. Test runner executes all testcases it can find in the current directory.

Each test focuses on a certain data format. Sample data is very important to have in hand before writing and testing the analyzers. Egon started a project recently to provide a central repository of chemical test files for Blue Obelisk. The problem at the moment is that the repository is incomplete. CALL FOR DATA is announced -- any chemical file with a free license can enter the repository. If you want your chemical data files to be recognized by Strigi, please release samples of your files under an OSI-approved license!

Subversion provides a very nice trick which allows to include external repositories into the project tree. Once set up it requires no further actions. For those developers interested: use svn propset, propget and proplist to manipulate svn:externals property. In our case, after checkout, ctfr will appear as subdirectory in /test:

ctfr http://blueobelisk.svn.sourceforge.net/svnroot/blueobelisk/ctfr/trunk/

TestFileRepository is a python class for generalized access to contents of XML-based CTFR repository. Every testcase inherits from StrigiTestCase an initialized self.ct object, which you can ask to getTOC(), getFiles() or getFleByName() without taking care of the CTFR internals.

Strigi-chemical testcases already helped me to detect problems with text tokenizers, float values and keyword queries. Fortunately, all these problems already found their solutions in the Strigi core. And now, tests won't allow these problems to appear again without been noticed.

4 comments:

Egon Willighagen said...

Alexandr, thanx for the svn:external pointers... I have been thinking of using this with Bioclipse too, so that we a CDK/Jmol plugin that uses the active source code from CDK/Jmol SVN, instead of those static jars.

baoilleach said...

You should also consider vendor drop-ins, if you want to apply patches to whatever you're including in your repository (see the SVN book).

cclib has a quite comprehensive set of comp chem log files. Some of these are stored in SVN. However, others are downloaded with a script that downloads them from our SF web server. All files are public domain.

Egon Willighagen said...

Noel, if they are all public domain, might you be interested in putting them into the BO test file repository?

Egon Willighagen said...

Alexandr, please read this blog item by Rich (and the comments):

http://depth-first.com/articles/2007/08/01/never-draw-the-same-molecule-twice-image-metadata-for-cheminformatics

Would be great if you could get indexing of the chemical file available from the PNG annotation going. What do you think?