August 16, 2007

Strigi-chemical GSoC final timeline

On August 20th all GSoC students and mentors are supposed to start the final evaluation. This means that the major goals of the projects should be complete and working code submitted.

The environment of my Strigi-chemical project is very dynamic: Strigi is under heavy development and undergoing XESAM-reforms at the moment, there is also much happening in open source cheminformatics world, like new OpenBabel release and things like OSRA optical structure recognition or structural information embedded in PNG images. This post outlines the final TODO of my project which you can expect to be ready by the end of this SoC.

  1. JStreams SDF substream provider is a nice feature which represents an SDF file as a virtual folder of MOL files. I'm trying to make it work stable at the moment. This makes SDF analysis transparent as if it really was a set of MOL files. It also allows to browse the contents of SDF with jstream:// KIO;
  2. I need to fix CML2 analyzer and make sure it correctly recognizes the newly generated CMLs from Jerome's chemical-structures-2 repository;
  3. Use MIME helper to decide whether to continue with stream analysis or to skip the stream. It will probably allow to optimize some greedy analyzers;
  4. Make sure that OpenBabel helper works well and does not crash with parallel analyzers;
  5. Create a chemical PNG analyzer to extract chemical information (MOL or InChI) from images; Unittests are based on samples generated by Firefly and gchempaint software;
  6. Make sure all strigi-chemical analyzers conform Strigi PLUGIN architecture and have unittests with files taken from Blue Obelisk CTFR (Chemical Test File Repository)
  7. Update chemical ontology to prepare it for migration to XESAM ontology (fix types, cardinalities, child-parent links and indexing flags);
  8. And finally, build a sample GUI application, using molsKetch and avogadro as Kparts. This application should be able to input a structure by drawing it, represent it as OpenBabel molecule, convert it to InChI using OB, make a XESAM query over dbus to strigidaemon, display the list of results with a structural preview powered by Avogadro (Kalzium).

It is quite a lot of work for 5 days left, but I have drafts of everything listed above so it should be feasible.

2 comments:

Egon Willighagen said...

About point 1. Does that mean we can open an SDF file in Konqueror as if it was a folder, and that it therefore get a folder icon, possibly overlayed with a molecule or so? Does it scale to 100MB SDF files too?

Alexander Goncearenco said...

Egon: yes, it means with jstream:// KIO you would be able to open an SDF file in Konqueror as if it was a folder.

I'm not sure about the icon at the moment.

The scaling is a great question, since it is based on streams it should scale very well, but it really depends on the implementation. And this is what I am doing right now, coding and testing it.