Seen at Europython: Xapian text search engine
I have been at Michael Salib’s talk about Xapian, “Stupidity and laser cat toys: Indexing the US Patent Database with Xapian and Twisted”
Xapian is a probabilistic text search engine.
Michael used to index the US Patent Database, wich is pretty big indeed.He wrote a python wrapper called Xapwrap, that you can get here:
http://divmod.org/projects/xapwrap
Michael explained that Xapian was prefered to Lucene because It easier to wrap into Python and provided faster queries and a better precision.
I’m waiting for Michael to upload the slides on the EP sites to give more precise feedback on this.
More info on PyLucene here:
http://www.sauria.com/~twl/conferences/pycon2005/20050325/Pulling Java Lucene into Python.html(PyCon05 notes)
feature-wise, Xapian has eveything needed to run a scalabale text engine.(stemming based on snowball, meta-indexes, etc..) It optionnally uses twisted’s python.log for logging.
- Lucene features: http://lucene.apache.org/java/docs/features.html
- Xapian features: http://www.xapian.org/features.php
I have the feeling that Xapian would fit pretty well as an external indexer for z3
(Post originally written by Tarek Ziadé on the old Nuxeo blogs.)