[cgi-wiki-dev] Search indexer errors

Kate L Pugh cgi-wiki-dev@earth.li
Fri, 28 Nov 2003 18:24:41 +0000


Earle Martin <openguides@downlode.org> wrote to openguides-dev:
> [Thu Nov 27 23:15:03 2003] index.cgi:
> Search::InvertedIndex::remove_index_from_group() - Corrupted database.
> Unable to find 'ged_000000000000_c_000000000523' record
> [Thu Nov 27 23:15:03 2003] index.cgi:  at
> /home/earle/openguides.org/lib/CGI/Wiki/Search/SII.pm line 204 
> 
> This is for the London site. Should I be worried?

It's deja-vu all over again:
  http://openguides.org/mail/openguides-dev/2003-September/000039.html

Search::InvertedIndex does throw these little errors from time to
time.  Like, about three times a year, which makes it very difficult
to track down.

So what can we do about it?  CGI::Wiki only has two search backends so
far.  One is the Search::InvertedIndex one, and the other is based on
DBIx::FullTextSearch, which only works on MySQL.  There really is a
dearth of decent indexing/searching modules on CPAN.  Does anyone know
of any that I've overlooked?

Simon Cozens is working on Plucene, a port of Lucene to Perl:
  http://blog.simon-cozens.org/bryar.cgi/id_6587

I've not properly looked into Lucene though, since it's written in
Java and I only found out about Plucene last night.  Anyone here used it?

Another option would be rewriting Search::InvertedIndex to be simpler
and hence hopefully less likely to go wrong.  I did a fair bit of work
towards this about a year or so ago, and sent a multitude of patches
to the author, but he is very busy and apologised profusely for not
having time to integrate them.  I am not hopeful that they will ever
go into the distribution, but they might be useful for a simple
rewrite that could be released as Search::SimpleIndexer or
Search::InvertedIndec::Simple or something.  I don't want to tread on
the guy's toes, and I don't even think he would see it that way,
especially as CGI::Wiki simply doesn't need much of the complexity
that's in the current distro.

The drawback of that approach (ie me writing a new search module) is
that I am already overwhelmed with programming tasks.  I am deep in
the guts of both CGI::Wiki and OpenGuides this week, and I don't want
to start a major new project until the current work on those two
projects is released.  I *may* be able to break the back of the
current stuff this weekend.

Is anyone else interested in a Search::InvertedIndex rewrite?  You
could write code, docs, tests or all three.


Kake