On 05/02/12 17:38, Dominic Hargreaves wrote:
Hmm, I'm not too familiar with Lucy, but it looks like quite a heavyweight solution for this. Would a simple serialisation interface like JSON or YAML be better?
Possibly. My attraction to Lucy was more its (semi-standard Lucene-style) keyword /indexing/ rather than being a simple metadata store.
You're right though, a JSON file would be much more "consumable". I'll check later whether there are standard modules for searching / indexing JSON.
Well, W::T currently uses revision numbers rather dates, so your naming convention would need to store that.
Ah, good point.
I'm also working on a project with lots (thousands of documents) of badly formatted, but still useful, legacy HTML. We're doing all we can to clean them up computationally, but I envisage that there will still be many aspects of cleaning them up which will have to be crowdsourced over time (contributors being screened for HTML competence).
This is the use-case for which I'd like to say, "Take this document and make it wiki-like"; having a solution general enough so that any given flat file could be "wikified".
Interested to hear if you have any further thoughts on something like that. Otherwise I'll start hacking something together next time I get a chance.
Cheers,
--Ryan