At Fri, 24 Jun 2011 08:38:25 +0100, Peter Alcibiades wrote:
Does anyone know any packages that do word recurrence intervals, something called the trigram markov model etc to test for same authorship of two passages? Obviously a linux package would be best, but windows would be ok. Whatever works.
Are you familiar with R? Quite a few of my colleagues (especially psychologists) use it. It may be worth spending a bit of time learning if you're interested in this sort of thing. Here are a few pointers for NLP: http://cran.ma.imperial.ac.uk/web/views/NaturalLanguageProcessing.html The Python Natural Language Toolkit may also be worth considering: http://www.nltk.org/. The book <http://www.nltk.org/book> is available online. Techniques used for language detection are often quite similar to those used for authorship attribution, e.g.: http://misja.posterous.com/language-detection-with-python-nltk Sounds like a fun project. Best, -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Richard Lewis ISMS, Computing Goldsmiths, University of London Tel: +44 (0)20 7078 5134 Skype: richardjlewis JID: ironchicken@jabber.earth.li http://www.richardlewis.me.uk/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-