- Joined
- 6/23/09
- Messages
- 244
- Points
- 28
Cool article, implications for improved news mining of real-time twitter results, perhaps with Financial Engineering implications for algo trading.
http://apps.ysf-fsj.ca/virtualcwsf/projectdetails.php?id=2740&switchlanguage=en
Related Article with a good interview
http://www.theglobeandmail.com/news...-better-way-to-search-internet/article2118962
http://apps.ysf-fsj.ca/virtualcwsf/projectdetails.php?id=2740&switchlanguage=en
Nicholas Schiefer
Apodora: Markov Chain-Inspired Microsearch
Abstract: A novel information retrieval algorithm called "Apodora" is introduced, using limiting powers of Markov chain-like matrices to determine models for the documents and making contextual statistical inferences about the semantics of words. The system is implemented and compared to the vector space model. Especially when the query is short, the novel algorithm gives results with approximately twice the precision and has interesting applications to microsearch.
Related Article with a good interview
http://www.theglobeandmail.com/news...-better-way-to-search-internet/article2118962
A lot of traditional algorithms for information retrieval tend to break down when you apply them to micro search. The reason for that is that most, nearly all existing algorithms make the independent assumption – that all words are completely independent from other words.
Obviously, that is false, but it’s been shown to work pretty well.
But that assumption breaks down quite badly with micro search. You do not have room to stuff your text full of synonyms and descriptions of everything you say so a search engine can find it.
For example, if you wanted to search tweets for the word “cat.” If a tweet contains the word “kitten,” that’s not going to be very helpful. It’s assuming cat and kitten are independent, even if they’re not.