• C++ Programming for Financial Engineering
    Highly recommended by thousands of MFE students. Covers essential C++ topics with applications to financial engineering. Learn more Join!
    Python for Finance with Intro to Data Science
    Gain practical understanding of Python to read, understand, and write professional Python code for your first day on the job. Learn more Join!
    An Intuition-Based Options Primer for FE
    Ideal for entry level positions interviews and graduate studies, specializing in options trading arbitrage and options valuation models. Learn more Join!

17 Year-Old designs better micro-search for Twitter etc.

Joined
6/23/09
Messages
244
Points
28
Cool article, implications for improved news mining of real-time twitter results, perhaps with Financial Engineering implications for algo trading.

http://apps.ysf-fsj.ca/virtualcwsf/projectdetails.php?id=2740&switchlanguage=en
Nicholas Schiefer
Apodora: Markov Chain-Inspired Microsearch
Abstract: A novel information retrieval algorithm called "Apodora" is introduced, using limiting powers of Markov chain-like matrices to determine models for the documents and making contextual statistical inferences about the semantics of words. The system is implemented and compared to the vector space model. Especially when the query is short, the novel algorithm gives results with approximately twice the precision and has interesting applications to microsearch.

Related Article with a good interview

http://www.theglobeandmail.com/news...-better-way-to-search-internet/article2118962

A lot of traditional algorithms for information retrieval tend to break down when you apply them to micro search. The reason for that is that most, nearly all existing algorithms make the independent assumption – that all words are completely independent from other words.

Obviously, that is false, but it’s been shown to work pretty well.

But that assumption breaks down quite badly with micro search. You do not have room to stuff your text full of synonyms and descriptions of everything you say so a search engine can find it.

For example, if you wanted to search tweets for the word “cat.” If a tweet contains the word “kitten,” that’s not going to be very helpful. It’s assuming cat and kitten are independent, even if they’re not.
 
joel_b :: thanks so much for sharing, I actually have a friend designing a system that uses an approach like you mention.
 
the article actually doesn't go into ANY detail as to how the algorithm is different except that it connects to similar search words...is there a published paper?
 
He already works for IBM, so I would imagine they've already struck a deal with him. You won't see a paper published until he's received a bunch of money for it, or never at all. Google won't publish their algo either.

It's written in Python obviously; I would guess that he is building a large database of sentences/paragraphs, likely off the internet, thesaurus or other sources and develops statistical realtionships betwen all the words. Princeton's WordNet could be used for that I would imagine as well http://wordnet.princeton.edu/

Maybe more detail will come out, but I would imagine that a Science Competition for Grade 11 students wouldn't require him to to fully describe the process, let alone really get it. Really blows my Volcano out of the water, hah!
 
Back
Top