Alexandre Trilla, PhD - Research Engineer | home publications
 

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome


Post 43

Full-text search ability in the blog posts

21-Dec-2010

As the number of posts in the blog increases from time to time, I have thought it would be a good idea to enable a full-text search option there. Hence, thematic posts regarding a search query may be retrieved within a few moments.

In order to deploy such a text search engine, I have taken Ian Barber's Vector Space Model (VSM) implementation (in PHP) as reference. This (simple) search method first performs a free vocabulary indexing with the post texts directly, without applying any stopword filtering, stemming or lemmatisation procedures. Then it weights the terms with the tf-idf method so as to consider the local contribution of a term (post-wise) as well as its discriminating power within the collection (blog-wise). Finally, the most similar posts are retrieved and delivered to the user via a distributional similarity measure (a pseudo-cosine distance computed as the average sum of term weighted measures).



All contents © Alexandre Trilla 2008-2025