Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 43
Full-text search ability in the blog posts
21-Dec-2010
As the number of posts in the blog increases from time to time,
I have thought it would be a good idea to enable a full-text
search option there. Hence, thematic posts regarding a search
query may be retrieved within a few moments.
In order to deploy such a text search engine, I have taken
Ian Barber's
Vector Space Model (VSM) implementation (in PHP) as reference.
This (simple) search method first performs a free vocabulary
indexing with the post texts directly, without applying any stopword
filtering, stemming or lemmatisation procedures.
Then it weights the terms with the tf-idf method so as to
consider the local contribution of a term (post-wise) as well as its
discriminating power within the collection (blog-wise).
Finally, the most similar posts are retrieved and delivered to
the user via a distributional similarity measure (a pseudo-cosine
distance computed as the average sum of term weighted measures).
|
All contents © Alexandre Trilla 2008-2025 |