Status upadate Week[2-3]
Status upadate Week[1-2]
I've learned a lot this couple weeks:
Let's start with the questions that needed answer on my previous post:
1-) There is a Python API for interfacing with Sphinx "searchd" daemon, but not much work has been done on it. No documentation is available for the python API.
2-) The API does work with BETA version of sphinx and also the stable versions
3-) Because of the number of enhancements made on the Stable Beta version, it’s recommended that we should implement using the sable beta version.
I’ve bought a book called “Introduction to Search with Sphinx”, written by lead developer of Sphinx Search.
So far, I’ve had a crash course on linguistics, morphology processing (cats = cat, mice = mouse, going = go/goes/went and so on), lemmatisation ( converting the word to its lemma/root ) stemming ( Intentionally trying to output the stem, even if it’s not necessarily a correct word.
This impacts the effectiveness of the algorithm and results returned by Sphinx search server.
It is also critical on indexing records of languages different than American English.
Aside from the linguistics, I’ve learned the basic mechanics on how “Batch Indexing” and “Real-Time Indexing” needs to be done, the different components of the Sphinx search witch are:
1-) Data Source (may be a database, files, data through xml pipe or a combination of them)
2-) The indexer ( has the rules and the queries in order to know what should be indexed)
3-) “Searchd” ( The daemon which should receive the search queries from the application, look for possible matches on the index and return relevant results to the application)
4-) The application (This is where we are going to use the available Python API to interface with the “Searchd” daemon)
I’ve also learned about basic configurations of the components above, how to aggregate multiple data sources, how do create multiple indexes (which i believe will be extremely helpful on indexing data in different languages on Tryton)
In order to increase relevancy on more languages, it is necessary to compile sphinx search from source, with support for snowball “string processing language for creating stemming algorithms” ( http://snowball.tartarus.org/ ) . This will extend support beyond the default English, Russian and Czech languages to better relevance matching for French, Spanish, Portuguese, Italian, German, Dutch, Swedish, Norwegian, Danish, Finnish languages.
I will update later on the process of indexing data.
I've learned a lot this couple weeks:
Let's start with the questions that needed answer on my previous post:
1-) There is a Python API for interfacing with Sphinx "searchd" daemon, but not much work has been done on it. No documentation is available for the python API.
2-) The API does work with BETA version of sphinx and also the stable versions
3-) Because of the number of enhancements made on the Stable Beta version, it’s recommended that we should implement using the sable beta version.
I’ve bought a book called “Introduction to Search with Sphinx”, written by lead developer of Sphinx Search.
So far, I’ve had a crash course on linguistics, morphology processing (cats = cat, mice = mouse, going = go/goes/went and so on), lemmatisation ( converting the word to its lemma/root ) stemming ( Intentionally trying to output the stem, even if it’s not necessarily a correct word.
This impacts the effectiveness of the algorithm and results returned by Sphinx search server.
It is also critical on indexing records of languages different than American English.
Aside from the linguistics, I’ve learned the basic mechanics on how “Batch Indexing” and “Real-Time Indexing” needs to be done, the different components of the Sphinx search witch are:
1-) Data Source (may be a database, files, data through xml pipe or a combination of them)
2-) The indexer ( has the rules and the queries in order to know what should be indexed)
3-) “Searchd” ( The daemon which should receive the search queries from the application, look for possible matches on the index and return relevant results to the application)
4-) The application (This is where we are going to use the available Python API to interface with the “Searchd” daemon)
I’ve also learned about basic configurations of the components above, how to aggregate multiple data sources, how do create multiple indexes (which i believe will be extremely helpful on indexing data in different languages on Tryton)
In order to increase relevancy on more languages, it is necessary to compile sphinx search from source, with support for snowball “string processing language for creating stemming algorithms” ( http://snowball.tartarus.org/ ) . This will extend support beyond the default English, Russian and Czech languages to better relevance matching for French, Spanish, Portuguese, Italian, German, Dutch, Swedish, Norwegian, Danish, Finnish languages.
I will update later on the process of indexing data.
Comments
Post a Comment