Full text search on Tryton ERP

Draft Version of Tryton_Sphinx integration

2011-07-12T20:28:00.000-04:00

I have made a draft version of the module to connect Tryton to Sphinx Search server.

This module currently gets all the objects from the Tryton.pool.Pool and make any field that was originally selected by the author to be indexed (an searchable) on the database also searchable on Sphinx Search, with the benefit of full-text indexing on char and text fields, while making other fields available as attributes.

I choose to implement this way in order to maintain compatibility with the current modules developed for Tryton 1.8, 2.0 and the future Tryton 2.2, in a manner that no module developer should make any additional effort in order to make their module compatible with the new search methodology if the Tryton.

I have pushed my code to a github repository, along with the basic instruction on how to download, install, create the sphinx configuration, index the Tryton Pool ojects and run a searchd daemon in order to listen to incoming search queries.

https://github.com/dfamorato/tryton_sphinx

Setting up sphinx.conf file for indexing tryton product data

2011-06-20T12:23:00.001-04:00

I've successfully configured sphinx to index Tryton data from a postgres database using inheritance for different languages support

Here is how I achieved this:

** Again, I am assuming that you are using ubuntu 10.04 and you followed the instructions for compiling sphinx from source as my previous blog post indicated

here is the sample file which should be on:



/etc/sphinx/etc/sphinx.conf

I will update later on how to start the indexer and the changes we need to do on the postgres side in order to enable the translated product indexing to work

I've been very very busy this couple weeks with my finals and problems sets of my college.

How to install Sphinx Search with libstemmer support on Unbuntu 10.04

2011-06-07T15:58:00.001-04:00

I have made a little recipe to install Sphinx Search with libstemmer (for added number of languages) on Ubuntu 10.04 Lucid with postgres support and no MySQL support.

Here is what needs to be done as the root user on your Ubuntu linux terminal:

0-) Before we start, we need to make sure all Sphinx dependecies are met:


aptitude install -y python-software-properties
apt-add-repository ppa:pitti/postgresql
aptitude update
aptitude install -y postgresql-server-dev-9.0 build-essential

1-) Get the current version of sphinx and uncompress it.


cd /opt
wget http://sphinxsearch.com/files/sphinx-2.0.1-beta.tar.gz
tar -xvzf sphinx-2.0.1-beta.tar.gz

2-) Get libsnowball / libstemmer to add more languages to the stemming process and uncompress it


wget http://snowball.tartarus.org/dist/libstemmer_c.tgz
tar -xvzf libstemmer_c.tgz'

3-) Copy all files from libstemmer to the sphinx directory so they can be compile together


cp -fa libstemmer_c/* sphinx-2.0.1-beta/libstemmer_c/

./configure --prefix=/etc/sphinx --without-mysql --with-pgsql --enable-id64 --with-libstemmer

5-) Make and install sphinx using 4 jobs to speed up compiling (most people have 4 core processors, so make it 4 jobs)


make -j4 install

6-) Create simbolic links to sphinx executables so it can be accesses from anywhere


cd /usr/local/bin
run('ln -s /etc/sphinx/bin/indexer')                                  
ln -s /etc/sphinx/bin/indextool                                
ln -s /etc/sphinx/bin/search                                   
ln -s /etc/sphinx/bin/searchd                                
ln -s /etc/sphinx/bin/spelldump

Now, follow instruction on the next blog post in order to configure you sphinx instalation

Status upadate Week[2-3]

2011-05-24T10:05:00.000-04:00

Status upadate Week[1-2]

I've learned a lot this couple weeks:

Let's start with the questions that needed answer on my previous post:

1-) There is a Python API for interfacing with Sphinx "searchd" daemon, but not much work has been done on it. No documentation is available for the python API.

2-) The API does work with BETA version of sphinx and also the stable versions

3-) Because of the number of enhancements made on the Stable Beta version, it’s recommended that we should implement using the sable beta version.

I’ve bought a book called “Introduction to Search with Sphinx”, written by lead developer of Sphinx Search.

So far, I’ve had a crash course on linguistics, morphology processing (cats = cat, mice = mouse, going = go/goes/went and so on), lemmatisation ( converting the word to its lemma/root ) stemming ( Intentionally trying to output the stem, even if it’s not necessarily a correct word.

This impacts the effectiveness of the algorithm and results returned by Sphinx search server.

It is also critical on indexing records of languages different than American English.

Aside from the linguistics, I’ve learned the basic mechanics on how “Batch Indexing” and “Real-Time Indexing” needs to be done, the different components of the Sphinx search witch are:
1-) Data Source (may be a database, files, data through xml pipe or a combination of them)
2-) The indexer ( has the rules and the queries in order to know what should be indexed)
3-) “Searchd” ( The daemon which should receive the search queries from the application, look for possible matches on the index and return relevant results to the application)
4-) The application (This is where we are going to use the available Python API to interface with the “Searchd” daemon)

I’ve also learned about basic configurations of the components above, how to aggregate multiple data sources, how do create multiple indexes (which i believe will be extremely helpful on indexing data in different languages on Tryton)

In order to increase relevancy on more languages, it is necessary to compile sphinx search from source, with support for snowball “string processing language for creating stemming algorithms” ( http://snowball.tartarus.org/ ) . This will extend support beyond the default English, Russian and Czech languages to better relevance matching for French, Spanish, Portuguese, Italian, German, Dutch, Swedish, Norwegian, Danish, Finnish languages.

I will update later on the process of indexing data.

Status upadate Week[0-1]

2011-05-07T15:23:00.000-04:00

To summarize, this week's work which was mostly spent on research.

* Tryton uses the attribute `select` with values 1 and 2 to indicate if the field has to be searchable or not.
* Select =1 ends up creating an index in the current backend and also reflects as a valid field for simple search in the view.
* Select = 2 does not create an index, but appears in the advanced search view.

From the Sphinx point of view, the tryton fields with `select` attribute need to be sphinx attributes as well, which allows filtering and quick searching just like how it is done now.

Furthermore the current python APIs for sphinx are as poorly documented as sphinx itself. It is yet to be confirmed if the python API also supports all the features of Sphinx server API. A good pythonic API may be needed to be written from scratch.

After a discussion with Bertrand (my mentor), maybe the best alternative would be to integrate sphinx directly under the tryton application layer, which would be a sufficient level of abstraction considering that Tryton implements security at Model and record levels on the application rather than the database. This also makes it transparent to the different backends supported by Tryton (now and in future) which are currently Postgres, MySQL and SQLite.

Questions that need answers this week:

1. Do we really need to rewrite the Python API ? Is it dead or alive ?
2. Does the API work with the <1.0, 2.x releases of Sphinx ?
3. Sphinx seems to have both <1.0 and 2.X releases with a `STABLE BETA` tag. Which one do we consider stable enough to be used with Tryton.

I will update again next week

Project Milestones

2011-04-22T19:10:00.001-04:00

The goal is to have all major development and integration done by midterm evaluation.

So far, the idea is to support for UTF-8 in full text searching, advanced syntax on search terms (Boolean operators, keyword matching), batch and real-time indexing, relevance ranking and non-text attributes support.

This project can also be a very powerful tool to other ideas that are proposed for Tryton. I believe historical time-line, e-commerce integration and email integration will benefit a lot from this full text search capability. I plan to discuss constantly with the other students and mentors how to make this tool capable of indexing the modules that will be developed on this GSOC

Start of Program (May 24)
· Analyze current implementation of searchable fields, the functions associated with search and how data is indexed;
· Evaluate design changes between versions in order to support all the 3 versions that Tryton supports (in May 24th, it will be 1.8, 2.0 and 2.1);
· Discuss with my mentor and the core development team what are the functionalities expected from this module;
· Put together all ideas, feature requests and setup milestones for each feature in a project management system publicly accessible;
· Discuss with other students implementing Email Integration and Historical Time-line;
· Specify the architecture design with mentor/core developers;
· Start Coding !
Midterm Evaluation (July 12)
· All major (critical) functions should be completed, with unit test / doctest and a complete set of English Documentation generated using sphinx documentation;
· Start performance testing and tuning;
· Start the Code Review process so this module can be distributed as Tryton Module;
· Develop the necessary changes for advance search GTK Interface;
· If core developers find it useful, I plan to “bundle” Sphinx Search Server with “NESO”, so new users can have a simple test environment to experience the potential of the full text search;· Discuss with mentor and core team the priority of the additional / optional functions and improvements in order to make sure I can complete them by the deadline
Final Evaluation (Aug 16)
· Coding for all functionalities in the roadmap are now completed, have complete unit tests, are reviewed by core team and fully documented;
· I plan to add a wiki page on the Tryton Code Snippets Wiki describing the basic usage, design patterns of this module, which will then link to a page containing all sphinx documentation;
· Any request from mentor and core team will be accommodated;

Project Abstract

2011-04-22T19:09:00.000-04:00

This project proposes implementing full text search of records and attributes in Tryton using Sphinx Search Sever ( http://sphinxsearch.com/ ) in a pythonic implementation. The idea is to improve Tryton search capability to a more flexible, scalable and powerful option.

I propose to use Sphinx Search Server because it is very fast (C++ language), can work in multiple OSes and has windows binaries (which might be included with NESO), it is very scalable, has proven success cases, we can implement in a pythonic way using SphinxAPI (which eliminates the need to install and support Apache Tomcat if we use Apache Solr), because it supports 2 of databases used by Tryton (PostgreSQL and Mysql), maybe can also support SQLlite3 through xmlpipe (if needed) and also supports NoSQL databases and future integrations (like MongoDB and CouchDB).

My commitment is to submit fully functional code, using a wiki or project management system to keep track of ideas and milestones (Google Code or Assembla), using mercurial as a repository, coding in a pythonic standard, following PEP008 Style Guide, make unit tests or doc tests, with a full set of English documentation generated using Sphinx Document Generator ( http://sphinx.pocoo.org/ ).