Posts

Showing posts from 2011

Draft Version of Tryton_Sphinx integration

I have made a draft version of the module to connect Tryton to Sphinx Search server. This module currently gets all the objects from the Tryton.pool.Pool and make any field that was originally selected by the author to be indexed (an searchable) on the database also searchable on Sphinx Search, with the benefit of full-text indexing on char and text fields, while making other fields available as attributes. I choose to implement this way in order to maintain compatibility with the current modules developed for Tryton 1.8, 2.0 and the future Tryton 2.2, in a manner that no module developer should make any additional effort in order to make their module compatible with the new search methodology if the Tryton. I have pushed my code to a github repository, along with the basic instruction on how to download, install, create the sphinx configuration, index the Tryton Pool ojects and run a searchd daemon in order to listen to incoming search queries. https://github.com/dfamorato/try...

Setting up sphinx.conf file for indexing tryton product data

I've successfully configured sphinx to index Tryton data from a postgres database using inheritance for different languages support Here is how I achieved this: ** Again, I am assuming that you are using ubuntu 10.04 and you followed the instructions for compiling sphinx from source as my previous blog post indicated here is the sample file which should be on: /etc/sphinx/etc/sphinx.conf I will update later on how to start the indexer and the changes we need to do on the postgres side in order to enable the translated product indexing to work I've been very very busy this couple weeks with my finals and problems sets of my college.

How to install Sphinx Search with libstemmer support on Unbuntu 10.04

I have made a little recipe to install Sphinx Search with libstemmer (for added number of languages) on Ubuntu 10.04 Lucid with postgres support and no MySQL support. Here is what needs to be done as the root user on your Ubuntu linux terminal: 0-) Before we start, we need to make sure all Sphinx dependecies are met: aptitude install -y python-software-properties apt-add-repository ppa:pitti/postgresql aptitude update aptitude install -y postgresql-server-dev-9.0 build-essential 1-) Get the current version of sphinx and uncompress it. cd /opt wget http://sphinxsearch.com/files/sphinx-2.0.1-beta.tar.gz tar -xvzf sphinx-2.0.1-beta.tar.gz 2-) Get libsnowball / libstemmer to add more languages to the stemming process and uncompress it wget http://snowball.tartarus.org/dist/libstemmer_c.tgz tar -xvzf libstemmer_c.tgz' 3-) Copy all files from libstemmer to the sphinx directory so they can be compile together cp -fa libstemmer_c/* sphinx-2.0.1-beta/libstemmer_c/ ./configure --prefix=/etc/...

Status upadate Week[2-3]

Status upadate Week[1-2] I've learned a lot this couple weeks: Let's start with the questions that needed answer on my previous post: 1-) There is a Python API for interfacing with Sphinx "searchd" daemon, but not much work has been done on it. No documentation is available for the python API. 2-) The API does work with BETA version of sphinx and also the stable versions 3-) Because of the number of enhancements made on the Stable Beta version, it’s recommended that we should implement using the sable beta version. I’ve bought a book called “Introduction to Search with Sphinx”, written by lead developer of Sphinx Search. So far, I’ve had a crash course on linguistics, morphology processing (cats = cat, mice = mouse, going = go/goes/went and so on), lemmatisation ( converting the word to its lemma/root ) stemming ( Intentionally trying to output the stem, even if it’s not necessarily a correct word. This impacts the effectiveness of the algorithm and results returned b...

Status upadate Week[0-1]

To summarize, this week's work which was mostly spent on research. * Tryton uses the attribute `select` with values 1 and 2 to indicate if the field has to be searchable or not. * Select =1 ends up creating an index in the current backend and also reflects as a valid field for simple search in the view. * Select = 2 does not create an index, but appears in the advanced search view. From the Sphinx point of view, the tryton fields with `select` attribute need to be sphinx attributes as well, which allows filtering and quick searching just like how it is done now. Furthermore the current python APIs for sphinx are as poorly documented as sphinx itself. It is yet to be confirmed if the python API also supports all the features of Sphinx server API. A good pythonic API may be needed to be written from scratch. After a discussion with Bertrand (my mentor), maybe the best alternative would be to integrate sphinx directly under the tryton application layer, which would be a sufficie...

Project Milestones

The goal is to have all major development and integration done by midterm evaluation. So far, the idea is to support for UTF-8 in full text searching, advanced syntax on search terms (Boolean operators, keyword matching), batch and real-time indexing, relevance ranking and non-text attributes support. This project can also be a very powerful tool to other ideas that are proposed for Tryton. I believe historical time-line, e-commerce integration and email integration will benefit a lot from this full text search capability. I plan to discuss constantly with the other students and mentors how to make this tool capable of indexing the modules that will be developed on this GSOC Start of Program (May 24) · Analyze current implementation of searchable fields, the functions associated with search and how data is indexed; · Evaluate design changes between versions in order to support all the 3 versions that Tryton supports (in May 24th, it will be 1.8, 2.0 and 2.1); · Discuss w...

Project Abstract

This project proposes implementing full text search of records and attributes in Tryton using Sphinx Search Sever ( http://sphinxsearch.com/ ) in a pythonic implementation. The idea is to improve Tryton search capability to a more flexible, scalable and powerful option. I propose to use Sphinx Search Server because it is very fast (C++ language), can work in multiple OSes and has windows binaries (which might be included with NESO), it is very scalable, has proven success cases, we can implement in a pythonic way using SphinxAPI (which eliminates the need to install and support Apache Tomcat if we use Apache Solr), because it supports 2 of databases used by Tryton (PostgreSQL and Mysql), maybe can also support SQLlite3 through xmlpipe (if needed) and also supports NoSQL databases and future integrations (like MongoDB and CouchDB). My commitment is to submit fully functional code, using a wiki or project management system to keep track of ideas and milestones (Google Code or Assembl...