- Install and Configure Sphinx on MAC OSX
- Installing Sphinx¶
- Debian/Ubuntu: Install Sphinx using packaging system¶
- Other Linux distributions¶
- Mac OS X: Install Sphinx using MacPorts¶
- Windows: Install Python and Sphinx¶
- Install Python¶
- Install the pip command¶
- Installing Sphinx with pip¶
- Installing Sphinx¶
- Debian/Ubuntu: Install Sphinx using packaging system¶
- Other Linux distributions¶
- Mac OS X: Install Sphinx using MacPorts¶
- Windows: Install Python and Sphinx¶
- Install Python¶
- Install the pip command¶
- Installing Sphinx with pip¶
Install and Configure Sphinx on MAC OSX
People always search for what they need whenever they open a browser. If you’re running a website, then your website must provide people what they want when they jump into it, as quick as possible. But how? As a web developer, there’s must be a moment you think about which search engine to implement for your website. You will be standing among three options:
You may build your own search engine, basing on your database.
You may integrate Google search
Or you may choose to use standalone search engine like Sphinx or Apache Solr.
Each option has its own pros and cons: If you build you own search engine. The implementation would be quick and simple. You just need a good indexing of your database and write some SQL queries. But the supported features for searching is very poor. As your database size grows, say, some millions of rows, the time of querying may become unacceptable. Google Search is definitely quick, but it is not customizable, nor filterable. It may turn up your site for a wide-range search that make the results sometimes inaccurate.
So, to build a professional, quick, customizable search engine, you may want to use standalone search engine like Sphinx or Apache Solr. It could take a while to implement, but its benefit is much worth it. That’s why in this post I’d like to share with you my experience with Sphinx Search. Throughout this post, I hope you can quickly implement this great software into your websites. That’s why in this blog I’d like to share with you my experience with Sphinx Search.
Introduction
Sphinx is an open source full-text search engine. It is free, and it is extremely quick. I tried my test on a database of 125,000 records and 5 indexed columns, the index takes about 600 MB of disk space (I’m running Mac OSX with 2.4 GHz Intel Core I5 and 4 GB RAM). A search of a 15-words phrase took about 0.59 sec for Sphinx to yield a result for any of those words. For a 4-words phrase, it took just 0.13 sec.
The result is very detailed. For each search, beside a list of items that match, you can also get the weight of each item found, the query time and many more. Below is an example returned by SphinxQL:
Some of Sphinx’s great features
indexing text column (of course)
It is fully customized, and distributed. Therefore you can optimize your search engine in many ways to meet your hardware.
While it is not possible to create index on a view in MySQL, you can create a view joining multiple tables and Sphinx can do the indexing on that view.
Using realtime index, you can store your index data in RAM that can make searches faster.
You can associate attributes to your index for further filter search results
It support morphology and unicode search
Installation
Sphinx is written in C++, builds with the GNU compilers, supports 64-bit on capable platforms, and runs on Linux, UNIX®, Microsoft® Windows®, and Mac OS X. Installing Sphinx is simple:
If your’re using MAMP, since MAMP is missing some mysql’s library and include files that sphinx requires, then you have to upgrade your MAMP by coping the required files to the MAMP/Library/ directory.
Go to your MAMP/Library directory and create mysql directory:
Go to http://downloads.mysql.com/archives.php and download the appropriated one. You could go to phpMyAdmin to check your MAMP’s mysql version.
Untar it and copy the include/ and lib/ to the MAMP/Library/mysql you just created before.
In terminal, assuming your MAMP is located in /Applications/ , then your configure should be like below:
Components
Sphinx has three components: an indexer, a search engine daemon, and a command-line search utility. All of them locate in /usr/local/sphinx/bin/
The indexer is an index generator. It queries your database, indexes text column and ties each index entry to the row’s primary key. Data can be load into indexes via data source which will be defined in sphinx.conf . Sources can fetch data directly from MySQL, PostgreSQL, ODBC compliant database (MS SQL, Oracle, etc), or a pipe in a custom XML format
The search engine daemon is called searchd . It allows applications to access Sphinx via three methods: via native API (SphinxAPI), via Sphinx’s implementation MySQL network protocol (SphinxQL), or via MySQL server with a pluggable storage engine (SphinxSE). I haven’t try SphinxSE yet, but using native API or SphinxQL is very easy, especially SphinxQL that provides a MySQL environment to work with.
The search utility allow you to do searching from the command line without writing code. It might be useful for debugging your Sphinx configuration. Unfortunately, since there’s an issue with this utility in the version 2.0.8 (the one I am using). I’d never had the chance to use it. However, you can use the built-in SphinxQL instead. As I just mentioned above, since SphinxQL supports MySQL binary network protocol and can be accessed with regular MySQL API, it provides the same environment as MySQL server command line tool.
$ mysql -P9306 –protocol=tcp –prompt=’sphinxQL> ‘ Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 2.0.8 (r3831)
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the buffer.
mysql> SELECT * FROM test1 WHERE MATCH(‘test’) -> ORDER BY group_id ASC OPTION ranker=bm25; +——+——–+———-+————+ | id | weight | group_id | date_added | +——+——–+———-+————+ | 4 | 1442 | 2 | 1231721236 | | 2 | 2421 | 123 | 1231721236 | | 1 | 2421 | 456 | 1231721236 | +——+——–+———-+————+ 3 rows in set (0.00 sec)
Sphinx Configuration
To give you a clear picture of how to implement Sphinx, I will describe step by step the implementation of my case study: using Sphinx to search the content of articles. To simplify the case, there’re just three tables:
With the tables defined above, you can imagine that searching a phrase within the article like below are very costly:
The JOIN and the LIKE are very expensive, especially when Article is large, say, millions of rows. Not to mention that each article could have a long content. Even when using MyISAM engine which supports full-text indexing, the query time is still long. I tried a test by myself with 125,000 articles, a full-text index of three columns (title, headline and content), and it took about 35 secs to search a phrase of 4 words. (I use the same computer mentioned in Introduction section). Besides, what if I want to query the articles that contain any of the words I search?
To apply Sphinx, you must define at least one source and one index in sphinx.conf :
A source identifies a database to indexes, provides connection, authentication information, and defines the query to fetch data from database. This is the text data where you will do your search. The query’s select statement must include an unique unsigned non-zero integer number. Usually it’s the document primary key that should be selected. Optionally, a source can also define one or more columns as a filter or to sort on the result.
An index is built as a data structure optimized for queries from the text data that you defined in your source.
Before defining a source, we must define from your database a select statement from where Sphinx will fetch data. In this case, we may want to search the articles among some specific categories. So we have to join 3 tables. we can create a view like below:
And below is an example of sphinx.conf :
Indexing and Querying
Until now we have been through the hard part, configuration. Of course there’s still a lot more that we can do with the sphinx.conf , but for now it’s more than enough to get us to the fun part: indexing, querying and see the result!
We run the command below to start indexing:
Note: The -all argument makes all the indexes be built in case there are more than one index defined in sphinx.conf . You want to build only one index, just specify its name:
We can now querying the index to get what we want by using the search utility. Note that you don’t need searchd running to use search.
Oops, if you like me, you will get the error above in 2.0.8 release. This is a common issue but there’s still no fix until now. But don’t worry, this error only occurs in search utility which is used just for testing.
To actually use Sphinx in your code, you have to use the provided Sphinx native API, or SphinxQL. And of course, the searchd daemon must be running by the command:
The Sphinx APIs are located in /usr/local/sphinx/api/ directory. Below is an example using php code & Sphinx API:
Assuming the filename that contain the code is test.php. Running the command below to see the result:
As you can see, the test searches for each of the given words and returns a result of 98032 articles in 0.061 sec. The articles are also sorted by their weight.
So, I’ve been taking you through the basic implementation of Sphinx and I hope that helped. Of course, Sphinx has many more features to take advantage of and there’s still much more to do to apply it into your real world project. For example:
How indexes are rebuilt when there some updates on your database? You will have to know how index is built and stored on disk.
How to customize the configuration to make the query even faster? You will have to know about memory usage and many other tips to optimize Sphinx.
How it supports unicode searching?
What about SphinxQL and SphinxSE?
Don’t worry, from this basic point, you will answer yourself all of those questions faster than you think. The Sphinx documentation is a good place to continue.
Источник
Installing Sphinx¶
Since Sphinx is written in the Python language, you need to install Python (the required version is at least 2.6) and Sphinx.
Sphinx packages are available on the Python Package Index.
You can also download a snapshot from the Git repository:
There are introductions for several environments:
Debian/Ubuntu: Install Sphinx using packaging system¶
You may install using this command if you use Debian/Ubuntu.
Other Linux distributions¶
Most Linux distributions have Sphinx in their package repositories. Usually the package is called “python-sphinx”, “python-Sphinx” or “sphinx”. Be aware that there are two other packages with “sphinx” in their name: a speech recognition toolkit (CMU Sphinx) and a full-text search database (Sphinx search).
Mac OS X: Install Sphinx using MacPorts¶
If you use Mac OS X MacPorts, use this command to install all necessary software.
To set up the executable paths, use the port select command:
Type which sphinx-quickstart to check if the installation was successful.
Windows: Install Python and Sphinx¶
Install Python¶
Most Windows users do not have Python, so we begin with the installation of Python itself. If you have already installed Python, please skip this section.
Go to https://www.python.org/, the main download site for Python. Look at the left sidebar and under “Quick Links”, click “Windows Installer” to download.
Currently, Python offers two major versions, 2.x and 3.x. Sphinx 1.3 can run under Python 2.6, 2.7, 3.3, 3.4, with the recommended version being 2.7. This chapter assumes you have installed Python 2.7.
Follow the Windows installer for Python.
After installation, you better add the Python executable directories to the environment variable PATH in order to run Python and package commands such as sphinx-build easily from the Command Prompt.
Right-click the “My Computer” icon and choose “Properties”
Click the “Environment Variables” button under the “Advanced” tab
If “Path” (or “PATH”) is already an entry in the “System variables” list, edit it. If it is not present, add a new variable called “PATH”.
Add these paths, separating entries by ”;”:
- C:\Python27 – this folder contains the main Python executable
- C:\Python27\Scripts – this folder will contain executables added by Python packages installed with pip (see below)
This is for Python 2.7. If you use another version of Python or installed to a non-default location, change the digits “27” accordingly.
Now run the Command Prompt. After command prompt window appear, type python and Enter. If the Python installation was successful, the installed Python version is printed, and you are greeted by the prompt >>> . Type Ctrl+Z and Enter to quit.
Install the pip command¶
Python has a very useful pip command which can download and install 3rd-party libraries with a single command. This is provided by the Python Packaging Authority(PyPA): https://groups.google.com/forum/#!forum/pypa-dev
To install pip, download https://bootstrap.pypa.io/get-pip.py and save it somewhere. After download, invoke the command prompt, go to the directory with get-pip.py and run this command:
Now pip command is installed. From there we can go to the Sphinx install.
pip has been contained in the Python official installation after version of Python-3.4.0 or Python-2.7.9.
Installing Sphinx with pip¶
If you finished the installation of pip, type this line in the command prompt:
After installation, type sphinx-build -h on the command prompt. If everything worked fine, you will get a Sphinx version number and a list of options for this command.
That it. Installation is over. Head to First Steps with Sphinx to make a Sphinx project.
Источник
Installing Sphinx¶
Since Sphinx is written in the Python language, you need to install Python (the required version is at least 2.7) and Sphinx.
Sphinx packages are available on the Python Package Index.
You can also download a snapshot from the Git repository:
There are introductions for several environments:
Debian/Ubuntu: Install Sphinx using packaging system¶
You may install using this command if you use Debian/Ubuntu.
Other Linux distributions¶
Most Linux distributions have Sphinx in their package repositories. Usually the package is called “python-sphinx”, “python-Sphinx” or “sphinx”. Be aware that there are two other packages with “sphinx” in their name: a speech recognition toolkit (CMU Sphinx) and a full-text search database (Sphinx search).
Mac OS X: Install Sphinx using MacPorts¶
If you use Mac OS X MacPorts, use this command to install all necessary software.
To set up the executable paths, use the port select command:
Type which sphinx-quickstart to check if the installation was successful.
Windows: Install Python and Sphinx¶
Install Python¶
Most Windows users do not have Python, so we begin with the installation of Python itself. If you have already installed Python, please skip this section.
Go to https://www.python.org/, the main download site for Python. Look at the left sidebar and under “Quick Links”, click “Windows Installer” to download.
Currently, Python offers two major versions, 2.x and 3.x. Sphinx 1.3 can run under Python 2.7, 3.4, 3.5, with the recommended version being 2.7. This chapter assumes you have installed Python 2.7.
Follow the Windows installer for Python.
After installation, you better add the Python executable directories to the environment variable PATH in order to run Python and package commands such as sphinx-build easily from the Command Prompt.
Right-click the “My Computer” icon and choose “Properties”
Click the “Environment Variables” button under the “Advanced” tab
If “Path” (or “PATH”) is already an entry in the “System variables” list, edit it. If it is not present, add a new variable called “PATH”.
Add these paths, separating entries by ”;”:
- C:\Python27 – this folder contains the main Python executable
- C:\Python27\Scripts – this folder will contain executables added by Python packages installed with pip (see below)
This is for Python 2.7. If you use another version of Python or installed to a non-default location, change the digits “27” accordingly.
Now run the Command Prompt. After command prompt window appear, type python and Enter. If the Python installation was successful, the installed Python version is printed, and you are greeted by the prompt >>> . Type Ctrl+Z and Enter to quit.
Install the pip command¶
Python has a very useful pip command which can download and install 3rd-party libraries with a single command. This is provided by the Python Packaging Authority(PyPA): https://groups.google.com/forum/#!forum/pypa-dev
To install pip, download https://bootstrap.pypa.io/get-pip.py and save it somewhere. After download, invoke the command prompt, go to the directory with get-pip.py and run this command:
Now pip command is installed. From there we can go to the Sphinx install.
pip has been contained in the Python official installation after version of Python-3.4.0 or Python-2.7.9.
Installing Sphinx with pip¶
If you finished the installation of pip, type this line in the command prompt:
After installation, type sphinx-build -h on the command prompt. If everything worked fine, you will get a Sphinx version number and a list of options for this command.
That it. Installation is over. Head to First Steps with Sphinx to make a Sphinx project.
Источник