开发者

Does Sphinx auto update is index when you add data to your SQL?

开发者 https://www.devze.com 2023-04-08 16:34 出处:网络
I am curious as to whether or not Sphinx will auto update its index when you add new SQL data or whether you have to tell it specifically to reindex 开发者_StackOverflow社区your db.

I am curious as to whether or not Sphinx will auto update its index when you add new SQL data or whether you have to tell it specifically to reindex 开发者_StackOverflow社区your db.

If it doesn't, does anyone have an example of how to automate this process when the database data changes?


The answer is no and you need to tell sphinx to reindex your db.

There are some steps and requirements which you need to know:

  1. Main and delta are requirement
  2. First run you need to index your main index.
  3. After the first run, you may index delta by rotating it ( to make sure the service is running and the data on the web is can be used at the time )
  4. Before you go further step, you need to create a table to mark your "last indexed rows". THe last indexed rows ID could be used for the next indexing delta and merging delta into main.
  5. You need to merge your delta index to the main index. as inside the sphinx documents http://sphinxsearch.com/docs/current.html#index-merging
  6. Restart sphinx service.

    TIPS: Create your own program that could execute the index by using C# or other languages. You may try the task schedule of windows also can.

Here is my conf:

source Main
{
type            = mysql

sql_host        = localhost
sql_user        = root
sql_pass        = password
sql_db          = table1
sql_port        = 3306  # optional, default is 3306
sql_query_pre = REPLACE INTO table1.sph_counter SELECT 1, MAX(PageID) FROM table1.pages;
sql_query       = \
    SELECT  pd.`PageID`, pd.Status from table1.pages pd
    WHERE pd.PageID>=$start AND pd.PageID<=$end \
    GROUP BY pd.`PageID`

sql_attr_uint       = Status

sql_query_info      = SELECT * FROM table1.`pages` pd WHERE pd.`PageID`=$id
sql_query_range     = SELECT MIN(PageID),MAX(PageID)\
              FROM tabl1.`pages`
sql_range_step      = 1000000
}


source Delta : Main
{
sql_query_pre = SET NAMES utf8

sql_query = \
    SELECT  PageID, Status from pages \
    WHERE PageID>=$start AND PageID<=$end 

sql_attr_uint       = Status

sql_query_info      = SELECT * FROM table1.`pages` pd WHERE pd.`PageID`=$id
sql_query_range     = SELECT (SELECT MaxDoc FROM table1.sph_counter WHERE ID = 1) MinDoc,MAX(PageID) FROM table1.`pages`;
sql_range_step      = 1000000
}


index Main
{
source          = Main
path            = C:/sphinx/data/Main
docinfo         = extern
charset_type        = utf-8
}


index Delta : Main
{
    source = Delta
path = C:/sphinx/data/Delta
charset_type = utf-8
}


As found in the sphinx documentation part about real-time indexes

Real-time indexes (or RT indexes for brevity) are a new backend that lets you insert, update, or delete documents (rows) on the fly.

So to update an index on the fly you would just need to make a query like

{INSERT | REPLACE} INTO index [(column, ...)]
VALUES (value, ...)
[, (...)]


To expand on Anne's answer - if you're using SQL indices, it won't update automatically. You can manage the process of reindexing after every change - but that can be expensive. One way to get around this is have a core index with everything, and then a delta index with the same structure that indexes just the changes (this could be done by a boolean or timestamp column).

That way, you can just reindex the delta index (which is smaller, and thus faster) on a super-regular basis, and then process both core and delta together less regularly (but still, best to do it at least daily).

But otherwise, the new RT indices are worth looking at - you still need to update things yourself, and it's not tied to the database, so it's a different mindset. Also: RT indices don't have all the features that SQL indices do, so you'll need to decide what's more important.

0

精彩评论

暂无评论...
验证码 换一张
取 消