Lucene index backup_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-03 06:33 出处：网络

What is the best practice to backup a lucene index without taking the index offline (hot ba开发者_StackOverflow社区ckup)?You don\'t have to stop your IndexWriter in order to take a backup of the index

相关专题：lucene

What is the best practice to backup a lucene index without taking the index offline (hot ba开发者_StackOverflow社区ckup)?

You don't have to stop your IndexWriter in order to take a backup of the index.

Just use the SnapshotDeletionPolicy, which lets you "protect" a given commit point (and all files it includes) from being deleted. Then, copy the files in that commit point to your backup, and finally release the commit.

It's fine if the backup takes a while to run -- as long as you don't release the commit point with SnapshotDeletionPolicy, the IndexWriter will not delete the files (even if, eg, they have since been merged together).

This gives you a consistent backup which is a point-in-time image of the index without blocking ongoing indexing.

I wrote about this in Lucene in Action (2nd edition), and there's paper excerpted from the book available (free) from http://www.manning.com/hatcher3, "Hot Backups with Lucene", that describes this in more detail.

This answer depends upon (a) how big your index is and (b) what OS you are using. It is suitable for large indexes hosted on Unix operating systems, and is based upon the Solr 1.3 replication strategy.

Once a file has been created, Lucene will not change it, it will only delete it. Therefore, you can use a hard link strategy to make a backup. The approach would be:

stop indexing (and do a commit?), so that you can be sure you won't snapshot mid write
create a hard link copy of your index files (using cp -lr)
restart indexing

The cp -lr will only copy the directory structure and not the files, so even a 100Gb index should copy in less than a second.

In my opinion it would typically be enough to stop any ongoing indexing operation and simply take a file copy of your index files. Also look at the snapshooter script from Solr which can be found in apache-solr-1.4.1/src/scripts, which essentially does:

cp -lr indexLocation backupLocation

Another options might be to have a look at the Directory.copy(..) routine for a progammatic approach (e.g., using the same Directory given as constructor parameter to the IndexWriter. You might also be interested in Snapshooter.java which does the equivalent of the script.

Create a new index with a separate IndexWriter and use addIndexesNoOptimize() to merge the running index into the new one. This is very slow, but it allows you keep the original index operational while doing the backup.

However, you cannot write to the index while merging. So even if it is online and you can query the index, you cannot write to it during the backup.