I'm unsure of the best solution for this but this is what I've done.
I'm using PHP to look into a directory that contains zip files.
These zip files contain text files that are to be loaded into an oracle database through SqlLoader (sqlldr).
I want to be able to start more than one PHP process via the command line to load these zip files into the db.
If other 'php loader' processes are running, they shouldn't overlap and try to load the same zip file. I know I could start one process and let it process each zip file but I'd rather start up a new process for incoming zip files so I can load concurrently.
Right now, I've created a class that will 'lock' a zip file, a directory, or a generic text file by creating a file called 'filename.ext.lock'. Other process that start up will check to see开发者_Go百科 if a file has been 'locked' in this way, if it has it will skip that file and move on to another file for processing.
I've made a class that uses a directory and creates 'process id' files so that each PHP process has an id it can use for logging purposes and for identifying which PHP process has locked the file.
I'm on a windows machine and it isn't in the plan to make this an ubuntu machine, for those of you that might suggest pcntl.
What other solutions do you see? I know that this isn't truly synchronized because a lock file might be about to be created and then a context switch occurs and then another PHP process 'locks' the file before the first one can create the lock file.
Can you please provide me with some ideas about how I can make this solution better? A java implementation? Erlang?
Also forgot to mention, the PHP process connects to the DB to fetch metadata about the files that it is going to load via SqlLoader. I don't think that is important but just in case.
Quick note : I'm aware that sqlldr locks the table it is loading and that if multiple processes try to load to the same table it will become a bottle neck. To alleviate this problem I plan on making a directory that will contain files name after tables that are currently being loaded. After a table has completed loading the respective file will be deleted and other processes will check that it is safe to load that table.
Extra information : I'm using 7zip to unzip the files and php's exec to perform these commands.
I'm using exec to call sqlldr as well.
The zip files can be huge (1gb) and loading one table can take up to an 1hr.
Rather than creating a .lock file, you can just rename the zip file when a loader start to process a zip file. e.g. "foobar.zip.bar", the process should be faster than creating a new file on disk.
But it doesn't ensure your next loader will be loaded after the file rename. You should at least have some controls loading new loaders in another script.
Also, just some side suggestion, its possible to emulate threading in PHP using CURL, you might want to try it out.
https://web.archive.org/web/20091014034235/http://www.ibuildings.co.uk/blog/archives/811-Multithreading-in-PHP-with-CURL.html
I do not know if I understand right, but I have a suggestion: get the lock files with a prefix of priority.
Example: 10-script.php started
20-script.php started (enters a loop waiting for a 10-foobar.ext.lock)
while 10-foobar.ext.lock is not generated by 10-script.php, still waiting
30-script.php will have to wait for 10-foobar.ext.lock and 20-example.ext.lock
I tried to find pcntl_fork with cygwin, but found nothing that works
精彩评论