开发者

concurrent file processing

开发者 https://www.devze.com 2023-02-15 23:21 出处:网络
I have a directory where a lot of files are saved dynamically. Currently there is a task which开发者_如何学Go lists the files from time to time and processes them sequentially (writing to a database).

I have a directory where a lot of files are saved dynamically. Currently there is a task which开发者_如何学Go lists the files from time to time and processes them sequentially (writing to a database). Due to the increasing number of files it is necessary to implement parallel processing of these files. Can you give me some ideas and a code example in java, please?


Use an ExecutorService. Create a Executors.newFixedThreadExecutor(n); you can probably make the file processing into a single runnable (or callable) task and have it pass in a File that you can work on

ExecutorService service = Executors.newFixedThreadExecutor(10);

for(final File file : directory.listFiles()){
   service.submit(new Runnable(){
        public void run(){
             //do work here on file object
        }
   });
}


Take a look at the Watch Servie API in java.nio.file. Here's documentation and a tutorial: http://download.oracle.com/javase/tutorial/essential/io/notification.html

This service lets you register for file notification changes on a directory. For every notification you can do whatever processing you want. Probably a lot easier than implementing your own thing.


create a class saver extends Thread and handle the file manipulation there ( in run() method)?


http://download.oracle.com/javase/tutorial/essential/concurrency/

http://download.oracle.com/javase/7/docs/api/java/lang/Thread.html


It's not really obvious if you're familiar with concurrency in Java, so I'd start by taking a look at the the Java Concurrency Tutorial. It's a good place to start.

Then keep in mind that any object that needs to be accessed by multiple threads should be immutable or synchronized.

Following that you can have a thread pool using an ExecutorService and have a number of threads run simultaneously.

I know that it's not the same process essentially but assuming you know how to handle the files, you can take a look at the following questions about multithreading in different context: questions around synchronization in java; when/how/to what extent

Parallel-processing in Java; advice needed i.e. on Runnanble/Callable interfaces


If I understand correctly your single task processing from reading to loading in DB. You can break this task into different task based on the nature (DB centric, CPU centric or IO centric). For example you can have different tasks as follows

  1. Current task which picks the file from the directory and pass it to next task.

  2. IO Centric - New task to read the file and store in memory then pass to next taks.

  3. DB centric - New task to load the data from memory to database and then clean the memory.

  4. IO centric - move the file to some other place.

To further improve the performance you can implement task 2, 3, 4 using thread pool.This will allow to process many file parallely. Based on the complexity of the task you can add or remove any task from the list to suit your requirement.

0

精彩评论

暂无评论...
验证码 换一张
取 消