开发者

looping for PDF files

开发者 https://www.devze.com 2023-01-27 00:22 出处:网络
My program goes into a directory and searches for pdf files to parse them. This program is always running so I need to make sure to don\'t parse the same file over again.

My program goes into a directory and searches for pdf files to parse them. This program is always running so I need to make sure to don't parse the same file over again.

I used a list to store the file names and then check if they are in there.

My code does not work in respect to that, if anyone can take a look and see whats wrong it would be greatly appreciated.

FileInfo[] filePaths = di.GetFiles("*.pdf");
for (int i = 0; i < filePaths.Length; i++)
{
    foreach (string fileName in usedFileNames)
    {
        if (fileName.Equals(filePaths[i].Name))
        {
            isInList = true;
        }
        else
        {
            isInList = false;
        }
    }
    if (isInList == false)
    {
        PDFReaderChooser chooser = new PDFReaderChooser(fil开发者_运维知识库ePaths[i].Name);
        usedFileNames.Add(filePaths[i].Name);
    }

}


More concise still:

var fileNames = di.GetFiles("*.pdf")
                  .Select(f => f.Name)
                  .Where(n => !usedFileNames.Contains(n));
usedFileNames.AddRange(fileNames);

foreach (var fileName in fileNames)
{
    var chooser = new PDFReaderChooser(fileName);
}

This nicely abstracts away the logic that figures out which file names you need to process (outside the loop), from the logic that processes them (inside the loop).


While the other answers are better solutions to the problem, they don't explain why the original code didn't work. The problem is that the algorithm overwrites the value of the isInList variable, which will therefore only be true for the last file in the list. This would fix that problem:

FileInfo[] filePaths = di.GetFiles("*.pdf"); 
for (int i = 0; i < filePaths.Length; i++) 
{ 
    isInList = false
    foreach (string fileName in usedFileNames) 
    { 
        if (fileName.Equals(filePaths[i].Name)) 
        { 
            isInList = true;
            break;
        } 
    } 
    if (isInList == false) 
    { 
        PDFReaderChooser chooser = new PDFReaderChooser(filePaths[i].Name); 
        usedFileNames.Add(filePaths[i].Name); 
    } 
} 

I would add that it is better to use a HashSet instead of a List for your usedFileNames collection. The hash set is designed to determine efficiently whether it contains a given item. The list, if I recall correctly, does a linear search, which (for large numbers of items) is inefficient.


A LINQ Contains operation would make this much more concise (assuming usedFileNames is a List<string>):

FileInfo[] filePaths = di.GetFiles("*.pdf");
foreach(FileInfo myInfo in filePaths)
{
    if (!usedFileNames.Contains(myInfo.Name))
    {
        PDFReaderChooser chooser = new PDFReaderChooser(myInfo.Name);
        usedFileNames.Add(myInfo.Name);
    }

}


Try this:

FileInfo[] filePaths = di.GetFiles("*.pdf");
foreach(FileInfo fInfo in filePaths)
{
    if (!usedFileNames.Contains(fInfo.Name))
    {
        PDFReaderChooser chooser = new PDFReaderChooser(fInfo.Name);
        usedFileNames.Add(fInfo.Name);
    }
}

As I commented on your question, the code you posted doesn't work because you have to insert a break statement, like this:

for (int i = 0; i < filePaths.Length; i++)
{
    bool isInList = false;

    foreach (string fileName in usedFileNames)
    {
        if (fileName.Equals(filePaths[i].Name))
            isInList = true;
    }

    if (isInList == false)
    {
        Console.WriteLine("Not in list! #{0}", x);
        usedFileNames.Add(filePaths[i].Name);
    }
}

Anyway I recommend you to use one of the techniques shown in this question replies.

0

精彩评论

暂无评论...
验证码 换一张
取 消