开发者

Faster way to get multiple FileInfo's?

开发者 https://www.devze.com 2023-01-28 02:56 出处:网络
This is a longshot, but is there a faster way to get size, lastaccessedtime, lastcreated time etc开发者_运维技巧 for multiple files?

This is a longshot, but is there a faster way to get size, lastaccessedtime, lastcreated time etc开发者_运维技巧 for multiple files?

I have a long list of file paths (so I needn't enumerate) and need to look up that information as quickly as possible. Creating FileInfo's in parallel probably won't help much since the bottleneck should be the disk.

The NTFS Journal only keeps the filenames unfortunately otherwise that' be great, i guess the OS doesn't store that meta information somewhere?

One other optimization that might be done if there's a static or Win32 call (File methods only allows me to get one piece of information at a time though) method that fetches the information rather that creating a bunch of FileInfo objects

Anyways, glad if anyone know something that might help, unfortunately I do have to have to do micro optimization here and no "using a database" isn't a viable answer ;)


There are static methods on System.IO.File to get what you want. It's a micro-optimization, but it might be what you need: GetLastAccessTime, GetCreationTime.

Edit

I'll leave the text above because you specifically asked for static methods. However, I think you are better off using FileInfo (you should measure just to be sure). Both File and FileInfo uses an internal method on File called FillAttributeInfo to get the data you are after. For the properties you need, FileInfo will need to call this method once. File will have to call it on each call, since the attribute info object is thrown away when the method finishes (since it's static).

So my hunch is, when you need multiple attributes, a FileInfo for each file will be faster. But in performance situations, you should always measure ! Faced with this problem, I would try both managed options as outlined above and make a benchmark, both when running in serial and in parallel. Then decide if it's fast enough.

If it is not fast enough, you need to resort into calling the Win32 API directly. It wouldn't be too hard to look at File.FileAttributeInfo in the reference sources and come up with something similar.

2nd Edit

In fact, if you really need it, this is the code required to call the Win32 API directly using the same approach as the internal code for File does, but using one OS call to get all the attributes. I think you should only use if it is really neccessary. You'll have to parse from FILETIME to a usable datetime yourself, etc, so you get some more work to do manually.

static class FastFile
{
    private const int MAX_PATH = 260;
    private const int MAX_ALTERNATE = 14;

    public static WIN32_FIND_DATA GetFileData(string fileName)
    {
        WIN32_FIND_DATA data;
        IntPtr handle = FindFirstFile(fileName, out data);
        if (handle == IntPtr.Zero)
            throw new IOException("FindFirstFile failed");
        FindClose(handle);
        return data;
    }

    [DllImport("kernel32")]
    private static extern IntPtr FindFirstFile(string fileName, out WIN32_FIND_DATA data);

    [DllImport("kernel32")]
    private static extern bool FindClose(IntPtr hFindFile);


    [StructLayout(LayoutKind.Sequential)]
    public struct FILETIME
    {
        public uint dwLowDateTime;
        public uint dwHighDateTime;
    }
    [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
    public struct WIN32_FIND_DATA
    {
        public FileAttributes dwFileAttributes;
        public FILETIME ftCreationTime;
        public FILETIME ftLastAccessTime;
        public FILETIME ftLastWriteTime;
        public int nFileSizeHigh;
        public int nFileSizeLow;
        public int dwReserved0;
        public int dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_PATH)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_ALTERNATE)]
        public string cAlternate;
    }
}


.NET's DirectoryInfo and FileInfo classes are incredibly slow in this matter, especially when used with network shares.

If many of the files to be "scanned" are in the same directory, you'll get much faster results (depending on the situation: by dimensions faster) by using the Win32 API's FindFirstFile, FindNextFile and FindClose functions. This is even true if you have to ask for more information that you actually need (e.g. if you ask for all ".log" files in a directory, where you only need 75% of them).

Actually, .NET's info classes also use these Win32 API functions internally. But they only "remmeber" the file names. When asking for more information on a bunch of files (e.g. LastModified), a separate (network) request is made for each file, which taskes time.


Is it possible to use DirectoryInfo class?

 DirectoryInfo d = new DirectoryInfo(@"c:\\Temp");
 FileInfo[] f= d.GetFiles()


I think you're looking for the GetFileAttributesEx function (pinvoke.net link). However, the FileInfo class (or rather, its base class) uses this internally anyway, so I doubt you're going see any performance improvement.


If the file system is remote then parallelism may help since the network may be the bottleneck.

This test case showed ~5x (52s => 11s) improvement for 50k files using 8 threads. Also avoiding lock() was critical since calling it 50k has large impact. The timings were made without running the debugger.

This also illustrates that the work of getting the file length is not performed until the FileInfo.Length is accessed. Accessing Length again after the parallel section is instantaneous. This may be a bit overly implementation dependent.

// ~4s
//
List<string> files = Directory.EnumerateFileSystemEntries(directory, "*", SearchOption.AllDirectories)
    .ToList();

// ~0s
// 
Dictionary<string, FileInfo> fileMap = files.Select(file => new
{
    file,
    info = new FileInfo(file)
})
.ToDictionary(f => f.file, f => f.info);

// ~10s
//
Int64 totalSize = fileMap.Where(kv => kv.Value != null)
    .AsParallel() // ~50s w/o this 
    .Select(kv =>
    {
        try
        {
            return kv.Value.Length;
        }
        catch (FileNotFoundException)  // a transient file or directory
        {
        }
        catch (UnauthorizedAccessException)
        {
        }
        return 0;
    })
    .Sum();
0

精彩评论

暂无评论...
验证码 换一张
取 消