开发者

Java, Linux: how to detect whether two java.io.Files refer to the same physical file

开发者 https://www.devze.com 2023-03-02 14:34 出处:网络
I\'m looking for an efficient way to detect whether two java.io.Files refer to the same physical file. According to the docs, File.equals()开发者_StackOverflow should do the job:

I'm looking for an efficient way to detect whether two java.io.Files refer to the same physical file. According to the docs, File.equals()开发者_StackOverflow should do the job:

Tests this abstract pathname for equality with the given object. Returns true if and only if the argument is not null and is an abstract pathname that denotes the same file or directory as this abstract pathname.

However, given a FAT32 partition (actually a TrueCrypt container) which is mounted at /media/truecrypt1:

new File("/media/truecrypt1/File").equals(new File("/media/truecrypt1/file")) == false

Would you say that this conforms to the specification? And in this case, how to work around that problem?

Update: Thanks to commenters, for Java 7 I've found java.io.Files.isSameFile() which works for me.


The answer in @Joachim's comment is normally correct. The way to determine if two File object refer to the same OS file is to use getCanonicalFile() or getCanonicalPath(). The javadoc says this:

"A canonical pathname is both absolute and unique. [...] Every pathname that denotes an existing file or directory has a unique canonical form."

So the following should work:

File f1 = new File("/media/truecrypt1/File");  // different capitalization ...
File f2 = new File("/media/truecrypt1/file");  // ... but same OS file (on Windows)
if (f1.getCanonicalPath().equals(f2.getCanonicalPath())) {
    System.out.println("Files are equal ... no kittens need to die.");
}

However, it would appear that you are viewing a FAT32 file system mounted on UNIX / Linux. AFAIK, Java does not know that this is happening, and is just applying the generic UNIX / Linux rules for file names ... which give the wrong answer in this scenario.

If this is what is really happening, I don't think there is a reliable solution in pure Java 6. However,

  • You could do some hairy JNI stuff; e.g. get the file descriptor numbers and then in native code, use the fstat(2) system call to get hold of the two files' device and inode numbers and comparing those.

  • Java 7 java.nio.file.Path.equals(Object) looks like it might give the right answer if you call resolve() on the paths first to resolve symlinks. (It is a little unclear from the javadoc whether each mounted filesystem on Linux will correspond to a distinct FileSystem object.)

  • The Java 7 tutorials have this section on seeing if two Path objects are for the same file ... which recommends using java.nio.file.Files.isSameFile(Path, Path)


Would you say that this conforms to the specification?

No and yes.

  • No in the sense that the getCanonicalPath() method is not returning the same value for each existing OS file ... which is what you'd expect from reading the javadoc.

  • Yes in the technical sense that the Java codebase (not the javadoc) is the ultimate specification ... both in theory and in practice.


you could try to obtain an exclusive write lock on the file, and see if that fails:

boolean isSame;
try {
   FileOutputStream file1 = new FileOutputStream (file1);
   FileOutputStream file2 = new FileOutputStream (file2);
   FileChannel channel1 = file1.getChannel();
   FileChannel channel2 = file2.getChannel();
   FileLock fileLock1 = channel1.tryLock();
   FileLock fileLock2 = channel2.tryLock();
   isSame = fileLock2 != null;
} catch(/*appropriate exceptions*/) {
   isSame = false;
} finally {
   fileLock1.unlock();
   fileLock2.unlock();
   file1.close();
   file2.close();
   ///cleanup etc...
}
System.out.println(file1 + " and " + file2 + " are " + (isSame?"":"not") + " the same");

This is not always guaranteed to be correct tho - because another process could potentially have obtained the lock, and thus fail for you. But at least this doesn't require you to shell out to an external process.


There's a chance the same file has two paths (e.g. over the network \\localhost\file and \\127.0.0.1\file would refer to the same file with a different path). I would go with comparing digests of both files to determine whether they are identical or not. Something like this

public static void main(String args[]) {
    try {
        File f1 = new File("\\\\79.129.94.116\\share\\bots\\triplon_bots.jar");
        File f2 = new File("\\\\triplon\\share\\bots\\triplon_bots.jar");
        System.out.println(f1.getCanonicalPath().equals(f2.getCanonicalPath()));
        System.out.println(computeDigestOfFile(f1).equals(computeDigestOfFile(f2)));
    }
    catch(Exception e) {
        e.printStackTrace();
    }
}

private static String computeDigestOfFile(File f) throws Exception {
    MessageDigest md = MessageDigest.getInstance("MD5");
    InputStream is = new FileInputStream(f);
    try {
        is = new DigestInputStream(is, md);
        byte[] buffer = new byte[1024];
        while(is.read(buffer) != -1) {
            md.update(buffer);
        }
    }
    finally {
        is.close();
    }
    return new BigInteger(1,md.digest()).toString(16);
}

It outputs

false
true

This approach is of course much slower than any sort of path comparison, it also depends on the size of files. Another possible side effect is that two files will be considered equals equal indifferently from their locations.


The Files.isSameFile method was added for exactly this kind of usage - that is, you want to check if two non-equal paths locate the same file.


On *nix systems, casing does have an importance. file is not the same as File or fiLe.


The API doc of equals() says (right after your quote):

On UNIX systems, alphabetic case is significant in comparing pathnames; on Microsoft Windows systems it is not.


You can try Runtime.exec() of

ls -i /fullpath/File # extract the inode number.
df /fullpath/File # extract the "Mounted on" field.

If the mount point and the "inode" number is the same, they are the same file whether you have symbolic links or case-insensitive file systems.

Or even

bash test "file1" -ef "file2"

FILE1 and FILE2 have the same device and inode numbers


The traditional Unix way to test whether two filenames refer to the same underlying filesystem object is to stat them and test whether they have the same [dev,ino] pair.

That does assume no redundant mounts, however. If those are allowed, you have to go about it differently.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号