开发者

How to find if the file is a CSV file?

开发者 https://www.devze.com 2023-01-03 18:14 出处:网络
I have a scenario wherein the user uploads a file to the system. The only file that the system understands in a CSV, but the user can upload any type of file eg: jpeg, doc, html. I need to throw an ex

I have a scenario wherein the user uploads a file to the system. The only file that the system understands in a CSV, but the user can upload any type of file eg: jpeg, doc, html. I need to throw an exception if the user uploads anything other than CSV file.

Can anybody let me 开发者_如何学JAVAknow how can I find if the uploaded file is a CSV file or not?


CSV files vary a lot, and they all could be called, legitimately, CSV files.

I guess your approach is not the best one, the correct approach would be to tell if the uploaded file is a text file the application can parse instead of it it's a CSV or not.

You would report errors whenever you can't parse the file, be it a JPG, MP3 or CSV in a format you cannot parse.

To do that, I would try to find a library to parse various CSV file formats, else you have a long road ahead writing code to parse many possible types of CSV files (or restricting the application's flexibility by supporting few CSV formats.)

One such library for Java is opencsv


If you're using some library CSV parser, all you would have to do is catch any errors it throws.

If the CSV parser you're using is remotely robust, it will throw some useful errors in the event that it doesn't understand the file format.


I can think of several methods.

One way is to try to decode the file using UTF-8. (This is built into Java and is probably built into .NET too.) If the file decodes properly, then you at least know that it's a text file of some kind.

Once you know it's a text file, parse out the individual fields from each line and check that you get the number of fields that you expect. If the number of fields per line is inconsistent then you might just have a file that contains text but is not organized into lines and fields.

Otherwise you have a CSV. Then you can validate the fields.


If it's a web application, you might want to check the content-type HTTP header the browser sends when uploading/posting a file through a form. If there's a bind for the language you're using, you might also try using libmagic, is pretty good at recognizing file types. For example, the UNIX tool file uses it.

http://sourceforge.net/projects/libmagic/


I don't know if you can tell for 100% certain in any way, but I'd suggest that the first validations should be:

  1. Is the file extension .csv
  2. Count the number of commas in the file per line, there should normally be the same amount of commas on each line of the file for it to be a valid CSV file. (As Jkramer said, this only works if the files can't contain quoted commas).


try this one :

String type = Files.probeContentType(Paths.get(filepath));


I solved it like this: read the file with UTF-16 encoding, if no comma is found in the file, it means UTF-16 encoding didnt work. Which means that this csv file is of Excel format (NOT plain text).

      if(fileA.endsWith(".csv") && fileB.endsWith(".csv")) {
            second_list=readCSVFile(fileA);
            new_list=readCSVFile(fileB);
            if(!String.join("", second_list).contains(",") || !String.join("", new_list).contains(",")) {
                  //read these files with UTF-8 encoding
                    System.out.println("[WARN] csv files will be read like text files. (UTF-16 encoding couldnt find any comma in the file i.e., UTF-16 encoding didn't work)");
                    second_list=readFile(fileA);
                    new_list=readFile(fileB);
                } else {
                    //                  keep the csv as UTF-16 encoded
                }
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号