I need to read a large开发者_JAVA技巧 text file of around 5-6 GB line by line using Java.
How can I do this quickly?
A common pattern is to use
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
}
You can read the data faster if you assume there is no character encoding. e.g. ASCII-7 but it won't make much difference. It is highly likely that what you do with the data will take much longer.
EDIT: A less common pattern to use which avoids the scope of line
leaking.
try(BufferedReader br = new BufferedReader(new FileReader(file))) {
for(String line; (line = br.readLine()) != null; ) {
// process the line.
}
// line is not visible here.
}
UPDATE: In Java 8 you can do
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
}
NOTE: You have to place the Stream in a try-with-resource block to ensure the #close method is called on it, otherwise the underlying file handle is never closed until GC does it much later.
Look at this blog:
- Java Read File Line by Line - Java Tutorial
The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.
// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
fstream.close();
Once Java 8 is out (March 2014) you'll be able to use streams:
try (Stream<String> lines = Files.lines(Paths.get(filename), Charset.defaultCharset())) {
lines.forEachOrdered(line -> process(line));
}
Printing all the lines in the file:
try (Stream<String> lines = Files.lines(file, Charset.defaultCharset())) {
lines.forEachOrdered(System.out::println);
}
Here is a sample with full error handling and supporting charset specification for pre-Java 7. With Java 7 you can use try-with-resources syntax, which makes the code cleaner.
If you just want the default charset you can skip the InputStream and use FileReader.
InputStream ins = null; // raw byte-stream
Reader r = null; // cooked reader
BufferedReader br = null; // buffered for readLine()
try {
String s;
if (true) {
String data = "#foobar\t1234\n#xyz\t5678\none\ttwo\n";
ins = new ByteArrayInputStream(data.getBytes());
} else {
ins = new FileInputStream("textfile.txt");
}
r = new InputStreamReader(ins, "UTF-8"); // leave charset out for default
br = new BufferedReader(r);
while ((s = br.readLine()) != null) {
System.out.println(s);
}
}
catch (Exception e)
{
System.err.println(e.getMessage()); // handle exception
}
finally {
if (br != null) { try { br.close(); } catch(Throwable t) { /* ensure close happens */ } }
if (r != null) { try { r.close(); } catch(Throwable t) { /* ensure close happens */ } }
if (ins != null) { try { ins.close(); } catch(Throwable t) { /* ensure close happens */ } }
}
Here is the Groovy version, with full error handling:
File f = new File("textfile.txt");
f.withReader("UTF-8") { br ->
br.eachLine { line ->
println line;
}
}
I documented and tested 10 different ways to read a file in Java and then ran them against each other by making them read in test files from 1KB to 1GB. Here are the fastest 3 file reading methods for reading a 1GB test file.
Note that when running the performance tests I didn't output anything to the console since that would really slow down the test. I just wanted to test the raw reading speed.
1) java.nio.file.Files.readAllBytes()
Tested in Java 7, 8, 9. This was overall the fastest method. Reading a 1GB file was consistently just under 1 second.
import java.io..File;
import java.io.IOException;
import java.nio.file.Files;
public class ReadFile_Files_ReadAllBytes {
public static void main(String [] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-1GB.txt";
File file = new File(fileName);
byte [] fileBytes = Files.readAllBytes(file.toPath());
char singleChar;
for(byte b : fileBytes) {
singleChar = (char) b;
System.out.print(singleChar);
}
}
}
2) java.nio.file.Files.lines()
This was tested successfully in Java 8 and 9 but it won't work in Java 7 because of the lack of support for lambda expressions. It took about 3.5 seconds to read in a 1GB file which put it in second place as far as reading larger files.
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;
public class ReadFile_Files_Lines {
public static void main(String[] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-1GB.txt";
File file = new File(fileName);
try (Stream linesStream = Files.lines(file.toPath())) {
linesStream.forEach(line -> {
System.out.println(line);
});
}
}
}
3) BufferedReader
Tested to work in Java 7, 8, 9. This took about 4.5 seconds to read in a 1GB test file.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class ReadFile_BufferedReader_ReadLine {
public static void main(String [] args) throws IOException {
String fileName = "c:\\temp\\sample-1GB.txt";
FileReader fileReader = new FileReader(fileName);
try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
String line;
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
}
}
You can find the complete rankings for all 10 file reading methods here.
In Java 8, you could do:
try (Stream<String> lines = Files.lines (file, StandardCharsets.UTF_8))
{
for (String line : (Iterable<String>) lines::iterator)
{
;
}
}
Some notes: The stream returned by Files.lines
(unlike most streams) needs to be closed. For the reasons mentioned here I avoid using forEach()
. The strange code (Iterable<String>) lines::iterator
casts a Stream to an Iterable.
What you can do is scan the entire text using Scanner and go through the text line by line. Of course you should import the following:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public static void readText throws FileNotFoundException {
Scanner scan = new Scanner(new File("samplefilename.txt"));
while(scan.hasNextLine()){
String line = scan.nextLine();
//Here you can manipulate the string the way you want
}
}
Scanner basically scans all the text. The while loop is used to traverse through the entire text.
The .hasNextLine()
function is a boolean that returns true if there are still more lines in the text. The .nextLine()
function gives you an entire line as a String which you can then use the way you want. Try System.out.println(line)
to print the text.
Side Note: .txt is the file type text.
FileReader won't let you specify the encoding, use InputStreamReader
instead if you need to specify it:
try {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), "Cp1252"));
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
If you imported this file from Windows, it might have ANSI encoding (Cp1252), so you have to specify the encoding.
In Java 7:
String folderPath = "C:/folderOfMyFile";
Path path = Paths.get(folderPath, "myFileName.csv"); //or any text file eg.: txt, bat, etc
Charset charset = Charset.forName("UTF-8");
try (BufferedReader reader = Files.newBufferedReader(path , charset)) {
while ((line = reader.readLine()) != null ) {
//separate all csv fields into string array
String[] lineVariables = line.split(",");
}
} catch (IOException e) {
System.err.println(e);
}
In Java 8, there is also an alternative to using Files.lines()
. If your input source isn't a file but something more abstract like a Reader
or an InputStream
, you can stream the lines via the BufferedReader
s lines()
method.
For example:
try (BufferedReader reader = new BufferedReader(...)) {
reader.lines().forEach(line -> processLine(line));
}
will call processLine()
for each input line read by the BufferedReader
.
For reading a file with Java 8
package com.java.java8;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;
/**
* The Class ReadLargeFile.
*
* @author Ankit Sood Apr 20, 2017
*/
public class ReadLargeFile {
/**
* The main method.
*
* @param args
* the arguments
*/
public static void main(String[] args) {
try {
Stream<String> stream = Files.lines(Paths.get("C:\\Users\\System\\Desktop\\demoData.txt"));
stream.forEach(System.out::println);
}
catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
You can use Scanner class
Scanner sc=new Scanner(file);
sc.nextLine();
Java 9:
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
}
You need to use the readLine()
method in class BufferedReader
.
Create a new object from that class and operate this method on him and save it to a string.
BufferReader Javadoc
The clear way to achieve this,
For example:
If you have dataFile.txt
on your current directory
import java.io.*;
import java.util.Scanner;
import java.io.FileNotFoundException;
public class readByLine
{
public readByLine() throws FileNotFoundException
{
Scanner linReader = new Scanner(new File("dataFile.txt"));
while (linReader.hasNext())
{
String line = linReader.nextLine();
System.out.println(line);
}
linReader.close();
}
public static void main(String args[]) throws FileNotFoundException
{
new readByLine();
}
}
The output like as below,
BufferedReader br;
FileInputStream fin;
try {
fin = new FileInputStream(fileName);
br = new BufferedReader(new InputStreamReader(fin));
/*Path pathToFile = Paths.get(fileName);
br = Files.newBufferedReader(pathToFile,StandardCharsets.US_ASCII);*/
String line = br.readLine();
while (line != null) {
String[] attributes = line.split(",");
Movie movie = createMovie(attributes);
movies.add(movie);
line = br.readLine();
}
fin.close();
br.close();
} catch (FileNotFoundException e) {
System.out.println("Your Message");
} catch (IOException e) {
System.out.println("Your Message");
}
It works for me. Hope It will help you too.
You can use streams to do it more precisely:
Files.lines(Paths.get("input.txt")).forEach(s -> stringBuffer.append(s);
I usually do the reading routine straightforward:
void readResource(InputStream source) throws IOException {
BufferedReader stream = null;
try {
stream = new BufferedReader(new InputStreamReader(source));
while (true) {
String line = stream.readLine();
if(line == null) {
break;
}
//process line
System.out.println(line)
}
} finally {
closeQuiet(stream);
}
}
static void closeQuiet(Closeable closeable) {
if (closeable != null) {
try {
closeable.close();
} catch (IOException ignore) {
}
}
}
By using the org.apache.commons.io package, it gave more performance, especially in legacy code which uses Java 6 and below.
Java 7 has a better API with fewer exceptions handling and more useful methods:
LineIterator lineIterator = null;
try {
lineIterator = FileUtils.lineIterator(new File("/home/username/m.log"), "windows-1256"); // The second parameter is optionnal
while (lineIterator.hasNext()) {
String currentLine = lineIterator.next();
// Some operation
}
}
finally {
LineIterator.closeQuietly(lineIterator);
}
Maven
<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
You can use this code:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class ReadTextFile {
public static void main(String[] args) throws IOException {
try {
File f = new File("src/com/data.txt");
BufferedReader b = new BufferedReader(new FileReader(f));
String readLine = "";
System.out.println("Reading file using Buffered Reader");
while ((readLine = b.readLine()) != null) {
System.out.println(readLine);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
You can also use Apache Commons IO:
File file = new File("/home/user/file.txt");
try {
List<String> lines = FileUtils.readLines(file);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
For Android developers ending up here (who use Kotlin):
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file
.bufferedReader()
.lineSequence()
.forEach(::println)
Or:
val myFileUrl = object{}.javaClass.getResource("/vegetables.txt")
val file = File(myFileUrl.toURI())
file.useLines { lines ->
lines.forEach(::println)
}
Notes:
The vegetables.txt file should be in your classpath (for example, in src/main/resources directory)
The above solutions all treat the file encodings as
UTF-8
by default. You can specify your desired encoding as the argument for the functions.The above solutions do not need any further action like closing the files or readers. They are automatically taken care of by the Kotlin standard library.
精彩评论