How to read all rows from huge table?_问答_开发者

I have a problem with processing all rows from database (PostgreSQL). I get an error: org.postgresql.util.PSQLException: Ran out of memory retrieving query results. I think that I need to read all rows in small pieces, but it doesn't work - it reads only 100 rows (code below). How to do that?

    int i = 0;      
    Statement s = connection.createStatement();
    s.setMaxRows(100); // bac开发者_StackOverflow社区ause of: org.postgresql.util.PSQLException: Ran out of memory retrieving query results.
    ResultSet rs = s.executeQuery("select * from " + tabName);      
    for (;;) {
        while (rs.next()) {
            i++;
            // do something...
        }
        if ((s.getMoreResults() == false) && (s.getUpdateCount() == -1)) {
            break;
        }           
    }

The short version is, call stmt.setFetchSize(50); and conn.setAutoCommit(false); to avoid reading the entire ResultSet into memory.

Here's what the docs say:

Getting results based on a cursor

By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

A small number of rows are cached on the client side of the connection and when exhausted the next block of rows is retrieved by repositioning the cursor.

Note:

Cursor based ResultSets cannot be used in all situations. There a number of restrictions which will make the driver silently fall back to fetching the whole ResultSet at once.

The connection to the server must be using the V3 protocol. This is the default for (and is only supported by) server versions 7.4 and later.-

The Connection must not be in autocommit mode. The backend closes cursors at the end of transactions, so in autocommit mode the backend will have closed the cursor before anything can be fetched from it.-

The Statement must be created with a ResultSet type of ResultSet.TYPE_FORWARD_ONLY. This is the default, so no code will need to be rewritten to take advantage of this, but it also means that you cannot scroll backwards or otherwise jump around in the ResultSet.-

The query given must be a single statement, not multiple statements strung together with semicolons.

Example 5.2. Setting fetch size to turn cursors on and off.

Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour).

// make sure autocommit is off
conn.setAutoCommit(false);
Statement st = conn.createStatement();

// Turn use of the cursor on.
st.setFetchSize(50);
ResultSet rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("a row was returned.");
}
rs.close();

// Turn the cursor off.
st.setFetchSize(0);
rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("many rows were returned.");
}
rs.close();

// Close the statement.
st.close();

Use a CURSOR in PostgreSQL or let the JDBC-driver handle this for you.

LIMIT and OFFSET will get slow when handling large datasets.

So it turns out that the crux of the problem is that by default, Postgres starts in "autoCommit" mode, and also it needs/uses cursors to be able to "page" through data (ex: read the first 10K results, then the next, then the next), however cursors can only exist within a transaction. So the default is to read all rows, always, into RAM, and then allow your program to start processing "the first result row, then the second" after it has all arrived, for two reasons, it's not in a transaction (so cursors don't work), and also a fetch size hasn't been set.

So how the psql command line tool achieves batched response (its FETCH_COUNT setting) for queries, is to "wrap" its select queries within a short-term transaction (if a transaction isn't yet open), so that cursors can work. You can do something like that also with JDBC:

  static void readLargeQueryInChunksJdbcWay(Connection conn, String originalQuery, int fetchCount, ConsumerWithException<ResultSet, SQLException> consumer) throws SQLException {
    boolean originalAutoCommit = conn.getAutoCommit();
    if (originalAutoCommit) {
      conn.setAutoCommit(false); // start temp transaction
    }
    try (Statement statement = conn.createStatement()) {
      statement.setFetchSize(fetchCount);
      ResultSet rs = statement.executeQuery(originalQuery);
      while (rs.next()) {
        consumer.accept(rs); // or just do you work here
      }
    } finally {
      if (originalAutoCommit) {
        conn.setAutoCommit(true); // reset it, also ends (commits) temp transaction
      }
    }
  }
  @FunctionalInterface
  public interface ConsumerWithException<T, E extends Exception> {
    void accept(T t) throws E;
  }

This gives the benefit of requiring less RAM, and, in my results, seemed to run overall faster, even if you don't need to save the RAM. Weird. It also gives the benefit that your processing of the first row "starts faster" (since it process it a page at a time).

And here's how to do it the "raw postgres cursor" way, along with full demo code, though in my experiments it seemed the JDBC way, above, was slightly faster for whatever reason.

Another option would be to have autoCommit mode off, everywhere, though you still have to always manually specify a fetchSize for each new Statement (or you can set a default fetch size in the URL string).

I think your question is similar to this thread: JDBC Pagination which contains solutions for your need.

In particular, for PostgreSQL, you can use the LIMIT and OFFSET keywords in your request: http://www.petefreitag.com/item/451.cfm

PS: In Java code, I suggest you to use PreparedStatement instead of simple Statements: http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html

I did it like below. Not the best way i think, but it works :)

    Connection c = DriverManager.getConnection("jdbc:postgresql://....");
    PreparedStatement s = c.prepareStatement("select * from " + tabName + " where id > ? order by id");
    s.setMaxRows(100);
    int lastId = 0;
    for (;;) {
        s.setInt(1, lastId);
        ResultSet rs = s.executeQuery();

        int lastIdBefore = lastId;
        while (rs.next()) {
            lastId = Integer.parseInt(rs.getObject(1).toString());
            // ...
        }

        if (lastIdBefore == lastId) {
            break;
        }
    }

At lest in my case the problem was on the client that tries to fetch the results.

Wanted to get a .csv with ALL the results.

I found the solution by using

psql -U postgres -d dbname  -c "COPY (SELECT * FROM T) TO STDOUT WITH DELIMITER ','"

(where dbname the name of the db...) and redirecting to a file.

How to read all rows from huge table?

精彩评论

关注公众号

热门标签

图文推荐

How to read all rows from huge table?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：