Thread: Autocommit, isolation level, and vacuum behavior
I'm trying to understand the effect of autocommit on vacuum behavior (postgres 8.3, if it matters). Let's suppose you have two tables, BIG and TINY in a database accessed through JDBC. BIG has lots of rows. There are inserts, updates, and every so often there is a scan of the entire table. The scan is slow, e.g. one row every 30 seconds. TINY has one row, which contains summary information from BIG. It is updated every time that BIG is inserted or updated. BIG is vacuumed weekly, and TINY is vacuumed every 1000 updates. What I'm observing is that as my test program runs, transactions (insert/update BIG; update TINY) gets slower and slower, and the file storing the TINY table gets very big. I'm guessing that the long-running scan of BIG forces versions of the one row in TINY to accumulate, (just in case the TINY table is viewed, the connection has to have the correct view). As these accumulate, each update to TINY takes more and more time, and everything slows down. I wrote a little JDBC test program to test this theory. Long scans (with the 30 second sleep) and with autocommit = false produces the problem described. Shorter scans (e.g. no sleep between rows of the BIG scan) produce better results. Also, if the scan is done on a connection with autocommit = true, everything works fine -- no slowdown, and no bloat of the TINY file. Am I on the right track -- does autocommit = false for the BIG scan force versions of TINY to accumulate? I played around with a JDBC test program, and so far cannot see how the autocommit mode causes variations in what is seen by the scan. The behavior I've observed is consistent with the SERIALIZABLE isolation level, but 1) I thought the default was READ COMMITTED, and 2) why does the accumulation of row versions have anything to do with autocommit mode (as opposed to isolation level) on a connection used for the scan? Jack Orenstein
On Wed, Sep 10, 2008 at 09:45:04AM -0400, Jack Orenstein wrote: > Am I on the right track -- does autocommit = false for the BIG scan force > versions of TINY to accumulate? I played around with a JDBC test program, > and so far cannot see how the autocommit mode causes variations in what is > seen by the scan. The behavior I've observed is consistent with the > SERIALIZABLE isolation level, but 1) I thought the default was READ > COMMITTED, and 2) why does the accumulation of row versions have anything > to do with autocommit mode (as opposed to isolation level) on a connection > used for the scan? Vacuum can only clean up stuff older than the oldest open transaction. So if you have a transaction which is open for hours then stuff made since then it can't be vacuumed. The solution is: don't do that. What I don't understand from your description is why your scan is slow and how the autocommit relates to this. Postgresql only cares about when you start and commit transactions, and I can't get from your description when exactly that happens. Rule of thumb: don't hold transaction open unnessarily long. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Martijn van Oosterhout wrote: > Vacuum can only clean up stuff older than the oldest open transaction. > So if you have a transaction which is open for hours then stuff made > since then it can't be vacuumed. The solution is: don't do that. Actually it's worse than that: older than the oldest transaction that was active at the time when the current oldest transaction created its snapshot. As for autocommit, my guess is that the driver is doing "COMMIT; BEGIN". This should not cause much of a problem in 8.3 compared to previous releases, because the transaction gets its Xid at the time the first command write command is run (previously it was grabbed when the transaction started). Also, I thought recent versions of the JDBC driver did not issue the BEGIN right after COMMIT, so I'm surprised that there's any visible difference at all. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Martijn van Oosterhout wrote: > On Wed, Sep 10, 2008 at 09:45:04AM -0400, Jack Orenstein wrote: >> Am I on the right track -- does autocommit = false for the BIG scan force >> versions of TINY to accumulate? I played around with a JDBC test program, >> and so far cannot see how the autocommit mode causes variations in what is >> seen by the scan. The behavior I've observed is consistent with the >> SERIALIZABLE isolation level, but 1) I thought the default was READ >> COMMITTED, and 2) why does the accumulation of row versions have anything >> to do with autocommit mode (as opposed to isolation level) on a connection >> used for the scan? > > Vacuum can only clean up stuff older than the oldest open transaction. > So if you have a transaction which is open for hours then stuff made > since then it can't be vacuumed. The solution is: don't do that. > > What I don't understand from your description is why your scan is slow Application requirement. We need to do something for each row retrieved from BIG and the something is expensive. We do the scan slowly (30 second sleep inside the loop) to amortize the cost. > and how the autocommit relates to this. Postgresql only cares about > when you start and commit transactions, and I can't get from your > description when exactly that happens. If the slow scan is done with autocommit = true, then the transactions updating BIG and TINY run with no degradation in performance (as long as TINY is vacuumed frequently). If the slow scan is done with autocommit = false, then the transactions updating BIG and TINY get slower and slower and the TINY table's file bloats. I guess the question is this: What are the transaction boundaries for a scan done with autocommit = false? (The connection has autcommit false, and the connection is used for nothing but the scan.) Jack
On 2008-09-10 16:46, Jack Orenstein wrote: > Application requirement. We need to do something for each row retrieved from BIG > and the something is expensive. We do the scan slowly (30 second sleep inside > the loop) to amortize the cost. Then do the processing in separate transactions like this (in pseudocode): $last_id = -1; do { begin transaction; $result = select * from bigtable where id>$last_id and processed=false order by id limit 1; if ( empty($result) ) { rollback; break; } do_something_expensive_with($result[0]); update bigtable set processed=true where id=$result[0][id]; commit; sleep 30; } while (true); Always avoid long running transactions. This is recommended for any transactional database. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
On 2008-09-11 17:21, Jack Orenstein wrote: >> Then do the processing in separate transactions like this (in pseudocode): > The id > last_id trick doesn't work for me -- I don't have an index that would > support it efficiently. > > Turning on autocommit seems to work, I'm just not clear on the reason why. Not knowing would bite you some time. Please provide some (pseudo-)code on what you do. Do you mark rows as processed? Do you save output of processing to a database? IMHO without it it is hard to solve a mystery but I'll try below. > I played around with a JDBC test program, and so far cannot see how > the autocommit mode causes variations in what is seen by the > scan. The behavior I've observed is consistent with the SERIALIZABLE > isolation level, but 1) I thought the default was READ COMMITTED When you do: result = query("select something from sometable") then all rows of a result will be cached by a client program. To see effects of serialization modes you have to issue another query in the same transaction or use a cursor. Check memory usage of your client program - you'll see that it needs a lot of memory for query results. > 2) why does the > accumulation of row versions have anything to do with autocommit mode (as > opposed to isolation level) on a connection used for the scan? I think after caching a result of a query you start processing your rows. When you finish processing your first row you update your database to save results. In autocommit mode a transaction in which you do this update is automatically commited and ended. When autocommit is turned off a transaction is not ended so from now on vacuum is not working until you finish processing all rows. Regards Tometzky PS. Please keep a CC to the list. -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
Tomasz Ostrowski wrote: > On 2008-09-11 17:21, Jack Orenstein wrote: > >>> Then do the processing in separate transactions like this (in pseudocode): >> The id > last_id trick doesn't work for me -- I don't have an index that would >> support it efficiently. >> >> Turning on autocommit seems to work, I'm just not clear on the reason why. > > Not knowing would bite you some time. > > Please provide some (pseudo-)code on what you do. Do you mark rows as > processed? Do you save output of processing to a database? IMHO without > it it is hard to solve a mystery but I'll try below. No, it's really as simple as what I said in earlier email. The scan just walks through BIG very slowly. On another connection, we're inserting/updating the same table, and in each transaction also updating TINY. > >> I played around with a JDBC test program, and so far cannot see how >> the autocommit mode causes variations in what is seen by the >> scan. The behavior I've observed is consistent with the SERIALIZABLE >> isolation level, but 1) I thought the default was READ COMMITTED > > When you do: > result = query("select something from sometable") > then all rows of a result will be cached by a client program. I am very sure this is not happening. Maybe some rows are being cached (specifying fetch size), but certainly not all of them. It used to, with older drivers, (7.4?) but I've been using 8.1 drivers (at least) for a long time. Maybe some result set options you're using cause such memory usage? Jack
On 2008-09-11 18:03, Jack Orenstein wrote: >> When you do: >> result = query("select something from sometable") >> then all rows of a result will be cached by a client program. > > I am very sure this is not happening. Maybe some rows are being > cached (specifying fetch size), but certainly not all of them. It > used to, with older drivers, (7.4?) but I've been using 8.1 drivers > (at least) for a long time. Maybe some result set options you're > using cause such memory usage? Wanna bet? http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor | There a number of restrictions which will make the driver silently | fall back to fetching the whole ResultSet at once. (...) The | Connection must not be in autocommit mode. The backend closes cursors | at the end of transactions, so in autocommit mode the backend will | have closed the cursor before anything can be fetched from it. So, when you turn on autocommit then it is caching it all. Fetch size is ignored. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
On 2008-09-11 17:21, Jack Orenstein wrote: > The id > last_id trick doesn't work for me -- I don't have an index that would > support it efficiently. You do not have a primary key? If you do then you have an index as it is automatically created. Watch this: test=> create temporary table test ( id int primary key, data text ); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE test=> insert into test (select i, 'this is a row number '||i::text from (select generate_series(1,1000000) as i) as q); INSERT 0 1000000 test=> explain analyze select * from test where id>500000 order by id limit 1; QUERY PLAN Limit (cost=0.00..0.07 rows=1 width=36) (actual time=0.150..0.151 rows=1 loops=1) -> Index Scan using test_pkey on test (cost=0.00..23769.63 rows=322248 width=36) (actual time=0.148..0.148 rows=1 loops=1) Index Cond: (id > 500000) Total runtime: 0.191 ms (4 rows) Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
Tomasz Ostrowski wrote: > On 2008-09-11 18:03, Jack Orenstein wrote: > >>> When you do: >>> result = query("select something from sometable") >>> then all rows of a result will be cached by a client program. >> I am very sure this is not happening. Maybe some rows are being >> cached (specifying fetch size), but certainly not all of them. It >> used to, with older drivers, (7.4?) but I've been using 8.1 drivers >> (at least) for a long time. Maybe some result set options you're >> using cause such memory usage? > > Wanna bet? > > http://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor > | There a number of restrictions which will make the driver silently > | fall back to fetching the whole ResultSet at once. (...) The > | Connection must not be in autocommit mode. The backend closes cursors > | at the end of transactions, so in autocommit mode the backend will > | have closed the cursor before anything can be fetched from it. > > So, when you turn on autocommit then it is caching it all. Fetch size is > ignored. Well that explains what I've been seeing (autocommit on scan producing behavior that looks like SERIALIZABLE). Not the behavior I would prefer, but I understand it now. Jack
Tomasz Ostrowski wrote: > On 2008-09-11 17:21, Jack Orenstein wrote: > >> The id > last_id trick doesn't work for me -- I don't have an index that would >> support it efficiently. > > You do not have a primary key? If you do then you have an index as it is > automatically created. Sorry, I misspoke. I have an index, but preferred doing a scan without the index in this case. Jack
On 2008-09-12 15:52, Jack Orenstein wrote: > Sorry, I misspoke. I have an index, but preferred doing a scan without > the index in this case. Why? The only reason I can think of is that you'd like to avoid disk seeking. But you get at most 1 row in 30 seconds, so disk latency (only several milliseconds) can be ignored. Regards Tometzky -- ...although Eating Honey was a very good thing to do, there was a moment just before you began to eat it which was better than when you were... Winnie the Pooh
On Sep 13, 2008, at 4:39 AM, Tomasz Ostrowski wrote: > On 2008-09-12 15:52, Jack Orenstein wrote: > >> Sorry, I misspoke. I have an index, but preferred doing a scan >> without the index in this case. > > Why? > > The only reason I can think of is that you'd like to avoid disk > seeking. But you get at most 1 row in 30 seconds, so disk latency > (only several milliseconds) can be ignored. > Because other parts of our application consume results from the same query at normal speed. Jack