Thread: Read only transactions - Commit or Rollback
Hello, We have a database containing PostGIS MAP data, it is accessed mainly via JDBC. There are multiple simultaneous read-only connections taken from the JBoss connection pooling, and there usually are no active writers. We use connection.setReadOnly(true). Now my question is what is best performance-wise, if it does make any difference at all: Having autocommit on or off? (I presume "off") Using commit or rollback? Committing / rolling back occasionally (e. G. when returning the connection to the pool) or not at all (until the pool closes the connection)? Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
afaik, this should be completely neglectable. starting a transaction implies write access. if there is none, You do not need to think about transactions, because thereare none. postgres needs to schedule the writing transactions with the reading ones, anyway. But I am not that performance profession anyway ;-) regards, Marcus -----Ursprüngliche Nachricht----- Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]Im Auftrag von Markus Schaber Gesendet: Dienstag, 20. Dezember 2005 11:41 An: PostgreSQL Performance List Betreff: [PERFORM] Read only transactions - Commit or Rollback Hello, We have a database containing PostGIS MAP data, it is accessed mainly via JDBC. There are multiple simultaneous read-only connections taken from the JBoss connection pooling, and there usually are no active writers. We use connection.setReadOnly(true). Now my question is what is best performance-wise, if it does make any difference at all: Having autocommit on or off? (I presume "off") Using commit or rollback? Committing / rolling back occasionally (e. G. when returning the connection to the pool) or not at all (until the pool closes the connection)? Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Nörder-Tuitje wrote: |> We have a database containing PostGIS MAP data, it is accessed |> mainly via JDBC. There are multiple simultaneous read-only |> connections taken from the JBoss connection pooling, and there |> usually are no active writers. We use connection.setReadOnly(true). |> |> Now my question is what is best performance-wise, if it does make |> any difference at all: |> |> Having autocommit on or off? (I presume "off") |> |> Using commit or rollback? |> |> Committing / rolling back occasionally (e. G. when returning the |> connection to the pool) or not at all (until the pool closes the |> connection)? |> | afaik, this should be completely neglectable. | | starting a transaction implies write access. if there is none, You do | not need to think about transactions, because there are none. | | postgres needs to schedule the writing transactions with the reading | ones, anyway. | | But I am not that performance profession anyway ;-) Hello, Marcus, Nörder, list. What about isolation? For several dependent calculations, MVCC doesn't happen a bit with autocommit turned on, right? Cheers, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDp+2afu4IwuB3+XoRA6j3AJ0Ri0/NrJtHg4xBNcFsVFFW0XvCoQCfereo aX6ThZIlPL0RhETJK9IcqtU= =xalw -----END PGP SIGNATURE-----
Mmmm, good question. MVCC blocks reading processes when data is modified. using autocommit implies that each modification statement is an atomicoperation. on a massive readonly table, where no data is altered, MVCC shouldn't have any effect (but this is only an assumption) basingon http://en.wikipedia.org/wiki/Mvcc using rowlevel locks with write access should make most of the mostly available to reading-only sessions, but this is anassumption only, too. maybe the community knows a little more ;-) regards, marcus -----Ursprüngliche Nachricht----- Von: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]Im Auftrag von Grega Bremec Gesendet: Dienstag, 20. Dezember 2005 12:41 An: PostgreSQL Performance List Betreff: Re: [PERFORM] Read only transactions - Commit or Rollback -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Nörder-Tuitje wrote: |> We have a database containing PostGIS MAP data, it is accessed |> mainly via JDBC. There are multiple simultaneous read-only |> connections taken from the JBoss connection pooling, and there |> usually are no active writers. We use connection.setReadOnly(true). |> |> Now my question is what is best performance-wise, if it does make |> any difference at all: |> |> Having autocommit on or off? (I presume "off") |> |> Using commit or rollback? |> |> Committing / rolling back occasionally (e. G. when returning the |> connection to the pool) or not at all (until the pool closes the |> connection)? |> | afaik, this should be completely neglectable. | | starting a transaction implies write access. if there is none, You do | not need to think about transactions, because there are none. | | postgres needs to schedule the writing transactions with the reading | ones, anyway. | | But I am not that performance profession anyway ;-) Hello, Marcus, Nörder, list. What about isolation? For several dependent calculations, MVCC doesn't happen a bit with autocommit turned on, right? Cheers, - -- ~ Grega Bremec ~ gregab at p0f dot net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDp+2afu4IwuB3+XoRA6j3AJ0Ri0/NrJtHg4xBNcFsVFFW0XvCoQCfereo aX6ThZIlPL0RhETJK9IcqtU= =xalw -----END PGP SIGNATURE----- ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
Hi, Marcus, Nörder-Tuitje wrote: > afaik, this should be completely neglectable. > > starting a transaction implies write access. if there is none, You do > not need to think about transactions, because there are none. Hmm, I always thought that the transaction will be opened at the first statement, because there _could_ be a parallel writing transaction started later. > postgres needs to schedule the writing transactions with the reading > ones, anyway. As I said, there usually are no writing transactions on the same database. Btw, there's another setting that might make a difference: Having ACID-Level SERIALIZABLE or READ COMMITED? Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Markus Schaber schrieb: > Hello, > > We have a database containing PostGIS MAP data, it is accessed mainly > via JDBC. There are multiple simultaneous read-only connections taken > from the JBoss connection pooling, and there usually are no active > writers. We use connection.setReadOnly(true). > > Now my question is what is best performance-wise, if it does make any > difference at all: > > Having autocommit on or off? (I presume "off") If you are using large ResultSets, it is interesting to know that Statement.setFetchSize() does not do anything as long as you have autocommit on. So you might want to always disable autocommit and set a reasonable fetch size with large results, or otherwise have serious memory problems in Java/JDBC.
Markus Schaber writes: > As I said, there usually are no writing transactions on the same database. > > Btw, there's another setting that might make a difference: > > Having ACID-Level SERIALIZABLE or READ COMMITED? Well, if nonrepeatable or phantom reads would pose a problem because of those occasional writes, you wouldn't be considering autocommit for performance reasons either, would you? regards, Andreas --
Hello, Andreas, Andreas Seltenreich wrote: >>Btw, there's another setting that might make a difference: >>Having ACID-Level SERIALIZABLE or READ COMMITED? > > Well, if nonrepeatable or phantom reads would pose a problem because > of those occasional writes, you wouldn't be considering autocommit for > performance reasons either, would you? Yes, the question is purely performance-wise. We don't care about any read/write conflicts in this special case. Some time ago, I had some tests with large bulk insertions, and it turned out that SERIALIZABLE seemed to be 30% faster, which surprised us. That's why I ask this questions, and mainly because we currently cannot perform a large bunch of benchmarking. Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
On 12/20/05, Nörder-Tuitje, Marcus <noerder-tuitje@technology.de> wrote: > MVCC blocks reading processes when data is modified. That is incorrect. The main difference between 2PL and MVCC is that readers are never blocked under MVCC. greetings, Nicolas -- Nicolas Barbier http://www.gnu.org/philosophy/no-word-attachments.html
Markus Schaber <schabi@logix-tt.com> writes: > Some time ago, I had some tests with large bulk insertions, and it > turned out that SERIALIZABLE seemed to be 30% faster, which surprised us. That surprises me too --- can you provide details on the test case so other people can reproduce it? AFAIR the only performance difference between SERIALIZABLE and READ COMMITTED is the frequency with which transaction status snapshots are taken; your report suggests you were spending 30% of the time in GetSnapshotData, which is a lot higher than I've ever seen in a profile. As to the original question, a transaction that hasn't modified the database does not bother to write either a commit or abort record to pg_xlog. I think you'd be very hard pressed to measure any speed difference between saying COMMIT and saying ROLLBACK after a read-only transaction. It'd be worth your while to let transactions run longer to minimize their startup/shutdown overhead, but there's a point of diminishing returns --- you don't want client code leaving transactions open for hours, because of the negative side-effects of holding locks that long (eg, VACUUM can't reclaim dead rows). regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > That surprises me too --- can you provide details on the test case so > other people can reproduce it? AFAIR the only performance difference > between SERIALIZABLE and READ COMMITTED is the frequency with which > transaction status snapshots are taken; your report suggests you were > spending 30% of the time in GetSnapshotData, which is a lot higher than > I've ever seen in a profile. Perhaps it reduced the amount of i/o concurrent vacuums were doing? -- greg
Hi, Tom, Tom Lane wrote: >>Some time ago, I had some tests with large bulk insertions, and it >>turned out that SERIALIZABLE seemed to be 30% faster, which surprised us. > > That surprises me too --- can you provide details on the test case so > other people can reproduce it? AFAIR the only performance difference > between SERIALIZABLE and READ COMMITTED is the frequency with which > transaction status snapshots are taken; your report suggests you were > spending 30% of the time in GetSnapshotData, which is a lot higher than > I've ever seen in a profile. It was in my previous Job two years ago, so I don't have access to the exact code, and my memory is foggy. It was PostGIS 0.8 and PostgreSQL 7.4. AFAIR, it was inserting into a table with about 6 columns and some indices, some columns having database-provided values (now() and a SERIAL column), where the other columns (a PostGIS Point, a long, a foreign key into another table) were set via the aplication. We tried different insertion methods (INSERT, prepared statements, a pgjdbc patch to allow COPY support), different bunch sizes and different number of parallel connections to get the highest overall insert speed. However, the project never went productive the way it was designed initially. As you write about transaction snapshots: It may be that the PostgreSQL config was not optimized well enough, and the hard disk was rather slow. > As to the original question, a transaction that hasn't modified the > database does not bother to write either a commit or abort record to > pg_xlog. I think you'd be very hard pressed to measure any speed > difference between saying COMMIT and saying ROLLBACK after a read-only > transaction. It'd be worth your while to let transactions run longer > to minimize their startup/shutdown overhead, but there's a point of > diminishing returns --- you don't want client code leaving transactions > open for hours, because of the negative side-effects of holding locks > that long (eg, VACUUM can't reclaim dead rows). Okay, so I'll stick with my current behaviour (Autocommit off and ROLLBACK after each bunch of work). Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> That surprises me too --- can you provide details on the test case so >> other people can reproduce it? AFAIR the only performance difference >> between SERIALIZABLE and READ COMMITTED is the frequency with which >> transaction status snapshots are taken; your report suggests you were >> spending 30% of the time in GetSnapshotData, which is a lot higher than >> I've ever seen in a profile. > Perhaps it reduced the amount of i/o concurrent vacuums were doing? Can't see how it would do that. regards, tom lane