Thread: COPY with hints, rebirth
A long time ago, in a galaxy far away, we discussed ways to speed up data loads/COPY. http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php In particular, the idea that we could mark tuples as committed while we are still loading them, to avoid negative behaviour for the first reader. Simple patch to implement this is attached, together with test case. Current behaviour is shown here Run COPY and then... SELECT count(*) FROM table with no indexes 1st SELECT Time: 1518.571 ms <--- slowed dramatically by setting hint bits 2nd SELECT Time: 914.141 ms 3rd SELECT Time: 914.921 ms With this patch I observed the following results 1st SELECT Time: 890.820 ms 2nd SELECT Time: 884.799 ms 3rd SELECT Time: 882.405 ms What exactly does it do? Previously, we optimised COPY when it was loading data into a newly created table or a freshly truncated table. This patch extends that and actually sets the tuple header flag as HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of code. The patch also adds some tests for corner cases that would make that action break MVCC - though those cases are minor and typical data loads will benefit fully from this. In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC() and adding current xid to snapshots. That is an invasive change that I would wish to avoid at any time and explains the long delay in tackling this. The way I've implemented it, is just as a short test during XidInMVCCSnapshot() so that we trap the case when the xid == xmax and so would appear to be running. This is much less invasive and just as performant as Tom's original suggestion. Why do we need this now? Setting checksums on page requires us to write WAL for hints, so the situation of the 1st SELECT after a load would get somewhat worse when page_checksums are enabled, but we already know there is a price. However, this is a situation we can solve, and add value for all cases, not just when checksums enabled. So I'm posting this as a separate patch rather than including that as a tuning feature of the checksums patch. Your input will be generously received, -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Simon Riggs <simon@2ndQuadrant.com> wrote: > This patch extends that and actually sets the tuple header flag as > HEAP_XMIN_COMMITTED during the load. Fantastic! So, without bulk-load conditions, a long-lived tuple in PostgreSQL is written to disk at least five times[1]: (1) The WAL record for the inserted tuple is written. (2) The inserted tuple is written. (3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written in place some time after the inserting transaction'sCOMMIT. (4) The WAL record for the "freeze" in write 5 is written. (5) The xmin is set to frozen and the tuple is rewritten in place some time after every other connection can see it. Prior to your patch, bulk load omitted write 1. With your patch we will also omit write 3. Since you've just been looking at this area, do you have any thoughts about writes 4 and 5 being rendered unnecessary by writing bulk-loaded tuples with a frozen xmin, and having transactions with a snapshot which doesn't include the bulk load's transaction just not seeing the table? (Or am I just dreaming about those?) -Kevin [1] If you are archiving, it could be more.
On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > >> This patch extends that and actually sets the tuple header flag as >> HEAP_XMIN_COMMITTED during the load. > > Fantastic! > > So, without bulk-load conditions, a long-lived tuple in PostgreSQL > is written to disk at least five times[1]: > > (1) The WAL record for the inserted tuple is written. > (2) The inserted tuple is written. > (3) The HEAP_XMIN_COMMITTED bit is set and the tuple is re-written > in place some time after the inserting transaction's COMMIT. > (4) The WAL record for the "freeze" in write 5 is written. > (5) The xmin is set to frozen and the tuple is rewritten in place > some time after every other connection can see it. > > Prior to your patch, bulk load omitted write 1. With your patch we > will also omit write 3. Yes, well explained. > Since you've just been looking at this area, do you have any > thoughts about writes 4 and 5 being rendered unnecessary by writing > bulk-loaded tuples with a frozen xmin, and having transactions with > a snapshot which doesn't include the bulk load's transaction just > not seeing the table? (Or am I just dreaming about those?) Setting straight to frozen breaks MVCC, unless/until we use MVCC for catalog access because we can see the table immediately and then read the contents as if they had always been there. I think we could add that as an option on COPY, since "breaking MVCC" in that way is only a bad thing if it happens accidentally without the user's permission and knowledge - which they may wish to give in many cases, such as reloading a database from a dump. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 24.02.2012 22:55, Simon Riggs wrote: > A long time ago, in a galaxy far away, we discussed ways to speed up > data loads/COPY. > http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php > > In particular, the idea that we could mark tuples as committed while > we are still loading them, to avoid negative behaviour for the first > reader. > > Simple patch to implement this is attached, together with test case. > > ... > > What exactly does it do? Previously, we optimised COPY when it was > loading data into a newly created table or a freshly truncated table. > This patch extends that and actually sets the tuple header flag as > HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of > code. The patch also adds some tests for corner cases that would make > that action break MVCC - though those cases are minor and typical data > loads will benefit fully from this. This doesn't work with subtransactions: postgres=# create table a as select 1 as id; SELECT 1 postgres=# copy a to '/tmp/a'; COPY 1 postgres=# begin; BEGIN postgres=# truncate a; TRUNCATE TABLE postgres=# savepoint sp1; SAVEPOINT postgres=# copy a from '/tmp/a'; COPY 1 postgres=# select * from a; id ---- (0 rows) The query should return the row copied in the same subtransaction. > In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC() > and adding current xid to snapshots. That is an invasive change that I > would wish to avoid at any time and explains the long delay in > tackling this. The way I've implemented it, is just as a short test > during XidInMVCCSnapshot() so that we trap the case when the xid == > xmax and so would appear to be running. This is much less invasive and > just as performant as Tom's original suggestion. TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a lot of subtransactions open... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, Feb 25, 2012 at 6:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, Feb 25, 2012 at 6:24 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> Simon Riggs <simon@2ndQuadrant.com> wrote: >> >>> This patch extends that and actually sets the tuple header flag as >>> HEAP_XMIN_COMMITTED during the load. >> >> Fantastic! > I think we could add that as an option on COPY, since "breaking MVCC" > in that way is only a bad thing if it happens accidentally without the > user's permission and knowledge - which they may wish to give in many > cases, such as reloading a database from a dump. I've coded up COPY FREEZE to do just that. This version doesn't require any changes to transaction machinery. But it is very effective at avoiding 4 out of the 5 writes you mention. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > But it is very effective at avoiding 4 out of the 5 writes you mention. For the "common case," would we not want to have (1) [WAL] and (2) [Writing the pre-frozen tuple]? If we only write the tuple (2), and don't capture WAL, then the COPY wouldn't be replicable, right? -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Wed, Feb 29, 2012 at 6:14 PM, Christopher Browne <cbbrowne@gmail.com> wrote: > On Wed, Feb 29, 2012 at 11:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> But it is very effective at avoiding 4 out of the 5 writes you mention. > > For the "common case," would we not want to have (1) [WAL] and (2) > [Writing the pre-frozen tuple]? > > If we only write the tuple (2), and don't capture WAL, then the COPY > wouldn't be replicable, right? Well, my answer is a question: how would you like it to work? The way I coded it is that it will still write WAL if wal_level is set, so it would be replicable. So it only works when writing to a newly created table but is otherwise separate to whether WAL is skipped. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 24.02.2012 22:55, Simon Riggs wrote: >> >> A long time ago, in a galaxy far away, we discussed ways to speed up >> data loads/COPY. >> http://archives.postgresql.org/pgsql-hackers/2007-01/msg00470.php >> >> In particular, the idea that we could mark tuples as committed while >> we are still loading them, to avoid negative behaviour for the first >> reader. >> >> Simple patch to implement this is attached, together with test case. >> >> ... >> >> >> What exactly does it do? Previously, we optimised COPY when it was >> loading data into a newly created table or a freshly truncated table. >> This patch extends that and actually sets the tuple header flag as >> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of >> code. The patch also adds some tests for corner cases that would make >> that action break MVCC - though those cases are minor and typical data >> loads will benefit fully from this. > > > This doesn't work with subtransactions: ... > The query should return the row copied in the same subtransaction. Thanks for pointing that out. New patch with corrected logic and test case attached. >> In the link above, Tom suggested reworking HeapTupleSatisfiesMVCC() >> and adding current xid to snapshots. That is an invasive change that I >> would wish to avoid at any time and explains the long delay in >> tackling this. The way I've implemented it, is just as a short test >> during XidInMVCCSnapshot() so that we trap the case when the xid == >> xmax and so would appear to be running. This is much less invasive and >> just as performant as Tom's original suggestion. > > > TransactionIdIsCurrentTransactionId() can be fairly expensive if you have a > lot of subtransactions open... I've put in something to avoid that cost for the common case - just a boolean. This seems like the best plan rather than the explicit FREEZE option. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 01.03.2012 18:40, Simon Riggs wrote: > On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 24.02.2012 22:55, Simon Riggs wrote: >>> >>> What exactly does it do? Previously, we optimised COPY when it was >>> loading data into a newly created table or a freshly truncated table. >>> This patch extends that and actually sets the tuple header flag as >>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of >>> code. The patch also adds some tests for corner cases that would make >>> that action break MVCC - though those cases are minor and typical data >>> loads will benefit fully from this. >> >> This doesn't work with subtransactions: > ... >> The query should return the row copied in the same subtransaction. > > Thanks for pointing that out. > > New patch with corrected logic and test case attached. It's still broken: -- create test table and file create table a as select 1 as id; copy a to '/tmp/a'; -- start test postgres=# begin; BEGIN postgres=# truncate a; TRUNCATE TABLE postgres=# savepoint sp1; SAVEPOINT postgres=# copy a from '/tmp/a'; COPY 1 postgres=# select * from a; id ---- 1 (1 row) postgres=# rollback to savepoint sp1; ROLLBACK postgres=# select * from a; id ---- 1 (1 row) That last select should not have seen the tuple. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 01.03.2012 18:40, Simon Riggs wrote: >> >> On Sun, Feb 26, 2012 at 7:16 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> >>> On 24.02.2012 22:55, Simon Riggs wrote: >>>> >>>> >>>> What exactly does it do? Previously, we optimised COPY when it was >>>> loading data into a newly created table or a freshly truncated table. >>>> This patch extends that and actually sets the tuple header flag as >>>> HEAP_XMIN_COMMITTED during the load. Doing so is simple 2 lines of >>>> code. The patch also adds some tests for corner cases that would make >>>> that action break MVCC - though those cases are minor and typical data >>>> loads will benefit fully from this. >>> >>> >>> This doesn't work with subtransactions: >> >> ... >>> >>> The query should return the row copied in the same subtransaction. >> >> >> Thanks for pointing that out. >> >> New patch with corrected logic and test case attached. > > > It's still broken: Agreed. Thanks for your detailed input. Performance test results show that playing with XidInMVCCSnapshot() causes a small but growing performance regression that I find unacceptable, so while I might fix the above, I doubt if I can improve this as well. So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote: > On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > > It's still broken: [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] > So this approach isn't the one... > > The COPY FREEZE patch provides a way for the user to say explicitly > that they don't really care about these MVCC corner cases and as a > result allows us to avoid touching XidInMVCCSnapshot() at all. So > there is still a patch on the table. You can salvage the optimization by tightening its prerequisite: use it when the current subtransaction or a child thereof created or truncated the table. A parent subtransaction having done so is acceptable for the WAL avoidance optimization but not for this. A COPY FREEZE ignoring that stronger restriction would be an interesting feature in its own right, but it brings up other problems. For example, suppose you write a tuple, then fail while writing its index entries. The tuple is already frozen and visible, but it lacks a full set of live index entries. The subtransaction aborts, but the top transaction commits and makes the situation permanent. Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all: http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com Thanks, nm
Noah Misch <noah@leadboat.com> wrote: > Incidentally, I contend that we should write frozen tuples to > new/truncated tables unconditionally. +1 > The current behavior of making old snapshots see the table as > empty violates atomicity at least as badly as letting those > snapshots see the future-snapshot contents. Right, there was no point where the table existed as empty at the end of a transaction, so it is quite broken as things stand now. > But Marti has a sound proposal that would interact with your > efforts here to avoid violating atomicity at all Well, getting it right is certainly better than moving from a slow non-conforming behavior to a fast non-conforming behavior. ;-) -Kevin
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote: >> On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> > It's still broken: > > [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] > >> So this approach isn't the one... >> >> The COPY FREEZE patch provides a way for the user to say explicitly >> that they don't really care about these MVCC corner cases and as a >> result allows us to avoid touching XidInMVCCSnapshot() at all. So >> there is still a patch on the table. > > You can salvage the optimization by tightening its prerequisite: use it when > the current subtransaction or a child thereof created or truncated the table. > A parent subtransaction having done so is acceptable for the WAL avoidance > optimization but not for this. That's already what it does. The problem is what happens after the COPY. If we said "you can't see the rows you just loaded" it would be problem over, but in my opinion that needs an explicit request from the user to show they accept that. > A COPY FREEZE ignoring that stronger restriction would be an interesting > feature in its own right, but it brings up other problems. For example, > suppose you write a tuple, then fail while writing its index entries. The > tuple is already frozen and visible, but it lacks a full set of live index > entries. The subtransaction aborts, but the top transaction commits and makes > the situation permanent. The optimisation only works in the subtransaction that created the relfilenode so that isn't an issue. Thanks for your input. > Incidentally, I contend that we should write frozen tuples to new/truncated > tables unconditionally. The current behavior of making old snapshots see the > table as empty violates atomicity at least as badly as letting those snapshots > see the future-snapshot contents. But Marti has a sound proposal that would > interact with your efforts here to avoid violating atomicity at all: > http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com Personally, that seems a pretty reasonable thing. I like Marti's idea. At present, making his idea work could easily mean checksums sink, so not sure whether to attempt to make that work in detail. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> wrote: > I like Marti's idea. At present, making his idea work could easily > mean checksums sink, so not sure whether to attempt to make that > work in detail. For my part, improving bulk load performance and TRUNCATE transactional semantics would trump checksums. If we have to defer one or the other to a later release, I would prefer that we bump checksums. While I understand the desire for checksums, and understand others have had situations where they would have helped, so far I have yet to run into a situation where I feel they would have helped me. The hint bit and freeze issues require ongoing attention and have real performance consequences on a regular basis. And closing the window for odd transactional semantics on TRUNCATE versus DELETE of all rows in a table would be one less thing to worry about, in my world. -Kevin
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch <noah@leadboat.com> wrote: > On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote: >> On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> > It's still broken: > > [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] > >> So this approach isn't the one... >> >> The COPY FREEZE patch provides a way for the user to say explicitly >> that they don't really care about these MVCC corner cases and as a >> result allows us to avoid touching XidInMVCCSnapshot() at all. So >> there is still a patch on the table. > > You can salvage the optimization by tightening its prerequisite: use it when > the current subtransaction or a child thereof created or truncated the table. > A parent subtransaction having done so is acceptable for the WAL avoidance > optimization but not for this. I misread your earlier comment. Yes, that will make this work correctly. > Incidentally, I contend that we should write frozen tuples to new/truncated > tables unconditionally. The current behavior of making old snapshots see the > table as empty violates atomicity at least as badly as letting those snapshots > see the future-snapshot contents. But Marti has a sound proposal that would > interact with your efforts here to avoid violating atomicity at all: > http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com Thank you for bringing this to my attention. This will make this optimisation work correctly without adding anything to the main code path of XidInMVCCSnapshot() and without the annoying FREEZE syntax. So this puts the patch squarely back on the table. I'll do another version of this later today designed to work with the StrictMVCC patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Mar 3, 2012 at 1:14 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch <noah@leadboat.com> wrote: >> On Fri, Mar 02, 2012 at 08:46:45AM +0000, Simon Riggs wrote: >>> On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >>> > It's still broken: >> >> [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] >> >>> So this approach isn't the one... >>> >>> The COPY FREEZE patch provides a way for the user to say explicitly >>> that they don't really care about these MVCC corner cases and as a >>> result allows us to avoid touching XidInMVCCSnapshot() at all. So >>> there is still a patch on the table. >> >> You can salvage the optimization by tightening its prerequisite: use it when >> the current subtransaction or a child thereof created or truncated the table. >> A parent subtransaction having done so is acceptable for the WAL avoidance >> optimization but not for this. > > I misread your earlier comment. Yes, that will make this work correctly. > >> Incidentally, I contend that we should write frozen tuples to new/truncated >> tables unconditionally. The current behavior of making old snapshots see the >> table as empty violates atomicity at least as badly as letting those snapshots >> see the future-snapshot contents. But Marti has a sound proposal that would >> interact with your efforts here to avoid violating atomicity at all: >> http://archives.postgresql.org/message-id/CABRT9RBRMdsoz8KxgeHfb4LG-ev9u67-6DLqvoiibpkKhTLQfw@mail.gmail.com > > Thank you for bringing this to my attention. > > This will make this optimisation work correctly without adding > anything to the main code path of XidInMVCCSnapshot() and without the > annoying FREEZE syntax. So this puts the patch squarely back on the > table. > > I'll do another version of this later today designed to work with the > StrictMVCC patch. OK, so latest version of patch handling the subtransaction issues correctly. Thanks to Noah and Heikki for identifying them and hitting me with the clue stick. So this version of patch has negligible effect on mainline performance though leaves newly loaded data visible in ways that would break MVCC. So this patch relies on the safe_truncate.v2.patch posted on separate thread. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, Mar 2, 2012 at 11:15 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > >> I like Marti's idea. At present, making his idea work could easily >> mean checksums sink, so not sure whether to attempt to make that >> work in detail. > > For my part, improving bulk load performance and TRUNCATE > transactional semantics would trump checksums. If we have to defer > one or the other to a later release, I would prefer that we bump > checksums. > > While I understand the desire for checksums, and understand others > have had situations where they would have helped, so far I have yet > to run into a situation where I feel they would have helped me. The > hint bit and freeze issues require ongoing attention and have real > performance consequences on a regular basis. And closing the window > for odd transactional semantics on TRUNCATE versus DELETE of all > rows in a table would be one less thing to worry about, in my world. Since you supported this, I've invested the time to make it work. It doesn't look like it needs to be either-or. Please review the safe_truncate.v2.patch and copy_autofrozen.v359.patch, copied here to assist testing and inspection. At present those patches handle only the TRUNCATE/COPY optimisation but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc.. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
> Simon Riggs wrote: > Kevin Grittner wrote: >> Simon Riggs wrote: >> >>> I like Marti's idea. At present, making his idea work could >>> easily mean checksums sink >> For my part, improving bulk load performance and TRUNCATE >> transactional semantics would trump checksums > It doesn't look like it needs to be either-or. Great news! > Please review the safe_truncate.v2.patch and > copy_autofrozen.v359.patch, copied here to assist testing and > inspection. I'll look at it today. > At present those patches handle only the TRUNCATE/COPY optimisation > but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc.. CREATE/COPY would be important so that pg_dump | psql -1 would benefit. -Kevin