Thread: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hello Pavan, Thank you for the patch. It seems to me that while performing COPY FREEZE, if we've copied tuples in a previously emptied page, we can set the PageSetAllVisible(page) in heap_muli_insert only. Something like, bool init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber && PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1); if (init && (options & HEAP_INSERT_FROZEN)) PageSetAllVisible(page); Later, you can skip the pages for CheckAndSetAllVisibleBulkInsertState() where PD_ALL_VISIBLE flag is already set. Do you think it's correct? On Thu, Feb 21, 2019 at 11:35 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > Hi, > > Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly while loading data via COPY FREEZE and hadalso posted a draft patch. > > I now have what I think is a more complete patch. I took a slightly different approach and instead of setting PD_ALL_VISIBLEbit initially and then not clearing it during insertion, we now recheck the page for all-frozen, all-visibletuples just before switching to a new page. This allows us to then also mark set the visibility map bit, likewe do in vacuumlazy.c > > Some special treatment is required to handle the last page before bulk insert it shutdown. We could have chosen not todo anything special for the last page and let it remain unfrozen, but I thought it makes sense to take that extra effortso that we can completely freeze the table and set all VM bits at the end of COPY FREEZE. > > Let me know what you think. > > Thanks, > Pavan > > [1] https://www.postgresql.org/message-id/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com > > -- > Pavan Deolasee http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Thank you for the patch. It seems to me that while performing COPY
FREEZE, if we've copied tuples in a previously emptied page
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 26, 2019 at 6:46 PM Simon Riggs <simon@2ndquadrant.com> wrote: > > On Thu, 21 Feb 2019 at 15:38, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > >> >> Thank you for the patch. It seems to me that while performing COPY >> FREEZE, if we've copied tuples in a previously emptied page > > > There won't be any previously emptied pages because of the pre-conditions required for FREEZE. > Right, I missed that part. Thanks for pointing that out. But, this optimization is still possible for copying frozen tuples in a newly created page, right? If current backend allocates a new page, copies a bunch of frozen tuples in that page, it can set the PD_ALL_VISIBLE in the same operation. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Hi,Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly while loading data via COPY FREEZE and had also posted a draft patch.I now have what I think is a more complete patch. I took a slightly different approach and instead of setting PD_ALL_VISIBLE bit initially and then not clearing it during insertion, we now recheck the page for all-frozen, all-visible tuples just before switching to a new page. This allows us to then also mark set the visibility map bit, like we do in vacuumlazy.cSome special treatment is required to handle the last page before bulk insert it shutdown. We could have chosen not to do anything special for the last page and let it remain unfrozen, but I thought it makes sense to take that extra effort so that we can completely freeze the table and set all VM bits at the end of COPY FREEZE.Let me know what you think.
After doing a truncation and '\copy ... with (freeze)' of a table with long data, I find that the associated toast table has a handful of unfrozen blocks. I don't know if that is an actual problem, but it does seem a bit odd, and thus suspicious.
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 21, 2019 at 3:05 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > Hi, > > Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly while loading data via COPY FREEZE and hadalso posted a draft patch. > > I now have what I think is a more complete patch. I took a slightly different approach and instead of setting PD_ALL_VISIBLEbit initially and then not clearing it during insertion, we now recheck the page for all-frozen, all-visibletuples just before switching to a new page. This allows us to then also mark set the visibility map bit, likewe do in vacuumlazy.c I might be missing something but why do we need to recheck whether each pages is all-frozen after insertion? I wonder if we can set all-frozen without checking all tuples again in this case. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 12, 2019 at 4:54 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > On Mon, Mar 11, 2019 at 1:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> >> I might be missing something but why do we need to recheck whether >> each pages is all-frozen after insertion? I wonder if we can set >> all-frozen without checking all tuples again in this case. > > > It's possible that the user may have inserted unfrozen rows (via regular INSERT or COPY without FREEZE option) before insertingfrozen rows. I think that since COPY FREEZE can be executed only when the table is created or truncated within the transaction other users cannot insert any rows during COPY FREEZE. > So we can't set the all-visible/all-frozen flag unconditionally. I also find it safer to do an explicit check to ensurewe never accidentally mark a page as all-frozen. Since the check is performed immediately after the page becomes fulland only once per page, there shouldn't be any additional IO cost and the check should be quite fast. I'd suggest to measure performance overhead. I can imagine one use case of COPY FREEZE is the loading a very large table. Since in addition to set visibility map bits this patch could scan a very large table I'm concerned that how much performance is degraded by this patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
I think that since COPY FREEZE can be executed only when the table is
created or truncated within the transaction other users cannot insert
any rows during COPY FREEZE.
I'd suggest to measure performance overhead. I can imagine one use
case of COPY FREEZE is the loading a very large table. Since in
addition to set visibility map bits this patch could scan a very large
table I'm concerned that how much performance is degraded by this
patch.
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 14, 2019 at 5:17 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > > On Wed, Mar 13, 2019 at 11:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> >> I think that since COPY FREEZE can be executed only when the table is >> created or truncated within the transaction other users cannot insert >> any rows during COPY FREEZE. >> > > Right. But the truncating transaction can insert unfrozen rows into the table before inserting more rows via COPY FREEZE. > > postgres=# CREATE EXTENSION pageinspect ; > CREATE EXTENSION > postgres=# BEGIN; > BEGIN > postgres=# TRUNCATE testtab ; > TRUNCATE TABLE > postgres=# INSERT INTO testtab VALUES (100, 200); > INSERT 0 1 > postgres=# COPY testtab FROM STDIN WITH (FREEZE); > Enter data to be copied followed by a newline. > End with a backslash and a period on a line by itself, or an EOF signal. > >> 1 2 > >> 2 3 > >> \. > COPY 2 > postgres=# COMMIT; > > postgres=# SELECT lp, to_hex(t_infomask) FROM heap_page_items(get_raw_page('testtab', 0)); > lp | to_hex > ----+-------- > 1 | 800 > 2 | b00 > 3 | b00 > (3 rows) > > The first row in inserted by regular insert and it's not frozen. The next 2 are frozen. We can't mark such as page all-visible,all-frozen. Understood. Thank you for explanation! > >> >> >> I'd suggest to measure performance overhead. I can imagine one use >> case of COPY FREEZE is the loading a very large table. Since in >> addition to set visibility map bits this patch could scan a very large >> table I'm concerned that how much performance is degraded by this >> patch. > > > Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is causedby having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we maydo some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan thatwe are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost,but not anything in terms of IO cost. Agreed. I had been misunderstanding this patch. The page scan during COPY FREEZE is necessary and it's very cheaper than doing in the first vacuum. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Pavan, On 3/14/19 2:20 PM, Masahiko Sawada wrote: > On Thu, Mar 14, 2019 at 5:17 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote: >> >> Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is causedby having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we maydo some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan thatwe are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost,but not anything in terms of IO cost. > > Agreed. I had been misunderstanding this patch. The page scan during > COPY FREEZE is necessary and it's very cheaper than doing in the first > vacuum. I have removed Ibrar as a reviewer since there has been no review from them in three weeks, and too encourage others to have a look. Regards, -- -David david@pgmasters.net
>
>
> Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is caused by having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we may do some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan that we are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost, but not anything in terms of IO cost.
Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.
PostgreSQL Development, 24x7 Support, Training & Services
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: not tested This patch is particularly helpful in processing OpenStreetMap Data in PostGIS. OpenStreetMap is imported as a stream of 300-900 (depending on settings) gigabytes, that are needing a VACUUM after a COPYFREEZE. With this patch, the first and usually the last transforming query is performed much faster after initial load. I have read this patch and have no outstanding comments on it. Pavan Deolasee demonstrates the expected speed improvement. The new status of this patch is: Ready for Committer
On Thu, Mar 21, 2019 at 11:27 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > > > > On Thu, Mar 14, 2019 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> > >> > >> > Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is causedby having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we maydo some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan thatwe are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost,but not anything in terms of IO cost. >> >> Agreed. I had been misunderstanding this patch. The page scan during >> COPY FREEZE is necessary and it's very cheaper than doing in the first >> vacuum. > > > Thanks for agreeing to the need of this bug fix. I ran some simple tests anyways and here are the results. > > The test consists of a simple table with three columns, two integers and one char(100). I then ran COPY (FREEZE), loading7M rows, followed by a VACUUM. The total size of the raw data is about 800MB and the table size in Postgres is justunder 1GB. The results for 3 runs in milliseconds are: > > Master: > COPY FREEZE: 40243.725 40309.675 40783.836 > VACUUM: 2685.871 2517.445 2508.452 > > Patched: > COPY FREEZE: 40942.410 40495.303 40638.075 > VACUUM: 25.067 35.793 25.390 > > So there is a slight increase in the time to run COPY FREEZE, but a significant reduction in time to VACUUM the table.The benefits will only go up if the table is vacuumed much later when most of the pages are already written to thedisk and removed from shared buffers and/or kernel cache. > > I hope this satisfies your doubts regarding performance implications of the patch. Thank you for the performance testing, that's a great improvement! I've looked at the patch and have comments and questions. + /* + * While we are holding the lock on the page, check if all tuples + * in the page are marked frozen at insertion. We can safely mark + * such page all-visible and set visibility map bits too. + */ + if (CheckPageIsAllFrozen(relation, buffer)) + PageSetAllVisible(page); + + MarkBufferDirty(buffer); Maybe we don't need to mark the buffer dirty if the page is not set all-visible. ----- + if (PageIsAllVisible(page)) + { + uint8 vm_status = visibilitymap_get_status(relation, + targetBlock, &myvmbuffer); + uint8 flags = 0; + + /* Set the VM all-frozen bit to flag, if needed */ + if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0) + flags |= VISIBILITYMAP_ALL_VISIBLE; + if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0) + flags |= VISIBILITYMAP_ALL_FROZEN; + + Assert(BufferIsValid(myvmbuffer)); + if (flags != 0) + visibilitymap_set(relation, targetBlock, buffer, InvalidXLogRecPtr, + myvmbuffer, InvalidTransactionId, flags); Since CheckPageIsAllFrozen() is used only when inserting frozen tuples CheckAndSetPageAllVisible() seems to be implemented for the same situation. If we have CheckAndSetPageAllVisible() for only this situation we would rather need to check that the VM status of the page should be 0 and then set two flags to the page? The 'flags' will always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in copy freeze case. I'm confused that this function has both code that assumes some special situations and code that can be used in generic situations. ----- Perhaps we can add some tests for this feature to pg_visibility module. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
I've looked at the patch and have comments and questions.
+ /*
+ * While we are holding the lock on the page, check if all tuples
+ * in the page are marked frozen at insertion. We can safely mark
+ * such page all-visible and set visibility map bits too.
+ */
+ if (CheckPageIsAllFrozen(relation, buffer))
+ PageSetAllVisible(page);
+
+ MarkBufferDirty(buffer);
Maybe we don't need to mark the buffer dirty if the page is not set all-visible.
If we have CheckAndSetPageAllVisible() for only this
situation we would rather need to check that the VM status of the page
should be 0 and then set two flags to the page? The 'flags' will
always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
copy freeze case. I'm confused that this function has both code that
assumes some special situations and code that can be used in generic
situations.
Perhaps we can add some tests for this feature to pg_visibility module.
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
The patch looks good to me. There is no comment from me.
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hi, I've been looking at this patch for a while, and it seems pretty much RFC, so barring objections I'll take care of that once I do a bit more testing and review. Unless someone else wants to take care of that. FWIW I wonder if we should add the code for partitioned tables to CopyFrom, considering that's unsupported and so can't be tested etc. It's not a huge amount of code, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-04-03 10:19:17 +0530, Pavan Deolasee wrote: > diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c > index c1fd7b78ce..09df70a3ac 100644 > --- a/src/backend/commands/copy.c > +++ b/src/backend/commands/copy.c > @@ -2833,6 +2833,15 @@ CopyFrom(CopyState cstate) > !has_instead_insert_row_trig && > resultRelInfo->ri_FdwRoutine == NULL; > > + /* > + * Note: As of PG12, COPY FREEZE is not supported on > + * partitioned table. Nevertheless have this check in place so > + * that we do the right thing if it ever gets supported. > + */ > + if (ti_options & TABLE_INSERT_FROZEN) > + CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc, > + bistate); > + > /* > * We'd better make the bulk insert mechanism gets a new > * buffer when the partition being inserted into changes. > @@ -3062,6 +3071,15 @@ CopyFrom(CopyState cstate) > firstBufferedLineNo); > } > > + /* > + * If we are inserting frozen tuples, check if the last page used can also > + * be marked as all-visible and all-frozen. This ensures that a table can > + * be fully frozen when the data is loaded. > + */ > + if (ti_options & TABLE_INSERT_FROZEN) > + CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc, > + bistate); > + > /* Done, clean up */ > error_context_stack = errcallback.previous; I'm totally not OK with this from a layering POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific (without being named such), whereas all the heap specific bits are getting moved below tableam. Greetings, Andres Freund
On 2019-Apr-04, Andres Freund wrote: > I'm totally not OK with this from a layering > POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific > (without being named such), whereas all the heap specific bits are > getting moved below tableam. This is a fair complaint, but on the other hand the COPY changes for table AM are still being developed, so there's no ground on which to rebase this patch. Do you have a timeline on getting the COPY one committed? I think it's fair to ask the RMT for an exceptional extension of a couple of working days for this patch. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-04-04 16:15:54 -0300, Alvaro Herrera wrote: > On 2019-Apr-04, Andres Freund wrote: > > > I'm totally not OK with this from a layering > > POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific > > (without being named such), whereas all the heap specific bits are > > getting moved below tableam. > > This is a fair complaint, but on the other hand the COPY changes for > table AM are still being developed, so there's no ground on which to > rebase this patch. Do you have a timeline on getting the COPY one > committed? ~2h. Just pondering the naming of some functions etc. Don't think there's a large interdependency though. But even if tableam weren't committed, I'd still argue that it's structurally done wrong in the patch right now. FWIW, I actually think this whole approach isn't quite right - this shouldn't be done as a secondary action after we'd already inserted, with a separate lock-unlock cycle etc. Also, how is this code even close to correct? CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and there's no WAL logging? Without even a comment arguing why that's OK (I don't think it is)? Greetings, Andres Freund
Hi, On 2019-04-04 12:23:08 -0700, Andres Freund wrote: > Also, how is this code even close to correct? > CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and > there's no WAL logging? Without even a comment arguing why that's OK (I > don't think it is)? Peter Geoghegan just reminded me over IM that there's actually logging inside log_heap_visible(), called from visibilitymap_set(). Still lacks a critical section though. I still think this is the wrong architecture. Greetings, Andres Freund PS: We're going to have to revamp visibilitymap_set() soon-ish - the fact that it directly calls heap routines inside is bad, means that additional AMs e.g. zheap has to reimplement that routine.
On 2019-Apr-04, Andres Freund wrote: > On 2019-04-04 12:23:08 -0700, Andres Freund wrote: > > Also, how is this code even close to correct? > > CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and > > there's no WAL logging? Without even a comment arguing why that's OK (I > > don't think it is)? > > Peter Geoghegan just reminded me over IM that there's actually logging > inside log_heap_visible(), called from visibilitymap_set(). Still lacks > a critical section though. Hmm, isn't there already a critical section in visibilitymap_set itself? > I still think this is the wrong architecture. Hmm. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote: > On 2019-Apr-04, Andres Freund wrote: > > > On 2019-04-04 12:23:08 -0700, Andres Freund wrote: > > > Also, how is this code even close to correct? > > > CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and > > > there's no WAL logging? Without even a comment arguing why that's OK (I > > > don't think it is)? > > > > Peter Geoghegan just reminded me over IM that there's actually logging > > inside log_heap_visible(), called from visibilitymap_set(). Still lacks > > a critical section though. > > Hmm, isn't there already a critical section in visibilitymap_set itself? There is, but the proposed code sets all visible on the page, and marks the buffer dirty, before calling visibilitymap_set. Greetings, Andres Freund
Hi, On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote: > On 2019-Apr-04, Andres Freund wrote: > > I still think this is the wrong architecture. > > Hmm. I think the right approach would be to do all of this in heap_insert and heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN is specified, remember whether it is either currently empty, or is already marked as all-visible. If previously empty, mark it as all visible at the end. If already all visible, there's no need to change that. If not yet all-visible, no need to do anything, since it can't have been inserted with COPY FREEZE. Do you see any problem doing it that way? Greetings, Andres Freund
I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE.
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote:
>
> Hmm, isn't there already a critical section in visibilitymap_set itself?
There is, but the proposed code sets all visible on the page, and marks
the buffer dirty, before calling visibilitymap_set.
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@anarazel.de> writes: > I think the right approach would be to do all of this in heap_insert and > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > is specified, remember whether it is either currently empty, or is > already marked as all-visible. If previously empty, mark it as all > visible at the end. If already all visible, there's no need to change > that. If not yet all-visible, no need to do anything, since it can't > have been inserted with COPY FREEZE. Do you see any problem doing it > that way? Do we want to add overhead to these hot-spot routines for this purpose? regards, tom lane
Hi, On 2019-04-05 09:20:36 +0530, Pavan Deolasee wrote: > On Fri, Apr 5, 2019 at 9:05 AM Andres Freund <andres@anarazel.de> wrote: > > I think the right approach would be to do all of this in heap_insert and > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > is specified, remember whether it is either currently empty, or is > > already marked as all-visible. If previously empty, mark it as all > > visible at the end. If already all visible, there's no need to change > > that. If not yet all-visible, no need to do anything, since it can't > > have been inserted with COPY FREEZE. > > > We're doing roughly the same. If we are running INSERT_FROZEN, whenever > we're about to switch to a new page, we check if the previous page should > be marked all-frozen and do it that way. The special code in copy.c is > necessary to take care of the last page which we don't get to handle in the > regular code path. Well, it's not the same, because you need extra code from copy.c, extra lock cycles, and extra WAL logging. > Or are you suggesting that we don't even rescan the page for all-frozen > tuples at the end and just simply mark it all-frozen at the start, when the > first tuple is inserted and then don't touch the PD_ALL_VISIBLE/visibility > map bit as we go on inserting more tuples in the page? Correct. If done right that should be cheaper (no extra scans, less WAL logging), without requiring some new dispatch logic from copy.c. > Anyways, if major architectural changes are required then it's probably too > late to consider this for PG12, even though it's more of a bug fix and a > candidate for back-patching too. Let's just see how bad it looks? I don't feel like we'd need to be super strict about it. If it looks simple enough I'd e.g. be ok to merge this soon after freeze, and backpatch around maybe 12.1 or such. Greetings, Andres Freund
Hi, On 2019-04-04 23:57:58 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think the right approach would be to do all of this in heap_insert and > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > is specified, remember whether it is either currently empty, or is > > already marked as all-visible. If previously empty, mark it as all > > visible at the end. If already all visible, there's no need to change > > that. If not yet all-visible, no need to do anything, since it can't > > have been inserted with COPY FREEZE. Do you see any problem doing it > > that way? > > Do we want to add overhead to these hot-spot routines for this purpose? For heap_multi_insert I can't see it being a problem - it's only used from copy.c, and the cost should be "smeared" over many tuples. I'd assume that compared with locking a page, WAL logging, etc, it'd not even meaningfully show up for heap_insert. Especially because we already have codepaths for options & HEAP_INSERT_FROZEN in heap_prepare_insert(), and I'd assume those could be combined. I think we should measure it, but I don't think that one or two additional, very well predictd, branches are going to be measurable in in those routines. The patch, as implemented, has modifications in RelationGetBufferForTuple(), that seems like they'd be more expensive. Greetings, Andres Freund
Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Andres Freund <andres@anarazel.de> writes:
> I think the right approach would be to do all of this in heap_insert and
> heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> is specified, remember whether it is either currently empty, or is
> already marked as all-visible. If previously empty, mark it as all
> visible at the end. If already all visible, there's no need to change
> that. If not yet all-visible, no need to do anything, since it can't
> have been inserted with COPY FREEZE. Do you see any problem doing it
> that way?
Do we want to add overhead to these hot-spot routines for this purpose?
Hi, On 2019-04-05 08:38:34 +0300, Darafei "Komяpa" Praliaskouski wrote: > On Fri, Apr 5, 2019 at 6:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Andres Freund <andres@anarazel.de> writes: > > > I think the right approach would be to do all of this in heap_insert and > > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > > is specified, remember whether it is either currently empty, or is > > > already marked as all-visible. If previously empty, mark it as all > > > visible at the end. If already all visible, there's no need to change > > > that. If not yet all-visible, no need to do anything, since it can't > > > have been inserted with COPY FREEZE. Do you see any problem doing it > > > that way? > > > > Do we want to add overhead to these hot-spot routines for this purpose? > > > > Sizing the overhead: workflows right now don't end with COPY FREEZE - you > need another VACUUM to set maps. > Anything that lets you skip that VACUUM (and faster than that VACUUM > itself) is helping. You specifically asked for it to be skippable with > FREEZE anyway. Tom's point was that the routines I was suggesting to adapt above aren't just used for COPY FREEZE. Greetings, Andres Freund
Hi, On 2019-04-04 21:04:49 -0700, Andres Freund wrote: > On 2019-04-04 23:57:58 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > I think the right approach would be to do all of this in heap_insert and > > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN > > > is specified, remember whether it is either currently empty, or is > > > already marked as all-visible. If previously empty, mark it as all > > > visible at the end. If already all visible, there's no need to change > > > that. If not yet all-visible, no need to do anything, since it can't > > > have been inserted with COPY FREEZE. Do you see any problem doing it > > > that way? > > > > Do we want to add overhead to these hot-spot routines for this purpose? > > For heap_multi_insert I can't see it being a problem - it's only used > from copy.c, and the cost should be "smeared" over many tuples. I'd > assume that compared with locking a page, WAL logging, etc, it'd not > even meaningfully show up for heap_insert. Especially because we already > have codepaths for options & HEAP_INSERT_FROZEN in > heap_prepare_insert(), and I'd assume those could be combined. > > I think we should measure it, but I don't think that one or two > additional, very well predictd, branches are going to be measurable in > in those routines. > > The patch, as implemented, has modifications in > RelationGetBufferForTuple(), that seems like they'd be more expensive. Here's a *prototype* patch for this. It only implements what I described for heap_multi_insert, not for plain inserts. I wanted to see what others think before investing additional time. I don't think it's possible to see the overhead of the changed code in heap_multi_insert(), and probably - with less confidence - that it's also going to be ok for heap_insert(). But we gotta measure that. This avoids an extra WAL record for setting empty pages to all visible, by adding XLH_INSERT_ALL_VISIBLE_SET & XLH_INSERT_ALL_FROZEN_SET, and setting those when appropriate in heap_multi_insert. Unfortunately currently visibilitymap_set() doesn't really properly allow to do this, as it has embedded WAL logging for heap. I think we should remove the WAL logging from visibilitymap_set(), and move it to a separate, heap specific, function. Right now different tableams e.g. would have to reimplement visibilitymap_set(), so that's a second need to separate that functionality. Let me try to come up with a proposal. The patch currently does a vmbuffer_pin() while holding an exclusive lwlock on the page. That's something we normally try to avoid - but I think it's probably OK here, because INSERT_FROZEN can only be used to insert into a new relfilenode (which thus no other session can see). I think it's preferrable to have this logic in specific to the INSERT_FROZEN path, rather than adding nontrivial complications to RelationGetBufferForTuple(). I noticed that, before this patch, we do a if (vmbuffer != InvalidBuffer) ReleaseBuffer(vmbuffer); after every filled page - that doesn't strike me as particularly smart - it's pretty likely that the next heap page to be filled is going to be on the same vm page as the previous iteration. I noticed one small oddity that I think is common to all the approaches presented in this thread so far. After BEGIN; TRUNCATE foo; COPY foo(i) FROM '/tmp/foo' WITH FREEZE; COPY foo(i) FROM '/tmp/foo' WITH FREEZE; COPY foo(i) FROM '/tmp/foo' WITH FREEZE; COMMIT; we currently end up with pages like: ┌───────┬───────────┬──────────┬───────┬───────┬───────┬─────────┬──────────┬─────────┬───────────┐ │ blkno │ lsn │ checksum │ flags │ lower │ upper │ special │ pagesize │ version │ prune_xid │ ├───────┼───────────┼──────────┼───────┼───────┼───────┼─────────┼──────────┼─────────┼───────────┤ │ 0 │ 0/50B5488 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 1 │ 0/50B6360 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 2 │ 0/50B71B8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 3 │ 0/50B8028 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 4 │ 0/50B8660 │ 0 │ 4 │ 408 │ 5120 │ 8192 │ 8192 │ 4 │ 0 │ │ 5 │ 0/50B94B8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 6 │ 0/50BA328 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 7 │ 0/50BB180 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 8 │ 0/50BBFD8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 9 │ 0/50BCF88 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 10 │ 0/50BDDE0 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 11 │ 0/50BEC50 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 12 │ 0/50BFAA8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │ │ 13 │ 0/50C06F8 │ 0 │ 4 │ 792 │ 2048 │ 8192 │ 8192 │ 4 │ 0 │ └───────┴───────────┴──────────┴───────┴───────┴───────┴─────────┴──────────┴─────────┴───────────┘ (14 rows) Note how block 4 has more space available. That's because the visibilitymap_pin() called in the first COPY has to vm_extend(), which in turn does: /* * Send a shared-inval message to force other backends to close any smgr * references they may have for this rel, which we are about to change. * This is a useful optimization because it means that backends don't have * to keep checking for creation or extension of the file, which happens * infrequently. */ CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode); which invalidates ->rd_smgr->smgr_targblock *after* the first COPY, because that's when the pending smgr invalidations are sent out. That's far from great, but it doesn't seem to be this patch's fault. It seems to me we need a separate invalidation that doesn't close the whole smgr relation, but just invalidates the VM specific fields. Greetings, Andres Freund
Attachment
Hi, On 2019-04-07 18:04:27 -0700, Andres Freund wrote: > Here's a *prototype* patch for this. It only implements what I > described for heap_multi_insert, not for plain inserts. I wanted to see > what others think before investing additional time. Pavan, are you planning to work on this for v13 CF1? Or have you lost interest on the topic? - Andres
Hi,
On 2019-04-07 18:04:27 -0700, Andres Freund wrote:
> Here's a *prototype* patch for this. It only implements what I
> described for heap_multi_insert, not for plain inserts. I wanted to see
> what others think before investing additional time.
Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, 28 May 2019 at 4:36 PM, Andres Freund <andres@anarazel.de> wrote:Hi,
On 2019-04-07 18:04:27 -0700, Andres Freund wrote:
> Here's a *prototype* patch for this. It only implements what I
> described for heap_multi_insert, not for plain inserts. I wanted to see
> what others think before investing additional time.
Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?Yes, I plan to work on it.
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 27, 2019 at 11:02 AM Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > >>> On 2019-04-07 18:04:27 -0700, Andres Freund wrote: >>> > Here's a *prototype* patch for this. It only implements what I >>> > described for heap_multi_insert, not for plain inserts. I wanted to see >>> > what others think before investing additional time. >>> >>> Pavan, are you planning to work on this for v13 CF1? Or have you lost >>> interest on the topic? >> >> >> Yes, I plan to work on it. >> > > I am sorry, but I am not able to find time to get back to this because of other high priority items. If it still remainsunaddressed in the next few weeks, I will pick it up again. But for now, I am happy if someone wants to pick and finishthe work. > Fair enough, I have marked the entry [1] in the coming CF as "Returned with Feedback". I hope that is okay with you. [1] - https://commitfest.postgresql.org/23/2009/ -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 27, 2019 at 11:02 AM Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
>
>>> On 2019-04-07 18:04:27 -0700, Andres Freund wrote:
>>> > Here's a *prototype* patch for this. It only implements what I
>>> > described for heap_multi_insert, not for plain inserts. I wanted to see
>>> > what others think before investing additional time.
>>>
>>> Pavan, are you planning to work on this for v13 CF1? Or have you lost
>>> interest on the topic?
>>
>>
>> Yes, I plan to work on it.
>>
>
> I am sorry, but I am not able to find time to get back to this because of other high priority items. If it still remains unaddressed in the next few weeks, I will pick it up again. But for now, I am happy if someone wants to pick and finish the work.
>
Fair enough, I have marked the entry [1] in the coming CF as "Returned
with Feedback". I hope that is okay with you.
[1] - https://commitfest.postgresql.org/23/2009/
As Pavan mentioned in the last email that he is no longer interested in that, so I want to
take the lead and want to finish that. It is a bug and needs to be fixed.
I have rebased and the patch with the latest master and added some test
cases (borrowed from Pavan's patch), and did some performance testing with a table size of
COPY WITH FREEZE took 21406.692ms and VACUUM took 2478.666ms
COPY WITH FREEZE took 23095.985ms and VACUUM took 26.309ms
The performance decrease in copy with the patch is only 7%, but we get
quite adequate performance in VACUUM. In any case, this is a bug fix, so we can ignore
the performance hit.
There are two issues left to address.
1 - Andres: It only implements what I described for heap_multi_insert, not for plain inserts.
I wanted to see what others think before investing additional time.
2 - Andres: I think we should remove the WAL logging from visibilitymap_set(), and
move it to a separate, heap specific, function. Right now different
tableams e.g. would have to reimplement visibilitymap_set(), so that's a
second need to separate that functionality. Let me try to come up with
Attachment
Thanks for picking up this patch. There's a minor typo:
+ * readable outside of this sessoin. Therefore doing IO here isn't
=> session
--
Justin
Attachment
On Fri, Mar 13, 2020 at 6:58 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
Thanks for picking up this patch. There's a minor typo:
+ * readable outside of this sessoin. Therefore doing IO here isn't
=> session
--
JustinThanks, please see the updated and rebased patch. (master 17a28b03645e27d73bf69a95d7569b61e58f06eb)--Ibrar Ahmed
/*
* FIXME: setting recptr here is a dirty dirty hack, to prevent
* visibilitymap_set() from WAL logging.
This patch incurs a compiler warning, which probably is quite simple to fix: heapam.c: In function ‘heap_multi_insert’: heapam.c:2349:4: error: ‘recptr’ may be used uninitialized in this function [-Werror=maybe-uninitialized] visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer, ^ heapam.c:2136:14: note: ‘recptr’ was declared here XLogRecPtr recptr; ^ Please fix and submit a new version, I'm marking the entry Waiting on Author in the meantime. cheers ./daniel
On 01.07.2020 12:38, Daniel Gustafsson wrote: > This patch incurs a compiler warning, which probably is quite simple to fix: > > heapam.c: In function ‘heap_multi_insert’: > heapam.c:2349:4: error: ‘recptr’ may be used uninitialized in this function [-Werror=maybe-uninitialized] > visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer, > ^ > heapam.c:2136:14: note: ‘recptr’ was declared here > XLogRecPtr recptr; > ^ > > Please fix and submit a new version, I'm marking the entry Waiting on Author in > the meantime. > > cheers ./daniel > This patch looks very useful to me, so I want to pick it up. The patch that fixes the compiler warning is in the attachment. Though, I'm not entirely satisfied with this fix. Also, the patch contains some FIXME comments. I'll test it more and send fixes this week. Questions from the first review pass: 1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is always implied by XLH_INSERT_ALL_FROZEN_SET. 2) What does this comment mean? Does XXX refers to the lsn comparison? Since it is definitely necessary to update the VM. + * XXX: This seems entirely unnecessary? + * + * FIXME: Theoretically we should only do this after we've + * modified the heap - but it's safe to do it here I think, + * because this means that the page previously was empty. + */ + if (lsn > PageGetLSN(vmpage)) + visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer, + InvalidTransactionId, vmbits); -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Jul 14, 2020 at 1:51 PM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > Questions from the first review pass: > > 1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is always > implied by XLH_INSERT_ALL_FROZEN_SET. I agree that it looks unnecessary to have two separate bits. > 2) What does this comment mean? Does XXX refers to the lsn comparison? > Since it > is definitely necessary to update the VM. > > + * XXX: This seems entirely unnecessary? > + * > + * FIXME: Theoretically we should only do this after we've > + * modified the heap - but it's safe to do it here I think, > + * because this means that the page previously was empty. > + */ > + if (lsn > PageGetLSN(vmpage)) > + visibilitymap_set(reln, blkno, InvalidBuffer, lsn, > vmbuffer, > + InvalidTransactionId, vmbits); I wondered about that too. The comment which precedes it was, I believe, originally written by me, and copied here from heap_xlog_visible(). But it's not clear very good practice to just copy the comment like this. If the same logic applies, the code should say that we're doing the same thing here as in heap_xlog_visible() for consistency, or some such thing; after all, that's the primary place where that happens. But it looks like the XXX might have been added by a second person who thought that we didn't need this logic at all, and the FIXME by a third person who thought it was in the wrong place, so the whole thing is really confusing at this point. I'm pretty worried about this, too: + * FIXME: setting recptr here is a dirty dirty hack, to prevent + * visibilitymap_set() from WAL logging. That is indeed a dirty hack, and something needs to be done about it. I wonder if it was really all that smart to try to make the HEAP2_MULTI_INSERT do this instead of just issuing separate WAL records to mark it all-visible afterwards, but I don't see any reason why this can't be made to work. It needs substantially more polishing than it's had, though, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 31.07.2020 23:28, Robert Haas wrote: > On Tue, Jul 14, 2020 at 1:51 PM Anastasia Lubennikova > <a.lubennikova@postgrespro.ru> wrote: >> Questions from the first review pass: >> >> 1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is always >> implied by XLH_INSERT_ALL_FROZEN_SET. > I agree that it looks unnecessary to have two separate bits. > >> 2) What does this comment mean? Does XXX refers to the lsn comparison? >> Since it >> is definitely necessary to update the VM. >> >> + * XXX: This seems entirely unnecessary? >> + * >> + * FIXME: Theoretically we should only do this after we've >> + * modified the heap - but it's safe to do it here I think, >> + * because this means that the page previously was empty. >> + */ >> + if (lsn > PageGetLSN(vmpage)) >> + visibilitymap_set(reln, blkno, InvalidBuffer, lsn, >> vmbuffer, >> + InvalidTransactionId, vmbits); > I wondered about that too. The comment which precedes it was, I > believe, originally written by me, and copied here from > heap_xlog_visible(). But it's not clear very good practice to just > copy the comment like this. If the same logic applies, the code should > say that we're doing the same thing here as in heap_xlog_visible() for > consistency, or some such thing; after all, that's the primary place > where that happens. But it looks like the XXX might have been added by > a second person who thought that we didn't need this logic at all, and > the FIXME by a third person who thought it was in the wrong place, so > the whole thing is really confusing at this point. > > I'm pretty worried about this, too: > > + * FIXME: setting recptr here is a dirty dirty hack, to prevent > + * visibilitymap_set() from WAL logging. > > That is indeed a dirty hack, and something needs to be done about it. > > I wonder if it was really all that smart to try to make the > HEAP2_MULTI_INSERT do this instead of just issuing separate WAL > records to mark it all-visible afterwards, but I don't see any reason > why this can't be made to work. It needs substantially more polishing > than it's had, though, I think. > New version of the patch is in the attachment. New design is more conservative and simpler: - pin the visibility map page in advance; - set PageAllVisible; - call visibilitymap_set() with its own XlogRecord, just like in other places. It allows to remove most of the "hacks" and keep code clean. The patch passes tests added in previous versions. I haven't tested performance yet, though. Maybe after tests, I'll bring some optimizations back. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 31.07.2020 23:28, Robert Haas wrote:
> On Tue, Jul 14, 2020 at 1:51 PM Anastasia Lubennikova
> <a.lubennikova@postgrespro.ru> wrote:
>> Questions from the first review pass:
>>
>> 1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is always
>> implied by XLH_INSERT_ALL_FROZEN_SET.
> I agree that it looks unnecessary to have two separate bits.
>
>> 2) What does this comment mean? Does XXX refers to the lsn comparison?
>> Since it
>> is definitely necessary to update the VM.
>>
>> + * XXX: This seems entirely unnecessary?
>> + *
>> + * FIXME: Theoretically we should only do this after we've
>> + * modified the heap - but it's safe to do it here I think,
>> + * because this means that the page previously was empty.
>> + */
>> + if (lsn > PageGetLSN(vmpage))
>> + visibilitymap_set(reln, blkno, InvalidBuffer, lsn,
>> vmbuffer,
>> + InvalidTransactionId, vmbits);
> I wondered about that too. The comment which precedes it was, I
> believe, originally written by me, and copied here from
> heap_xlog_visible(). But it's not clear very good practice to just
> copy the comment like this. If the same logic applies, the code should
> say that we're doing the same thing here as in heap_xlog_visible() for
> consistency, or some such thing; after all, that's the primary place
> where that happens. But it looks like the XXX might have been added by
> a second person who thought that we didn't need this logic at all, and
> the FIXME by a third person who thought it was in the wrong place, so
> the whole thing is really confusing at this point.
>
> I'm pretty worried about this, too:
>
> + * FIXME: setting recptr here is a dirty dirty hack, to prevent
> + * visibilitymap_set() from WAL logging.
>
> That is indeed a dirty hack, and something needs to be done about it.
>
> I wonder if it was really all that smart to try to make the
> HEAP2_MULTI_INSERT do this instead of just issuing separate WAL
> records to mark it all-visible afterwards, but I don't see any reason
> why this can't be made to work. It needs substantially more polishing
> than it's had, though, I think.
>
New version of the patch is in the attachment.
New design is more conservative and simpler:
- pin the visibility map page in advance;
- set PageAllVisible;
- call visibilitymap_set() with its own XlogRecord, just like in other
places.
It allows to remove most of the "hacks" and keep code clean.
The patch passes tests added in previous versions.
I haven't tested performance yet, though. Maybe after tests, I'll bring
some optimizations back.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms
Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9985.109 ms vacuum: 39.953 ms
COPY: 9283.373 ms vacuum: 37.137 ms
Unfortunately the latest patch doesn't apply cleanly on the master branch. Can you please share an updated one.
Unfortunately the latest patch doesn't apply cleanly on the master branch. Can you please share an updated one.
Attachment
On 2020-Aug-14, Ibrar Ahmed wrote: > The table used for the test contains three columns (integer, text, > varchar). > The total number of rows is 10000000 in total. > > Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600) > COPY: 9069.432 ms vacuum; 2567.961ms > COPY: 9004.533 ms vacuum: 2553.075ms > COPY: 8832.422 ms vacuum: 2540.742ms > > Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600) > COPY: 10031.723 ms vacuum: 127.524 ms > COPY: 9985.109 ms vacuum: 39.953 ms > COPY: 9283.373 ms vacuum: 37.137 ms > > Time to take the copy slightly increased but the vacuum time significantly > decrease. "Slightly"? It seems quite a large performance drop to me -- more than 10%. Where is that time being spent? Andres said in [1] that he thought the performance shouldn't be affected noticeably, but this doesn't seem to hold true. As I understand, the idea was that there would be little or no additional WAL records .. only flags in the existing record. So what is happening? [1] https://postgr.es/m/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de Also, when Andres posted this patch first, he said this was only for heap_multi_insert because it was a prototype. But I think we expect that the table_insert path (CIM_SINGLE mode in copy) should also receive that treatment. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 18.08.2020 02:54, Alvaro Herrera wrote: > On 2020-Aug-14, Ibrar Ahmed wrote: > >> The table used for the test contains three columns (integer, text, >> varchar). >> The total number of rows is 10000000 in total. >> >> Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600) >> COPY: 9069.432 ms vacuum; 2567.961ms >> COPY: 9004.533 ms vacuum: 2553.075ms >> COPY: 8832.422 ms vacuum: 2540.742ms >> >> Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600) >> COPY: 10031.723 ms vacuum: 127.524 ms >> COPY: 9985.109 ms vacuum: 39.953 ms >> COPY: 9283.373 ms vacuum: 37.137 ms >> >> Time to take the copy slightly increased but the vacuum time significantly >> decrease. > "Slightly"? It seems quite a large performance drop to me -- more than > 10%. Where is that time being spent? Andres said in [1] that he > thought the performance shouldn't be affected noticeably, but this > doesn't seem to hold true. As I understand, the idea was that there > would be little or no additional WAL records .. only flags in the > existing record. So what is happening? > > [1] https://postgr.es/m/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de I agree that 10% performance drop is not what we expect with this patch. Ibrar, can you share more info about your tests? I'd like to reproduce this slowdown and fix it, if necessary. I've run some tests on my laptop and COPY FREEZE shows the same time for both versions, while VACUUM is much faster on the patched version. I've also checked WAL generation and it shows that the patch works correctly as it doesn't add any records for COPY. Not patched: Time: 54883,356 ms (00:54,883) Time: 65373,333 ms (01:05,373) Time: 64684,592 ms (01:04,685) VACUUM Time: 60861,670 ms (01:00,862) COPY wal_bytes 3765 MB VACUUM wal_bytes 6015 MB table size 5971 MB Patched: Time: 53142,947 ms (00:53,143) Time: 65420,812 ms (01:05,421) Time: 66600,114 ms (01:06,600) VACUUM Time: 63,401 ms COPY wal_bytes 3765 MB VACUUM wal_bytes 30 kB table size 5971 MB The test script is attached. > Also, when Andres posted this patch first, he said this was only for > heap_multi_insert because it was a prototype. But I think we expect > that the table_insert path (CIM_SINGLE mode in copy) should also receive > that treatment. I am afraid that extra checks for COPY FREEZE in heap_insert() will slow down normal insertions. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 18.08.2020 02:54, Alvaro Herrera wrote:
> On 2020-Aug-14, Ibrar Ahmed wrote:
>
>> The table used for the test contains three columns (integer, text,
>> varchar).
>> The total number of rows is 10000000 in total.
>>
>> Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
>> COPY: 9069.432 ms vacuum; 2567.961ms
>> COPY: 9004.533 ms vacuum: 2553.075ms
>> COPY: 8832.422 ms vacuum: 2540.742ms
>>
>> Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
>> COPY: 10031.723 ms vacuum: 127.524 ms
>> COPY: 9985.109 ms vacuum: 39.953 ms
>> COPY: 9283.373 ms vacuum: 37.137 ms
>>
>> Time to take the copy slightly increased but the vacuum time significantly
>> decrease.
> "Slightly"? It seems quite a large performance drop to me -- more than
> 10%. Where is that time being spent? Andres said in [1] that he
> thought the performance shouldn't be affected noticeably, but this
> doesn't seem to hold true. As I understand, the idea was that there
> would be little or no additional WAL records .. only flags in the
> existing record. So what is happening?
>
> [1] https://postgr.es/m/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de
I agree that 10% performance drop is not what we expect with this patch.
Ibrar, can you share more info about your tests? I'd like to reproduce
this slowdown and fix it, if necessary.
I've run some tests on my laptop and COPY FREEZE shows the same time for
both versions, while VACUUM is much faster on the patched version. I've
also checked WAL generation and it shows that the patch works correctly
as it doesn't add any records for COPY.
Not patched:
Time: 54883,356 ms (00:54,883)
Time: 65373,333 ms (01:05,373)
Time: 64684,592 ms (01:04,685)
VACUUM Time: 60861,670 ms (01:00,862)
COPY wal_bytes 3765 MB
VACUUM wal_bytes 6015 MB
table size 5971 MB
Patched:
Time: 53142,947 ms (00:53,143)
Time: 65420,812 ms (01:05,421)
Time: 66600,114 ms (01:06,600)
VACUUM Time: 63,401 ms
COPY wal_bytes 3765 MB
VACUUM wal_bytes 30 kB
table size 5971 MB
The test script is attached.
> Also, when Andres posted this patch first, he said this was only for
> heap_multi_insert because it was a prototype. But I think we expect
> that the table_insert path (CIM_SINGLE mode in copy) should also receive
> that treatment.
I am afraid that extra checks for COPY FREEZE in heap_insert() will
slow down normal insertions.
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
postgres=# BEGIN;
BEGIN
postgres=*# TRUNCATE foo;
TRUNCATE TABLE
postgres=*# COPY foo(id, name, address) FROM '/home/ibrar/bar.csv' DELIMITER ',' FREEZE;
COPY 10000000
postgres=*# COMMIT;
COMMIT
postgres=# VACUUM;
VACUUM
postgres=# SELECT count(*) FROM foo;
count
----------
10000000
(1 row)
On Wed, Aug 19, 2020 at 6:15 PM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:On 18.08.2020 02:54, Alvaro Herrera wrote:
> On 2020-Aug-14, Ibrar Ahmed wrote:
>
>> The table used for the test contains three columns (integer, text,
>> varchar).
>> The total number of rows is 10000000 in total.
>>
>> Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
>> COPY: 9069.432 ms vacuum; 2567.961ms
>> COPY: 9004.533 ms vacuum: 2553.075ms
>> COPY: 8832.422 ms vacuum: 2540.742ms
>>
>> Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
>> COPY: 10031.723 ms vacuum: 127.524 ms
>> COPY: 9985.109 ms vacuum: 39.953 ms
>> COPY: 9283.373 ms vacuum: 37.137 ms
>>
>> Time to take the copy slightly increased but the vacuum time significantly
>> decrease.
> "Slightly"? It seems quite a large performance drop to me -- more than
> 10%. Where is that time being spent? Andres said in [1] that he
> thought the performance shouldn't be affected noticeably, but this
> doesn't seem to hold true. As I understand, the idea was that there
> would be little or no additional WAL records .. only flags in the
> existing record. So what is happening?
>
> [1] https://postgr.es/m/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de
I agree that 10% performance drop is not what we expect with this patch.
Ibrar, can you share more info about your tests? I'd like to reproduce
this slowdown and fix it, if necessary.Here is my test;postgres=# BEGIN;
BEGIN
postgres=*# TRUNCATE foo;
TRUNCATE TABLE
postgres=*# COPY foo(id, name, address) FROM '/home/ibrar/bar.csv' DELIMITER ',' FREEZE;
COPY 10000000
--Ibrar Ahmed
I've repeated the test and didn't notice any slowdown for COPY FREEZE.
Test data is here [1].
The numbers do fluctuate a bit, but there is no dramatic difference between master and patched version. So I assume that the performance drop in your test has something to do with the measurement error. Unless, you have some non-default configuration that could affect it.
patched:
COPY: 12327,090 ms vacuum: 37,555 ms
COPY: 12939,540 ms vacuum: 35,703 ms
COPY: 12245,819 ms vacuum: 36,273 ms
master:
COPY
COPY: 13253,605 ms vacuum: 3592,849 ms
COPY: 12619,428 ms vacuum: 4253,836 ms
COPY: 12512,940 ms vacuum: 4009,847 ms
I also slightly cleaned up comments, so the new version of the patch is attached. As this is just a performance optimization documentation is not needed. It would be great, if other reviewers could run some independent performance tests, as I believe that this patch is ready for committer.
[1] https://drive.google.com/file/d/11r19NX6yyPjvxdDub8Ce-kmApRurp4Nx/view
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companyt
Attachment
On 21.08.2020 19:43, Ibrar Ahmed wrote:On Wed, Aug 19, 2020 at 6:15 PM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote:On 18.08.2020 02:54, Alvaro Herrera wrote:
> On 2020-Aug-14, Ibrar Ahmed wrote:
>
>> The table used for the test contains three columns (integer, text,
>> varchar).
>> The total number of rows is 10000000 in total.
>>
>> Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
>> COPY: 9069.432 ms vacuum; 2567.961ms
>> COPY: 9004.533 ms vacuum: 2553.075ms
>> COPY: 8832.422 ms vacuum: 2540.742ms
>>
>> Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
>> COPY: 10031.723 ms vacuum: 127.524 ms
>> COPY: 9985.109 ms vacuum: 39.953 ms
>> COPY: 9283.373 ms vacuum: 37.137 ms
>>
>> Time to take the copy slightly increased but the vacuum time significantly
>> decrease.
> "Slightly"? It seems quite a large performance drop to me -- more than
> 10%. Where is that time being spent? Andres said in [1] that he
> thought the performance shouldn't be affected noticeably, but this
> doesn't seem to hold true. As I understand, the idea was that there
> would be little or no additional WAL records .. only flags in the
> existing record. So what is happening?
>
> [1] https://postgr.es/m/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de
I agree that 10% performance drop is not what we expect with this patch.
Ibrar, can you share more info about your tests? I'd like to reproduce
this slowdown and fix it, if necessary.Here is my test;postgres=# BEGIN;
BEGIN
postgres=*# TRUNCATE foo;
TRUNCATE TABLE
postgres=*# COPY foo(id, name, address) FROM '/home/ibrar/bar.csv' DELIMITER ',' FREEZE;
COPY 10000000
--Ibrar Ahmed
I've repeated the test and didn't notice any slowdown for COPY FREEZE.
Test data is here [1].
The numbers do fluctuate a bit, but there is no dramatic difference between master and patched version. So I assume that the performance drop in your test has something to do with the measurement error. Unless, you have some non-default configuration that could affect it.patched:
COPY: 12327,090 ms vacuum: 37,555 ms
COPY: 12939,540 ms vacuum: 35,703 ms
COPY: 12245,819 ms vacuum: 36,273 ms
master:
COPY
COPY: 13253,605 ms vacuum: 3592,849 ms
COPY: 12619,428 ms vacuum: 4253,836 ms
COPY: 12512,940 ms vacuum: 4009,847 ms
I also slightly cleaned up comments, so the new version of the patch is attached. As this is just a performance optimization documentation is not needed. It would be great, if other reviewers could run some independent performance tests, as I believe that this patch is ready for committer.[1] https://drive.google.com/file/d/11r19NX6yyPjvxdDub8Ce-kmApRurp4Nx/view
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companyt
Master (ff60394a8c9a7af8b32de420ccb54a20a0f019c1)
postgres=# \timing
postgres=# BEGIN;
postgres=*# TRUNCATE foo;
postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv' DELIMITER ',' FREEZE;
Time: 11824.495 ms (00:11.824)
postgres=*# COMMIT;
Restart
postgres=# \timing
postgres=# BEGIN;
postgres=*# TRUNCATE foo;
postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv' DELIMITER ',' FREEZE;
Time: 14096.987 ms (00:14.097)
postgres=*# commit;
Restart
postgres=# \timing
postgres=# BEGIN;
postgres=*# TRUNCATE foo;
postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv' DELIMITER ',' FREEZE;
Time: 11108.289 ms (00:11.108)
postgres=*# commit;
Patched (ff60394a8c9a7af8b32de420ccb54a20a0f019c1)
postgres=# \timing
postgres=# BEGIN;
postgres=*# TRUNCATE foo;
postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv' DELIMITER ',' FREEZE;
Time: 10749.945 ms (00:10.750)
postgres=*# commit;
Restart
postgres=# \timing
postgres=# BEGIN;
postgres=*# TRUNCATE foo;
postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv' DELIMITER ',' FREEZE;
Time: 14274.361 ms (00:14.274)
postgres=*# commit;
Restart
postgres=# \timing
postgres=# BEGIN;
postgres=*# TRUNCATE foo;
postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv' DELIMITER ',' FREEZE;
Time: 11884.089 ms (00:11.884)
postgres=*# commit;
Status update for a commitfest entry. This patch is ReadyForCommitter. It applies and passes the CI. There are no unanswered questions in the discussion. The discussion started in 2015 with a patch by Jeff Janes. Later it was revived by Pavan Deolasee. After it was picked upby Ibrar Ahmed and finally, it was rewritten by me, so I moved myself from reviewers to authors as well. The latest version was reviewed and tested by Ibrar Ahmed. The patch doesn't affect COPY FREEZE performance and significantlydecreases the time of the following VACUUM.
> Status update for a commitfest entry. > > This patch is ReadyForCommitter. It applies and passes the CI. There are no unanswered questions in the discussion. > > The discussion started in 2015 with a patch by Jeff Janes. Later it was revived by Pavan Deolasee. After it was pickedup by Ibrar Ahmed and finally, it was rewritten by me, so I moved myself from reviewers to authors as well. > > The latest version was reviewed and tested by Ibrar Ahmed. The patch doesn't affect COPY FREEZE performance and significantlydecreases the time of the following VACUUM. I have tested the patch on my laptop (mem 16GB, SSD 512GB) using the data introduced in up thread and saw that VACCUM after COPY FREEZE is nearly 60 times faster than current master branch. Quite impressive. By the way, I noticed following comment: + /* + * vmbuffer should be already pinned by RelationGetBufferForTuple, + * Though, it's fine if is not. all_frozen is just an optimization. + */ could be enhanced like below. What do you think? + /* + * vmbuffer should be already pinned by RelationGetBufferForTuple. + * Though, it's fine if it is not. all_frozen is just an optimization. + */ Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hi, I might be somewhat late to the party, but I've done a bit of benchmarking too ;-) I used TPC-H data from a 100GB test, and tried different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on current master and with the patch. The results look like this (the columns say what combination of COPY and VACUUM was used - e.g. -/FREEZE means plain COPY and VACUUM FREEZE) master: - / - FREEZE / - - / FREEZE FREEZE / FREEZE ---------------------------------------------------------------- COPY 2471 2477 2486 2484 VACUUM 228 209 453 206 patched: - / - FREEZE / - - / FREEZE FREEZE / FREEZE ---------------------------------------------------------------- COPY 2459 2445 2458 2446 VACUUM 227 0 467 0 So I don't really observe any measurable slowdowns in the COPY part (in fact there seems to be a tiny speedup, but it might be just noise). In the VACUUM part, there's clear speedup when the data was already frozen by COPY (Yes, those are zeroes, because it took less than 1 second.) So that looks pretty awesome, I guess. For the record, these tests were run on a server with NVMe SSD, so hopefully reliable and representative numbers. A couple minor comments about the code: 1) Maybe add a comment before the block setting xlrec->flags in heap_multi_insert. It's not very complex, but it used to be a bit simpler, and all the other pieces around have comments, so it won't hurt. 2) I find the "if (all_frozen_set)" block a bit strange. It's a matter of personal preference, but I'd just use a single level of nesting, i.e. something like this: /* if everything frozen, the whole page has to be visible */ Assert(!(all_frozen_set && !PageIsAllVisible(page))); /* * If we've frozen everything on the page, and if we're already * holding pin on the vmbuffer, record that in the visibilitymap. * If we're not holding the pin, it's OK to skip this - it's just * an optimization. * * It's fine to use InvalidTransactionId here - this is only used * when HEAP_INSERT_FROZEN is specified, which intentionally * violates visibility rules. */ if (all_frozen_set && visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer)) visibilitymap_set(...); IMHO it's easier to read, but YMMV. I've also reworded the comment a bit to say what we're doing etc. And I've moved the comment from inside the if block into the main comment - that was making it harder to read too. 3) I see RelationGetBufferForTuple does this: /* * This is for COPY FREEZE needs. If page is empty, * pin vmbuffer to set all_frozen bit */ if ((options & HEAP_INSERT_FROZEN) && (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0)) visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer); so is it actually possible to get into the (all_frozen_set) without holding a pin on the visibilitymap? I haven't investigated this so maybe there are other ways to get into that code block. But the new additions to hio.c get the pin too. 4) In heap_xlog_multi_insert we now have this: if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) PageClearAllVisible(page); if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) PageSetAllVisible(page); IIUC it makes no sense to have both flags at the same time, right? So what about adding Assert(!(XLH_INSERT_ALL_FROZEN_SET && XLH_INSERT_ALL_VISIBLE_CLEARED)); to check that? 5) Not sure we need to explicitly say this is for COPY FREE in all the blocks added to hio.c. IMO it's sufficient to use HEAP_INSERT_FROZEN in the condition, at this level of abstraction. I wonder what to do about the heap_insert - I know there are concerns it would negatively impact regular insert, but is it really true? I suppose this optimization would be valuable even for cases where multi-insert is not possible. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 30.10.2020 03:42, Tomas Vondra wrote: > Hi, > > I might be somewhat late to the party, but I've done a bit of > benchmarking too ;-) I used TPC-H data from a 100GB test, and tried > different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on > current master and with the patch. > > So I don't really observe any measurable slowdowns in the COPY part (in > fact there seems to be a tiny speedup, but it might be just noise). In > the VACUUM part, there's clear speedup when the data was already frozen > by COPY (Yes, those are zeroes, because it took less than 1 second.) > > So that looks pretty awesome, I guess. > > For the record, these tests were run on a server with NVMe SSD, so > hopefully reliable and representative numbers. > Thank you for the review. > A couple minor comments about the code: > > 2) I find the "if (all_frozen_set)" block a bit strange. It's a matter > of personal preference, but I'd just use a single level of nesting, i.e. > something like this: > > /* if everything frozen, the whole page has to be visible */ > Assert(!(all_frozen_set && !PageIsAllVisible(page))); > > /* > * If we've frozen everything on the page, and if we're already > * holding pin on the vmbuffer, record that in the visibilitymap. > * If we're not holding the pin, it's OK to skip this - it's just > * an optimization. > * > * It's fine to use InvalidTransactionId here - this is only used > * when HEAP_INSERT_FROZEN is specified, which intentionally > * violates visibility rules. > */ > if (all_frozen_set && > visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer)) > visibilitymap_set(...); > > IMHO it's easier to read, but YMMV. I've also reworded the comment a bit > to say what we're doing etc. And I've moved the comment from inside the > if block into the main comment - that was making it harder to read too. > I agree that it's a matter of taste. I've updated comments and left nesting unchanged to keep assertions simple. > > 3) I see RelationGetBufferForTuple does this: > > /* > * This is for COPY FREEZE needs. If page is empty, > * pin vmbuffer to set all_frozen bit > */ > if ((options & HEAP_INSERT_FROZEN) && > (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0)) > visibilitymap_pin(relation, BufferGetBlockNumber(buffer), > vmbuffer); > > so is it actually possible to get into the (all_frozen_set) without > holding a pin on the visibilitymap? I haven't investigated this so > maybe there are other ways to get into that code block. But the new > additions to hio.c get the pin too. > I was thinking that GetVisibilityMapPins() can somehow unset the pin. I gave it a second look. And now I don't think it's possible to get into this code block without a pin. So, I converted this check into an assertion. > > 4) In heap_xlog_multi_insert we now have this: > > if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) > PageClearAllVisible(page); > if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) > PageSetAllVisible(page); > > IIUC it makes no sense to have both flags at the same time, right? So > what about adding > > Assert(!(XLH_INSERT_ALL_FROZEN_SET && > XLH_INSERT_ALL_VISIBLE_CLEARED)); > > to check that? > Agree. I placed this assertion to the very beginning of the function. It also helped to simplify the code a bit. I also noticed, that we were not updating visibility map for all_frozen from heap_xlog_multi_insert. Fixed. > > I wonder what to do about the heap_insert - I know there are concerns it > would negatively impact regular insert, but is it really true? I suppose > this optimization would be valuable even for cases where multi-insert is > not possible. > Do we have something like INSERT .. FREEZE? I only see TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations. Can you explain, what use-case are we trying to optimize by extending this patch to heap_insert()? The new version is attached. I've also fixed a typo in the comment by Tatsuo Ishii suggestion. Also, I tested this patch with replication and found no issues. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, I started looking at this patch again, hoping to get it committed in this CF, but I think there's a regression in handling TOAST tables (compared to the v3 patch submitted by Pavan in February 2019). The test I'm running a very simple test (see test.sql): 1) start a transaction 2) create a table with a text column 3) copy freeze data into it 4) use pg_visibility to see how many blocks are all_visible both in the main table and it's TOAST table For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options) I get this: pages NOT all_visible ------------------------------------------ main 637 0 toast 50001 3 There was some discussion about relcache invalidations causing a couple TOAST pages not be marked as all_visible, etc. However, for this patch on master I get this pages NOT all_visible ------------------------------------------ main 637 0 toast 50001 50001 So no pages in TOAST are marked as all_visible. I haven't investigated what's causing this, but IMO that needs fixing to make ths patch RFC. Attached is the test script I'm using, and a v5 of the patch - rebased on current master, with some minor tweaks to comments etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 11.01.2021 01:35, Tomas Vondra wrote: > Hi, > > I started looking at this patch again, hoping to get it committed in > this CF, but I think there's a regression in handling TOAST tables > (compared to the v3 patch submitted by Pavan in February 2019). > > The test I'm running a very simple test (see test.sql): > > 1) start a transaction > 2) create a table with a text column > 3) copy freeze data into it > 4) use pg_visibility to see how many blocks are all_visible both in the > main table and it's TOAST table > > For v3 patch (applied on top of 278584b526 and > s/hi_options/ti_options) I get this: > > pages NOT all_visible > ------------------------------------------ > main 637 0 > toast 50001 3 > > There was some discussion about relcache invalidations causing a > couple TOAST pages not be marked as all_visible, etc. > > However, for this patch on master I get this > > pages NOT all_visible > ------------------------------------------ > main 637 0 > toast 50001 50001 > > So no pages in TOAST are marked as all_visible. I haven't investigated > what's causing this, but IMO that needs fixing to make ths patch RFC. > > Attached is the test script I'm using, and a v5 of the patch - rebased > on current master, with some minor tweaks to comments etc. > Thank you for attaching the test script. I reproduced the problem. This regression occurs because TOAST internally uses heap_insert(). You have asked upthread about adding this optimization to heap_insert(). I wrote a quick fix, see the attached patch 0002. The TOAST test passes now, but I haven't tested performance or any other use-cases yet. I'm going to test it properly in a couple of days and share results. With this change a lot of new code is repeated in heap_insert() and heap_multi_insert(). I think it's fine, because these functions already have a lot in common. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 1/11/21 10:00 PM, Anastasia Lubennikova wrote: > On 11.01.2021 01:35, Tomas Vondra wrote: >> Hi, >> >> I started looking at this patch again, hoping to get it committed in >> this CF, but I think there's a regression in handling TOAST tables >> (compared to the v3 patch submitted by Pavan in February 2019). >> >> The test I'm running a very simple test (see test.sql): >> >> 1) start a transaction >> 2) create a table with a text column >> 3) copy freeze data into it >> 4) use pg_visibility to see how many blocks are all_visible both in the >> main table and it's TOAST table >> >> For v3 patch (applied on top of 278584b526 and >> s/hi_options/ti_options) I get this: >> >> pages NOT all_visible >> ------------------------------------------ >> main 637 0 >> toast 50001 3 >> >> There was some discussion about relcache invalidations causing a >> couple TOAST pages not be marked as all_visible, etc. >> >> However, for this patch on master I get this >> >> pages NOT all_visible >> ------------------------------------------ >> main 637 0 >> toast 50001 50001 >> >> So no pages in TOAST are marked as all_visible. I haven't investigated >> what's causing this, but IMO that needs fixing to make ths patch RFC. >> >> Attached is the test script I'm using, and a v5 of the patch - rebased >> on current master, with some minor tweaks to comments etc. >> > > Thank you for attaching the test script. I reproduced the problem. This > regression occurs because TOAST internally uses heap_insert(). > You have asked upthread about adding this optimization to heap_insert(). > > I wrote a quick fix, see the attached patch 0002. The TOAST test passes > now, but I haven't tested performance or any other use-cases yet. > I'm going to test it properly in a couple of days and share results. > Thanks. I think it's important to make this work for TOAST tables - it often stores most of the data, and it was working in v3 of the patch. I haven't looked into the details, but if it's really just due to TOAST using heap_insert, I'd say it just confirms the importance of tweaking heap_insert too. > With this change a lot of new code is repeated in heap_insert() and > heap_multi_insert(). I think it's fine, because these functions already > have a lot in common. > Understood. IMHO a bit of redundancy is not a big issue, but I haven't looked at the code yet. Let's get it working first, then we can decide if some refactoring is appropriate. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12.01.2021 00:51, Tomas Vondra wrote: > > > On 1/11/21 10:00 PM, Anastasia Lubennikova wrote: >> On 11.01.2021 01:35, Tomas Vondra wrote: >>> Hi, >>> >>> I started looking at this patch again, hoping to get it committed in >>> this CF, but I think there's a regression in handling TOAST tables >>> (compared to the v3 patch submitted by Pavan in February 2019). >>> >>> The test I'm running a very simple test (see test.sql): >>> >>> 1) start a transaction >>> 2) create a table with a text column >>> 3) copy freeze data into it >>> 4) use pg_visibility to see how many blocks are all_visible both in the >>> main table and it's TOAST table >>> >>> For v3 patch (applied on top of 278584b526 and >>> s/hi_options/ti_options) I get this: >>> >>> pages NOT all_visible >>> ------------------------------------------ >>> main 637 0 >>> toast 50001 3 >>> >>> There was some discussion about relcache invalidations causing a >>> couple TOAST pages not be marked as all_visible, etc. >>> >>> However, for this patch on master I get this >>> >>> pages NOT all_visible >>> ------------------------------------------ >>> main 637 0 >>> toast 50001 50001 >>> >>> So no pages in TOAST are marked as all_visible. I haven't >>> investigated what's causing this, but IMO that needs fixing to make >>> ths patch RFC. >>> >>> Attached is the test script I'm using, and a v5 of the patch - >>> rebased on current master, with some minor tweaks to comments etc. >>> >> >> Thank you for attaching the test script. I reproduced the problem. >> This regression occurs because TOAST internally uses heap_insert(). >> You have asked upthread about adding this optimization to heap_insert(). >> >> I wrote a quick fix, see the attached patch 0002. The TOAST test >> passes now, but I haven't tested performance or any other use-cases yet. >> I'm going to test it properly in a couple of days and share results. >> > > Thanks. I think it's important to make this work for TOAST tables - it > often stores most of the data, and it was working in v3 of the patch. > I haven't looked into the details, but if it's really just due to > TOAST using heap_insert, I'd say it just confirms the importance of > tweaking heap_insert too. I've tested performance. All tests were run on my laptop, latest master with and without patches, all default settings, except disabled autovacuum and installed pg_stat_statements extension. The VACUUM is significantly faster with the patch, as it only checks visibility map. COPY speed fluctuates a lot between tests, but I didn't notice any trend. I would expect minor slowdown with the patch, as we need to handle visibility map pages during COPY FREEZE. But in some runs, patched version was faster than current master, so the impact of the patch is insignificant. I run 3 different tests: 1) Regular table (final size 5972 MB) patched | master COPY FREEZE data 3 times: 33384,544 ms 31135,037 ms 31666,226 ms 31158,294 ms 32802,783 ms 33599,234 ms VACUUM 54,185 ms 48445,584 ms 2) Table with TOAST (final size 1207 MB where 1172 MB is in toast table) patched | master COPY FREEZE data 3 times: 368166,743 ms 383231,077 ms 398695,018 ms 454572,630 ms 410174,682 ms 567847,288 ms VACUUM 43,159 ms 6648,302 ms 3) Table with a trivial BEFORE INSERT trigger (final size 5972 MB) patched | master COPY FREEZE data 3 times: 73544,225 ms 64967,802 ms 90960,584 ms 71826,362 ms 81356,025 ms 80023,041 ms VACUUM 49,626 ms 40100,097 ms I took another look at the yesterday's patch and it looks fine to me. So now I am waiting for your review. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Thanks. These patches seem to resolve the TOAST table issue, freezing it as expected. I think the code duplication is not an issue, but I wonder why heap_insert uses this condition: /* * ... * * No need to update the visibilitymap if it had all_frozen bit set * before this insertion. */ if (all_frozen_set && ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)) while heap_multi_insert only does this: if (all_frozen_set) { ... } I haven't looked at the details, but shouldn't both do the same thing? I've done some benchmarks, comparing master and patched version on a bunch of combinations (logged/unlogged, no before-insert trigger, trigger filtering everything/nothing). On master, the results are: group copy vacuum ----------------------------------------------- logged / no trigger 4672 162 logged / trigger (all) 4914 162 logged / trigger (none) 1219 11 unlogged / no trigger 4132 156 unlogged / trigger (all) 4292 156 unlogged / trigger (none) 1275 11 and patched looks like this: group copy vacuum ----------------------------------------------- logged / no trigger 4669 12 logged / trigger (all) 4874 12 logged / trigger (none) 1233 11 unlogged / no trigger 4054 11 unlogged / trigger (all) 4185 12 unlogged / trigger (none) 1222 10 This looks pretty nice - there are no regressions, just speedups in the vacuum step. The SQL script used is attached. However, I've also repeated the test counting all-frozen pages in both the main table and TOAST table, and I get this: master ====== select count(*) from pg_visibility((select reltoastrelid from pg_class where relname = 't')); count -------- 100000 (1 row) select count(*) from pg_visibility((select reltoastrelid from pg_class where relname = 't')) where not all_visible; count -------- 100000 (1 row) patched ======= select count(*) from pg_visibility((select reltoastrelid from pg_class where relname = 't')); count -------- 100002 (1 row) select count(*) from pg_visibility((select reltoastrelid from pg_class where relname = 't')) where not all_visible; count -------- 0 (1 row) That is - all TOAST pages are frozen (as expected, which is good). But now there are 100002 pages, not just 100000 pages. That is, we're now creating 2 extra pages, for some reason. I recall Pavan reported similar issue with every 32768-th page not being properly filled, but I'm not sure if that's the same issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 12.01.2021 22:30, Tomas Vondra wrote: > Thanks. These patches seem to resolve the TOAST table issue, freezing > it as expected. I think the code duplication is not an issue, but I > wonder why heap_insert uses this condition: > > /* > * ... > * > * No need to update the visibilitymap if it had all_frozen bit set > * before this insertion. > */ > if (all_frozen_set && ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)) > > while heap_multi_insert only does this: > > if (all_frozen_set) { ... } > > I haven't looked at the details, but shouldn't both do the same thing? I decided to add this check for heap_insert() to avoid unneeded calls of visibilitymap_set(). If we insert tuples one by one, we can only call this once per page. In my understanding, heap_multi_insert() inserts tuples in batches, so it doesn't need this optimization. > > > However, I've also repeated the test counting all-frozen pages in both > the main table and TOAST table, and I get this: > > patched > ======= > > select count(*) from pg_visibility((select reltoastrelid from pg_class > where relname = 't')); > > count > -------- > 100002 > (1 row) > > > select count(*) from pg_visibility((select reltoastrelid from pg_class > where relname = 't')) where not all_visible; > > count > -------- > 0 > (1 row) > > That is - all TOAST pages are frozen (as expected, which is good). But > now there are 100002 pages, not just 100000 pages. That is, we're now > creating 2 extra pages, for some reason. I recall Pavan reported > similar issue with every 32768-th page not being properly filled, but > I'm not sure if that's the same issue. > > > regards > As Pavan correctly figured it out before the problem is that RelationGetBufferForTuple() moves to the next page, losing free space in the block: > ... I see that a relcache invalidation arrives > after 1st and then after every 32672th block is filled. That clears the > rel->rd_smgr field and we lose the information about the saved target > block. The code then moves to extend the relation again and thus skips the > previously less-than-half filled block, losing the free space in that block. The reason of this cache invalidation is vm_extend() call, which happens every 32762 blocks. RelationGetBufferForTuple() tries to use the last page, but for some reason this code is under 'use_fsm' check. And COPY FROM doesn't use fsm (see TABLE_INSERT_SKIP_FSM). /* * If the FSM knows nothing of the rel, try the last page before we * give up and extend. This avoids one-tuple-per-page syndrome during * bootstrapping or in a recently-started system. */ if (targetBlock == InvalidBlockNumber) { BlockNumber nblocks = RelationGetNumberOfBlocks(relation); if (nblocks > 0) targetBlock = nblocks - 1; } I think we can use this code without regard to 'use_fsm'. With this change, the number of toast rel pages is correct. The patch is attached. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On 1/16/21 4:11 PM, Anastasia Lubennikova wrote: > > ... > > As Pavan correctly figured it out before the problem is that > RelationGetBufferForTuple() moves to the next page, losing free space in > the block: > > > ... I see that a relcache invalidation arrives > > after 1st and then after every 32672th block is filled. That clears the > > rel->rd_smgr field and we lose the information about the saved target > > block. The code then moves to extend the relation again and thus > skips the > > previously less-than-half filled block, losing the free space in that > block. > > The reason of this cache invalidation is vm_extend() call, which happens > every 32762 blocks. > > RelationGetBufferForTuple() tries to use the last page, but for some > reason this code is under 'use_fsm' check. And COPY FROM doesn't use fsm > (see TABLE_INSERT_SKIP_FSM). > > > /* > * If the FSM knows nothing of the rel, try the last page > before we > * give up and extend. This avoids one-tuple-per-page syndrome > during > * bootstrapping or in a recently-started system. > */ > if (targetBlock == InvalidBlockNumber) > { > BlockNumber nblocks = RelationGetNumberOfBlocks(relation); > if (nblocks > 0) > targetBlock = nblocks - 1; > } > > > I think we can use this code without regard to 'use_fsm'. With this > change, the number of toast rel pages is correct. The patch is attached. > Thanks for the updated patch, this version looks OK to me - I've marked it as RFC. I'll do a bit more testing, review, and then I'll get it committed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/16/21 11:18 PM, Tomas Vondra wrote: > ... > > Thanks for the updated patch, this version looks OK to me - I've marked > it as RFC. I'll do a bit more testing, review, and then I'll get it > committed. > Pushed. Thanks everyone for the effort put into this patch. The first version was sent in 2015, so it took quite a bit of time. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Pushed. Thanks everyone for the effort put into this patch. The first > version was sent in 2015, so it took quite a bit of time. Great news. Thanks everyone who have been working on this. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Pushed. Thanks everyone for the effort put into this patch. The first
version was sent in 2015, so it took quite a bit of time.
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > On Mon, Jan 18, 2021 at 3:02 AM Tomas Vondra <tomas.vondra@enterprisedb.com> > wrote: >> Pushed. Thanks everyone for the effort put into this patch. The first >> version was sent in 2015, so it took quite a bit of time. > Thanks Tomas, Anastasia and everyone else who worked on the patch and > ensured that it gets into the tree. Buildfarm results suggest that the test case is unstable under CLOBBER_CACHE_ALWAYS: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-01-19%2020%3A27%3A46 This might mean an actual bug, or just that the test isn't robust. regards, tom lane
On 1/23/21 1:10 AM, Tom Lane wrote: > Pavan Deolasee <pavan.deolasee@gmail.com> writes: >> On Mon, Jan 18, 2021 at 3:02 AM Tomas Vondra <tomas.vondra@enterprisedb.com> >> wrote: >>> Pushed. Thanks everyone for the effort put into this patch. The first >>> version was sent in 2015, so it took quite a bit of time. > >> Thanks Tomas, Anastasia and everyone else who worked on the patch and >> ensured that it gets into the tree. > > Buildfarm results suggest that the test case is unstable under > CLOBBER_CACHE_ALWAYS: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-01-19%2020%3A27%3A46 > > This might mean an actual bug, or just that the test isn't robust. > Yeah :-( It seems I've committed the v5 patch, not the v7 addressing exactly this issue (which I've actually pointed out and asked to be fixed). Oh well ... I'll get this fixed tomorrow - I have the fix, and I have verified that it passes with CLOBBER_CACHE_ALWAYS, but pushing it at 5AM does not seem like a good idea. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, I've pushed the fix, after a a couple extra rounds of careful testing. I noticed that the existing pg_visibility regression tests don't check if we freeze the TOAST rows too (failing to do that was one of the symptoms). It'd be good to add that, because that would fail even without CLOBBER_CACHE_ALWAYS, so attached is a test I propose to add. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company