Re: Freeze avoidance of very large table. - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Freeze avoidance of very large table.
Date
Msg-id CAD21AoCCq_R-fbdXSXXEV8jAM9C2CD4R=jGu0Z-Y7K7bbFt=xw@mail.gmail.com
Whole thread Raw
In response to Re: Freeze avoidance of very large table.  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Sat, Oct 3, 2015 at 12:23 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Masahiko Sawada wrote:
>

Thank you for taking time to review this feature.
Attached the latest version patch (v15).


>> @@ -2972,10 +2981,15 @@ l1:
>>        */
>>       PageSetPrunable(page, xid);
>>
>> +     /* clear PD_ALL_VISIBLE and PD_ALL_FORZEN flags */
>
> Typo "FORZEN".

Fixed.

>
>>       if (PageIsAllVisible(page))
>>       {
>>               all_visible_cleared = true;
>> +
>> +             /* all-frozen information is also cleared at the same time */
>>               PageClearAllVisible(page);
>> +             PageClearAllFrozen(page);
>
> I wonder if it makes sense to have a macro to clear both in unison,
> which seems a very common pattern.
>

Fixed.

>
>> diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
>> index 7c38772..a284b85 100644
>> --- a/src/backend/access/heap/visibilitymap.c
>> +++ b/src/backend/access/heap/visibilitymap.c
>> @@ -21,33 +21,45 @@
>>   *
>>   * NOTES
>>   *
>> - * The visibility map is a bitmap with one bit per heap page. A set bit means
>> - * that all tuples on the page are known visible to all transactions, and
>> - * therefore the page doesn't need to be vacuumed. The map is conservative in
>> - * the sense that we make sure that whenever a bit is set, we know the
>> - * condition is true, but if a bit is not set, it might or might not be true.
>> + * The visibility map is a bitmap with two bits (all-visible and all-frozen)
>> + * per heap page. A set all-visible bit means that all tuples on the page are
>> + * known visible to all transactions, and therefore the page doesn't need to
>> + * be vacuumed. A set all-frozen bit means that all tuples on the page are
>> + * completely frozen, and therefore the page doesn't need to be vacuumed even
>> + * if whole table scanning vacuum is required (e.g. anti-wraparound vacuum).
>> + * A all-frozen bit must be set only when the page is already all-visible.
>> + * That is, all-frozen bit is always set with all-visible bit.
>
> "A all-frozen" -> "The all-frozen" (but "A set all-xyz" is correct).

Fixed.

>
>>   * When we *set* a visibility map during VACUUM, we must write WAL.  This may
>>   * seem counterintuitive, since the bit is basically a hint: if it is clear,
>> - * it may still be the case that every tuple on the page is visible to all
>> - * transactions; we just don't know that for certain.  The difficulty is that
>> - * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
>> - * on the page itself, and the visibility map bit.  If a crash occurs after the
>> - * visibility map page makes it to disk and before the updated heap page makes
>> - * it to disk, redo must set the bit on the heap page.  Otherwise, the next
>> - * insert, update, or delete on the heap page will fail to realize that the
>> - * visibility map bit must be cleared, possibly causing index-only scans to
>> - * return wrong answers.
>> + * it may still be the case that every tuple on the page is visible or frozen
>> + * to all transactions; we just don't know that for certain.  The difficulty is
>> + * that there are two bits which are typically set together: the PD_ALL_VISIBLE
>> + * or PD_ALL_FROZEN bit on the page itself, and the visibility map bit.  If a
>> + * crash occurs after the visibility map page makes it to disk and before the
>> + * updated heap page makes it to disk, redo must set the bit on the heap page.
>> + * Otherwise, the next insert, update, or delete on the heap page will fail to
>> + * realize that the visibility map bit must be cleared, possibly causing index-only
>> + * scans to return wrong answers.
>
> In the "The difficulty ..." para, I would add the word "corresponding" before
> "visibility".  Otherwise, it is not clear what the plural means exactly.

Fixed.

>>   * VACUUM will normally skip pages for which the visibility map bit is set;
>>   * such pages can't contain any dead tuples and therefore don't need vacuuming.
>> - * The visibility map is not used for anti-wraparound vacuums, because
>> + * The visibility map is not used for anti-wraparound vacuums before 9.5, because
>>   * an anti-wraparound vacuum needs to freeze tuples and observe the latest xid
>>   * present in the table, even on pages that don't have any dead tuples.
>> + * 9.6 or later, the visibility map has a additional bit which indicates all tuple
>> + * on single page has been completely forzen, so the visibility map is also used for
>> + * anti-wraparound vacuums.
>
> This should not mention database versions.  Just explain how the code
> behaves today, not how it behaved in the past.  Those who want to
> understand how it behaved in 9.5 can read the 9.5 code.  (Again typo
> "forzen".)

Changed these comment.
Sorry for the same typo frequently..

>> @@ -1115,6 +1187,14 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
>>                                               tups_vacuumed, vacuumed_pages)));
>>
>>       /*
>> +      * This information would be effective for how much effect all-frozen bit
>> +      * of VM had for freezing tuples.
>> +      */
>> +     ereport(elevel,
>> +                     (errmsg("Skipped %d frozen pages acoording to visibility map",
>> +                                     vacrelstats->vmskipped_frozen_pages)));
>
> Message must start on lowercase letter.  I don't understand what the
> comment means.  Can you rephrase it?

Fixed.

>> @@ -1779,10 +1873,12 @@ vac_cmp_itemptr(const void *left, const void *right)
>>  /*
>>   * Check if every tuple in the given page is visible to all current and future
>>   * transactions. Also return the visibility_cutoff_xid which is the highest
>> - * xmin amongst the visible tuples.
>> + * xmin amongst the visible tuples, and all_forzen which implies that all tuples
>> + * of this page are frozen.
>
> Typo "forzen" here again.

Fixed.

>> @@ -201,6 +239,110 @@ copy_file(const char *srcfile, const char *dstfile, bool force)
>>  #endif
>>
>>
>> +/*
>> + * rewriteVisibilitymap()
>> + *
>> + * A additional bit which indicates that all tuples on page is completely
>> + * frozen is added into visibility map at PG 9.6. So the format of visibiilty
>> + * map has been changed.
>> + * Copies a visibility map file while adding all-frozen bit(0) into each bit.
>> + */
>> +static const char *
>> +rewriteVisibilitymap(const char *fromfile, const char *tofile, bool force)
>> +{
>> +#define REWRITE_BUF_SIZE (50 * BLCKSZ)
>> +#define BITS_PER_HEAPBLOCK 2
>> +
>> +     int                     src_fd, dst_fd;
>> +     uint16          vm_bits;
>> +     ssize_t         nbytes;
>> +     char            *buffer;
>> +     int                     ret = 0;
>> +     int                     save_errno = 0;
>> +
>> +     if ((fromfile == NULL) || (tofile == NULL))
>> +     {
>> +             errno = EINVAL;
>> +             return getErrorText(errno);
>> +     }
>> +
>> +     if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
>> +             return getErrorText(errno);
>> +
>> +     if ((dst_fd = open(tofile, O_RDWR | O_CREAT | (force ? 0 : O_EXCL), S_IRUSR | S_IWUSR)) < 0)
>> +     {
>> +             save_errno = errno;
>> +             if (src_fd != 0)
>> +                     close(src_fd);
>> +
>> +             errno = save_errno;
>> +             return getErrorText(errno);
>> +     }
>> +
>> +     buffer = (char *) pg_malloc(REWRITE_BUF_SIZE);
>> +
>> +     /* Copy page header data in advance */
>> +     if ((nbytes = read(src_fd, buffer, MAXALIGN(SizeOfPageHeaderData))) <= 0)
>> +     {
>> +             save_errno = errno;
>> +             return getErrorText(errno);
>> +     }
>
> Not clear why you bother with save_errno in this path.  Forgot to
> close()?  (Though I wonder why you bother to close() if the program is
> going to exit shortly thereafter anyway.)

Fixed.

>> diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
>> index 13aa891..fc92a5f 100644
>> --- a/src/bin/pg_upgrade/pg_upgrade.h
>> +++ b/src/bin/pg_upgrade/pg_upgrade.h
>> @@ -112,6 +112,11 @@ extern char *output_files[];
>>  #define VISIBILITY_MAP_CRASHSAFE_CAT_VER 201107031
>>
>>  /*
>> + * The format of visibility map changed with this 9.6 commit,
>> + *
>> + */
>> +#define VISIBILITY_MAP_FROZEN_BIT_CAT_VER 201509181
>
> Useless empty line in comment.

Fixed.

>> diff --git a/src/common/relpath.c b/src/common/relpath.c
>> index 66dfef1..52ff14e 100644
>> --- a/src/common/relpath.c
>> +++ b/src/common/relpath.c
>> @@ -30,6 +30,9 @@
>>   * If you add a new entry, remember to update the errhint in
>>   * forkname_to_number() below, and update the SGML documentation for
>>   * pg_relation_size().
>> + * 9.6 or later, the visibility map fork name is changed from "vm" to
>> + * "vfm" bacause visibility map has not only information about all-visible
>> + * but also information about all-frozen.
>>   */
>>  const char *const forkNames[] = {
>>       "main",                                         /* MAIN_FORKNUM */
>
> Drop the change in comment?  There's no "vfm" in this version of the
> patch, is there?

Fixed.

Regards,

--
Masahiko Sawada

Attachment

pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Request for dogfood volunteers (was No Issue Tracker - Say it Ain't So!)
Next
From: Jeff Janes
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!