Re: Freeze avoidance of very large table. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Freeze avoidance of very large table.
Date
Msg-id CAA4eK1+aTdaSwG3u+y8fXxn67Kkj0T1KzRsFDLEi=tQvTYgFrQ@mail.gmail.com
Whole thread Raw
In response to Re: Freeze avoidance of very large table.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Freeze avoidance of very large table.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On Fri, Oct 30, 2015 at 6:03 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>
>
> v20 patch has a bug in result of regression test.
> Attached updated v21 patch.
>

Couple of more review comments:
------------------------------------------------------

1.
@@ -615,6 +617,8 @@ typedef struct PgStat_StatTabEntry
  PgStat_Counter n_dead_tuples;
  PgStat_Counter 
changes_since_analyze;
 
+ int32 n_frozen_pages;
+
  PgStat_Counter blocks_fetched;
  PgStat_Counter 
blocks_hit;

As you are changing above structure, you need to update
PGSTAT_FILE_FORMAT_ID, refer below code:
#define PGSTAT_FILE_FORMAT_ID 0x01A5BC9D

2. It seems that n_frozen_page is not initialized/updated properly
for toast tables:

Try with below steps:

postgres=# create table t4(c1 int, c2 text);
CREATE TABLE
postgres=# select oid, relname from pg_class where relname like '%t4%';
  oid  | relname
-------+---------
 16390 | t4
(1 row)


postgres=# select oid, relname from pg_class where relname like '%16390%';
  oid  |       relname
-------+----------------------
 16393 | pg_toast_16390
 16395 | pg_toast_16390_index
(2 rows)

postgres=# select relname,seq_scan,n_tup_ins,last_vacuum,n_frozen_page from pg_s
tat_all_tables where relname like '%16390%';
    relname     | seq_scan | n_tup_ins | last_vacuum | n_frozen_page
----------------+----------+-----------+-------------+---------------
 pg_toast_16390 |        1 |         0 |             |    -842150451
(1 row)

Note that I have tested above scenario on my Windows 7 m/c. 

3.
 * visibilitymap.c
 *  bitmap for tracking visibility of heap tuples

I think above needs to be changed to:
bitmap for tracking visibility and frozen state of heap tuples


4.
a.
  /*
- * If we froze any tuples, mark the buffer dirty, and write a WAL
- * record recording the changes.  We must log the changes to be
- * crash-safe against future truncation of CLOG.
+ * If we froze any tuples then we mark the buffer dirty, and write a WAL

b.
- * Release any remaining pin on visibility map page.
+ * Release any remaining pin on visibility map.

c.
  * We do update relallvisible even in the corner case, since if the table
- * is all-visible 
we'd definitely like to know that.  But clamp the value
- * to be not more than what we're setting 
relpages to.
+ * is all-visible we'd definitely like to know that.
+ * But clamp the value to be not more 
than what we're setting relpages to.

I don't think you need to change above comments.

5.
+ * Even if scan_all is set so far, we could skip to scan some pages
+ * according by all-frozen 
bit of visibility amp.

/according by/according to
/amp/map

I suggested to modify comment as below:
During full scan, we could skip some pages according to all-frozen
bit of visibility map.

Also no need to start this in new line, start from where the
previous line of comment ends.

6.
/*
 * lazy_scan_heap() -- scan an open heap relation
 *
 * This routine prunes each page in the 
heap, which will among other
 * things truncate dead tuples to dead line pointers, defragment the
 *
page, and set commit status bits (see heap_page_prune).  It also builds
 * lists of dead 
tuples and pages with free space, calculates statistics
 * on the number of live tuples in the 
heap, and marks pages as
 * all-visible if appropriate.

Modify above function header as:

all-visible, all-frozen

7.
lazy_scan_heap()
{
..

if (PageIsEmpty(page))
{
empty_pages++;
freespace = 
PageGetHeapFreeSpace(page);

/* empty pages are always all-visible */
if (!PageIsAllVisible(page))
..
}

Don't we need to ensure that empty pages should get marked as
all-frozen?

8.
lazy_scan_heap()
{
..
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if
* the page-
level bit is clear.  However, it's possible that the bit
* got cleared after we checked it 
and before we took the buffer
* content lock, so we must recheck before jumping to the conclusion
* that something bad has happened.
*/
else if (all_visible_according_to_vm 
&& !PageIsAllVisible(page)
&& visibilitymap_test(onerel, blkno, &vmbuffer, 
VISIBILITYMAP_ALL_VISIBLE))
{
elog(WARNING, "page is not marked all-visible 
but visibility map bit is set in relation \"%s\" page %u",
relname, blkno);
visibilitymap_clear(onerel, blkno, vmbuffer);
}

/*
It's possible for the value returned by GetOldestXmin() to move
* backwards, so it's not wrong for 
us to see tuples that appear to
* not be visible to everyone yet, while PD_ALL_VISIBLE is already
* set. The real safe xmin value never moves backwards, but
* GetOldestXmin() is 
conservative and sometimes returns a value
* that's unnecessarily small, so if we see that 
contradiction it just
* means that the tuples that we think are not visible to everyone yet
 * actually are, and the PD_ALL_VISIBLE flag is correct.
*
* There should never 
be dead tuples on a page with PD_ALL_VISIBLE
* set, however.
*/
else 
if (PageIsAllVisible(page) && has_dead_tuples)
{
elog(WARNING, "page 
containing dead tuples is marked as all-visible in relation \"%s\" page %u",
 
relname, blkno);
PageClearAllVisible(page);
MarkBufferDirty(buf);
visibilitymap_clear(onerel, blkno, vmbuffer);
}

..
}

I think both the above cases could happen for frozen state
as well, unless you think otherwise, we need similar handling
for frozen bit.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: konstantin knizhnik
Date:
Subject: eXtensible Transaction Manager API
Next
From: David Fetter
Date:
Subject: Re: ALTER ... OWNER TO ... vs. ALTER DEFAULT PRIVILEGES