Re: Freeze avoidance of very large table. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Freeze avoidance of very large table. |
Date | |
Msg-id | 20151105.180329.151649663.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Freeze avoidance of very large table. (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Freeze avoidance of very large table.
(Masahiko Sawada <sawada.mshk@gmail.com>)
|
List | pgsql-hackers |
Hello, I had a look on v21 patch. Though I haven't looked the whole of the patch, I'd like to show you some comments only for visibilitymap.c and a part of the documentation. 1. Patch application patch command complains about offsets for heapam.c at current master. 2. visitibilymap_test() - if (visibilitymap_test(rel, blkno, &vmbuffer)) + if (visibilitymap_test(rel, blkno, &vmbuffer, VISIBILITYMAP_ALL_VISIBLE) The old VM was a simple bitmap so the name _test and thefunction are proper but now the bitmap is quad state so it'd bebetterchainging the function. Alghough it is not so expensiveto call it twice successively, it is a bit uneasy for me doingso.One possible shape would be like the following. lazy_vacuum_page()> int vmstate = visibilitymap_get_status(rel, blkno, &vmbuffer);> if (!(vmstate & VISIBILITYMAP_ALL_VISIBLE))> ...> if (all_frozen && !(vmstate & VISIBILITYMAP_ALL_FROZEN))> ...> if (flags != vmstate)> visibilitymap_set(...., flags); and defining two macros for indivisual tests, > #define VM_ALL_VISIBLE(r, b, v) ((vm_get_status((r), (b), (v)) & .._VISIBLE) != 0)> if (VM_ALL_VISIBLE(rel, blkno, &vmbuffer))and>if (VM_ALL_FROZEN(rel, blkno, &vmbuffer)) How about this? 3. visibilitymap.c - HEAPBLK_TO_MAPBIT In visibilitymap_clear and other functions, mapBit means mapDualBit in the patch, and mapBit always appears in the form"mapBit * BITS_PER_HEAPBLOCK". So, it'd be better to change the definition of HEAPBLK_TO_MAPBIT so that it calculatesreally the bit position in a byte. - #define HEAPBLK_TO_MAPBIT(x) ((x) % HEAPBLOCKS_PER_BYTE) + #define HEAPBLK_TO_MAPBIT(x) (((x) % HEAPBLOCKS_PER_BYTE) *BITS_PER_HEAPBLOCK) - visibilitymap_count() The third argument all_frozen is not necessary in some usage. So this interface would be preferable to be as following, BlockNumber visibilitymap_count(Relation rel, BlockNumber *all_frozen) { BlockNumber all_visible = 0; ... if (all_frozen) *all_frozen = 0; ... something like ... - visibilitymap_set() The check for ALL_VISIBLE is duplicate in the following assertion. > Assert((flags & VISIBILITYMAP_ALL_VISIBLE) || > (flags & (VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN))); 4. documentation - 18.11.1 Statement Hehavior A typo. > VACUUM performs *a* aggressive freezing However I am not a fluent English speaker, and such wordsmithing would be done by someone else, I feel that "eager/greedy"is more suitable for this meaning.., nevertheless, the term "whole-table freezing" that you wrote elsewherein this patch would be usable. "VACUUM performs a whole-table freezing" All "a table scan/sweep"s and something has the similar meaning would be better be changed to "a whole-table freezing" In similar manner, "tuples/rows that are marked as frozen" could be replaced with "unfrozen tuples/rows". - 23.1.5 Preventing Transaction ID Wraparound Failures "The whole table is scanned only when all pages happen to require vacuuming to remove dead row versions." This description looks a bit out-of-point. "the whole table scan" in the original description is what is triggered by relfrozenxid so the correspondent in the revised description is "the whole-table freezing", maybe. "The whole-table feezing takes place when <structfield>relfrozenxid</> is more than <varname>vacuum_freeze_table_age</>transactions old or when <command>VACUUM</>'s <literal>FREEZE</> option is used. The whole-table freezing scans all unfreezed pages." The last sentence might be unnecessary. - 63.4 Visibility Map "pages contain only tuples that are marked as frozen" would be enough to be "pages contain only frozen tuples" and according to the discussion upthread, we might be good to have some desciption that the name is historically omitting the aspect of freezemap. At Sat, 31 Oct 2015 18:07:32 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1+aTdaSwG3u+y8fXxn67Kkj0T1KzRsFDLEi=tQvTYgFrQ@mail.gmail.com> amit.kapila16> On Fri, Oct 30, 2015 at 6:03 AM, Masahiko Sawada <sawada.mshk@gmail.com> > Couple of more review comments: > ------------------------------------------------------ > > 1. > @@ -615,6 +617,8 @@ typedef struct PgStat_StatTabEntry > PgStat_Counter n_dead_tuples; > PgStat_Counter > changes_since_analyze; > > + int32 n_frozen_pages; > + > PgStat_Counter blocks_fetched; > PgStat_Counter > blocks_hit; > > As you are changing above structure, you need to update > PGSTAT_FILE_FORMAT_ID, refer below code: > #define PGSTAT_FILE_FORMAT_ID 0x01A5BC9D > > 2. It seems that n_frozen_page is not initialized/updated properly > for toast tables: > > Try with below steps: > > postgres=# create table t4(c1 int, c2 text); > CREATE TABLE > postgres=# select oid, relname from pg_class where relname like '%t4%'; > oid | relname > -------+--------- > 16390 | t4 > (1 row) > > > postgres=# select oid, relname from pg_class where relname like '%16390%'; > oid | relname > -------+---------------------- > 16393 | pg_toast_16390 > 16395 | pg_toast_16390_index > (2 rows) > > postgres=# select relname,seq_scan,n_tup_ins,last_vacuum,n_frozen_page from > pg_s > tat_all_tables where relname like '%16390%'; > relname | seq_scan | n_tup_ins | last_vacuum | n_frozen_page > ----------------+----------+-----------+-------------+--------------- > pg_toast_16390 | 1 | 0 | | -842150451 > (1 row) > > Note that I have tested above scenario on my Windows 7 m/c. > > 3. > * visibilitymap.c > * bitmap for tracking visibility of heap tuples > > I think above needs to be changed to: > bitmap for tracking visibility and frozen state of heap tuples > > > 4. > a. > /* > - * If we froze any tuples, mark the buffer dirty, and write a WAL > - * record recording the changes. We must log the changes to be > - * crash-safe against future truncation of CLOG. > + * If we froze any tuples then we mark the buffer dirty, and write a WAL > > b. > - * Release any remaining pin on visibility map page. > + * Release any remaining pin on visibility map. > > c. > * We do update relallvisible even in the corner case, since if the table > - * is all-visible > we'd definitely like to know that. But clamp the value > - * to be not more than what we're setting > relpages to. > + * is all-visible we'd definitely like to know that. > + * But clamp the value to be not more > than what we're setting relpages to. > > I don't think you need to change above comments. > > 5. > + * Even if scan_all is set so far, we could skip to scan some pages > + * according by all-frozen > bit of visibility amp. > > /according by/according to > /amp/map > > I suggested to modify comment as below: > During full scan, we could skip some pages according to all-frozen > bit of visibility map. > > Also no need to start this in new line, start from where the > previous line of comment ends. > > 6. > /* > * lazy_scan_heap() -- scan an open heap relation > * > * This routine prunes each page in the > heap, which will among other > * things truncate dead tuples to dead line pointers, defragment the > * > page, and set commit status bits (see heap_page_prune). It also builds > * lists of dead > tuples and pages with free space, calculates statistics > * on the number of live tuples in the > heap, and marks pages as > * all-visible if appropriate. > > Modify above function header as: > > all-visible, all-frozen > > 7. > lazy_scan_heap() > { > .. > > if (PageIsEmpty(page)) > { > empty_pages++; > freespace = > PageGetHeapFreeSpace(page); > > /* empty pages are always all-visible */ > if (!PageIsAllVisible(page)) > .. > } > > Don't we need to ensure that empty pages should get marked as > all-frozen? > > 8. > lazy_scan_heap() > { > .. > /* > * As of PostgreSQL 9.2, the visibility map bit should never be set if > * the page- > level bit is clear. However, it's possible that the bit > * got cleared after we checked it > and before we took the buffer > * content lock, so we must recheck before jumping to the conclusion > * that something bad has happened. > */ > else if (all_visible_according_to_vm > && !PageIsAllVisible(page) > && visibilitymap_test(onerel, blkno, &vmbuffer, > VISIBILITYMAP_ALL_VISIBLE)) > { > elog(WARNING, "page is not marked all-visible > but visibility map bit is set in relation \"%s\" page %u", > relname, blkno); > visibilitymap_clear(onerel, blkno, vmbuffer); > } > > /* > * > It's possible for the value returned by GetOldestXmin() to move > * backwards, so it's not wrong for > us to see tuples that appear to > * not be visible to everyone yet, while PD_ALL_VISIBLE is already > * set. The real safe xmin value never moves backwards, but > * GetOldestXmin() is > conservative and sometimes returns a value > * that's unnecessarily small, so if we see that > contradiction it just > * means that the tuples that we think are not visible to everyone yet > * actually are, and the PD_ALL_VISIBLE flag is correct. > * > * There should never > be dead tuples on a page with PD_ALL_VISIBLE > * set, however. > */ > else > if (PageIsAllVisible(page) && has_dead_tuples) > { > elog(WARNING, "page > containing dead tuples is marked as all-visible in relation \"%s\" page %u", > > relname, blkno); > PageClearAllVisible(page); > MarkBufferDirty(buf); > visibilitymap_clear(onerel, blkno, vmbuffer); > } > > .. > } > > I think both the above cases could happen for frozen state > as well, unless you think otherwise, we need similar handling > for frozen bit. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: