Thread: Dead Space Map version 3 (simplified)

Dead Space Map version 3 (simplified)

From
ITAGAKI Takahiro
Date:
Attached is an updated DSM patch. I've left the core function of DSM only
and dropped other complicated features in this release.

VACUUM finishs faster with the patch, but it's obvious... DSM vacuum
sweeps only pages that have many dead tuples and leave some of them
after vacuum.

I'll examine the sweep behavior and the performance from now.


* Features
  - DSM tracks pages worth vacuuming using 1bit/page bit.
    The threshold is two dead tuples or 2kB of deadspaces.
  - DSM is constructed at page flush. Almost of the works are done by
    bgwriter if it is properly configured.
  - 'VACUUM' command uses DSM. 'VACUUM ALL' always scans all pages.
  - This is including n_dead_tuples statistics fix.
      http://momjian.us/mhonarc/patches/msg00002.html

* Configuration
  - max_dsm_relations (=1000)
        Counterpart to max_fsm_relations, but count tables only;
        Indexes are not tracked by DSM.
  - max_dsm_pages (=1024000)
        Counterpart to max_dsm_pages. Default values are configurated to
        5 times of max_fsm_pages at initdb.
  - min_dsm_target (=8MB)
        Minimum size of tables of which dead space is tracked
        to avoid tracking small tables, including system catalogs.

* Limitation
  - XID-wraparound vacuum is still required. VACUUM with DSM cannot
    update relfrozenxid, so we sometimes needs full-scan.
  - No recovery support. All contents of DSM and FSM are lost on crash.
  - DSM uses fixed size memory allocated at server start. We cannot change
    the value on-the-fly. If we want the feature, we need something like
    shared-memory-allocator or swap-supported memory management module.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment

Re: Dead Space Map version 3 (simplified)

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


ITAGAKI Takahiro wrote:
> Attached is an updated DSM patch. I've left the core function of DSM only
> and dropped other complicated features in this release.
>
> VACUUM finishs faster with the patch, but it's obvious... DSM vacuum
> sweeps only pages that have many dead tuples and leave some of them
> after vacuum.
>
> I'll examine the sweep behavior and the performance from now.
>
>
> * Features
>   - DSM tracks pages worth vacuuming using 1bit/page bit.
>     The threshold is two dead tuples or 2kB of deadspaces.
>   - DSM is constructed at page flush. Almost of the works are done by
>     bgwriter if it is properly configured.
>   - 'VACUUM' command uses DSM. 'VACUUM ALL' always scans all pages.
>   - This is including n_dead_tuples statistics fix.
>       http://momjian.us/mhonarc/patches/msg00002.html
>
> * Configuration
>   - max_dsm_relations (=1000)
>         Counterpart to max_fsm_relations, but count tables only;
>         Indexes are not tracked by DSM.
>   - max_dsm_pages (=1024000)
>         Counterpart to max_dsm_pages. Default values are configurated to
>         5 times of max_fsm_pages at initdb.
>   - min_dsm_target (=8MB)
>         Minimum size of tables of which dead space is tracked
>         to avoid tracking small tables, including system catalogs.
>
> * Limitation
>   - XID-wraparound vacuum is still required. VACUUM with DSM cannot
>     update relfrozenxid, so we sometimes needs full-scan.
>   - No recovery support. All contents of DSM and FSM are lost on crash.
>   - DSM uses fixed size memory allocated at server start. We cannot change
>     the value on-the-fly. If we want the feature, we need something like
>     shared-memory-allocator or swap-supported memory management module.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Dead Space Map version 3 (simplified)

From
"Pavan Deolasee"
Date:

On 3/30/07, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
Attached is an updated DSM patch. I've left the core function of DSM only
and dropped other complicated features in this release.


I was testing this patch when got this server crash. The patch is applied
on the current CVS HEAD. I thought you would be interested in this.

The patch worked for smaller scaling factor and its reproducible.

Test: pgbench -s 90 -i -F 95 postgres

Stack:

(gdb) bt
#0  0x001d37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00213955 in raise () from /lib/tls/libc.so.6
#2  0x00215319 in abort () from /lib/tls/libc.so.6
#3  0x082dc04f in ExceptionalCondition (conditionName=0x83a7ad7 "!(victim)", errorType=0x83a7622 "FailedAssertion",
    fileName=0x83a7487 "deadspace.c", lineNumber=1080) at assert.c:51
#4  0x0821eb29 in dsm_create_chunk (dsmrel=0xb7bcd744, key=0xbff589c0) at deadspace.c:1080
#5  0x0821d473 in dsm_record_state (rnode=0xaee02698, pageno=98304, state=DSM_LOW) at deadspace.c:333
#6  0x0821d29e in DegradeDeadSpaceState (rel=0xaee02698, buffer=10645) at deadspace.c:254
#7  0x0817b542 in lazy_scan_heap (onerel=0xaee02698, vacrelstats=0x9f5f2e0, Irel=0x9f5f4dc, nindexes=1, iter=0x9f5f57c)
    at vacuumlazy.c:586
#8  0x0817a733 in lazy_vacuum_rel (onerel=0xaee02698, vacstmt=0x9f39c94) at vacuumlazy.c:209
#9  0x08174e5c in vacuum_rel (relid=16388, vacstmt=0x9f39c94, expected_relkind=114 'r') at vacuum.c:1107
#10 0x0817421c in vacuum (vacstmt=0x9f39c94, relids=0x0, isTopLevel=1 '\001') at vacuum.c:401
#11 0x0823d90b in ProcessUtility (parsetree=0x9f39c94, queryString=0x9f62f94 "vacuum analyze", params=0x0,
    isTopLevel=1 '\001', dest=0x9f39cf0, completionTag=0xbff5a040 "") at utility.c:929
#12 0x0823bdd6 in PortalRunUtility (portal=0x9f60f8c, utilityStmt=0x9f39c94, isTopLevel=1 '\001', dest=0x9f39cf0,
    completionTag=0xbff5a040 "") at pquery.c:1170
#13 0x0823bf0a in PortalRunMulti (portal=0x9f60f8c, isTopLevel=1 '\001', dest=0x9f39cf0, altdest=0x9f39cf0,
    completionTag=0xbff5a040 "") at pquery.c:1262
#14 0x0823b6df in PortalRun (portal=0x9f60f8c, count=2147483647, isTopLevel=1 '\001', dest=0x9f39cf0, altdest=0x9f39cf0,
    completionTag=0xbff5a040 "") at pquery.c:809
#15 0x082365df in exec_simple_query (query_string=0x9f399d4 "vacuum analyze") at postgres.c:956
#16 0x08239e43 in PostgresMain (argc=4, argv=0x9ecfc94, username=0x9ecfc64 "perf") at postgres.c:3503
#17 0x08204e84 in BackendRun (port=0x9ee3628) at postmaster.c:2987
#18 0x08204493 in BackendStartup (port=0x9ee3628) at postmaster.c:2614
#19 0x0820228b in ServerLoop () at postmaster.c:1214
#20 0x08201c66 in PostmasterMain (argc=3, argv=0x9ecdc50) at postmaster.c:967
#21 0x081a9e0b in main (argc=3, argv=0x9ecdc50) at main.c:188


--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Re: Dead Space Map version 3 (simplified)

From
ITAGAKI Takahiro
Date:
Thank you for reporting!

I noticed that I need more examination the case when dsm relations or
dsm chunks are exhausted. I'll do more tests for DSM.


"Pavan Deolasee" <pavan.deolasee@gmail.com> wrote:

> I was testing this patch when got this server crash. The patch is applied
> on the current CVS HEAD. I thought you would be interested in this.
>
> The patch worked for smaller scaling factor and its reproducible.
>
> Test: pgbench -s 90 -i -F 95 postgres
>
> #3  0x082dc04f in ExceptionalCondition (conditionName=0x83a7ad7 "!(victim)",
> errorType=0x83a7622 "FailedAssertion",
>     fileName=0x83a7487 "deadspace.c", lineNumber=1080) at assert.c:51

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: Dead Space Map version 3 (simplified)

From
Heikki Linnakangas
Date:
ITAGAKI Takahiro wrote:
> Attached is an updated DSM patch. I've left the core function of DSM only
> and dropped other complicated features in this release.

We discussed it a long time ago already, but I really wished the DSM
wouldn't need a fixed size shared memory area. It's one more thing the
DBA needs to tune manually. It also means we need to have an algorithm
for deciding what to keep in the DSM and what to leave out. And I don't
see a good way to extend the current approach to implement the
index-only-scans that we've been talking about, and the same goes for
recovery. :(

The way you update the DSM is quite interesting. When a page is dirtied,
the BM_DSM_DIRTY flag is set in the buffer descriptor. The corresponding
bit in the DSM is set lazily in FlushBuffer whenever BM_DSM_DIRTY is
set. That's a clever way to avoid contention on updates. But does it
work for tables that have a small hot part that's updated very
frequently? That's exactly the scenario where the DSM is the most
useful. Hot pages stay in the buffer cache because they're frequently
accessed, which means that FlushBuffer isn't getting called for them and
the bits in the DSM aren't getting set until checkpoint. This could lead
to unnecessary bloating of the hot part. A straightforward fix would be
to scan the buffer cache for buffers marked with BM_DSM_DIRTY to update
the DSM before starting the vacuum scan.

It might not be a problem in practice, but it bothers me that the DSM
isn't 100% accurate. You end up having a page with dead tuples on it
marked as non-dirty in the DSM at least when a page is vacuumed but
there's some RECENTLY_DEAD tuples on it that become dead later on. There
might be other scenarios as well.

If I'm reading the code correctly, DSM makes no attempt to keep the
chunks ordered by block number. If that's the case, vacuum needs to be
modified because it currently relies on the fact that blocks are scanned
and the dead tuple list is therefore populated in order.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Dead Space Map version 3 (simplified)

From
Hiroki Kataoka
Date:
Heikki Linnakangas wrote:
> The way you update the DSM is quite interesting. When a page is dirtied,
> the BM_DSM_DIRTY flag is set in the buffer descriptor. The corresponding
> bit in the DSM is set lazily in FlushBuffer whenever BM_DSM_DIRTY is
> set. That's a clever way to avoid contention on updates. But does it
> work for tables that have a small hot part that's updated very
> frequently?

I think there is no problem.  Bloating will make pages including the
unnecessary area which will not be accessed.  Soon, those pages will be
registered into DSM.

Or, though it expands however, do you assume accessing all pages equally?

--
Hiroki Kataoka <kataoka@interwiz.jp>

Re: Dead Space Map version 3 (simplified)

From
Gregory Stark
Date:
"Hiroki Kataoka" <kataoka@interwiz.jp> writes:

> I think there is no problem.  Bloating will make pages including the
> unnecessary area which will not be accessed.  Soon, those pages will be
> registered into DSM.

Except the whole point of the DSM is to let us vacuum those pages *before*
that happens...

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Dead Space Map version 3 (simplified)

From
Hiroki Kataoka
Date:
Gregory Stark wrote:
> "Hiroki Kataoka" <kataoka@interwiz.jp> writes:
>
>> I think there is no problem.  Bloating will make pages including the
>> unnecessary area which will not be accessed.  Soon, those pages will be
>> registered into DSM.
>
> Except the whole point of the DSM is to let us vacuum those pages *before*
> that happens...

You are right.  However, expecting perfection will often lose
performance.  Delaying processing to some extent leads to performance.

Even if hot page is not vacuumed, it does not mean generating dead
tuples boundlessly.  About one hot page, the quantity of dead tuple
which continues existing unnecessarily is at most 1 page or its extent.
  Also that page is soon registered into DSM by checkpoint like fail-safe.

Isn't some compromise need as first version of DSM vacuum?

--
Hiroki Kataoka <kataoka@interwiz.jp>

Re: Dead Space Map version 3 (simplified)

From
Bruce Momjian
Date:
This needs additional changes for memory mangement and we don't have
time to do that for 8.3, Sorry:

This has been saved for the 8.4 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> Attached is an updated DSM patch. I've left the core function of DSM only
> and dropped other complicated features in this release.
>
> VACUUM finishs faster with the patch, but it's obvious... DSM vacuum
> sweeps only pages that have many dead tuples and leave some of them
> after vacuum.
>
> I'll examine the sweep behavior and the performance from now.
>
>
> * Features
>   - DSM tracks pages worth vacuuming using 1bit/page bit.
>     The threshold is two dead tuples or 2kB of deadspaces.
>   - DSM is constructed at page flush. Almost of the works are done by
>     bgwriter if it is properly configured.
>   - 'VACUUM' command uses DSM. 'VACUUM ALL' always scans all pages.
>   - This is including n_dead_tuples statistics fix.
>       http://momjian.us/mhonarc/patches/msg00002.html
>
> * Configuration
>   - max_dsm_relations (=1000)
>         Counterpart to max_fsm_relations, but count tables only;
>         Indexes are not tracked by DSM.
>   - max_dsm_pages (=1024000)
>         Counterpart to max_dsm_pages. Default values are configurated to
>         5 times of max_fsm_pages at initdb.
>   - min_dsm_target (=8MB)
>         Minimum size of tables of which dead space is tracked
>         to avoid tracking small tables, including system catalogs.
>
> * Limitation
>   - XID-wraparound vacuum is still required. VACUUM with DSM cannot
>     update relfrozenxid, so we sometimes needs full-scan.
>   - No recovery support. All contents of DSM and FSM are lost on crash.
>   - DSM uses fixed size memory allocated at server start. We cannot change
>     the value on-the-fly. If we want the feature, we need something like
>     shared-memory-allocator or swap-supported memory management module.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +