Re: Vacuuming big btree indexes without pages with deleted items - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Vacuuming big btree indexes without pages with deleted items
Date
Msg-id 5519CCF2.9080006@BlueTreble.com
Whole thread Raw
In response to Vacuuming big btree indexes without pages with deleted items  (Vladimir Borodin <root@simply.name>)
Responses Re: Vacuuming big btree indexes without pages with deleted items
List pgsql-hackers
On 3/27/15 5:15 AM, Vladimir Borodin wrote:
> Hi all.
>
> I have described [0] a problem with delaying replicas after vacuuming a
> relation with big btree index. It stucks in replaying WAL record of
> type XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0):
>
> rmgr: Btree       len (rec/tot):     20/    52, tx:          0, lsn:
> 4115/56126DC0, prev 4115/56126D90, bkp: 0000, desc: vacuum: rel
> 1663/16420/16796; blk 31222118, lastBlockVacuumed 0
>
> Master writes this record to xlog in btvacuumscan [1] function after
> vacuuming of all index pages. And in case of no pages with deleted items
> xlog record would contain lastBlockVacuumed 0.
>
> In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed
> to last block of the index while applying this record because there is
> no api in the buffer manager to understand if the page is unpinned.
>
> So if the index is quite big (200+ GB in described case) it takes much
> time to do it. So the questions are:
>
> 1. Aren’t there still any api in buffer manager to understand that the
> page is not in shared_buffers without reading it?
I don't know offhand, but since XLogReadBufferExtended already has a 
mode argument it wouldn't be too hard to add it there.

> 2. Is it possible not to write to xlog record with lastBlockVacuumed 0
> in some cases? For example, in case of not deleting any pages.

Possibly, but that's much higher risk. Without studying it, if we wanted 
to mess around with that it might actually make more sense to XLOG a set 
of blkno's that got vacuumed, but I suspect that wouldn't be a win.

> Or maybe there are some better ways of improving this situation?
>
> [0]
> http://www.postgresql.org/message-id/FE82A9A7-0D52-41B5-A9ED-967F6927CB8A@simply.name
> [1]
>
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813
> [2]
>
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482
>
> --
> May the force be with you…
> https://simply.name
>


-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Ignoring entries generated by autoconf in code tree
Next
From: Michael Paquier
Date:
Subject: Re: Exposing PG_VERSION_NUM in pg_config