Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL - Mailing list pgsql-hackers

From Robert Haas
Subject Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
Date
Msg-id CA+TgmoYBnQfS4ACVmAq9i4KQ+EOJo053oHhsfBkWEnLNCbsu5Q@mail.gmail.com
Whole thread Raw
In response to Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL  (Bruce Momjian <bruce@momjian.us>)
Responses Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL  (Bruce Momjian <bruce@momjian.us>)
Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Wed, Nov 27, 2013 at 4:33 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote:
>> Amit Kapila wrote:
>> > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote:
>>
>> >> Surely VACUUM FULL should rebuild the visibility map, and make
>> >> tuples in the new relation all-visible, no?
>>
>> Certainly it seems odd to me that VACUUM FULL leaves the the table
>> in a less-well maintained state in terms of visibility than a
>> "normal" vacuum. VACUUM FULL should not need to be followed by
>> another VACUUM.
>>
>> > I think it cannot made all visible.
>>
>> I don't think all tuples in the relation are necessarily visible to
>> all transactions, but the ones which are should probably be flagged
>> that way.
>
> I have developed the attached proof-of-concept patch to fix the problem
> of having no visibility map after CLUSTER or VACUUM FULL.  I tested with
> these queries:
>
>         CREATE TABLE test(x INT PRIMARY KEY);
>         INSERT INTO test VALUES (1);
>         VACUUM FULL test; -- or CLUSTER
>         SELECT relfilenode FROM pg_class WHERE relname = 'test';
>          relfilenode
>         -------------
>                16399
>
> Then 'ls -l data/base/16384/16399*' to see the *_vm file.  I am not sure
> how to test that the vm contents are valid.
>
> This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior
> does not do writes through the shared buffer cache, as outlined in this
> C comment block:
>
>  * We can't use the normal heap_insert function to insert into the new
>  * heap, because heap_insert overwrites the visibility information.
>  * We use a special-purpose raw_heap_insert function instead, which
>  * is optimized for bulk inserting a lot of tuples, knowing that we have
>  * exclusive access to the heap.  raw_heap_insert builds new pages in
>  * local storage.  When a page is full, or at the end of the process,
>  * we insert it to WAL as a single record and then write it to disk
>  * directly through smgr.  Note, however, that any data sent to the new
>  * heap's TOAST table will go through the normal bufmgr.
>
> I originally tried to do this higher up in the stack but ran into
> problems because I couldn't access the new heap page so I had to do it
> at the non-shared-buffer page level.  I reused the lazy vacuum routines.
>
> I need to know this is the right approach, and need to know what things
> are wrong or missing.

The fact that you've needed to modify SetHintBits() to make this work
is pretty nasty.  I'm not exactly sure what to do about that, but it
doesn't seem good.

This patch also has the desirable but surprising consequence that hint
bits will be set as a side effect of update_page_vm(), which means
that you'd better do that BEFORE checksumming the page.

I wonder if we ought to mark each page as all-visible in
raw_heap_insert() when we first initialize it, and then clear the flag
when we come across a tuple that isn't all-visible.  We could try to
set hint bits on the tuple before placing it on the page, too, though
I'm not sure of the details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: new unicode table border styles for psql
Next
From: Bruce Momjian
Date:
Subject: Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block