Re: GIN data corruption bug(s) in 9.6devel - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: GIN data corruption bug(s) in 9.6devel
Date
Msg-id 568B8F15.9000407@2ndquadrant.com
Whole thread Raw
In response to Re: GIN data corruption bug(s) in 9.6devel  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: GIN data corruption bug(s) in 9.6devel  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 12/23/2015 09:33 PM, Jeff Janes wrote:
> On Mon, Dec 21, 2015 at 11:51 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> On 12/21/2015 07:41 PM, Jeff Janes wrote:
>>>
>>> On Sat, Dec 19, 2015 at 3:19 PM, Tomas Vondra
>>> <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> ...
>>
>>>> So both patches seem to do the trick, but (2) is faster. Not sure
>>>> if this is expected. (BTW all the results are without asserts
>>>> enabled).
>>>
>>>
>>> Do you know what the size of the pending list was at the end of each
>>> test?
>>>
>>> I think last one may be faster because it left a large mess behind
>>> that someone needs to clean up later.
>>
>>
>> No. How do I measure it?
>
> pageinspect's gin_metapage_info, or pgstattuple's pgstatginindex

Hmmm, so this turns out not very useful, because at the end the data I 
get from gin_metapage_info is almost exactly the same for both patches 
(more details below).

>>
>>>
>>> Also, do you have the final size of the indexes in each case?
>>
>>
>> No, I haven't realized the patches do affect that, so I haven't measured it.
>
> There shouldn't be a difference between the two approaches (although I
> guess there could be if one left a larger pending list than the other,
> as pending lists is very space inefficient), but since you included
> 9.5 in your test I thought it would be interesting to see how either
> patched version under 9.6 compared to 9.5.

Well, turns out there's a quite significant difference, actually. The 
index sizes I get (quite stable after multiple runs):
   9.5 : 2428 MB   9.6 + alone cleanup : 730 MB   9.6 + pending lock : 488 MB

So that's quite a significant difference, I guess. The load duration for 
each version look like this:
   9.5                 : 1415 seconds   9.6 + alone cleanup : 1310 seconds   9.6 + pending lock  : 1380 seconds

I'd say I'm happy with sacrificing ~5% of time in exchange for ~35% 
reduction of index size.

The size of the index on 9.5 after VACUUM FULL (so pretty much the 
smallest index possible) is 440MB, which suggests the "pending lock" 
patch does a quite good job.

The gin_metapage_info at the end of one of the runs (pretty much all the 
runs look exactly the same) looks like this:
                  pending lock   alone cleanup      9.5
-------------------------------------------------------- pending_head                2               2   310460
pending_tail             338             345   310806 tail_free_size            812             812      812
n_pending_pages          330             339      347 n_pending_tuples         1003            1037     1059
n_total_pages              2               2        2 n_entry_pages               1               1        1
n_data_pages               0               0        0 n_entries                   0               0        0 version
                2               2        2
 

So almost no difference, except for the pending_* attributes, and even 
in that case the values are only different for 9.5 branch. Not sure what 
conclusion to draw from this - maybe it's necessary to collect the 
function input while the load is running (but that'd be tricky to 
process, I guess).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Vitaly Burovoy
Date:
Subject: Re: Extracting fields from 'infinity'::TIMESTAMP[TZ]
Next
From: "Shulgin, Oleksandr"
Date:
Subject: Inconsistent error handling in START_REPLICATION command