Re: GIN pending list pages not recycled promptly (was Re: GIN improvements part 1: additional information) - Mailing list pgsql-hackers

From Amit Langote
Subject Re: GIN pending list pages not recycled promptly (was Re: GIN improvements part 1: additional information)
Date
Msg-id CA+HiwqFm7D2Vrpw4byLERS4306Btd4E6v6yq_MuKyiV1OxZs1g@mail.gmail.com
Whole thread Raw
In response to Re: GIN pending list pages not recycled promptly (was Re: GIN improvements part 1: additional information)  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
On Thu, Jun 19, 2014 at 2:09 PM, Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Jan 22, 2014 at 9:12 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>>
>> I think we need to add the deleted pages to the FSM more aggressively.
>>
>> I tried simply adding calls to RecordFreeIndexPage, after the list pages
>> have been marked as deleted, but unfortunately that didn't help. The problem
>> is that the FSM is organized into a three-level tree, and
>> RecordFreeIndexPage only updates the bottom level. The upper levels are not
>> updated until the FSM is vacuumed, so the pages are still not visible to
>> GetFreeIndexPage calls until next vacuum. The simplest fix would be to add a
>> call to IndexFreeSpaceMapVacuum after flushing the pending list, per
>> attached patch. I'm slightly worried about the performance impact of the
>> IndexFreeSpaceMapVacuum() call. It scans the whole FSM of the index, which
>> isn't exactly free. So perhaps we should teach RecordFreeIndexPage to update
>> the upper levels of the FSM in a retail-fashion instead.
>>
>
> I wonder if you pursued this further?
>
> You recently added a number of TODO items related to GIN index; is it
> worth adding this to the list?
>

In fact, I forgot to mention that I tried your patch and it seems to
be working for the particular example I am working with using pg_bigm
(bigram) (indexed data not so realistic maybe).

postgres=# CREATE TABLE test(contents text);
CREATE TABLE

postgres=# create or replace function rnd_str(length integer) returns text as
$$
declare chars text[] :=

'{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,
あ, い, う, え, お, か, き, く, け, こ, さ, し, す, せ, そ, た, ち, つ, て, と, な, に, ぬ,
ね, の, は, ひ, ふ, へ, ほ, ま, み, む, め, も, や, ゆ, よ, ら, り, る, れ, ろ, わ, ゐ, ゑ,
を}'; result text := ''; i integer := 0; arr_len integer;
begin chars := array_append(chars, ' '); arr_len := array_length(chars, 1); if length < 0 then   raise exception 'Given
lengthcannot be less than 0'; end if; for i in 1..length loop   result := result || chars[1+random()*(arr_len - 1)];
endloop; return result; 
end;
$$ language plpgsql;
CREATE FUNCTION


-- HEAD

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000

Time: 76296.641 ms

postgres=# SELECT pg_size_pretty(pg_table_size('test'));pg_size_pretty
----------------49 MB
(1 row)

postgres=# CREATE INDEX test_bigm_idx ON test USING GIN (contents gin_bigm_ops);
CREATE INDEX
Time: 50517.912 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));pg_size_pretty
----------------118 MB
(1 row)

postgres=# TRUNCATE test;
TRUNCATE TABLE

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000
Time: 233369.366 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));pg_size_pretty
----------------747 MB
(1 row)


-- Whereas, patched (gin-recycle-deleted-list-pages-1.patch) HEAD

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000
Time: 32808.779 ms

postgres=# CREATE INDEX test_bigm_idx ON test USING GIN (contents gin_bigm_ops);
CREATE INDEX
Time: 24490.945 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));pg_size_pretty
----------------118 MB
(1 row)

postgres=# TRUNCATE test;
TRUNCATE TABLE

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000
Time: 153878.163 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));pg_size_pretty
----------------119 MB
(1 row)

That sure looks good.

--
Amit



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Removing dependency to wsock32.lib when compiling code on WIndows
Next
From: Ian Barwick
Date:
Subject: Re: replication commands and log_statements