Re: [PATCHES] GIN improvements - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: [PATCHES] GIN improvements
Date
Msg-id 1229910966.2285.76.camel@jdavis
Whole thread Raw
In response to Re: [PATCHES] GIN improvements  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: [PATCHES] GIN improvements
List pgsql-hackers
On Fri, 2008-12-12 at 20:36 +0300, Teodor Sigaev wrote:
> Changes:
>   - added vacuum_delay_point() in gininsertcleanup
>   - add trigger to run vacuum by number of inserted tuples from
>     last vacuum. Number of inserted tuples represents number
>     of really inserted tuples in index and it is calculated as
>     tuples_inserted + tuples_updated - tuples_hot_updated.
>     Trigger fires on instuples > vac_base_thresh because search time is linear
>     on number of pending pages (tuples)

Hi,

Comments:

1.

You use something like the following in a few places:

START_CRIT_SECTION();
...
l = PageAddItem(...);
if (l == InvalidOffsetNumber)
    elog(ERROR, "failed to add item to index page in \"%s\"",
         RelationGetRelationName(index));

It's no use using ERROR, because it will turn into PANIC, which is
obviously unacceptable. It looks to me like those conditions can't
happen anyway, so it's probably better to add a comment explaining why
it can't happen, and Assert().

2. It appears to be properly triggering autovacuum when only inserts
happen, so I think that issue is solved.

3. Simple performance result with autovacuum off:

create table random(i int[]);
insert into random select ARRAY[(random() * 10)::int, (random() *
10)::int, (random() * 10)::int, (random() * 10)::int, (random() *
10)::int, (random() * 10)::int, (random() * 10)::int, (random() *
10)::int, (random() * 10)::int, (random() * 10)::int] from
generate_series(1, 1000000);

\timing on
drop table foo;
create table foo(i int[]);
create index foogin on foo using gin (i);
insert into foo select i from random;
vacuum foo;

Without patch:
  INSERT: 71s
  VACUUM:  2s
  total:  73s

With patch:
  INSERT: 33s
  VACUUM: 12s
  total:  45s

So, there is a performance advantage. This was just a quick test to make
sure the numbers matched my expectations.

4. Heikki mentioned:
http://archives.postgresql.org/pgsql-hackers/2008-11/msg01832.php

"To make things worse, a query will fail if all the matching
fast-inserted tuples don't fit in the non-lossy tid bitmap."

That issue still remains, correct? Is there a resolution to that?

5. I attached a newer version merged with HEAD.

6. You defined:

#define GinPageHasFullRow(page) ( GinPageGetOpaque(page)->flags &
GIN_LIST_FULLROW )

But in many places you still do the same check without using that macro.
The macro has only one call site, so I suggest either removing the macro
entirely, or using it every place you check that flag.

7.  I don't understand this chunk of code:

ItemPointerData item = pos->item;

if ( scanGetCandidate(scan, pos) == false || !
ItemPointerEquals(&pos->item, &item) )
    elog(ERROR,"Could not process tuple");  /* XXX should not be here !
*/

How can (!ItemPointerEquals(&pos->item, &item)) ever happen?

And how can (scanGetCandidate(scan, pos) == false) ever happen? Should
that be an Assert() instead?

If those can happen during normal operation, then we need a better error
message there.

Regards,
    Jeff Davis

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: GIN index build speed
Next
From: ITAGAKI Takahiro
Date:
Subject: Re: generic reloptions improvement