Re: [PATCHES] GIN improvements - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: [PATCHES] GIN improvements
Date
Msg-id 4974B002.3040202@sigaev.ru
Whole thread Raw
In response to Re: [PATCHES] GIN improvements  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: [PATCHES] GIN improvements  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Changes:
  Results of pernding list's scan now are placed directly in resulting
tidbitmap. This saves cycles for filtering results and reduce memory usage.
Also, it allows to not check losiness of tbm.


> Is this a 100% bulletproof solution, or is it still possible for a query
> to fail due to the pending list? It relies on the stats collector, so
> perhaps in rare cases it could still fail?
Yes :(

> Can you explain why the tbm must not be lossy?

The problem with lossy tbm has two aspects:
  - amgettuple interface hasn't possibility to work with page-wide result instead
    of exact ItemPointer. amgettuple can not return just a block number as
    amgetbitmap can.
  - Because of concurrent vacuum process: while we scan pending list, it's
    content could be transferred into regular structure of index and then we will
    find the same tuple twice. Again, amgettuple hasn't protection from that,
    only amgetbitmap has it. So, we need to filter results from regular GIN
    by results from pending list. ANd for filtering we can't use lossy tbm.

v0.21 prevents from that fail on call of gingetbitmap, because now all results
are collected in single resulting tidbitmap.



> Also, can you clarify why a large update can cause a problem? In the

If query looks like
UPDATE tbl SET col=... WHERE col ... and planner choose GIN indexscan over col
then there is a probability of increasing of pending list over non-lossy limit.


> previous discussion, you suggested that it force normal index inserts
> after a threshold based on work_mem:
>
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php

I see only two guaranteed solution of the problem:
- after limit is reached, force normal index inserts. One of the motivation of
patch was frequent question from users: why update of whole table with GIN index
is so slow? So this way will not resolve this question.
- after limit is reached, force cleanup of pending list by calling
gininsertcleanup. Not very good, because users sometimes will see a huge
execution time of simple insert. Although users who runs a huge update should be
satisfied.

I have difficulties in a choice of way. Seems to me, the better will be second
way: if user gets very long time of insertion then (auto)vacuum of his
installation should tweaked.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Fixes for compiler warnings
Next
From: "Brendan Jurd"
Date:
Subject: Re: number of connections