Home > mailing lists

Re: [PATCHES] GIN improvements - Mailing list pgsql-hackers

From	Teodor Sigaev
Subject	Re: [PATCHES] GIN improvements
Date	January 19, 2009 12:53:33
Msg-id	4974B002.3040202@sigaev.ru Whole thread Raw
In response to	Re: [PATCHES] GIN improvements (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: [PATCHES] GIN improvements
List	pgsql-hackers

Tree view

Changes:
  Results of pernding list's scan now are placed directly in resulting
tidbitmap. This saves cycles for filtering results and reduce memory usage.
Also, it allows to not check losiness of tbm.


> Is this a 100% bulletproof solution, or is it still possible for a query
> to fail due to the pending list? It relies on the stats collector, so
> perhaps in rare cases it could still fail?
Yes :(

> Can you explain why the tbm must not be lossy?

The problem with lossy tbm has two aspects:
  - amgettuple interface hasn't possibility to work with page-wide result instead
    of exact ItemPointer. amgettuple can not return just a block number as
    amgetbitmap can.
  - Because of concurrent vacuum process: while we scan pending list, it's
    content could be transferred into regular structure of index and then we will
    find the same tuple twice. Again, amgettuple hasn't protection from that,
    only amgetbitmap has it. So, we need to filter results from regular GIN
    by results from pending list. ANd for filtering we can't use lossy tbm.

v0.21 prevents from that fail on call of gingetbitmap, because now all results
are collected in single resulting tidbitmap.



> Also, can you clarify why a large update can cause a problem? In the

If query looks like
UPDATE tbl SET col=... WHERE col ... and planner choose GIN indexscan over col
then there is a probability of increasing of pending list over non-lossy limit.


> previous discussion, you suggested that it force normal index inserts
> after a threshold based on work_mem:
>
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00065.php

I see only two guaranteed solution of the problem:
- after limit is reached, force normal index inserts. One of the motivation of
patch was frequent question from users: why update of whole table with GIN index
is so slow? So this way will not resolve this question.
- after limit is reached, force cleanup of pending list by calling
gininsertcleanup. Not very good, because users sometimes will see a huge
execution time of simple insert. Although users who runs a huge update should be
satisfied.

I have difficulties in a choice of way. Seems to me, the better will be second
way: if user gets very long time of insertion then (auto)vacuum of his
installation should tweaked.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

fast_insert_gin-0.21.gz

pgsql-hackers by date:

From: Alvaro Herrera
Date: 19 January 2009, 12:38:48
Subject: Re: Fixes for compiler warnings

From: "Brendan Jurd"
Date: 19 January 2009, 12:57:54
Subject: Re: number of connections

Re: [PATCHES] GIN improvements - Mailing list pgsql-hackers

Attachment

Previous

Next