Thread: GIN, partial matches, lossy bitmaps

GIN, partial matches, lossy bitmaps

From

Heikki Linnakangas

Date:

02 March 2009, 18:17:08

While reading the GIN code, I just rediscovered that the GIN partial 
match support suffers from the same problem that I criticized the "fast 
insert" patch about, and searching the archives I found that I already 
complained about that back in April:

http://archives.postgresql.org/pgsql-patches/2008-04/msg00157.php

If I'm reading the code correctly, item pointers of all matching heap 
tuples are first collected into a TIDBitmap, and then amgetnext returns 
tuples from that one by one. If the bitmap becomes lossy, an error is 
thrown. gingetbitmap is a dummy implementation: it creates a new 
TIDBitmap and inserts all the tuples from the other TIDBitmap into it 
one by one, and then returns the new TIDBitmap.

If we remove the support for regular, non-bitmap, index scans with GIN, 
that could be cleaned up as well. Even if we don't do that, gingetbitmap 
should not error when the bitmap becomes lossy, but just return the 
lossy bitmap.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: GIN, partial matches, lossy bitmaps

From

Jeff Davis

Date:

02 March 2009, 19:14:32

On Mon, 2009-03-02 at 21:14 +0200, Heikki Linnakangas wrote:
> If I'm reading the code correctly, item pointers of all matching heap 
> tuples are first collected into a TIDBitmap, and then amgetnext returns 
> tuples from that one by one. If the bitmap becomes lossy, an error is 
> thrown. gingetbitmap is a dummy implementation: it creates a new 
> TIDBitmap and inserts all the tuples from the other TIDBitmap into it 
> one by one, and then returns the new TIDBitmap.

Do you think that might be the cause of the extra startup overhead that
Robert Haas observed for bitmap scans?

> If we remove the support for regular, non-bitmap, index scans with GIN, 
> that could be cleaned up as well. Even if we don't do that, gingetbitmap 
> should not error when the bitmap becomes lossy, but just return the 
> lossy bitmap.

That sounds reasonable to me.

Regards,Jeff Davis

Re: GIN, partial matches, lossy bitmaps

From

Robert Haas

Date:

03 March 2009, 00:57:35

On Mon, Mar 2, 2009 at 2:14 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> While reading the GIN code, I just rediscovered that the GIN partial match
> support suffers from the same problem that I criticized the "fast insert"
> patch about, and searching the archives I found that I already complained
> about that back in April:
>
> http://archives.postgresql.org/pgsql-patches/2008-04/msg00157.php
>
> If I'm reading the code correctly, item pointers of all matching heap tuples
> are first collected into a TIDBitmap, and then amgetnext returns tuples from
> that one by one. If the bitmap becomes lossy, an error is thrown.
> gingetbitmap is a dummy implementation: it creates a new TIDBitmap and
> inserts all the tuples from the other TIDBitmap into it one by one, and then
> returns the new TIDBitmap.

The latest version of the path no longer does this - instead, it
flushes the pending list to the main index if the bitmap becomes
lossy.  That strikes me as more tolerable than throwing an error, but
I agree with your criticism: I'm not sure why we are insisting on
using a TIDBitmap (which is designed to be lossy at times) in a
situation where we actually can't tolerate lossiness.  In fact, this
was the main point of my original review of this patch.

http://archives.postgresql.org/message-id/603c8f070902101859j91fb78eg7e0228afe8f2fd36@mail.gmail.com

> If we remove the support for regular, non-bitmap, index scans with GIN, that
> could be cleaned up as well. Even if we don't do that, gingetbitmap should
> not error when the bitmap becomes lossy, but just return the lossy bitmap.

Make sense to me.

...Robert

Re: GIN, partial matches, lossy bitmaps

From

Robert Haas

Date:

03 March 2009, 01:03:32

On Mon, Mar 2, 2009 at 3:02 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2009-03-02 at 21:14 +0200, Heikki Linnakangas wrote:
>> If I'm reading the code correctly, item pointers of all matching heap
>> tuples are first collected into a TIDBitmap, and then amgetnext returns
>> tuples from that one by one. If the bitmap becomes lossy, an error is
>> thrown. gingetbitmap is a dummy implementation: it creates a new
>> TIDBitmap and inserts all the tuples from the other TIDBitmap into it
>> one by one, and then returns the new TIDBitmap.
>
> Do you think that might be the cause of the extra startup overhead that
> Robert Haas observed for bitmap scans?

I don't think this is the same thing.  My point was that an index scan
wins big time over a bitmap index scan when the index scan doesn't
need to be run to completion - that is, when the query is a semi-join
or an anti-join, or when using LIMIT without ORDER BY.
This is true with or without Teodor's patch, and is the reason why I'm
not sure that removing index scan support is such a great idea.

...Robert

Re: GIN, partial matches, lossy bitmaps

From

Teodor Sigaev

Date:

05 March 2009, 12:58:35

> If we remove the support for regular, non-bitmap, index scans with GIN,
> that could be cleaned up as well. Even if we don't do that, gingetbitmap
> should not error when the bitmap becomes lossy, but just return the
> lossy bitmap.
Changes since 28.2
(http://archives.postgresql.org/message-id/499B0FFA.8040608@sigaev.ru)

- fixes/changes pointed by Robert
(http://archives.postgresql.org/pgsql-hackers/2009-02/msg00987.php)
- gingetbitmap will never throw error about lossiness of bitmap, it will return
lossy bitmap even it was a prefix search.
- remove tbm_check_tuple/tbm_has_lossy/tbm_max_non_lossy methods because they
become unused
- add new method tbm_add_page(TIDBitmap*, BlockNumber) to add the whole page to
the TIDBitmap.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

fast_insert_gin-0.29.2.gz

Re: GIN, partial matches, lossy bitmaps

From

Tom Lane

Date:

05 March 2009, 22:35:45

Teodor Sigaev <teodor@sigaev.ru> writes:
> Changes since 28.2
> (http://archives.postgresql.org/message-id/499B0FFA.8040608@sigaev.ru)

> - fixes/changes pointed by Robert
> (http://archives.postgresql.org/pgsql-hackers/2009-02/msg00987.php)
> - gingetbitmap will never throw error about lossiness of bitmap, it will return
> lossy bitmap even it was a prefix search.
> - remove tbm_check_tuple/tbm_has_lossy/tbm_max_non_lossy methods because they
> become unused
> - add new method tbm_add_page(TIDBitmap*, BlockNumber) to add the whole page to
> the TIDBitmap.

I cleaned up and applied the planner part of this, since that seems
reasonably useful in its own right for experimental index AMs,
regardless of where we settle out for GIN.  (The "cleanup" mostly
consisted of fixing it to not make extra calls to find_usable_indexes
--- that's an expensive function, and there's no very good reason to
run it another time rather than separating out the indexes afterwards.)

Attached is the remainder of the patch with relatively minor fixes.
The main change I made is to get rid of the changes in gincostestimate;
I agree with Robert that it's probably inappropriate to consider the
current pending-list size during planning.  I haven't really reviewed
any of the rest of it; this is just to have a clean patch against HEAD.

            regards, tom lane

Attachment

msg-27106-2.bin

Re: GIN, partial matches, lossy bitmaps

From

Robert Haas

Date:

06 March 2009, 01:51:14

On Thu, Mar 5, 2009 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Attached is the remainder of the patch with relatively minor fixes.
> The main change I made is to get rid of the changes in gincostestimate;
> I agree with Robert that it's probably inappropriate to consider the
> current pending-list size during planning.  I haven't really reviewed
> any of the rest of it; this is just to have a clean patch against HEAD.

The changes to config.sgml are not good English and contain
typographical errors.  It could also be a bit more informatiave, maybe
something like:

This parameter also specifies the number of insert or updated tuples
needed to trigger <command>VACUUM</> on a <acronym>GIN</acronym>
index.   <acronym>GIN</acronym> indexes require <command>VACUUM</>
after insert or update operations because newly inserted tuples are
initially stored in an unsorted pending list.

I still think removing index scans entirely is short-sighted - but I
may be outvoted (then again, no one other than Tom has really
expressed an opinion one way or the other, and I initially agreed with
him until I thought about the performance aspects some more).

...Robert

Re: GIN, partial matches, lossy bitmaps

From

Oleg Bartunov

Date:

13 March 2009, 11:56:39

On Thu, 5 Mar 2009, Robert Haas wrote:

> On Thu, Mar 5, 2009 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Attached is the remainder of the patch with relatively minor fixes.
>> The main change I made is to get rid of the changes in gincostestimate;
>> I agree with Robert that it's probably inappropriate to consider the
>> current pending-list size during planning.  I haven't really reviewed
>> any of the rest of it; this is just to have a clean patch against HEAD.
>
> The changes to config.sgml are not good English and contain
> typographical errors.  It could also be a bit more informatiave, maybe
> something like:
>
> This parameter also specifies the number of insert or updated tuples
> needed to trigger <command>VACUUM</> on a <acronym>GIN</acronym>
> index.   <acronym>GIN</acronym> indexes require <command>VACUUM</>
> after insert or update operations because newly inserted tuples are
> initially stored in an unsorted pending list.

thanks, will update docs.

>
> I still think removing index scans entirely is short-sighted - but I
> may be outvoted (then again, no one other than Tom has really
> expressed an opinion one way or the other, and I initially agreed with
> him until I thought about the performance aspects some more).

I'm also wonder if we're on the right way, since the only serious
issue with indexscan was possible problem with slaves, but read-only slaves
delayed to 8.5, so this is not an issue now. In 8.5 development cycle we'll
certainly return to this issue, so why do we disable index scan for 8.4 ?
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: GIN, partial matches, lossy bitmaps

From

Tom Lane

Date:

13 March 2009, 12:25:45

Oleg Bartunov <oleg@sai.msu.su> writes:
> I'm also wonder if we're on the right way, since the only serious 
> issue with indexscan was possible problem with slaves, but read-only slaves
> delayed to 8.5, so this is not an issue now. In 8.5 development cycle we'll
> certainly return to this issue, so why do we disable index scan for 8.4 ?

So that we have a trouble-free feature in 8.4.  I have no confidence
in the solution that was proposed, and am more than willing to accept
the loss of plain-indexscan support to avoid the risk of worse bugs.
        regards, tom lane

Re: GIN, partial matches, lossy bitmaps

From

Tom Lane

Date:

23 March 2009, 15:38:13

I wrote:
> Attached is the remainder of the patch with relatively minor fixes.
> The main change I made is to get rid of the changes in gincostestimate;
> I agree with Robert that it's probably inappropriate to consider the
> current pending-list size during planning.  I haven't really reviewed
> any of the rest of it; this is just to have a clean patch against HEAD.

Is that patch (fast_insert_gin-0.30.gz) still the latest version of
the fast-insert patch?  Has anyone else done any further work?
        regards, tom lane