Thread: GIN, partial matches, lossy bitmaps
While reading the GIN code, I just rediscovered that the GIN partial match support suffers from the same problem that I criticized the "fast insert" patch about, and searching the archives I found that I already complained about that back in April: http://archives.postgresql.org/pgsql-patches/2008-04/msg00157.php If I'm reading the code correctly, item pointers of all matching heap tuples are first collected into a TIDBitmap, and then amgetnext returns tuples from that one by one. If the bitmap becomes lossy, an error is thrown. gingetbitmap is a dummy implementation: it creates a new TIDBitmap and inserts all the tuples from the other TIDBitmap into it one by one, and then returns the new TIDBitmap. If we remove the support for regular, non-bitmap, index scans with GIN, that could be cleaned up as well. Even if we don't do that, gingetbitmap should not error when the bitmap becomes lossy, but just return the lossy bitmap. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, 2009-03-02 at 21:14 +0200, Heikki Linnakangas wrote: > If I'm reading the code correctly, item pointers of all matching heap > tuples are first collected into a TIDBitmap, and then amgetnext returns > tuples from that one by one. If the bitmap becomes lossy, an error is > thrown. gingetbitmap is a dummy implementation: it creates a new > TIDBitmap and inserts all the tuples from the other TIDBitmap into it > one by one, and then returns the new TIDBitmap. Do you think that might be the cause of the extra startup overhead that Robert Haas observed for bitmap scans? > If we remove the support for regular, non-bitmap, index scans with GIN, > that could be cleaned up as well. Even if we don't do that, gingetbitmap > should not error when the bitmap becomes lossy, but just return the > lossy bitmap. That sounds reasonable to me. Regards,Jeff Davis
On Mon, Mar 2, 2009 at 2:14 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > While reading the GIN code, I just rediscovered that the GIN partial match > support suffers from the same problem that I criticized the "fast insert" > patch about, and searching the archives I found that I already complained > about that back in April: > > http://archives.postgresql.org/pgsql-patches/2008-04/msg00157.php > > If I'm reading the code correctly, item pointers of all matching heap tuples > are first collected into a TIDBitmap, and then amgetnext returns tuples from > that one by one. If the bitmap becomes lossy, an error is thrown. > gingetbitmap is a dummy implementation: it creates a new TIDBitmap and > inserts all the tuples from the other TIDBitmap into it one by one, and then > returns the new TIDBitmap. The latest version of the path no longer does this - instead, it flushes the pending list to the main index if the bitmap becomes lossy. That strikes me as more tolerable than throwing an error, but I agree with your criticism: I'm not sure why we are insisting on using a TIDBitmap (which is designed to be lossy at times) in a situation where we actually can't tolerate lossiness. In fact, this was the main point of my original review of this patch. http://archives.postgresql.org/message-id/603c8f070902101859j91fb78eg7e0228afe8f2fd36@mail.gmail.com > If we remove the support for regular, non-bitmap, index scans with GIN, that > could be cleaned up as well. Even if we don't do that, gingetbitmap should > not error when the bitmap becomes lossy, but just return the lossy bitmap. Make sense to me. ...Robert
On Mon, Mar 2, 2009 at 3:02 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Mon, 2009-03-02 at 21:14 +0200, Heikki Linnakangas wrote: >> If I'm reading the code correctly, item pointers of all matching heap >> tuples are first collected into a TIDBitmap, and then amgetnext returns >> tuples from that one by one. If the bitmap becomes lossy, an error is >> thrown. gingetbitmap is a dummy implementation: it creates a new >> TIDBitmap and inserts all the tuples from the other TIDBitmap into it >> one by one, and then returns the new TIDBitmap. > > Do you think that might be the cause of the extra startup overhead that > Robert Haas observed for bitmap scans? I don't think this is the same thing. My point was that an index scan wins big time over a bitmap index scan when the index scan doesn't need to be run to completion - that is, when the query is a semi-join or an anti-join, or when using LIMIT without ORDER BY. This is true with or without Teodor's patch, and is the reason why I'm not sure that removing index scan support is such a great idea. ...Robert
> If we remove the support for regular, non-bitmap, index scans with GIN, > that could be cleaned up as well. Even if we don't do that, gingetbitmap > should not error when the bitmap becomes lossy, but just return the > lossy bitmap. Changes since 28.2 (http://archives.postgresql.org/message-id/499B0FFA.8040608@sigaev.ru) - fixes/changes pointed by Robert (http://archives.postgresql.org/pgsql-hackers/2009-02/msg00987.php) - gingetbitmap will never throw error about lossiness of bitmap, it will return lossy bitmap even it was a prefix search. - remove tbm_check_tuple/tbm_has_lossy/tbm_max_non_lossy methods because they become unused - add new method tbm_add_page(TIDBitmap*, BlockNumber) to add the whole page to the TIDBitmap. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
Teodor Sigaev <teodor@sigaev.ru> writes: > Changes since 28.2 > (http://archives.postgresql.org/message-id/499B0FFA.8040608@sigaev.ru) > - fixes/changes pointed by Robert > (http://archives.postgresql.org/pgsql-hackers/2009-02/msg00987.php) > - gingetbitmap will never throw error about lossiness of bitmap, it will return > lossy bitmap even it was a prefix search. > - remove tbm_check_tuple/tbm_has_lossy/tbm_max_non_lossy methods because they > become unused > - add new method tbm_add_page(TIDBitmap*, BlockNumber) to add the whole page to > the TIDBitmap. I cleaned up and applied the planner part of this, since that seems reasonably useful in its own right for experimental index AMs, regardless of where we settle out for GIN. (The "cleanup" mostly consisted of fixing it to not make extra calls to find_usable_indexes --- that's an expensive function, and there's no very good reason to run it another time rather than separating out the indexes afterwards.) Attached is the remainder of the patch with relatively minor fixes. The main change I made is to get rid of the changes in gincostestimate; I agree with Robert that it's probably inappropriate to consider the current pending-list size during planning. I haven't really reviewed any of the rest of it; this is just to have a clean patch against HEAD. regards, tom lane
Attachment
On Thu, Mar 5, 2009 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Attached is the remainder of the patch with relatively minor fixes. > The main change I made is to get rid of the changes in gincostestimate; > I agree with Robert that it's probably inappropriate to consider the > current pending-list size during planning. I haven't really reviewed > any of the rest of it; this is just to have a clean patch against HEAD. The changes to config.sgml are not good English and contain typographical errors. It could also be a bit more informatiave, maybe something like: This parameter also specifies the number of insert or updated tuples needed to trigger <command>VACUUM</> on a <acronym>GIN</acronym> index. <acronym>GIN</acronym> indexes require <command>VACUUM</> after insert or update operations because newly inserted tuples are initially stored in an unsorted pending list. I still think removing index scans entirely is short-sighted - but I may be outvoted (then again, no one other than Tom has really expressed an opinion one way or the other, and I initially agreed with him until I thought about the performance aspects some more). ...Robert
On Thu, 5 Mar 2009, Robert Haas wrote: > On Thu, Mar 5, 2009 at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Attached is the remainder of the patch with relatively minor fixes. >> The main change I made is to get rid of the changes in gincostestimate; >> I agree with Robert that it's probably inappropriate to consider the >> current pending-list size during planning. I haven't really reviewed >> any of the rest of it; this is just to have a clean patch against HEAD. > > The changes to config.sgml are not good English and contain > typographical errors. It could also be a bit more informatiave, maybe > something like: > > This parameter also specifies the number of insert or updated tuples > needed to trigger <command>VACUUM</> on a <acronym>GIN</acronym> > index. <acronym>GIN</acronym> indexes require <command>VACUUM</> > after insert or update operations because newly inserted tuples are > initially stored in an unsorted pending list. thanks, will update docs. > > I still think removing index scans entirely is short-sighted - but I > may be outvoted (then again, no one other than Tom has really > expressed an opinion one way or the other, and I initially agreed with > him until I thought about the performance aspects some more). I'm also wonder if we're on the right way, since the only serious issue with indexscan was possible problem with slaves, but read-only slaves delayed to 8.5, so this is not an issue now. In 8.5 development cycle we'll certainly return to this issue, so why do we disable index scan for 8.4 ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > I'm also wonder if we're on the right way, since the only serious > issue with indexscan was possible problem with slaves, but read-only slaves > delayed to 8.5, so this is not an issue now. In 8.5 development cycle we'll > certainly return to this issue, so why do we disable index scan for 8.4 ? So that we have a trouble-free feature in 8.4. I have no confidence in the solution that was proposed, and am more than willing to accept the loss of plain-indexscan support to avoid the risk of worse bugs. regards, tom lane
I wrote: > Attached is the remainder of the patch with relatively minor fixes. > The main change I made is to get rid of the changes in gincostestimate; > I agree with Robert that it's probably inappropriate to consider the > current pending-list size during planning. I haven't really reviewed > any of the rest of it; this is just to have a clean patch against HEAD. Is that patch (fast_insert_gin-0.30.gz) still the latest version of the fast-insert patch? Has anyone else done any further work? regards, tom lane