Thread: GIN fast insert
Hi there, we present two variants of GIN fast insert patch, since we're not sure what is a the best solution: v0.28.1 - remove disable cost in gincostestimate - per http://archives.postgresql.org/message-id/499466D2.4010808@sigaev.ru gingettuple could force cleanup of pending list if it got a lossy tidbitmap. If such cleanup occurs the gingettuple will rescan pending list again. v0.28.2 - remove disable cost in gincostestimate - per http://archives.postgresql.org/message-id/12795.1234379754@sss.pgh.pa.us AM can now have only one search method: amgettuple or amgetbitmap. - GIN now has only amgetbitmap interface -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On Tue, Feb 17, 2009 at 2:28 PM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Hi there, > > we present two variants of GIN fast insert patch, since we're not sure > what is a the best solution: > > v0.28.1 > - remove disable cost in gincostestimate > - per http://archives.postgresql.org/message-id/499466D2.4010808@sigaev.ru > gingettuple could force cleanup of pending list if it got a lossy > tidbitmap. > If such cleanup occurs the gingettuple will rescan pending list again. > > v0.28.2 > - remove disable cost in gincostestimate > - per > http://archives.postgresql.org/message-id/12795.1234379754@sss.pgh.pa.us > AM can now have only one search method: amgettuple or amgetbitmap. > - GIN now has only amgetbitmap interface I reviewed v0.28.1. I see that disable_cost is gone from gincostestimate, but you're still using the pending list to set costs, and I still think that's bogus. It seems clear that this is going to change much faster than plans are going to be invalidated, and if autovacuum is doing its job, the pending list won't get long enough to matter much anyway, right? I don't think this patch should be touching gincostestimate at all. I am thinking that it is may not be necessary to remove the gingettuple interface (as you did in v0.28.2). Forcing a cleanup of the pending list seems like a reasonable workaround. We don't expect this situation to come up frequently, so if the method we use to handle it is not terribly efficient, oh well. The one thing that concerns me is - what will happen in a hot standby environment, when that patch is committed? In that situation, I believe that we can't call modify any heap or index pages, so... Some other assorted minor comments on v0.28.1... 1. The description of the "fastupdate" reloption should be reworded for consistency with other options: Enables "fast update" feature for this GIN index 2. Why is this implemented as a string reloption rather than a boolean reloption? It seems like you want to model this on autovacuum_enabled. 3. Documentation wordsmithing. You have the following paragraph: As of <productname>PostgreSQL</productname> 8.4, this problem is alleviated by postponing most of the work until the next <command>VACUUM</>. Newly inserted index entries are temporarily stored in an unsorted list of pending entries <command>VACUUM</> inserts all pending entries into the main <acronym>GIN</acronym> index data structure, using the same bulk insert techniques used during initial index creation. This greatly improves <acronym>GIN</acronym> index update speed, even counting the additional vacuum overhead. Here is my proposed rewrite: As of <productname>PostgreSQL</productname> 8.4, <acronym>GIN</> is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. When the table is vacuumed, or in some cases when the pending list becomes too large, the entries are moved to the main <acronym>GIN</acronym> data structure using the same bulk insert techniques used during initial index creation. This greatly improves <acronym>GIN</acronym> index update speed, even counting the additional vacuum overhead. 4. More wordsmithing. In the following paragraph, you have: It's recommended to use properly-configured autovacuum with tables having <acronym>GIN</acronym> indexes, to keep this overhead to reasonable levels. I think it is clearer and more succinct to write simply: Proper use of autovacuum can minimize this problem. 5. In textsearch.sgml, you say that GIN indexes are moderately slower to update, but about 10x slower without fastupdate. Can we provide a real number in place of "moderately"? I don't know whether to think this means 20% or 2x. 6. One of the comment changes in startScanEntry is simply a correction of a typographical error ("deletion" for "deletition"). You might as well commit this change separately and remove it from this patch. 7. pg_stat_get_fresh_inserted_tuples. I am not crazy about the fact that we call this the pending list in some places, fast update in some places, and now, here, fresh tuples. Let's just call it fast insert tuples. 8. tbm_check_tuple. The opening curly brace should be uncuddled. The curly braces around wordnum = bitnum = 0 are superfluous. 9. gincostestimate. There are a lot of superfluous whitespace changes here, and some debugging code that obviously wasn't fully removed. 10. GinPageOpaqueData. Surely we can come up with a better name than GIN_LIST. This is yet another name for the same concept. Why not call this GIN_FAST_INSERT_LIST? 11. ginInsertCleanup. "Inserion" is a typo. Unfortunately, I don't understand all of this patch well enough to give it as thorough a going-over as it deserves, so my apologies for whatever I've missed. ...Robert
> and I still think that's bogus. It seems clear that this is going to > change much faster than plans are going to be invalidated, and if > autovacuum is doing its job, the pending list won't get long enough to > matter much anyway, right? I don't think this patch should be > touching gincostestimate at all. Why not? For any estimation function it's possible to invent workload which will change cost (or optimal plan) much faster than plan's invalidation. Gincostestimate depends on statistic on table, not on real index's state, and plan's invalidation will occur after analyze run. > I am thinking that it is may not be necessary to remove the > gingettuple interface (as you did in v0.28.2). Forcing a cleanup of > the pending list seems like a reasonable workaround. We don't expect > this situation to come up frequently, so if the method we use to Agree > handle it is not terribly efficient, oh well. The one thing that > concerns me is - what will happen in a hot standby environment, when > that patch is committed? In that situation, I believe that we can't > call modify any heap or index pages, so... I don't see a problems here, because indexes in postgres don't depend on any transaction's ids or modes as heap depends. WAL-logger works without that knowledge too. May be I missed something here or don't understand. Although heap's pages could be changed by setting commit-status bits on tuple even in read-only transaction, but that changes are not WAL-logged. That is correct for index's page too. > 1. The description of the "fastupdate" reloption should be reworded > for consistency with other options: > Enables "fast update" feature for this GIN index Fixed > > 2. Why is this implemented as a string reloption rather than a boolean > reloption? It seems like you want to model this on > autovacuum_enabled. Fixed > 3. Documentation wordsmithing. You have the following paragraph: Done > 4. More wordsmithing. In the following paragraph, you have: Done > 5. In textsearch.sgml, you say that GIN indexes are moderately slower > to update, but about 10x slower without fastupdate. Can we provide a > real number in place of "moderately"? I don't know whether to think > this means 20% or 2x. Will make exact measurement some later :) > 6. One of the comment changes in startScanEntry is simply a correction > of a typographical error ("deletion" for "deletition"). You might as > well commit this change separately and remove it from this patch. Sorry, I don't find what you mean. I found only "begining" (and fixed it) > 7. pg_stat_get_fresh_inserted_tuples. I am not crazy about the fact > that we call this the pending list in some places, fast update in some > places, and now, here, fresh tuples. Let's just call it fast insert > tuples. It's really inserted ("fresh") tuples in table since last vacuum. This number doesn't depend on existence of GIN indexes or its modes. > 8. tbm_check_tuple. The opening curly brace should be uncuddled. The > curly braces around wordnum = bitnum = 0 are superfluous. It's a copy/paste code from tbm_add_tuples, and I'd like if { ... } else { ... } much more than if .. else { } > > 9. gincostestimate. There are a lot of superfluous whitespace changes > here, and some debugging code that obviously wasn't fully removed. Cleaned. > 10. GinPageOpaqueData. Surely we can come up with a better name than > GIN_LIST. This is yet another name for the same concept. Why not call > this GIN_FAST_INSERT_LIST? Like other defines here - GIN_DATA, not a GIN_DATA_PAGE, GIN_LEAF - not a GIN_LEAF_TREE_PAGE. > > 11. ginInsertCleanup. "Inserion" is a typo. fixed > Unfortunately, I don't understand all of this patch well enough to > give it as thorough a going-over as it deserves, so my apologies for > whatever I've missed. Thank you a lot for your reviewing. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On Thu, Feb 19, 2009 at 8:36 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: >> and I still think that's bogus. It seems clear that this is going to >> change much faster than plans are going to be invalidated, and if >> autovacuum is doing its job, the pending list won't get long enough to >> matter much anyway, right? I don't think this patch should be >> touching gincostestimate at all. > > Why not? For any estimation function it's possible to invent workload which > will change cost (or optimal plan) much faster than plan's invalidation. > Gincostestimate depends on statistic on table, not on real index's state, > and plan's invalidation will occur after analyze run. Hrm, hum, maybe you're right. >> I am thinking that it is may not be necessary to remove the >> gingettuple interface (as you did in v0.28.2). Forcing a cleanup of >> the pending list seems like a reasonable workaround. We don't expect >> this situation to come up frequently, so if the method we use to > > Agree > >> handle it is not terribly efficient, oh well. The one thing that >> concerns me is - what will happen in a hot standby environment, when >> that patch is committed? In that situation, I believe that we can't >> call modify any heap or index pages, so... > > I don't see a problems here, because indexes in postgres don't depend on any > transaction's ids or modes as heap depends. WAL-logger works without that > knowledge too. May be I missed something here or don't understand. > > Although heap's pages could be changed by setting commit-status bits on > tuple even in read-only transaction, but that changes are not WAL-logged. > That is correct for index's page too. It would be helpful if Heikki or Simon could jump in here, but my understanding is that cleaning up the pending list is a read-write operation. I don't think we can do that on a hot standby server. ...Robert
Robert Haas wrote: > On Thu, Feb 19, 2009 at 8:36 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: >>> handle it is not terribly efficient, oh well. The one thing that >>> concerns me is - what will happen in a hot standby environment, when >>> that patch is committed? In that situation, I believe that we can't >>> call modify any heap or index pages, so... >> I don't see a problems here, because indexes in postgres don't depend on any >> transaction's ids or modes as heap depends. WAL-logger works without that >> knowledge too. May be I missed something here or don't understand. >> >> Although heap's pages could be changed by setting commit-status bits on >> tuple even in read-only transaction, but that changes are not WAL-logged. >> That is correct for index's page too. > > It would be helpful if Heikki or Simon could jump in here, but my > understanding is that cleaning up the pending list is a read-write > operation. I don't think we can do that on a hot standby server. Right, can't do that on a hot standby server. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> Right, can't do that on a hot standby server. Is anywhere applicable hot standby patch? Last version on wiki is 9g and it can't be applied cleanly. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev wrote: >> Right, can't do that on a hot standby server. > > Is anywhere applicable hot standby patch? Last version on wiki is 9g and > it can't be applied cleanly. The latest version is in Simon's git repository at: http://git.postgresql.org/?p=~sriggs/simon.git;a=shortlog;h=refs/heads/dev_hot_standby in the dev_hot_standby branch. I don't think he's posted an updated patch based on that work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2009-02-19 at 22:43 -0500, Robert Haas wrote: > > I don't see a problems here, because indexes in postgres don't > depend on any > > transaction's ids or modes as heap depends. WAL-logger works without > that > > knowledge too. May be I missed something here or don't understand. > > > > Although heap's pages could be changed by setting commit-status bits > on > > tuple even in read-only transaction, but that changes are not > WAL-logged. > > That is correct for index's page too. > > It would be helpful if Heikki or Simon could jump in here, but my > understanding is that cleaning up the pending list is a read-write > operation. I don't think we can do that on a hot standby server. >From reading the docs with the patch the pending list is merged into the main index when a VACUUM is performed. I (think I) can see that additions to the pending list are WAL logged, so that will work in Hot Standby. I also see ginEntryInsert() calls during the move from pending list to main index which means that is WAL logged also. AFAICS this *could* work during Hot Standby mode. Best check here http://wiki.postgresql.org/wiki/Hot_Standby#Usage rather than attempt to read the patch. Teodor, can you confirm * we WAL log the insert into the pending list * we WAL log the move from the pending list to the main index * that we maintain the pending list correctly during redo so that it can be accessed by index scans The main thing with Hot Standby is that we can't do any writes. So a pending list cannot change solely because of a gingettuple call on the *standby*. But that's easy to disable. If all the inserts happened on the primary node and all the reads happened on the standby, then pending list would never be cleaned up if the cleanup is triggered only by read. I would suggest that we trigger cleanup by read at threshold size X and trigger cleanup by insert at threshold size 5X. That avoids the strange case mentioned, but generally ensures only reads trigger cleanup. (But why do we want that??) I found many parts of the patch and docs quite confusing because of the way things are named. For me, this is a deferred or delayed insert technique to allow batching. I would prefer if everything used one description, rather than "fast", "pending", "delayed" etc. Personally, I see ginInsertCleanup() as a scheduled task unrelated to vacuum. Making the deferred tasks happen at vacuum time is just a convenient way of having a background task occur regularly. That's OK for now, but I would like to be able to request a background task without having to hook into AV. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, Feb 23, 2009 at 4:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> It would be helpful if Heikki or Simon could jump in here, but my >> understanding is that cleaning up the pending list is a read-write >> operation. I don't think we can do that on a hot standby server. > > >From reading the docs with the patch the pending list is merged into the > main index when a VACUUM is performed. I (think I) can see that > additions to the pending list are WAL logged, so that will work in Hot > Standby. I also see ginEntryInsert() calls during the move from pending > list to main index which means that is WAL logged also. AFAICS this > *could* work during Hot Standby mode. > Best check here http://wiki.postgresql.org/wiki/Hot_Standby#Usage > rather than attempt to read the patch. > Teodor, can you confirm > * we WAL log the insert into the pending list > * we WAL log the move from the pending list to the main index > * that we maintain the pending list correctly during redo so that it can > be accessed by index scans > > The main thing with Hot Standby is that we can't do any writes. So a > pending list cannot change solely because of a gingettuple call on the > *standby*. That's what I thought. Thanks for confirming. > But that's easy to disable. If all the inserts happened on > the primary node and all the reads happened on the standby, then pending > list would never be cleaned up if the cleanup is triggered only by read. No, because the inserts would trigger VACUUM on the primary. > I would suggest that we trigger cleanup by read at threshold size X and > trigger cleanup by insert at threshold size 5X. That avoids the strange > case mentioned, but generally ensures only reads trigger cleanup. (But > why do we want that??) I think that's actually not what we want. What we want is for VACUUM to deal with it. Unfortunately that's hard to guarantee since, for example, someone might turn autovacuum off. So the issue is what do we do when we're in the midst of an index scan and our TIDBitmap has become lossy. Right now, the answer is that we clean up the pending list from inside the index scan and then retry the index scan. I don't think that's going to work. I'm starting to think that the right thing to do here is to create a non-lossy option for TIDBitmap. Tom has been advocating just losing the index scan AM altogether, but that risks losing performance in cases where a LIMIT would have stopped the scan well prior to completion. > I found many parts of the patch and docs quite confusing because of the > way things are named. For me, this is a deferred or delayed insert > technique to allow batching. I would prefer if everything used one > description, rather than "fast", "pending", "delayed" etc. I mentioned this in my previous review (perhaps not quite so articulately) and I completely agree with you. It's clear enough reading the patch because you know that all the changes in the patch must be related to each other, but once it's applied it's going to be tough to figure out. > Personally, I see ginInsertCleanup() as a scheduled task unrelated to > vacuum. Making the deferred tasks happen at vacuum time is just a > convenient way of having a background task occur regularly. That's OK > for now, but I would like to be able to request a background task > without having to hook into AV. This has been discussed previously and I assume you will be submitting a patch at some point, since no one else has volunteered to implement it. I think autovacuum is the right way to handle this particular case because it is a cleanup operation that is not dependent on time but on write activity and hooks into more or less the same stats infrastructure, but I don't deny the existence of other cases that would benefit from a scheduler. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I'm starting to think that the right thing to do here is to create a > non-lossy option for TIDBitmap. Tom has been advocating just losing > the index scan AM altogether, but that risks losing performance in > cases where a LIMIT would have stopped the scan well prior to > completion. Actually, I'm going to *insist* that we lose the index AM scan altogether. There might be a possibility to put it back in 8.5 or later if anyone actually makes the case (with some evidence) that it's worth the trouble. But right now, what this patch needs is to be made to work reliably, and having a great deal of complexity added by an inessential feature is a good way to make sure it doesn't go in at all. regards, tom lane
On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I'm starting to think that the right thing to do here is to create a >> non-lossy option for TIDBitmap. Tom has been advocating just losing >> the index scan AM altogether, but that risks losing performance in >> cases where a LIMIT would have stopped the scan well prior to >> completion. > > Actually, I'm going to *insist* that we lose the index AM scan > altogether. There might be a possibility to put it back in 8.5 or later > if anyone actually makes the case (with some evidence) that it's worth > the trouble. But right now, what this patch needs is to be made to work > reliably, and having a great deal of complexity added by an inessential > feature is a good way to make sure it doesn't go in at all. Except that the "inessential" feature in question is a feature that currently WORKS, and I don't believe that the testing you've done is anywhere near sufficient to show that no one will be upset if it goes away. Without some convincing evidence to support that proposition, I think it would be better to postpone the whole patch to 8.5 and use that time to fix the problem, rather than slam something through now and then possibly find out that we've introduced a performance regression in cases that used to work well. In fact, as far as I can see, the performance testing that has been done on this patch to date is fairly one-sided. It focuses on the improvement to index build speed, which is a good thing to test, but I don't believe anyone has done any testing at all of the cases where the patch might HURT. Previously, the relevant case was where someone has done a bunch of fast-inserts, but autovacuum has not kicked in yet, or the threshold to trigger autovacuum has not been reached. Now we're adding to that every query that currently uses an index scan, which seemed to be a pretty common plan in the simple cases I tested. Maybe I'm worrying over nothing? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Actually, I'm going to *insist* that we lose the index AM scan >> altogether. > Except that the "inessential" feature in question is a feature that > currently WORKS, and I don't believe that the testing you've done is > anywhere near sufficient to show that no one will be upset if it goes > away. What feature is that --- the ability to get an undefined subset of rows quickly by using LIMIT without ORDER BY? Not much of a feature. > Without some convincing evidence to support that proposition, I > think it would be better to postpone the whole patch to 8.5 and use > that time to fix the problem, Wouldn't bother me any. We are way overdue for 8.4 already. regards, tom lane
On Mon, Feb 23, 2009 at 1:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 23, 2009 at 10:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Actually, I'm going to *insist* that we lose the index AM scan >>> altogether. > >> Except that the "inessential" feature in question is a feature that >> currently WORKS, and I don't believe that the testing you've done is >> anywhere near sufficient to show that no one will be upset if it goes >> away. > > What feature is that --- the ability to get an undefined subset of rows > quickly by using LIMIT without ORDER BY? Not much of a feature. That's a huge over-generalization of the effect of removing index scans, as I am sure you are well aware. It only took me about 5 minutes to come up with a test case against CVS HEAD where disabling index scans resulted in a significant dropoff in performance. Here it is: create table foo (id serial, x int[], primary key (id)); create index foo_gin on foo using gin (x); insert into foo (x) select array[random()*10000::integer, random()*10000::integer, random()*10000::integer, random()*10000::integer] from generate_series(1,10000); analyze foo; OK, now here's the query: select sum(1) from generate_series(1,10000) g left join foo on array[g] <@ x where x is null; On my system this takes about 45 ms to execute with default settings and about 90 ms to execute with index scan disabled. >> Without some convincing evidence to support that proposition, I >> think it would be better to postpone the whole patch to 8.5 and use >> that time to fix the problem, > > Wouldn't bother me any. We are way overdue for 8.4 already. I completely agree, but there are four patches on the CommitFest wiki that still need some committer attention before we close up shop: B-Tree Emulation for GIN Improve Performance of Multi-Batch Hash Join for Skewed Data Sets Proposal of PITR performance improvement SE-PostgreSQL Lite I have reviewed the second of these already and believe it's in pretty good shape and may review one or more of the others as time permits, especially if you or one of the other committers express an opinion that it would be helpful for me to do that and especially if you express an opinion on which of them it would be most helpful for. It is well past time to move "Reducing some DDL Locks to ShareLock" to committed and leave the unapplied portions for 8.5. As much as I would like to have the feature, it is probably also time to think about punting "Hot Standby" unless it's going to get committed RSN. At this point, we are definitely holding up both the release of 8.4 and development for 8.5. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On my system this takes about 45 ms to execute with default settings > and about 90 ms to execute with index scan disabled. [ shrug... ] That's well within my threshold of pain for this. In any case, it might be possible to buy some/all of that back with minor optimization effort on the bitmap-scan code paths; nobody's ever really bothered to profile that AFAIK. There is no real difference in the useful work (page and tuple fetches) getting done in the two cases, so there's no reason in principle for bitmap scan to be much slower than indexscan here. The LIMIT case is the only one I'm aware of where there's a fundamental reason that bitmap scan should be slower. regards, tom lane
On Tue, Feb 24, 2009 at 10:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On my system this takes about 45 ms to execute with default settings >> and about 90 ms to execute with index scan disabled. > > [ shrug... ] That's well within my threshold of pain for this. > In any case, it might be possible to buy some/all of that back with > minor optimization effort on the bitmap-scan code paths; nobody's > ever really bothered to profile that AFAIK. There is no real > difference in the useful work (page and tuple fetches) getting done > in the two cases, so there's no reason in principle for bitmap scan > to be much slower than indexscan here. The LIMIT case is the only > one I'm aware of where there's a fundamental reason that bitmap scan > should be slower. Uh, a semi or anti join stops as soon as one matching row is found, doesn't it? ISTM that a semi or anti join is in essence an iterated limit 1 clause. ...Robert
On Tue, 2009-02-24 at 00:18 -0500, Robert Haas wrote: > It only took me about 5 minutes to come up with a test case against CVS > HEAD where disabling index scans resulted in a significant dropoff in > performance. Here it is: On the other hand, Teodor showed a typical use case and a very substantial performance gain: http://archives.postgresql.org/message-id/497F4606.6070303@sigaev.ru My impression is that GIN is used almost entirely for full text search. Even though GIN is actually quite general (as the BTree patch shows), the current users we have to worry about are a fairly narrow group (correct me if I'm wrong here). I wonder how many people really use GIN with non-bitmap scans for some benefit? And even if the benefit exists, does the planner have a way to identify those cases reliably, or does it have to be done manually? Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Tue, 2009-02-24 at 00:18 -0500, Robert Haas wrote: >> It only took me about 5 minutes to come up with a test case against CVS >> HEAD where disabling index scans resulted in a significant dropoff in >> performance. Here it is: > On the other hand, Teodor showed a typical use case and a very > substantial performance gain: Yeah. Whatever we do here is a tradeoff (and whether Robert likes it or not, reliability and code maintainability weigh heavily in the tradeoff). > I wonder how many people really use GIN with non-bitmap scans for some > benefit? And even if the benefit exists, does the planner have a way to > identify those cases reliably, or does it have to be done manually? A relevant point there is that most of the estimator functions for GIN-amenable operators are just smoke and mirrors; so if the planner is making a good choice between indexscan and bitmapscan at all, it's mostly luck. This might get better someday, but not in 8.4. regards, tom lane
On Tue, Feb 24, 2009 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> On the other hand, Teodor showed a typical use case and a very >> substantial performance gain: > > Yeah. Whatever we do here is a tradeoff (and whether Robert likes it > or not, reliability and code maintainability weigh heavily in the > tradeoff). I have no problem with reliability or code maintainability and I'm not sure what I said that would give that impression. If the consensus of the group is that the performance loss from dropping index scans is not important, then I'm fine with that, especially if that consensus is reached in the context of an educated knowledge of what that performance loss is likely to be. To me, a 2x slowdown on two-table anti-join seems pretty bad, but I just work here. Perhaps nobody else thinks that a semi-join or anti-join against a GIN index is a plausible use case (like, find all of the words from the following list that do not appear in any document)? If everyone agrees that we don't care about that case (or about ORDER-BY-without-LIMIT, which is certainly less compelling), then go ahead and remove it. I have no horse in this race other than having been asked to review the patch, which I did. On the other hand, if a significant number of people think that it might be a bad idea to make that case significantly worse, then some redesign work is called for, and that may mean the patch needs to get bumped. My own opinion is that it is better to decide on the right design and then figure out which release that design can go into than it is to start by deciding this has to go into 8.4 and then figuring out what can be done in that period of time. I don't think there is any question that making GIN continue to support both index scans and bitmap index scans will make the code more complex, but how bad will it be? So far we've ruled out using the planner to prevent index scans when the pending list is long (because it's not reliable) and cleaning up the pending list during insert when needed (because it won't work with Hot Standby). We haven't decided what WILL work, apart from ripping out index scans altogether, so to some degree we're comparing against an unknown. >> I wonder how many people really use GIN with non-bitmap scans for some >> benefit? And even if the benefit exists, does the planner have a way to >> identify those cases reliably, or does it have to be done manually? > > A relevant point there is that most of the estimator functions for > GIN-amenable operators are just smoke and mirrors; so if the planner > is making a good choice between indexscan and bitmapscan at all, it's > mostly luck. This might get better someday, but not in 8.4. Based on the limited testing I've done thus far, it appears to pick an index scan for small numbers of rows and a bitmap index scan for larger number of rows. Index scans will have lower startup costs which can be valuable if you only need to scan part of the index (as in the semi and anti join cases). I haven't done enough testing to see if there is any benefit when scanning the whole index and only returning a few tuples. ...Robert
> Teodor, can you confirm > * we WAL log the insert into the pending list > * we WAL log the move from the pending list to the main index Yes, I can and I confirm > * that we maintain the pending list correctly during redo so that it can > be accessed by index scans Not sure about correct locking in ginxlog.c - but it should fixable rather easy. > The main thing with Hot Standby is that we can't do any writes. So a > pending list cannot change solely because of a gingettuple call on the > *standby*. But that's easy to disable. If all the inserts happened on Right. And suggest to increase work_mem. > the primary node and all the reads happened on the standby, then pending > list would never be cleaned up if the cleanup is triggered only by read. > I would suggest that we trigger cleanup by read at threshold size X and > trigger cleanup by insert at threshold size 5X. That avoids the strange threshold is not helpful here because for gininsert and gingettuple it depends on work_mem setting which could be very different on slave(s) and master. Next, it's possible that master never executes read-only queries (i.e. gingettuple will not be called) by application's design. This is a strong objection for invocation of cleanup from gingettuple. So, I'm inclined to Tom's idea about removing gingettuple interface although GIN with fastupdate=off doesn't affect discussed problem . > I found many parts of the patch and docs quite confusing because of the > way things are named. For me, this is a deferred or delayed insert > technique to allow batching. I would prefer if everything used one > description, rather than "fast", "pending", "delayed" etc. Terminology was taken from inverted index technique. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> it be? So far we've ruled out using the planner to prevent index > scans when the pending list is long (because it's not reliable) and > cleaning up the pending list during insert when needed (because it > won't work with Hot Standby). We haven't decided what WILL work, During insert it will work with Hot Standby because there is no any limitation for number of pages touched or WAL records. There is a problem with cleanup invoked by gingettuple - slave could not start cleanup process at all and hence it could emit an error if tidbitmap becomes lossy. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Thu, Feb 26, 2009 at 11:41 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: >> it be? So far we've ruled out using the planner to prevent index >> scans when the pending list is long (because it's not reliable) and >> cleaning up the pending list during insert when needed (because it >> won't work with Hot Standby). We haven't decided what WILL work, > > During insert it will work with Hot Standby because there is no any > limitation for number of pages touched or WAL records. There is a problem > with cleanup invoked by gingettuple - slave could not start cleanup process > at all and hence it could emit an error if tidbitmap becomes lossy. I didn't think about that option - might be reasonable. ...Robert
While I'm looking at this thing... Why is tbm_add_page() not coded as a simple wrapper around tbm_mark_page_lossy()? As coded, it sometimes forces a whole bunch of pages *other than* the target page to become lossy too, which I cannot see a reason for it to be doing. regards, tom lane
> Why is tbm_add_page() not coded as a simple wrapper around > tbm_mark_page_lossy()? As coded, it sometimes forces a whole bunch of > pages *other than* the target page to become lossy too, which I cannot > see a reason for it to be doing. [after digging in tidbitmap] Oops, I was wrong, I supposed that all pages in chunk should be lossy, but it's true only for chunk page. So, tbm_add_page() should only call tbm_mark_page_lossy()... -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: > Oops, I was wrong, I supposed that all pages in chunk should be lossy, but it's > true only for chunk page. So, tbm_add_page() should only call > tbm_mark_page_lossy()... OK, thanks, that's what I thought. I've changed it in the copy I'm editing here. I have another question. I added the following comment to ginInsertCleanup(); is it accurate? (If it isn't, I think the code is buggy ...) * This can be called concurrently by multiple backends, so it must cope.* On first glance it looks completely not concurrent-safeand not crash-safe* either. The reason it's okay is that multiple insertion of the same entry* is detectedand treated as a no-op by gininsert.c. If we crash after* posting entries to the main index and before removingthem from the* pending list, it's okay because when we redo the posting later on, nothing* bad will happen. Likewise,if two backends simultaneously try to post* a pending entry into the main index, one will succeed and one will do*nothing. We try to notice when someone else is a little bit ahead of* us in the process, but that's just to avoid wastingcycles. Only the* action of removing a page from the pending list really needs exclusive* lock. regards, tom lane
> ginInsertCleanup(); is it accurate? (If it isn't, I think Exactly, that is right. > > > * This can be called concurrently by multiple backends, so it must cope. > * On first glance it looks completely not concurrent-safe and not crash-safe > * either. The reason it's okay is that multiple insertion of the same entry > * is detected and treated as a no-op by gininsert.c. If we crash after > * posting entries to the main index and before removing them from the > * pending list, it's okay because when we redo the posting later on, nothing > * bad will happen. Likewise, if two backends simultaneously try to post > * a pending entry into the main index, one will succeed and one will do > * nothing. We try to notice when someone else is a little bit ahead of > * us in the process, but that's just to avoid wasting cycles. Only the > * action of removing a page from the pending list really needs exclusive > * lock. > > > regards, tom lane > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
I've committed this patch with additional editorialization --- mostly cosmetic changes, except for removing the stats-driven cleanup in favor of letting cleanups occur during auto-ANALYZE, as per my suggestion yesterday. regards, tom lane
On 3/24/09 1:18 PM, Tom Lane wrote: > I've committed this patch with additional editorialization --- mostly > cosmetic changes, except for removing the stats-driven cleanup in favor > of letting cleanups occur during auto-ANALYZE, as per my suggestion > yesterday. By my count, this was the last patch left for 8.4. No? --Josh
On Tue, Mar 24, 2009 at 04:55:47PM -0700, Josh Berkus wrote: > On 3/24/09 1:18 PM, Tom Lane wrote: >> I've committed this patch with additional editorialization --- mostly >> cosmetic changes, except for removing the stats-driven cleanup in favor >> of letting cleanups occur during auto-ANALYZE, as per my suggestion >> yesterday. > > By my count, this was the last patch left for 8.4. No? One more patch left according to the Commitfest wiki: B-Tree emulation for GIN. http://wiki.postgresql.org/wiki/CommitFestInProgress Cheers, David <whisper>beta</> -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate