Thread: GIN fast insert
Jeff Davis asked me if I'd be willing to do a review of the GIN fast insert patch about two weeks ago, but I haven't actually had a chance to read through it in detail until tonight. I can't say I really know anything about GIN (though I did take this opportunity to RTM), so apologies in advance if my comments here are totally off base. My basic impression of this code is that it's trying unnaturally hard to avoid creating a lossy TIDBitmap, which seems pretty odd, considering that the whole point of TIDBitmap is, AFAICS, to degrade to lossy mode when the alternative is eating too much memory. I'm particularly dismayed by this hunk: + if ( ntids == NULL && tbm && tbm_has_lossy(tbm) ) + ereport(ERROR, + (errcode(ERRCODE_OUT_OF_MEMORY), + errmsg("not enough memory to store result of pending list or VACUUM table" ), + errhint("Increase the \"work_mem\" parameter."))); The definition of work_mem is the amount of memory that a hash table can use before spilling to disk, NOT the amount of memory a hash table can consume before arbitrarily failing. It's intended to be a soft limit which people can tune to get the best performance out of their system, not a hard limit that interrupts work. Using the limit this way will encourage people to set work_mem too high so that things don't crash, but that in turn will potentially lead to other bad behavior (like swapping). I see that there's already one other place in the GIN code that uses work_mem this way, but I don't think that's a good reason to add another one - I don't see any other place in the system that behaves this way, outside of GIN. I think this is related to the problems with gincostestimate() that Tom Lane was complaining about here: http://archives.postgresql.org/message-id/20441.1234209245@sss.pgh.pa.us I am not 100% sure I'm understanding this correctly, but I think the reason why gincostestimate() is so desperate to avoid index scans when the pending list is long is because it knows that scanFastInsert() will blow up if an index scan is actually attempted because of the aforementioned TIDBitmap problem. This seems unacceptably fragile. Faster insert performance is nice, but not if it means that my indices suddenly start switching themselves off (which is bad) or in the presence of cached plans, crashing (which is worse). I think this code needs to be somehow rewritten to make things degrade gracefully when the pending list is long - I'm not sure what the best way to do that is. Inventing a new data structure to store TIDs that is never lossy seems like it might work, but you'd have to think about what to do if it got too big. I think it's probably best to just go ahead and let it get arbitrarily long (since you have no other option besides crashing) and leave work_mem out of it. Instead, put a reloption it that controls the maximum length of the pending list. This is easy to tune: if you make it bigger, insert performance improves, but queries eat more memory. If you make it smaller, insert performance gets worse, but you bound the memory that queries use more tightly. The other problem with using work_mem is that it can vary between sessions, so one session happily stuffs a lot of data into the pending list and then another session can't scan the index because it has a lower work_mem setting. Using a reloption avoids that problem by making the whole thing symmetric, plus it gives you fine-grained control over the behavior on an index-by-index basis. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I think this is related to the problems with gincostestimate() that > Tom Lane was complaining about here: > http://archives.postgresql.org/message-id/20441.1234209245@sss.pgh.pa.us > I am not 100% sure I'm understanding this correctly, but I think the > reason why gincostestimate() is so desperate to avoid index scans when > the pending list is long is because it knows that scanFastInsert() > will blow up if an index scan is actually attempted because of the > aforementioned TIDBitmap problem. This seems unacceptably fragile. Yipes. If that's really the reason then I agree, it's a nonstarter. > I think this code needs to be somehow rewritten to make things degrade > gracefully when the pending list is long - I'm not sure what the best > way to do that is. Inventing a new data structure to store TIDs that > is never lossy seems like it might work, but you'd have to think about > what to do if it got too big. What would be wrong with letting it degrade to lossy? I suppose the reason it's trying to avoid that is to avoid having to recheck all the rows on that page when it comes time to do the index insertion; but surely having to do that is better than having arbitrary, unpredictable failure conditions. It strikes me that part of the issue here is that the behavior of this code is much better adapted to the bitmap-scan API than the traditional indexscan API. Since GIN doesn't support ordered scan anyway, I wonder whether it wouldn't be more sensible to simply allow it to not offer the traditional API. It should be easy to make the planner ignore plain indexscan plans for an AM that didn't support them. regards, tom lane
On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think this code needs to be somehow rewritten to make things degrade >> gracefully when the pending list is long - I'm not sure what the best >> way to do that is. Inventing a new data structure to store TIDs that >> is never lossy seems like it might work, but you'd have to think about >> what to do if it got too big. > > What would be wrong with letting it degrade to lossy? I suppose the > reason it's trying to avoid that is to avoid having to recheck all the > rows on that page when it comes time to do the index insertion; but > surely having to do that is better than having arbitrary, unpredictable > failure conditions. No, I don't think that's it. See here, beginning with "the problem with lossy tbm has two aspects": http://archives.postgresql.org/message-id/4974B002.3040202@sigaev.ru > It strikes me that part of the issue here is that the behavior of this > code is much better adapted to the bitmap-scan API than the traditional > indexscan API. Since GIN doesn't support ordered scan anyway, I wonder > whether it wouldn't be more sensible to simply allow it to not offer > the traditional API. It should be easy to make the planner ignore plain > indexscan plans for an AM that didn't support them. If that doesn't lead to a performance degradation, I think it would be a good idea. It certainly seems like it would allow this patch to be a LOT simpler, cleaner, and more robust. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It strikes me that part of the issue here is that the behavior of this >> code is much better adapted to the bitmap-scan API than the traditional >> indexscan API. Since GIN doesn't support ordered scan anyway, I wonder >> whether it wouldn't be more sensible to simply allow it to not offer >> the traditional API. It should be easy to make the planner ignore plain >> indexscan plans for an AM that didn't support them. > If that doesn't lead to a performance degradation, I think it would be > a good idea. For queries that select only a single index entry, there might be some speed degradation, but a quick test suggests it's in the single-digit-percentage range if everything's cached; and of course if you have to go to disk then the extra CPU cycles to push a bitmap around are lost in the noise. In any case, as a wise man once said "I can make this code arbitrarily fast, if it doesn't have to give the right answer". Blowing up in easily foreseeable circumstances isn't my idea of giving the right answer. regards, tom lane
On Tue, Feb 10, 2009 at 11:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > For queries that select only a single index entry, there might be some > speed degradation, but a quick test suggests it's in the > single-digit-percentage range if everything's cached; and of course if > you have to go to disk then the extra CPU cycles to push a bitmap around > are lost in the noise. > > In any case, as a wise man once said "I can make this code arbitrarily > fast, if it doesn't have to give the right answer". Blowing up in easily > foreseeable circumstances isn't my idea of giving the right answer. Sure, but dropping index-scan support is not the only way of fixing that problem. It has a certain elegance to it, but if it produces a perceptible slowdown, it may not be the best approach. ...Robert
>> What would be wrong with letting it degrade to lossy? I suppose the >> reason it's trying to avoid that is to avoid having to recheck all the >> rows on that page when it comes time to do the index insertion; but >> surely having to do that is better than having arbitrary, unpredictable >> failure conditions. > > No, I don't think that's it. See here, beginning with "the problem > with lossy tbm has two aspects": > > http://archives.postgresql.org/message-id/4974B002.3040202@sigaev.ru Right. Some comments to that points: - amgettuple interface hasn't possibility to work with page-wide result instead of exact ItemPointer. amgettuple can notreturn just a block number as amgetbitmap can. It's not so difficult to teach GIN to return ItemPointer one by one from pending list and eliminate this point. GIN will not collect matched ItemPointers in tidbitmap and will return them immediately. But: - Because of concurrent vacuum process: while we scan pending list, it's content could be transferred into regular structureof index and then we will find the same tuple twice. Again, amgettuple hasn't protection from that, only amgetbitmaphas it. So, we need to filter results from regular GIN by results from pending list. ANd for filtering we can'tuse lossy tbm. Again, we talk about amgettuple interface. We need to filter results from regular GIN by results from pending list. Now GIN does it by lookup matched ItemPointer in tidbitmap constructed from pending list. We could use ordered array of ItemPointers instead of tidbitmap, but I don't believe that it will take less memory. It's impossible to rescan pending list for two reasons: a) too slow, b) concurrent cleanup process (vacuum or insert now), because they could move tuples into regular structure. Is it acceptable to add option to tidbitmap which will forbid tidbitmap to become lossy? That removes disabling index scan in gincostestimate and checking of non-lossy tidbitmap. In current version, cleanup of pending list starts much earlier than non-lossy limit is reached in typical use-cases. Insertion process will start cleanup with at least one fired trigger: - number of heap tuples in pending list could produce lossy tidbitmap - total size of pendinglist is greater than work_mem. This trigger is developed to speedup bulk insertion (which is used in cleanup), because it will collect whole pending list in memory at once. And this trigger is more strict than first one because in typical use-case of GIN heap tuple is rather big. I believe that user could get GIN's error about work_mem only intentionally: - turn off autovacuum - set big work_mem - populate table with GIN index (by needed number of insertion) - prepare query which will return a lot of results (possibly, with seqscan=off because cost of scan of pending list grows fast) - decrease work_mem for at least ten times - execute query -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> I believe that user could get GIN's error about work_mem only intentionally: > - turn off autovacuum Meanwhile, in the other thread, we're having a discussion about people wanting to do exactly this on a database-wide basis during peak load hours... > - set big work_mem > - populate table with GIN index (by needed number of insertion) > - prepare query which will return a lot of results (possibly, with > seqscan=off because cost of scan of pending list grows fast) > - decrease work_mem for at least ten times > - execute query Why would the new work_mem need to be 10x smaller than the old work mem? ...Robert
Robert Haas wrote: >> I believe that user could get GIN's error about work_mem only intentionally: >> - turn off autovacuum > Meanwhile, in the other thread, we're having a discussion about people > wanting to do exactly this on a database-wide basis during peak load > hours... >> - decrease work_mem for at least ten times >> - execute query > Why would the new work_mem need to be 10x smaller than the old work mem? That is is way to get GIN's error emitted. Work_mem should be decreased to catch a chance to get lossy tidbitmap.
Teodor Sigaev <teodor@sigaev.ru> writes: > Robert Haas wrote: >> Why would the new work_mem need to be 10x smaller than the old work mem? > That is is way to get GIN's error emitted. Work_mem should be decreased > to catch a chance to get lossy tidbitmap. But it only has to be marginally lower, not 10x lower. And there are plenty of scenarios where different backends might be running with different work_mem settings. But the *real* problem is that you simply can not guarantee that someone doesn't increase the size of the pending list between the time that someone else commits to an indexscan plan and the time that they execute that plan. This scheme will result in random failures for concurrent inserts/selects, and that's not acceptable. What did you think of the idea of simply abandoning support for conventional indexscans in GIN? I agree that we could probably kluge something to make conventional scans still work reliably, but it seems to me that it'd be ugly, fragile, and quite possibly slow enough to not ever beat bitmap scans anyway. regards, tom lane
> But the *real* problem is that you simply can not guarantee that > someone doesn't increase the size of the pending list between the time If insertion process has bigger work_mem. Agree. > What did you think of the idea of simply abandoning support for > conventional indexscans in GIN? I agree that we could probably kluge > something to make conventional scans still work reliably, but it seems > to me that it'd be ugly, fragile, and quite possibly slow enough to not > ever beat bitmap scans anyway. I don't like this idea because it forbids conventional indexscans even with fastupdate=off. May readonly query change the index? Index doesn't use xmin/xmax/cmin/cmax anyhow, so it doesn't depend on transaction state. If so, gingettuple could make cleanup of pending list if it got lossy bitmap and repeat search. Although it could be slow but it will never produce a failures and it will cause very rare (and GIN could emit WARNING/NOTICE/LOG message). And this solution allows to remove disabling of indexscan in gincostestimate. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: >> What did you think of the idea of simply abandoning support for >> conventional indexscans in GIN? > I don't like this idea because it forbids conventional indexscans even with > fastupdate=off. So? Barring some evidence that there's a significant performance win from a conventional indexscan, this is a weak argument. AFAICS the only significant advantage of the conventional API is to support ordered scans, and GIN doesn't do that anyway. regards, tom lane
On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: >>> What did you think of the idea of simply abandoning support for >>> conventional indexscans in GIN? > >> I don't like this idea because it forbids conventional indexscans even with >> fastupdate=off. > > So? Barring some evidence that there's a significant performance win > from a conventional indexscan, this is a weak argument. AFAICS the only > significant advantage of the conventional API is to support ordered > scans, and GIN doesn't do that anyway. Wouldn't it force you to recheck all tuples on the page, instead of just rechecking the one of interest? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So? Barring some evidence that there's a significant performance win >> from a conventional indexscan, this is a weak argument. AFAICS the only >> significant advantage of the conventional API is to support ordered >> scans, and GIN doesn't do that anyway. > Wouldn't it force you to recheck all tuples on the page, instead of > just rechecking the one of interest? In the scenario at hand you'd have to do that anyway. Bear in mind that if the query is predicted to return more than a few rows, the planner is always going to pick bitmap scan anyway. So this whole issue is really only going to arise when you have a very bad rowcount prediction (or a very stale plan), leading to a choice of indexscan plan followed by enough rows actually returned to make the TID bitmap become lossy. That's certainly within the realm of possibility, particularly since we often don't have good estimators for GIN-compatible operators. But I think designing to squeeze every last bit of performance out of the case is a mistake. We should be satisfied to have correctness. In the end this is a tradeoff: how much complexity and loss of maintainability are we willing to accept to squeeze out a bit more performance? I'm leaning to the KISS end of that choice. The tests I did yesterday suggested to me that it would be difficult even to measure a performance gain from supporting conventional indexscan in GIN. IMHO the kinds of schemes that are being tossed around here are not remotely sane to choose if they don't lead to *big* wins. regards, tom lane
> So? Barring some evidence that there's a significant performance win > from a conventional indexscan, this is a weak argument. AFAICS the only > significant advantage of the conventional API is to support ordered > scans, and GIN doesn't do that anyway. What about SELECT ... AND EXISTS (SELECT ... t @@ query) ? But I don't believe that is popular use-case. In most cases, GIN is used with bitmap scan. Your emails are so convincing and I'll remove support amgettuple interface in GIN. Do you think we need to add new pg_am boolean option? Like pg_am.amcangettuple or pg_am.amcanpertuplescan -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev wrote: >> So? Barring some evidence that there's a significant performance win >> from a conventional indexscan, this is a weak argument. AFAICS the only >> significant advantage of the conventional API is to support ordered >> scans, and GIN doesn't do that anyway. > What about SELECT ... AND EXISTS (SELECT ... t @@ query) ? > But I don't believe that is popular use-case. In most cases, GIN is used > with bitmap scan. Your emails are so convincing and I'll remove support > amgettuple interface in GIN. SELECT * FROM foo WHERE t @@ query LIMIT 100 might be a fairly common use case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> SELECT * FROM foo WHERE t @@ query LIMIT 100 > might be a fairly common use case. It seems to me - SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100; -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Fri, Feb 13, 2009 at 8:00 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: >> SELECT * FROM foo WHERE t @@ query LIMIT 100 >> might be a fairly common use case. > > It seems to me - > SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100; I'm not sure you can really assume that the ORDER BY rank will always be in there. ...Robert
Teodor Sigaev <teodor@sigaev.ru> writes: > Do you think we need to add new pg_am boolean option? Like pg_am.amcangettuple > or pg_am.amcanpertuplescan I think it's sufficient to mark this by setting amgettuple to zero. We might as well allow amgetbitmap to be zero, too, to mark the case where the AM doesn't want to support bitmap scans. regards, tom lane
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Teodor Sigaev wrote: >> But I don't believe that is popular use-case. In most cases, GIN is used >> with bitmap scan. Your emails are so convincing and I'll remove support >> amgettuple interface in GIN. > SELECT * FROM foo WHERE t @@ query LIMIT 100 > might be a fairly common use case. [ shrug... ] The proposed implementation fails to be particularly fast-start anyway, since it will process the entire pending queue before returning anything to the executor. regards, tom lane
> [ shrug... ] The proposed implementation fails to be particularly > fast-start anyway, since it will process the entire pending queue > before returning anything to the executor. That is not true for fastupdate=off. But in any case it could be improved, but improvements doesn't solve the issue with lossy bitmap. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/