Thread: GIN fast insert

GIN fast insert

From
Robert Haas
Date:
Jeff Davis asked me if I'd be willing to do a review of the GIN fast
insert patch about two weeks ago, but I haven't actually had a chance
to read through it in detail until tonight.  I can't say I really know
anything about GIN (though I did take this opportunity to RTM), so
apologies in advance if my comments here are totally off base.

My basic impression of this code is that it's trying unnaturally hard
to avoid creating a lossy TIDBitmap, which seems pretty odd,
considering that the whole point of TIDBitmap is, AFAICS, to degrade
to lossy mode when the alternative is eating too much memory.  I'm
particularly dismayed by this hunk:

+       if ( ntids == NULL && tbm && tbm_has_lossy(tbm) )
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_OUT_OF_MEMORY),
+                                       errmsg("not enough memory to
store result of pending list or VACUUM table" ),
+                                       errhint("Increase the
\"work_mem\" parameter.")));

The definition of work_mem is the amount of memory that a hash table
can use before spilling to disk, NOT the amount of memory a hash table
can consume before arbitrarily failing.  It's intended to be a soft
limit which people can tune to get the best performance out of their
system, not a hard limit that interrupts work.  Using the limit this
way will encourage people to set work_mem too high so that things
don't crash, but that in turn will potentially lead to other bad
behavior (like swapping).  I see that there's already one other place
in the GIN code that uses work_mem this way, but I don't think that's
a good reason to add another one - I don't see any other place in the
system that behaves this way, outside of GIN.

I think this is related to the problems with gincostestimate() that
Tom Lane was complaining about here:

http://archives.postgresql.org/message-id/20441.1234209245@sss.pgh.pa.us

I am not 100% sure I'm understanding this correctly, but I think the
reason why gincostestimate() is so desperate to avoid index scans when
the pending list is long is because it knows that scanFastInsert()
will blow up if an index scan is actually attempted because of the
aforementioned TIDBitmap problem.  This seems unacceptably fragile.
Faster insert performance is nice, but not if it means that my indices
suddenly start switching themselves off (which is bad) or in the
presence of cached plans, crashing (which is worse).

I think this code needs to be somehow rewritten to make things degrade
gracefully when the pending list is long - I'm not sure what the best
way to do that is.  Inventing a new data structure to store TIDs that
is never lossy seems like it might work, but you'd have to think about
what to do if it got too big.  I think it's probably best to just go
ahead and let it get arbitrarily long (since you have no other option
besides crashing) and leave work_mem out of it.  Instead, put a
reloption it that controls the maximum length of the pending list.
This is easy to tune: if you make it bigger, insert performance
improves, but queries eat more memory.  If you make it smaller, insert
performance gets worse, but you bound the memory that queries use more
tightly.

The other problem with using work_mem is that it can vary between
sessions, so one session happily stuffs a lot of data into the pending
list and then another session can't scan the index because it has a
lower work_mem setting.  Using a reloption avoids that problem by
making the whole thing symmetric, plus it gives you fine-grained
control over the behavior on an index-by-index basis.

...Robert


Re: GIN fast insert

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I think this is related to the problems with gincostestimate() that
> Tom Lane was complaining about here:
> http://archives.postgresql.org/message-id/20441.1234209245@sss.pgh.pa.us

> I am not 100% sure I'm understanding this correctly, but I think the
> reason why gincostestimate() is so desperate to avoid index scans when
> the pending list is long is because it knows that scanFastInsert()
> will blow up if an index scan is actually attempted because of the
> aforementioned TIDBitmap problem.  This seems unacceptably fragile.

Yipes.  If that's really the reason then I agree, it's a nonstarter.

> I think this code needs to be somehow rewritten to make things degrade
> gracefully when the pending list is long - I'm not sure what the best
> way to do that is.  Inventing a new data structure to store TIDs that
> is never lossy seems like it might work, but you'd have to think about
> what to do if it got too big.

What would be wrong with letting it degrade to lossy?  I suppose the
reason it's trying to avoid that is to avoid having to recheck all the
rows on that page when it comes time to do the index insertion; but
surely having to do that is better than having arbitrary, unpredictable
failure conditions.

It strikes me that part of the issue here is that the behavior of this
code is much better adapted to the bitmap-scan API than the traditional
indexscan API.  Since GIN doesn't support ordered scan anyway, I wonder
whether it wouldn't be more sensible to simply allow it to not offer
the traditional API.  It should be easy to make the planner ignore plain
indexscan plans for an AM that didn't support them.
        regards, tom lane


Re: GIN fast insert

From
Robert Haas
Date:
On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think this code needs to be somehow rewritten to make things degrade
>> gracefully when the pending list is long - I'm not sure what the best
>> way to do that is.  Inventing a new data structure to store TIDs that
>> is never lossy seems like it might work, but you'd have to think about
>> what to do if it got too big.
>
> What would be wrong with letting it degrade to lossy?  I suppose the
> reason it's trying to avoid that is to avoid having to recheck all the
> rows on that page when it comes time to do the index insertion; but
> surely having to do that is better than having arbitrary, unpredictable
> failure conditions.

No, I don't think that's it.  See here, beginning with "the problem
with lossy tbm has two aspects":

http://archives.postgresql.org/message-id/4974B002.3040202@sigaev.ru

> It strikes me that part of the issue here is that the behavior of this
> code is much better adapted to the bitmap-scan API than the traditional
> indexscan API.  Since GIN doesn't support ordered scan anyway, I wonder
> whether it wouldn't be more sensible to simply allow it to not offer
> the traditional API.  It should be easy to make the planner ignore plain
> indexscan plans for an AM that didn't support them.

If that doesn't lead to a performance degradation, I think it would be
a good idea.  It certainly seems like it would allow this patch to be
a LOT simpler, cleaner, and more robust.

...Robert


Re: GIN fast insert

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Feb 10, 2009 at 10:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It strikes me that part of the issue here is that the behavior of this
>> code is much better adapted to the bitmap-scan API than the traditional
>> indexscan API.  Since GIN doesn't support ordered scan anyway, I wonder
>> whether it wouldn't be more sensible to simply allow it to not offer
>> the traditional API.  It should be easy to make the planner ignore plain
>> indexscan plans for an AM that didn't support them.

> If that doesn't lead to a performance degradation, I think it would be
> a good idea.

For queries that select only a single index entry, there might be some
speed degradation, but a quick test suggests it's in the
single-digit-percentage range if everything's cached; and of course if
you have to go to disk then the extra CPU cycles to push a bitmap around
are lost in the noise.

In any case, as a wise man once said "I can make this code arbitrarily
fast, if it doesn't have to give the right answer".  Blowing up in easily
foreseeable circumstances isn't my idea of giving the right answer.
        regards, tom lane


Re: GIN fast insert

From
Robert Haas
Date:
On Tue, Feb 10, 2009 at 11:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> For queries that select only a single index entry, there might be some
> speed degradation, but a quick test suggests it's in the
> single-digit-percentage range if everything's cached; and of course if
> you have to go to disk then the extra CPU cycles to push a bitmap around
> are lost in the noise.
>
> In any case, as a wise man once said "I can make this code arbitrarily
> fast, if it doesn't have to give the right answer".  Blowing up in easily
> foreseeable circumstances isn't my idea of giving the right answer.

Sure, but dropping index-scan support is not the only way of fixing
that problem.  It has a certain elegance to it, but if it produces a
perceptible slowdown, it may not be the best approach.

...Robert


Re: GIN fast insert

From
Teodor Sigaev
Date:
>> What would be wrong with letting it degrade to lossy?  I suppose the
>> reason it's trying to avoid that is to avoid having to recheck all the
>> rows on that page when it comes time to do the index insertion; but
>> surely having to do that is better than having arbitrary, unpredictable
>> failure conditions.
> 
> No, I don't think that's it.  See here, beginning with "the problem
> with lossy tbm has two aspects":
> 
> http://archives.postgresql.org/message-id/4974B002.3040202@sigaev.ru
Right. Some comments to that points:
 - amgettuple interface hasn't possibility to work with page-wide result instead   of exact ItemPointer. amgettuple can
notreturn just a block number as   amgetbitmap can.
 

It's not so difficult to teach GIN to return ItemPointer one by one from pending 
list and eliminate this point. GIN will not collect matched ItemPointers in 
tidbitmap and will return them immediately. But:
 - Because of concurrent vacuum process: while we scan pending list, it's   content could be transferred into regular
structureof index and then we will   find the same tuple twice. Again, amgettuple hasn't protection from that,   only
amgetbitmaphas it. So, we need to filter results from regular GIN   by results from pending list. ANd for filtering we
can'tuse lossy tbm.
 

Again, we talk about amgettuple interface. We need to filter results from 
regular GIN by results from pending list. Now GIN does it by lookup matched 
ItemPointer in tidbitmap constructed from pending list. We could use ordered 
array of ItemPointers instead of tidbitmap, but I don't believe that it will 
take less memory. It's impossible to rescan pending list for two reasons: a) too 
slow, b) concurrent cleanup process (vacuum or insert now), because they could 
move tuples into regular structure.

Is it acceptable to add option to tidbitmap which will forbid tidbitmap to 
become lossy? That removes disabling index scan in gincostestimate and checking 
of non-lossy tidbitmap.

In current version, cleanup of pending list starts much earlier than non-lossy 
limit is reached in typical use-cases. Insertion process will start cleanup with 
at least one fired trigger: - number of heap tuples in pending list could produce lossy tidbitmap - total size of
pendinglist is greater than work_mem. This trigger is 
 
developed to speedup bulk insertion (which is used in cleanup), because it will 
collect whole pending list in memory at once. And this trigger is more strict 
than first one because in typical use-case of GIN heap tuple is rather big.

I believe that user could get GIN's error about work_mem only intentionally:
- turn off autovacuum
- set big work_mem
- populate table with GIN index (by needed number of insertion)
- prepare query which will return a lot of results (possibly, with seqscan=off 
because cost of scan of pending list grows fast)
- decrease work_mem for at least ten times
- execute query


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIN fast insert

From
Robert Haas
Date:
> I believe that user could get GIN's error about work_mem only intentionally:
> - turn off autovacuum

Meanwhile, in the other thread, we're having a discussion about people
wanting to do exactly this on a database-wide basis during peak load
hours...

> - set big work_mem
> - populate table with GIN index (by needed number of insertion)
> - prepare query which will return a lot of results (possibly, with
> seqscan=off because cost of scan of pending list grows fast)
> - decrease work_mem for at least ten times
> - execute query

Why would the new work_mem need to be 10x smaller than the old work mem?

...Robert


Re: GIN fast insert

From
Teodor Sigaev
Date:
Robert Haas wrote:
>> I believe that user could get GIN's error about work_mem only intentionally:
>> - turn off autovacuum
> Meanwhile, in the other thread, we're having a discussion about people
> wanting to do exactly this on a database-wide basis during peak load
> hours...
>> - decrease work_mem for at least ten times
>> - execute query
> Why would the new work_mem need to be 10x smaller than the old work mem?

That is is way to get GIN's error emitted. Work_mem should be decreased 
to catch a chance to get lossy tidbitmap.


Re: GIN fast insert

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Robert Haas wrote:
>> Why would the new work_mem need to be 10x smaller than the old work mem?

> That is is way to get GIN's error emitted. Work_mem should be decreased 
> to catch a chance to get lossy tidbitmap.

But it only has to be marginally lower, not 10x lower.  And there are
plenty of scenarios where different backends might be running with
different work_mem settings.

But the *real* problem is that you simply can not guarantee that
someone doesn't increase the size of the pending list between the time
that someone else commits to an indexscan plan and the time that they
execute that plan.  This scheme will result in random failures for
concurrent inserts/selects, and that's not acceptable.

What did you think of the idea of simply abandoning support for
conventional indexscans in GIN?  I agree that we could probably kluge
something to make conventional scans still work reliably, but it seems
to me that it'd be ugly, fragile, and quite possibly slow enough to not
ever beat bitmap scans anyway.
        regards, tom lane


Re: GIN fast insert

From
Teodor Sigaev
Date:
> But the *real* problem is that you simply can not guarantee that
> someone doesn't increase the size of the pending list between the time

If insertion process has bigger work_mem. Agree.

> What did you think of the idea of simply abandoning support for
> conventional indexscans in GIN?  I agree that we could probably kluge
> something to make conventional scans still work reliably, but it seems
> to me that it'd be ugly, fragile, and quite possibly slow enough to not
> ever beat bitmap scans anyway.

I don't like this idea because it forbids conventional indexscans even with 
fastupdate=off.

May readonly query change the index? Index doesn't use xmin/xmax/cmin/cmax 
anyhow, so it doesn't depend on transaction state. If so, gingettuple could make 
cleanup of pending list if it got lossy bitmap and repeat search. Although it 
could  be slow but it will never produce a failures and it will cause very rare 
(and GIN could emit WARNING/NOTICE/LOG message). And this solution allows to 
remove disabling of indexscan in gincostestimate.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIN fast insert

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
>> What did you think of the idea of simply abandoning support for
>> conventional indexscans in GIN?

> I don't like this idea because it forbids conventional indexscans even with 
> fastupdate=off.

So?  Barring some evidence that there's a significant performance win
from a conventional indexscan, this is a weak argument.  AFAICS the only
significant advantage of the conventional API is to support ordered
scans, and GIN doesn't do that anyway.
        regards, tom lane


Re: GIN fast insert

From
Robert Haas
Date:
On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
>>> What did you think of the idea of simply abandoning support for
>>> conventional indexscans in GIN?
>
>> I don't like this idea because it forbids conventional indexscans even with
>> fastupdate=off.
>
> So?  Barring some evidence that there's a significant performance win
> from a conventional indexscan, this is a weak argument.  AFAICS the only
> significant advantage of the conventional API is to support ordered
> scans, and GIN doesn't do that anyway.

Wouldn't it force you to recheck all tuples on the page, instead of
just rechecking the one of interest?

...Robert


Re: GIN fast insert

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Feb 12, 2009 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So?  Barring some evidence that there's a significant performance win
>> from a conventional indexscan, this is a weak argument.  AFAICS the only
>> significant advantage of the conventional API is to support ordered
>> scans, and GIN doesn't do that anyway.

> Wouldn't it force you to recheck all tuples on the page, instead of
> just rechecking the one of interest?

In the scenario at hand you'd have to do that anyway.

Bear in mind that if the query is predicted to return more than a few
rows, the planner is always going to pick bitmap scan anyway.  So this
whole issue is really only going to arise when you have a very bad
rowcount prediction (or a very stale plan), leading to a choice of
indexscan plan followed by enough rows actually returned to make the TID
bitmap become lossy.  That's certainly within the realm of possibility,
particularly since we often don't have good estimators for
GIN-compatible operators.  But I think designing to squeeze every last
bit of performance out of the case is a mistake.  We should be satisfied
to have correctness.

In the end this is a tradeoff: how much complexity and loss of
maintainability are we willing to accept to squeeze out a bit more
performance?  I'm leaning to the KISS end of that choice.  The tests
I did yesterday suggested to me that it would be difficult even to
measure a performance gain from supporting conventional indexscan in
GIN.  IMHO the kinds of schemes that are being tossed around here are
not remotely sane to choose if they don't lead to *big* wins.
        regards, tom lane


Re: GIN fast insert

From
Teodor Sigaev
Date:
> So?  Barring some evidence that there's a significant performance win
> from a conventional indexscan, this is a weak argument.  AFAICS the only
> significant advantage of the conventional API is to support ordered
> scans, and GIN doesn't do that anyway.
What about  SELECT ... AND EXISTS (SELECT ... t @@ query) ?
But I don't believe that is popular use-case. In most cases, GIN is used with 
bitmap scan. Your emails are so convincing and I'll remove support amgettuple 
interface in GIN.

Do you think we need to add new pg_am boolean option? Like pg_am.amcangettuple 
or pg_am.amcanpertuplescan




-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIN fast insert

From
Heikki Linnakangas
Date:
Teodor Sigaev wrote:
>> So?  Barring some evidence that there's a significant performance win
>> from a conventional indexscan, this is a weak argument.  AFAICS the only
>> significant advantage of the conventional API is to support ordered
>> scans, and GIN doesn't do that anyway.
> What about  SELECT ... AND EXISTS (SELECT ... t @@ query) ?
> But I don't believe that is popular use-case. In most cases, GIN is used 
> with bitmap scan. Your emails are so convincing and I'll remove support 
> amgettuple interface in GIN.

SELECT * FROM foo WHERE t @@ query LIMIT 100

might be a fairly common use case.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: GIN fast insert

From
Teodor Sigaev
Date:
> SELECT * FROM foo WHERE t @@ query LIMIT 100
> might be a fairly common use case.

It seems to me -
SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100;



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: GIN fast insert

From
Robert Haas
Date:
On Fri, Feb 13, 2009 at 8:00 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>> SELECT * FROM foo WHERE t @@ query LIMIT 100
>> might be a fairly common use case.
>
> It seems to me -
> SELECT * FROM foo WHERE t @@ query *ORDER BY rank* LIMIT 100;

I'm not sure you can really assume that the ORDER BY rank will always
be in there.

...Robert


Re: GIN fast insert

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Do you think we need to add new pg_am boolean option? Like pg_am.amcangettuple 
> or pg_am.amcanpertuplescan

I think it's sufficient to mark this by setting amgettuple to zero.

We might as well allow amgetbitmap to be zero, too, to mark the case
where the AM doesn't want to support bitmap scans.
        regards, tom lane


Re: GIN fast insert

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Teodor Sigaev wrote:
>> But I don't believe that is popular use-case. In most cases, GIN is used 
>> with bitmap scan. Your emails are so convincing and I'll remove support 
>> amgettuple interface in GIN.

> SELECT * FROM foo WHERE t @@ query LIMIT 100
> might be a fairly common use case.

[ shrug... ]  The proposed implementation fails to be particularly
fast-start anyway, since it will process the entire pending queue
before returning anything to the executor.
        regards, tom lane


Re: GIN fast insert

From
Teodor Sigaev
Date:
> [ shrug... ]  The proposed implementation fails to be particularly
> fast-start anyway, since it will process the entire pending queue
> before returning anything to the executor.

That is not true for fastupdate=off. But in any case it could be improved, but 
improvements doesn't solve the issue with lossy bitmap.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/