Thread: SP-GiST failing to complete SP-GiST index build

SP-GiST failing to complete SP-GiST index build

From
"Jonathan S. Katz"
Date:
Hi,

While preparing for an upcoming presentation, I was playing around
with SP-GiST indexes on tstz ranges and was having an issue where
some would fail to build to completion in a reasonable time, especially
compared to corresponding GiST builds.

Version:

    PostgreSQL 10.4 on x86_64-apple-darwin16.7.0,
    compiled by Apple LLVM version 9.0.0 (clang-900.0.39.2), 64-bit

Relevant Config:

    shared_buffers = 1GB
    work_mem = 64MB
    maintenance_work_mem = 1GB

System was not under unusual load.

I thought perhaps it was a result of my data being relatively sparse,
but then I had an issue getting one scenario to build where the data
was much more common use case. I wanted to run this scenario
by just to ensure there are no bugs, as the “common case” was working
fine for me.

First, see “good.sql” for a base case: 1.2MM rows “densely” clustered
rows inserted, both GiST and SP-GiST index build in reasonable time
periods (< 15s on my machine).

Next, see bad.sql.  1.2MM sparsely clustered rows inserted, GiST indexes
builds in about 30s on my machine.  SP-GiST does not build at all, or at
least I have been composing this email for about 10 minutes since I kicked
off my latest and it has yet to terminate.

I can understand this being an extreme case for SP-GiST as it’s better
for data set that’s more densely clustered, but I wanted to provide
this info to rule out whether or not this is a bug.

Thanks,

Jonathan


Attachment

Re: SP-GiST failing to complete SP-GiST index build

From
Peter Geoghegan
Date:
On Sun, May 27, 2018 at 12:45 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote:
> Next, see bad.sql.  1.2MM sparsely clustered rows inserted, GiST indexes
> builds in about 30s on my machine.  SP-GiST does not build at all, or at
> least I have been composing this email for about 10 minutes since I kicked
> off my latest and it has yet to terminate.
>
> I can understand this being an extreme case for SP-GiST as it’s better
> for data set that’s more densely clustered, but I wanted to provide
> this info to rule out whether or not this is a bug.

While I'm no SP-GiST expert, I strongly suspect that you've identified
a real bug here. Your test case has been running on my development
machine for 20 minutes now (this is server class hardware).

I ran perf with your testcase, and I see that the majority of
instructions are executed within these functions:

  22.88%  postgres  postgres            [.] spgdoinsert
  12.98%  postgres  postgres            [.] range_deserialize
  11.44%  postgres  postgres            [.] FunctionCall2Coll
  10.40%  postgres  postgres            [.] heap_tuple_untoast_attr
   8.62%  postgres  postgres            [.] spgExtractNodeLabels
   5.92%  postgres  postgres            [.] getQuadrant
   4.90%  postgres  postgres            [.] AllocSetAlloc

spgdoinsert() contains the following comment:

/*
 * Bail out if query cancel is pending.  We must have this somewhere
 * in the loop since a broken opclass could produce an infinite
 * picksplit loop.
 */
CHECK_FOR_INTERRUPTS();

Perhaps the problem is in the range type SP-GiST opclass support code
- it could have this exact picksplit infinite loop problem. That's
perfectly consistent with what we see here.

--
Peter Geoghegan


Re: SP-GiST failing to complete SP-GiST index build

From
Peter Geoghegan
Date:
On Sun, May 27, 2018 at 2:09 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Sun, May 27, 2018 at 12:45 PM, Jonathan S. Katz <jkatz@postgresql.org> wrote:
>> Next, see bad.sql.  1.2MM sparsely clustered rows inserted, GiST indexes
>> builds in about 30s on my machine.  SP-GiST does not build at all, or at
>> least I have been composing this email for about 10 minutes since I kicked
>> off my latest and it has yet to terminate.
>>
>> I can understand this being an extreme case for SP-GiST as it’s better
>> for data set that’s more densely clustered, but I wanted to provide
>> this info to rule out whether or not this is a bug.
>
> While I'm no SP-GiST expert, I strongly suspect that you've identified
> a real bug here. Your test case has been running on my development
> machine for 20 minutes now (this is server class hardware).

Looks like I spoke too soon. The SP-GiST index build finished a moment
ago. The index build took a horrifically long time for a 122 MB index,
though.

--
Peter Geoghegan


Re: SP-GiST failing to complete SP-GiST index build

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> Looks like I spoke too soon. The SP-GiST index build finished a moment
> ago. The index build took a horrifically long time for a 122 MB index,
> though.

Instrumenting the test case suggests that getQuadrant pretty much always
returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
is related to the fact that the test case inserts the values in increasing
order, so that new values are always greater than existing values in the
index.  SPGiST is unable to rebalance its tree on the fly, so it's pretty
well screwed in this situation.  It does finish eventually, but in about
50x longer than GiST.  I imagine the index's query performance would be
equally awful.

            regards, tom lane


Re: SP-GiST failing to complete SP-GiST index build

From
Peter Geoghegan
Date:
On Sun, May 27, 2018 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Instrumenting the test case suggests that getQuadrant pretty much always
> returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
> is related to the fact that the test case inserts the values in increasing
> order, so that new values are always greater than existing values in the
> index.

I suspected the same. It reminded me of the weird behavior that the
Postgres qsort() sometimes exhibits.

> SPGiST is unable to rebalance its tree on the fly, so it's pretty
> well screwed in this situation.  It does finish eventually, but in about
> 50x longer than GiST.  I imagine the index's query performance would be
> equally awful.

Can you think of some way of side-stepping the issue? It's unfortunate
that SP-GiST is potentially so sensitive to input order.

-- 
Peter Geoghegan


Re: SP-GiST failing to complete SP-GiST index build

From
Peter Geoghegan
Date:
On Sun, May 27, 2018 at 5:24 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Sun, May 27, 2018 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Instrumenting the test case suggests that getQuadrant pretty much always
>> returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
>> is related to the fact that the test case inserts the values in increasing
>> order, so that new values are always greater than existing values in the
>> index.
>
> I suspected the same. It reminded me of the weird behavior that the
> Postgres qsort() sometimes exhibits.

I confirmed this by using CLUSTER on an index built against a new
column with no physical/logical correlation (a column containing
random() data). This resulted in a dramatically faster build for the
SP-GiST index:

pg@~[31121]=# CREATE INDEX logs2_log_time_spgist_idx ON logs2 USING
spgist (log_time);
DEBUG:  building index "logs2_log_time_spgist_idx" on table "logs2" serially
CREATE INDEX
Time: 3961.815 ms (00:03.962)

Also, the final index is only 88 MB (not 122 MB).

As a point of comparison, this is how a REINDEX of the GiST index went
against the same (CLUSTERed) table:

pg@~[31121]=# REINDEX INDEX logs2_log_time_gist_idx;
DEBUG:  building index "logs2_log_time_gist_idx" on table "logs2" serially
REINDEX
Time: 14652.058 ms (00:14.652)

-- 
Peter Geoghegan


Re: SP-GiST failing to complete SP-GiST index build

From
"Jonathan S. Katz"
Date:
> On May 27, 2018, at 8:24 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Sun, May 27, 2018 at 5:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Instrumenting the test case suggests that getQuadrant pretty much always
>> returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
>> is related to the fact that the test case inserts the values in increasing
>> order, so that new values are always greater than existing values in the
>> index.
>
> I suspected the same. It reminded me of the weird behavior that the
> Postgres qsort() sometimes exhibits.
>
>> SPGiST is unable to rebalance its tree on the fly, so it's pretty
>> well screwed in this situation.  It does finish eventually, but in about
>> 50x longer than GiST.  I imagine the index's query performance would be
>> equally awful.
>
> Can you think of some way of side-stepping the issue? It's unfortunate
> that SP-GiST is potentially so sensitive to input order.

To help with the testing, I’ve attached two more scenarios, labeled
“good2” and “bad2” below.  The premise is similar, except that I start
with empty tables with indexes already created.

The workload in “bad2” is what you may see in the real world with
proper DBA planning (i.e. I have my indexes in place before I start
collecting data) with scheduling applications or anything with an increasing
time series.

The timing results I found were similar to the initial example posted, with
me giving up on the last scenario (I do not have the same patience as
Peter).

FWIW I have used SP-GiST indexes before with datasets similar to how
“bad2” is generated (though not nearly as dramatic as the upward increase
seen in the range) and have not run across this issue.

Jonathan



Attachment