Re: Index-only scans for GIST - Mailing list pgsql-hackers

From Thom Brown
Subject Re: Index-only scans for GIST
Date
Msg-id CAA-aLv5cArNV6PtAfkqpro3prV0Y4_r1L47Yie2KMK_C=5K2Kg@mail.gmail.com
Whole thread Raw
In response to Re: Index-only scans for GIST  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 18 August 2014 09:05, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas <hlinnakangas@vmware.com>:

* I'm getting two regression failures with this (opr_sanity and join).


opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?

Sure, here you go. It seems like a change in a plan. At a quick glance it seems harmless: the new plan is identical except that the left and right side of a join have been reversed. But I don't understand either why this patch would change that, so it needs to be investigated.

* The regression test queries that use LIMIT are not guaranteed to always
return the same rows, hence they're not very good regression test cases.
I'd suggest using more restricting WHERE clauses, so that each query only
returns a handful of rows.

Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?

Imagine that you have a table with two rows, A and B. If you run a query like "SELECT * FROM table LIMIT 1", the system can legally return either row A or B, because there's no ORDER BY.

Now, admittedly you have a similar problem even without the LIMIT - the system can legally return the rows in either order - but it's less of an issue because at least you can quickly see from the diff that the result set is in fact the same, the rows are just in different order. You could fix that by adding an ORDER BY to all test queries, but we haven't done that in the regression suite because then we would not have any test coverage for cases where you don't have an ORDER BY. As a compromise, test cases are usually written without an ORDER BY, but if e.g. the buildfarm starts failing because of differences in the result set order across platforms, then we add an ORDER BY to make it stable.

* I think it's leaking memory, in GIST scan context. I tested this with a
variant of the regression tests:

insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
0.05*i)),
                          point(0.05*i, 0.05*i) FROM generate_series(0,
10000000) as i;
CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p <@ box(point(0,0),
point(9999999,9999999)) and length(p::text) < 10;

while the final query runs, 'top' shows constantly increasing memory usage.

I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.

No, it's definitely a leak caused by the patch. Test with the attached patch, which prints out to the server log the amount of memory used by the GiST scan memory context every 10000 rows. It clearly shows increasing memory usage all the way to the end of the query.

It's cleaned up at the end of the query, but that's not good enough because for a large query you might accumulate gigabytes of leaked memory until the query has finished. If you (manually) apply the same patch to git master, you'll see that the memory usage stays consistent and small.

Hi Anastasia,

Do you have time to address the issues brought up in Heikki's review?  It would be good if we could your work into PostgreSQL 9.5.

Thanks

Thom

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Directory/File Access Permissions for COPY and Generic File Access Functions
Next
From: Thom Brown
Date:
Subject: Re: KNN searches support for SP-GiST [GSOC'14]