Re: Index-only scans for GIST - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Index-only scans for GIST
Date
Msg-id 53E2906E.8080707@vmware.com
Whole thread Raw
In response to Index-only scans for GIST  (Anastasia Lubennikova <lubennikovaav@gmail.com>)
Responses Re: Index-only scans for GIST
List pgsql-hackers
On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote:
> Hi, hackers!
> I work on a GSoC project "Index-only scans for GIST"
> https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
>
> Repository is
> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
> Patch is in attachments.

Thanks!

Some comments:

* I got this compiler warning:

gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]     ListCell *tmpPageData = so->curPageData;     ^

* I'm getting two regression failures with this (opr_sanity and join).

* After merging with master, build fails because of duplicate OIDs.

* The regression test queries that use LIMIT are not guaranteed to 
always return the same rows, hence they're not very good regression test 
cases. I'd suggest using more restricting WHERE clauses, so that each 
query only returns a handful of rows.

* What's the reason for turning GISTScanOpaqueData.pageData from an 
array to a List?

* I think it's leaking memory, in GIST scan context. I tested this with 
a variant of the regression tests:

insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 
0.05*i)),                         point(0.05*i, 0.05*i) FROM generate_series(0, 
10000000) as i;
CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p <@ box(point(0,0), 
point(9999999,9999999)) and length(p::text) < 10;

while the final query runs, 'top' shows constantly increasing memory usage.

> It includes index-only scans for multicolumn GIST and new regression test.
> Fetch() method is realized for box and point opclasses.

Can we have Fetch functions for all the datatypes in btree_gist contrib 
module, please? Do other contrib modules contain GiST opclasses that 
could have Fetch functions?

> Documentation is not updated yet, but I'm going to do it till the end of
> GSoC.
>
> I've got one question about query with OR condition. It is the last query
> in regression test "gist_indexonly". It doesn't fail but it doensn't use
> index-only scans. Could someone explain to me how it works?
> It seems to depend on build_paths_for_OR
> <http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2>
> function.
> But I couldn't understand how.

The query is:

select * from gist_tbl
where b <@ box(point(5,5),  point(6,6))
or p <@ box(point(0,0),  point(100,100)) limit 10;

It cannot use an index(-only) scan for this, because a single index scan 
can only return rows based on one key. In this case, you need to do two 
scans, and then return the rows returned by either scan, removing 
duplicates. A bitmap scan is possible, because it can remove the 
duplicates, but the planner can't produce a plain index scan plan that 
would do the same.

A common trick when that happens in a real-world application is to 
re-write the query using UNION:

select * from gist_tbl
where b <@ box(point(5,5),  point(6,6))
UNION
select * from gist_tbl
where p <@ box(point(0,0),  point(100,100))
limit 10;

Although that doesn't seem to actually work:

ERROR:  could not identify an equality operator for type box
LINE 1: select * from gist_tbl               ^

but that's not your patch's fault, the same happens with unpatched master.

IOW, you don't need to worry about that case.

- Heikki




pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
Next
From: Bruce Momjian
Date:
Subject: Re: select_common_type()'s behavior doesn't match the documentation