Re: Question about rtrees (overleft replacing left in nodes) - Mailing list pgsql-general
From | William White |
---|---|
Subject | Re: Question about rtrees (overleft replacing left in nodes) |
Date | |
Msg-id | 406B0003.8030902@frognet.net Whole thread Raw |
In response to | Re: Question about rtrees (overleft replacing left in nodes) (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
Tom Lane wrote: > It'd need to be > > S &< T iff inf(S) <= sup(T) > > to satisfy the geometrical intuition. (You could quibble about the > equality case, but box_overlap seems to consider touching boxes to > overlap, so for consistency this should too.) > > However, if this is indeed wrong, why have we not heard bug reports > stating that rtree indexes don't work? Because the positionsel/positionjoinsel estimator functions make it all but impossible to invoke index search on r-trees for position operators; see below. > Can you generate a test case > in which it fails? Yes, actually; I was able to demonstrate there is a problem after I'd already sent off the email. But you have to "cheat", because the default cost estimator functions left, overleft, right, and overright all return a large fixed value (0.1, IIRC two orders of magnitude over areasel/areajoinsel) which all but guarantees sequential scan for any relvar with a reasonable number of tuples (I tried 100,000 tuples and still got sequential scan). These are, of course, constant cost estimator functions. The following Perl script temporarily replaces oprrest with "areasel" and oprjoin with "areajoinsel" (instead of positionsel and positionjoinsel rsp.). WARNING: This assumes oids 493 and 494 correspond to box_ops << and &< respectively, that happens to be true in the pgsql version I'm using at the moment. Could be replaced with test for operator name and data type, of course. Suggested remediation, if this is reproducible: 1. Modify overleft and overright to actually mean "overlap or left" and "overlap or right" respectively, which might break existing code, or 2. Add two new functions, e.g., "overlap_or_left" and "overlap_or_right", and two operators (e.g., "&&||<<" and "&&||>>" ... yes, I'm kidding about the ugly operator names), and modify the operator class definition to use the new operators instead of the old. Existing code would continue to work (since &< and &> would now never invoke index scan), new code could take advantage of index scan on position operators (assuming the cost estimator were modified to permit this). Code follows. Apologies for Perl rather than straight SQL, this was adapted from part of a test suite for an interval (as in "interval in R" not the IMNSHO misnamed SQL "interval" type) extension I'm writing, which is why I ran across the problem. ---- cut here ---- #!/usr/bin/perl open(P,"|psql >/dev/null 2>&1"); print P <<"EOF"; DROP INDEX i_test2; DROP TABLE test2; CREATE TABLE test2 ( i box NOT NULL ); CREATE INDEX i_test2 ON test2 using rtree (i); EOF $t0 = time; for ($i=1;$i<=1000;$i++) { printf(P "INSERT INTO test2 (i) VALUES ('(%d,%d),(%d,%d)');\n", $i+1,1,$i,0); } print P "VACUUM ANALYZE;\n"; close(P); open(P,"|psql"); print P <<"EOF"; -- Temporarily use cost estimators for areasel/areajoinsel on << and &< update pg_operator set oprrest = 'areasel', oprjoin = 'areajoinsel' where oid = 494 or oid = 493; -- This works (at least for 10 results) since it'll be towards the right -- half of leaves SELECT * FROM test2 WHERE i &< ('(999.5,1),(998.5,0)') LIMIT 1; -- This fails since it's to the left half of leaves SELECT * FROM test2 WHERE i &< ('(11.5,1),(10.5,0)') LIMIT 1; -- Restore original cost estimators update pg_operator set oprrest = 'positionsel', oprjoin = 'positionjoinsel' where oid = 494 or oid = 493; -- Now they both work: SELECT * FROM test2 WHERE i &< ('(999.5,1),(998.5,0)') LIMIT 1; SELECT * FROM test2 WHERE i &< ('(11.5,1),(10.5,0)') LIMIT 1; EOF ---- cut here ----
pgsql-general by date: