Thread: Broken selectivity with special inet operators

Broken selectivity with special inet operators

From

Josh Berkus

Date:

21 September 2011, 17:29:32

Summary: special inet operators ( << >> <<= =>> ) are
    up to 1000000X off in estimating rowcounts
Type: performance
Severity: normal
Tested on: 9.1.0
Description:

We've been noticing that row estimates for queries which use the =>> and
<<= operators for inet data were way, way off.  We finally narrowed the
problem down to a simple test:

===========
USING <<= :
===========

explain analyze
SELECT count(*)
FROM partition1 lh
WHERE lh.ip <<= '1.2.3'::cidr;


QUERY PLAN
.....
   ->  Index Scan using partition1_ip on partition1 lh
    (cost=0.00..10.21 rows=6956732 width=0)
    (actual time=0.016..0.016 rows=0 loops=1)
         Index Cond: ((ip >= '1.2.3.0/24'::inet) AND (ip <=
'1.2.3.255'::inet))
         Filter: (ip <<= '1.2.3.0/24'::inet)
.....

explain analyze
SELECT count(*)
FROM partition2 WHERE 1=1 AND ip <<= '87.178.193.0/24'::inet;

               QUERY PLAN
 Aggregate  (cost=18296.78..18296.79 rows=1 width=0) (actual
time=0.037..0.038 rows=1 loops=1)
   ->  Index Scan using partition2_ip on partition2  (cost=0.00..38.36
rows=7303365 width=0) (actual ti
me=0.022..0.031 rows=5 loops=1)
         Index Cond: ((ip >= '87.178.193.0/24'::inet) AND (ip <=
'87.178.193.255'::inet))
         Filter: (ip <<= '87.178.193.0/24'::inet)
 Total runtime: 0.107 ms

============
USING < > :
============

explain analyze
SELECT count(*)
FROM partition1 lh
WHERE lh.ip >= '1.2.3.0/24'::inet and lh.ip <= '1.2.3.255'::inet;

QUERY PLAN
....
   ->  Index Scan using partition1_ip on partition1 lh
    (cost=0.00..10.22 rows=1 width=0)
    (actual time=0.016..0.016 rows=0 loops=1)
         Index Cond: ((ip >= '1.2.3.0/24'::inet) AND (ip <=
'1.2.3.255'::inet))
....

explain analyze
SELECT count(*)
FROM partition2 WHERE 1=1 AND ip > '87.178.193.0'::inet and ip <=
'87.178.193.255'::inet;

            QUERY PLAN

 Aggregate  (cost=26.34..26.35 rows=1 width=0) (actual time=0.033..0.033
rows=1 loops=1)
   ->  Index Scan using partition2_ip on partition2  (cost=0.00..26.33
rows=5 width=0) (actual time=0.0
19..0.029 rows=5 loops=1)
         Index Cond: ((ip > '87.178.193.0'::inet) AND (ip <=
'87.178.193.255'::inet))
 Total runtime: 0.097 ms


====
Note that the mis-estimate of rows returned in each case is almost
exactly 50% of the total rows in the table.  That would suggest that
match_special_index_operator is failing, and not recognizing the <<=
operator for estimation purposes and just going with a default estimate
of 0.5.

I've tried to locate the cause of this problem, but the code involved is
rather convoluted and crusty, and I can't follow the logic.  Help?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Broken selectivity with special inet operators

From

Tom Lane

Date:

21 September 2011, 17:56:32

Josh Berkus <josh@agliodbs.com> writes:
> Summary: special inet operators ( << >> <<= =>> ) are
>     up to 1000000X off in estimating rowcounts

A look in pg_operator will show you that these operators have no
associated selectivity estimators at all.  It's not so much "broken"
as "unimplemented".

regression=# select oid::regoperator,oprcode,oprrest,oprjoin from pg_operator where (oprleft = 869 or oprright = 869)
andoprresult = 16; 
      oid       |    oprcode    |   oprrest   |     oprjoin
----------------+---------------+-------------+-----------------
 =(inet,inet)   | network_eq    | eqsel       | eqjoinsel
 <>(inet,inet)  | network_ne    | neqsel      | neqjoinsel
 <(inet,inet)   | network_lt    | scalarltsel | scalarltjoinsel
 <=(inet,inet)  | network_le    | scalarltsel | scalarltjoinsel
 >(inet,inet)   | network_gt    | scalargtsel | scalargtjoinsel
 >=(inet,inet)  | network_ge    | scalargtsel | scalargtjoinsel
 <<(inet,inet)  | network_sub   | -           | -
 <<=(inet,inet) | network_subeq | -           | -
 >>(inet,inet)  | network_sup   | -           | -
 >>=(inet,inet) | network_supeq | -           | -
(10 rows)


            regards, tom lane

Re: Broken selectivity with special inet operators

From

Josh Berkus

Date:

21 September 2011, 18:18:02

On 9/21/11 1:56 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Summary: special inet operators ( << >> <<= =>> ) are
>>     up to 1000000X off in estimating rowcounts
>
> A look in pg_operator will show you that these operators have no
> associated selectivity estimators at all.  It's not so much "broken"
> as "unimplemented".

Oh!  I was assuming that the special case code kicked in regardless.

So we implemented the rewrite to < and > for the actual execution, but
not for cost estimates?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Broken selectivity with special inet operators

From

Tom Lane

Date:

21 September 2011, 18:58:04

Josh Berkus <josh@agliodbs.com> writes:
> On 9/21/11 1:56 PM, Tom Lane wrote:
>> A look in pg_operator will show you that these operators have no
>> associated selectivity estimators at all.  It's not so much "broken"
>> as "unimplemented".

> Oh!  I was assuming that the special case code kicked in regardless.

> So we implemented the rewrite to < and > for the actual execution, but
> not for cost estimates?

If you mean the indexscan optimization, we do know how to estimate the
cost of the indexscans, because that depends mostly on the behavior of
the added < or > condition(s).  This does not imply knowing how to
estimate the behavior of >>= itself.

            regards, tom lane

Re: Broken selectivity with special inet operators

From

Josh Berkus

Date:

21 September 2011, 21:37:28

> If you mean the indexscan optimization, we do know how to estimate the
> cost of the indexscans, because that depends mostly on the behavior of
> the added < or > condition(s).  This does not imply knowing how to
> estimate the behavior of >>= itself.

If we can estimate the cost of the indexscan, why can't we estimate the
rowcount?  Sorry if I'm being dense here, but I really don't understand
how the special operator code works at all.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Broken selectivity with special inet operators

From

Tom Lane

Date:

21 September 2011, 21:48:16

Josh Berkus <josh@agliodbs.com> writes:
>> If you mean the indexscan optimization, we do know how to estimate the
>> cost of the indexscans, because that depends mostly on the behavior of
>> the added < or > condition(s).  This does not imply knowing how to
>> estimate the behavior of >>= itself.

> If we can estimate the cost of the indexscan, why can't we estimate the
> rowcount?

Because the estimated rowcount is derived long before we consider whether
to use an indexscan at all, and indeed must be computable whether the
table has any related index or not.  The special-indexscan-qual code is
*not* a substitute for providing a selectivity estimator.

It's possible that we could build simple estimators for these operators
that just turn the problem into a range estimation and then pass it off
to somewhere else, but nobody has tried.

            regards, tom lane

Re: Broken selectivity with special inet operators

From

Josh Berkus

Date:

21 September 2011, 21:54:39

> It's possible that we could build simple estimators for these operators
> that just turn the problem into a range estimation and then pass it off
> to somewhere else, but nobody has tried.

Right, that's why I'm asking.  I'd like to reuse code ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com