Re: Gist indexing performance with cidr types - Mailing list pgsql-performance

From Henrik Thostrup Jensen
Subject Re: Gist indexing performance with cidr types
Date
Msg-id alpine.DEB.2.11.1508261245250.35330@pyrite
Whole thread Raw
In response to Re: Gist indexing performance with cidr types  (Emre Hasegeli <emre@hasegeli.com>)
Responses Re: Gist indexing performance with cidr types
Re: Gist indexing performance with cidr types
List pgsql-performance
On Wed, 26 Aug 2015, Emre Hasegeli wrote:

>> Are the coverage operatons just that expensive?
>
> They shouldn't be.  A similar query like yours works in 0.5 second on my laptop:
[snip]

I get the same from your testcase.

> Maybe, something we haven't expected about your dataset causes a
> performance regression on the index.  Did you see anything relevant on
> the server logs on index creation time?

I tried dropping and re-creating the index. The only log entry was for the
drop statement.

The distribution of the data is not uniform like the data set you produce.
Though I find it hard to believe that it would affect this as much.

select masklen(route), count(*) from routes group by masklen(route);

  masklen |  count
---------+---------
        8 |      47
        9 |      30
       10 |      84
       11 |     225
       12 |     580
       13 |    1163
       14 |    2401
       15 |    4530
       16 |   32253
       17 |   20350
       18 |   35583
       19 |   76307
       20 |  111913
       21 |  132807
       22 |  229578
       23 |  286986
       24 | 1149793

Rest is rather small, though with bumps at /32 and /48 (typical IPv6 prefix length).

Real-world address space is very fragmented, where as some is unused.

Then there is the mixed IPv6 and IPv4 data that might factor in.


I tried the approach from your benchmark, to try make a more isolated test
case:

irrexplorer=> SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn = 2914;
SELECT 732

irrexplorer=> explain analyze select routes.route from routes join hmm on routes.route && hmm.route;
                                                                QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.41..511914.27 rows=2558 width=7) (actual time=8.096..17209.778 rows=8127 loops=1)
    ->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual time=0.010..0.609 rows=732 loops=1)
    ->  Index Only Scan using route_gist on routes  (cost=0.41..470.32 rows=22900 width=7) (actual time=4.823..23.502
rows=11loops=732) 
          Index Cond: (route && (hmm.route)::inet)
          Heap Fetches: 0
  Planning time: 0.971 ms
  Execution time: 17210.627 ms
(7 rows)

The only difference in the query plan is that the above used an index
only, where as your test case used index scan (it did this for me as
well). I tried without index only scan:

irrexplorer=> set enable_indexonlyscan =false;
SET

                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.41..571654.27 rows=2558 width=7) (actual time=6.406..15899.791 rows=8127 loops=1)
    ->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual time=0.011..0.615 rows=732 loops=1)
    ->  Index Scan using route_gist on routes  (cost=0.41..551.93 rows=22900 width=7) (actual time=4.490..21.712
rows=11loops=732) 
          Index Cond: ((route)::inet && (hmm.route)::inet)
  Planning time: 0.505 ms
  Execution time: 15900.669 ms
(6 rows)

Slight faster, but nothing significant. Something seems wonky.


     Best regards, Henrik



pgsql-performance by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: Gist indexing performance with cidr types
Next
From: Emre Hasegeli
Date:
Subject: Re: Gist indexing performance with cidr types