Re: Gist indexing performance with cidr types - Mailing list pgsql-performance
From | Henrik Thostrup Jensen |
---|---|
Subject | Re: Gist indexing performance with cidr types |
Date | |
Msg-id | alpine.DEB.2.11.1508261245250.35330@pyrite Whole thread Raw |
In response to | Re: Gist indexing performance with cidr types (Emre Hasegeli <emre@hasegeli.com>) |
Responses |
Re: Gist indexing performance with cidr types
Re: Gist indexing performance with cidr types |
List | pgsql-performance |
On Wed, 26 Aug 2015, Emre Hasegeli wrote: >> Are the coverage operatons just that expensive? > > They shouldn't be. A similar query like yours works in 0.5 second on my laptop: [snip] I get the same from your testcase. > Maybe, something we haven't expected about your dataset causes a > performance regression on the index. Did you see anything relevant on > the server logs on index creation time? I tried dropping and re-creating the index. The only log entry was for the drop statement. The distribution of the data is not uniform like the data set you produce. Though I find it hard to believe that it would affect this as much. select masklen(route), count(*) from routes group by masklen(route); masklen | count ---------+--------- 8 | 47 9 | 30 10 | 84 11 | 225 12 | 580 13 | 1163 14 | 2401 15 | 4530 16 | 32253 17 | 20350 18 | 35583 19 | 76307 20 | 111913 21 | 132807 22 | 229578 23 | 286986 24 | 1149793 Rest is rather small, though with bumps at /32 and /48 (typical IPv6 prefix length). Real-world address space is very fragmented, where as some is unused. Then there is the mixed IPv6 and IPv4 data that might factor in. I tried the approach from your benchmark, to try make a more isolated test case: irrexplorer=> SELECT DISTINCT route INTO hmm FROM routes_view WHERE asn = 2914; SELECT 732 irrexplorer=> explain analyze select routes.route from routes join hmm on routes.route && hmm.route; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.41..511914.27 rows=2558 width=7) (actual time=8.096..17209.778 rows=8127 loops=1) -> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual time=0.010..0.609 rows=732 loops=1) -> Index Only Scan using route_gist on routes (cost=0.41..470.32 rows=22900 width=7) (actual time=4.823..23.502 rows=11loops=732) Index Cond: (route && (hmm.route)::inet) Heap Fetches: 0 Planning time: 0.971 ms Execution time: 17210.627 ms (7 rows) The only difference in the query plan is that the above used an index only, where as your test case used index scan (it did this for me as well). I tried without index only scan: irrexplorer=> set enable_indexonlyscan =false; SET QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.41..571654.27 rows=2558 width=7) (actual time=6.406..15899.791 rows=8127 loops=1) -> Seq Scan on hmm (cost=0.00..11.32 rows=732 width=7) (actual time=0.011..0.615 rows=732 loops=1) -> Index Scan using route_gist on routes (cost=0.41..551.93 rows=22900 width=7) (actual time=4.490..21.712 rows=11loops=732) Index Cond: ((route)::inet && (hmm.route)::inet) Planning time: 0.505 ms Execution time: 15900.669 ms (6 rows) Slight faster, but nothing significant. Something seems wonky. Best regards, Henrik
pgsql-performance by date: