Gist indexing performance with cidr types - Mailing list pgsql-performance
From | Henrik Thostrup Jensen |
---|---|
Subject | Gist indexing performance with cidr types |
Date | |
Msg-id | alpine.DEB.2.11.1508251504160.31004@pyrite Whole thread Raw |
Responses |
Re: Gist indexing performance with cidr types
|
List | pgsql-performance |
Hi I'm trying to get a query to run fast enough for interactive use. I've gotten some speed-up, but still not there. It is for a tool called IRRExplorer (http://irrexplorer.nlnog.net/) that correlates IP routes between Internet Route Registries and real-world routing information. We landed on PostgreSQL largely due to indexing of the cidr type with gist indexing. * Preliminaries: The query is about getting data from this table: irrexplorer=> \d routes Table "public.routes" Column | Type | Modifiers -----------+---------+----------- route | cidr | not null asn | bigint | not null source_id | integer | not null Indexes: "unique_entry" UNIQUE CONSTRAINT, btree (route, asn, source_id) "route_gist" gist (route inet_ops) Check constraints: "positive_asn" CHECK (asn > 0) Foreign-key constraints: "routes_source_id_fkey" FOREIGN KEY (source_id) REFERENCES sources(id) Complete DDL: https://github.com/job/irrexplorer/blob/master/data/schema.sql Data set: 125 MB on disk, 2.3 million rows. Running stock PostgreSQL 9.4.4 on Ubuntu 14.10 (hosted on VMWare on OS X) Have done VACUUM, ANALYZE, and checked memory settings, and tried to increase work_mem, but with no success. The issue seems cpu bound (100% cpu load during query). * Query When a user inputs an AS number a simple match on the asn column will return the stuff relevant. However, the interesting thing to display is conflicting/rogue routes. This means matching routes with the && operator to find all covered/covering routes. This would look something like this: SELECT rv.route, rv.asn, rv.source FROM routes_view rv LEFT OUTER JOIN routes_view r ON (rv.route && r.route) WHERE rv.route && r.route AND r.asn = %s While this is fairly fast if the initial set of routes is relatively small (<100) it runs with a second or so, but if the number of routes matching the asn is large (>1000), it takes quite a while (+30 seconds). Explain analyze link: http://explain.depesz.com/s/dHqo I am not terribly good at reading the output, but it seem most of the time is actually spend on the bitmap scan for the gist index. It there another type of indexing that would behave better here? Since there often identical routes in the initial set of routes (from the AS number matching), I tried reducing the initialset of matching routes: ORDER BY rv.route; SELECT rv.route, rv.asn, rv.source FROM (SELECT DISTINCT route FROM routes_view WHERE asn = %s) r LEFT OUTER JOIN routes_view rv ON (r.route && rv.route) This often cuts the set of initial matching routes by 25-50%, and cuts a similar amount of time from the query time. Explain analyze link: http://explain.depesz.com/s/kf13 I tried further reducing the routes to the minimal set: WITH distinct_routes AS (SELECT DISTINCT route FROM routes_view WHERE asn = %s) SELECT route FROM distinct_routes EXCEPT SELECT r1.route FROM distinct_routes r1 INNER JOIN distinct_routes r2 ON (r1.route << r2.route); But typically only yields 10-20% reduction of the inital route set, and adds query complexity (putting the above in a CTE/WITH seems to make the query significantly slower for some reason). The main issue seem to be with the gist bitmap index. Is there a better way to approach this style of query? Best regards, Henrik
pgsql-performance by date: