Gist indexing performance with cidr types - Mailing list pgsql-performance

From Henrik Thostrup Jensen
Subject Gist indexing performance with cidr types
Date
Msg-id alpine.DEB.2.11.1508251504160.31004@pyrite
Whole thread Raw
Responses Re: Gist indexing performance with cidr types
List pgsql-performance
Hi

I'm trying to get a query to run fast enough for interactive use. I've gotten
some speed-up, but still not there. It is for a tool called IRRExplorer
(http://irrexplorer.nlnog.net/) that correlates IP routes between Internet
Route Registries and real-world routing information. We landed on PostgreSQL
largely due to indexing of the cidr type with gist indexing.

* Preliminaries:

The query is about getting data from this table:

irrexplorer=> \d routes
       Table "public.routes"
   Column   |  Type   | Modifiers
-----------+---------+-----------
  route     | cidr    | not null
  asn       | bigint  | not null
  source_id | integer | not null
Indexes:
     "unique_entry" UNIQUE CONSTRAINT, btree (route, asn, source_id)
     "route_gist" gist (route inet_ops)
Check constraints:
     "positive_asn" CHECK (asn > 0)
Foreign-key constraints:
     "routes_source_id_fkey" FOREIGN KEY (source_id) REFERENCES sources(id)

Complete DDL: https://github.com/job/irrexplorer/blob/master/data/schema.sql

Data set: 125 MB on disk, 2.3 million rows.

Running stock PostgreSQL 9.4.4 on Ubuntu 14.10 (hosted on VMWare on OS X)

Have done VACUUM, ANALYZE, and checked memory settings, and tried to increase
work_mem, but with no success. The issue seems cpu bound (100% cpu load during
query).


* Query

When a user inputs an AS number a simple match on the asn column will return
the stuff relevant. However, the interesting thing to display is
conflicting/rogue routes. This means matching routes with the && operator to
find all covered/covering routes. This would look something like this:

SELECT rv.route, rv.asn, rv.source
FROM routes_view rv
LEFT OUTER JOIN routes_view r ON (rv.route && r.route)
WHERE rv.route && r.route AND r.asn = %s

While this is fairly fast if the initial set of routes is relatively small
(<100) it runs with a second or so, but if the number of routes matching
the asn is large (>1000), it takes quite a while (+30 seconds). Explain
analyze link:

http://explain.depesz.com/s/dHqo

I am not terribly good at reading the output, but it seem most of the time is
actually spend on the bitmap scan for the gist index. It there another type of
indexing that would behave better here?

Since there often identical routes in the initial set of routes (from the AS number matching), I tried reducing the
initialset of matching routes: 
ORDER BY rv.route;

SELECT rv.route, rv.asn, rv.source
FROM
     (SELECT DISTINCT route FROM routes_view WHERE asn = %s) r
LEFT OUTER JOIN routes_view rv ON (r.route && rv.route)

This often cuts the set of initial matching routes by 25-50%, and cuts a
similar amount of time from the query time. Explain analyze link:

http://explain.depesz.com/s/kf13

I tried further reducing the routes to the minimal set:

WITH distinct_routes AS (SELECT DISTINCT route FROM routes_view WHERE asn = %s)
SELECT route FROM distinct_routes
EXCEPT
SELECT r1.route
FROM distinct_routes r1
     INNER JOIN distinct_routes r2 ON (r1.route << r2.route);

But typically only yields 10-20% reduction of the inital route set, and
adds query complexity (putting the above in a CTE/WITH seems to make the
query significantly slower for some reason).

The main issue seem to be with the gist bitmap index. Is there a better way to
approach this style of query?


     Best regards, Henrik



pgsql-performance by date:

Previous
From: Johann Spies
Date:
Subject: Long running query: How to monitor the progress
Next
From: Tom Lane
Date:
Subject: Re: Long running query: How to monitor the progress