Slow join using network address function - Mailing list pgsql-performance

From Eric Jain
Subject Slow join using network address function
Date
Msg-id 000701c3fa02$e4e3d180$c300000a@caliente
Whole thread Raw
Responses Re: Slow join using network address function
Re: Slow join using network address function
List pgsql-performance
I'm trying to join two tables on an inet column, where one of the
columns may contain a subnet rather than a single host. Somehow the
operation isn't completing quite fast enough, even though neither table
is very large:

       table        |  rows
--------------------+--------
 clients            | 115472
 clients_commercial |  11670


First attempt, cancelled after running for half an hour:

SELECT
  c.address AS address,
  cc.address AS network
FROM
  clients c
  JOIN clients_commercial cc ON (c.address <<= cc.address)
;

 Nested Loop
 (cost=189.00..27359887.76 rows=607947200 width=22)
   Join Filter: ("outer".address <<= "inner".address)
   ->  Seq Scan on clients c
       (cost=0.00..2074.76 rows=102176 width=11)
   ->  Materialize
       (cost=189.00..308.00 rows=11900 width=11)
         ->  Seq Scan on clients_commercial cc
             (cost=0.00..189.00 rows=11900 width=11)



Second attempt, completes within 10 min:

SELECT
  c.address AS address,
  cc.address AS network
FROM
  clients c,
  clients_commercial cc
WHERE
  c.commercial IS NULL
  AND c.address <<= cc.address
;

 Nested Loop
 (cost=189.00..139084.01 rows=3040450 width=22)
   Join Filter: ("outer".address <<= "inner".address)
   ->  Seq Scan on clients c
       (cost=0.00..2074.76 rows=511 width=11)
         Filter: (commercial IS NULL)
   ->  Materialize
       (cost=189.00..308.00 rows=11900 width=11)
         ->  Seq Scan on clients_commercial cc
             (cost=0.00..189.00 rows=11900 width=11)


Third attempt; provided some indexes, which unfortunately don't get
used, making the query twice as slow as the previous one:

SELECT
  c.address AS address,
  cc.address AS network
FROM
  clients c,
  clients_commercial cc
WHERE
  c.commercial IS NULL
  AND set_masklen(c.address, masklen(cc.address)) = cc.address
;

CREATE INDEX clients_commercial_masklen_idx
ON clients_commercial((masklen(address)));

CREATE INDEX clients_32_idx
ON clients((set_masklen(address, 32)));

CREATE INDEX clients_24_idx
ON clients((set_masklen(address, 24)));

CREATE INDEX clients_16_idx
ON clients((set_masklen(address, 16)));

 Nested Loop
 (cost=189.00..169488.51 rows=479 width=22)
   Join Filter: (set_masklen("outer".address, masklen("inner".address))
= "inner".address)
   ->  Seq Scan on clients c
       (cost=0.00..2074.76 rows=511 width=11)
         Filter: (commercial IS NULL)
   ->  Materialize
       (cost=189.00..308.00 rows=11900 width=11)
         ->  Seq Scan on clients_commercial cc
             (cost=0.00..189.00 rows=11900 width=11)


Anything else I could try? BTREE indexes don't seem to work with the <<=
operator; is this not possible in principal, or simply something that
has not been implmented yet?


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Column correlation drifts, index ignored again
Next
From: Steve Atkins
Date:
Subject: Re: Slow join using network address function