Thread: Slow join using network address function
I'm trying to join two tables on an inet column, where one of the columns may contain a subnet rather than a single host. Somehow the operation isn't completing quite fast enough, even though neither table is very large: table | rows --------------------+-------- clients | 115472 clients_commercial | 11670 First attempt, cancelled after running for half an hour: SELECT c.address AS address, cc.address AS network FROM clients c JOIN clients_commercial cc ON (c.address <<= cc.address) ; Nested Loop (cost=189.00..27359887.76 rows=607947200 width=22) Join Filter: ("outer".address <<= "inner".address) -> Seq Scan on clients c (cost=0.00..2074.76 rows=102176 width=11) -> Materialize (cost=189.00..308.00 rows=11900 width=11) -> Seq Scan on clients_commercial cc (cost=0.00..189.00 rows=11900 width=11) Second attempt, completes within 10 min: SELECT c.address AS address, cc.address AS network FROM clients c, clients_commercial cc WHERE c.commercial IS NULL AND c.address <<= cc.address ; Nested Loop (cost=189.00..139084.01 rows=3040450 width=22) Join Filter: ("outer".address <<= "inner".address) -> Seq Scan on clients c (cost=0.00..2074.76 rows=511 width=11) Filter: (commercial IS NULL) -> Materialize (cost=189.00..308.00 rows=11900 width=11) -> Seq Scan on clients_commercial cc (cost=0.00..189.00 rows=11900 width=11) Third attempt; provided some indexes, which unfortunately don't get used, making the query twice as slow as the previous one: SELECT c.address AS address, cc.address AS network FROM clients c, clients_commercial cc WHERE c.commercial IS NULL AND set_masklen(c.address, masklen(cc.address)) = cc.address ; CREATE INDEX clients_commercial_masklen_idx ON clients_commercial((masklen(address))); CREATE INDEX clients_32_idx ON clients((set_masklen(address, 32))); CREATE INDEX clients_24_idx ON clients((set_masklen(address, 24))); CREATE INDEX clients_16_idx ON clients((set_masklen(address, 16))); Nested Loop (cost=189.00..169488.51 rows=479 width=22) Join Filter: (set_masklen("outer".address, masklen("inner".address)) = "inner".address) -> Seq Scan on clients c (cost=0.00..2074.76 rows=511 width=11) Filter: (commercial IS NULL) -> Materialize (cost=189.00..308.00 rows=11900 width=11) -> Seq Scan on clients_commercial cc (cost=0.00..189.00 rows=11900 width=11) Anything else I could try? BTREE indexes don't seem to work with the <<= operator; is this not possible in principal, or simply something that has not been implmented yet?
On Mon, Feb 23, 2004 at 12:48:02PM +0100, Eric Jain wrote: > I'm trying to join two tables on an inet column, where one of the > columns may contain a subnet rather than a single host. Somehow the > operation isn't completing quite fast enough, even though neither table > is very large: > > table | rows > --------------------+-------- > clients | 115472 > clients_commercial | 11670 [snip] > Anything else I could try? BTREE indexes don't seem to work with the <<= > operator; is this not possible in principal, or simply something that > has not been implmented yet? I've been looking at a similar problem for a while. I found that the inet type didn't really give me the flexibility I needed, and indexing it in a way that worked with CIDR blocks didn't seem easy (and maybe not possible). So I rolled my own, based on the seg sample. <http://word-to-the-wise.com/ipr.tgz> is a datatype that contains a range of IPv4 addresses, and which has the various operators to make it GIST indexable. Untar it into contrib and make as usual. Input is of the form '10.11.12.13' or '10.11.12.13.0/25' or '10.11.12.13-10.11.12.13.127'. The function display() takes an ipr type and returns it formatted for display (as a dotted-quad if a /32, as CIDR format if possible, as a range of dotted-quads otherwise). A bunch of operators are included, but '&&' returns true if two ipr fields intersect. Bugs include: 0.0.0.0/0 doesn't do what it should on input. No documentation. No cast operators between ipr and inet types. No documentation. I was planning on doing some docs before releasing it, but here it is anyway. Cheers, Steve -- -- Steve Atkins -- steve@blighty.com
Eric, > Nested Loop > (cost=189.00..27359887.76 rows=607947200 width=22) > Join Filter: ("outer".address <<= "inner".address) > -> Seq Scan on clients c > (cost=0.00..2074.76 rows=102176 width=11) > -> Materialize > (cost=189.00..308.00 rows=11900 width=11) > -> Seq Scan on clients_commercial cc > (cost=0.00..189.00 rows=11900 width=11) To help you, we need EXPLAIN ANALYZE, not just EXPLAIN. Thanks! -- -Josh Berkus Aglio Database Solutions San Francisco
On Пнд, 2004-02-23 at 12:04 -0800, Josh Berkus wrote: > Eric, > > > Nested Loop > > (cost=189.00..27359887.76 rows=607947200 width=22) > > Join Filter: ("outer".address <<= "inner".address) > > -> Seq Scan on clients c > > (cost=0.00..2074.76 rows=102176 width=11) > > -> Materialize > > (cost=189.00..308.00 rows=11900 width=11) > > -> Seq Scan on clients_commercial cc > > (cost=0.00..189.00 rows=11900 width=11) > > To help you, we need EXPLAIN ANALYZE, not just EXPLAIN. Thanks! He said he cancelled the query. -- Markus Bertheau <twanger@bluetwanger.de>
> <http://word-to-the-wise.com/ipr.tgz> is a datatype that contains > a range of IPv4 addresses, and which has the various operators to > make it GIST indexable. Great, this looks very promising. > No cast operators between ipr and inet types. Any way to work around this, short of dumping and reloading tables? SELECT ipr '1.2.3.4'; -- Okay SELECT ipr text(inet '1.2.3.4'); -- Syntax error, of course SELECT ipr(text(inet '1.2.3.4')); -- Function does not exist, of course ...
"Eric Jain" <Eric.Jain@isb-sib.ch> writes: >> <http://word-to-the-wise.com/ipr.tgz> is a datatype that contains >> a range of IPv4 addresses, and which has the various operators to >> make it GIST indexable. > Great, this looks very promising. >> No cast operators between ipr and inet types. > Any way to work around this, short of dumping and reloading tables? Wouldn't it be better to implement the GIST indexing operators of that package on the standard datatypes? It wasn't apparent to me what "range of IP addresses" does for you that isn't covered by "CIDR subnet" for real-world cases. regards, tom lane
Tom Lane wrote: >"Eric Jain" <Eric.Jain@isb-sib.ch> writes: > > >>><http://word-to-the-wise.com/ipr.tgz> is a datatype that contains >>>a range of IPv4 addresses, and which has the various operators to >>>make it GIST indexable. >>> >>> > > > >>Great, this looks very promising. >> >> > > > >>>No cast operators between ipr and inet types. >>> >>> > > > >>Any way to work around this, short of dumping and reloading tables? >> >> > >Wouldn't it be better to implement the GIST indexing operators of that >package on the standard datatypes? It wasn't apparent to me what "range >of IP addresses" does for you that isn't covered by "CIDR subnet" for >real-world cases. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend > > We currently only allow access to one of our apps based on IP address. These IPs are stored one per row in a single table, but often represent a contiguous piece of IP space, but does not represent a full subnet. The current CIDR subnet has the limitation that it will only allow full subnets, i.e. every IP address in 192.168.1.0/24. For example: 192.168.1.15 -> 192.168.1.31 This range cannot be represented by a CIDR subnet, or it might be able to but I really dont want to figure it out each time. However this new type allows us to store this range as one row. It allows an arbitrary range of IP addresses, not just those in a specific subnet. I would see this as a useful inclusion whether in the main src tree or in contrib and we will probably be using it when we get to "mess" with the database schema for this app in the next few months, in fact I have already inserted it into our PG source tree ;-). Nick P.S. We are not responsible for the IP address ranges, we just get told what they are.
On Tue, Feb 24, 2004 at 10:23:22AM -0500, Tom Lane wrote: > "Eric Jain" <Eric.Jain@isb-sib.ch> writes: > >> <http://word-to-the-wise.com/ipr.tgz> is a datatype that contains > >> a range of IPv4 addresses, and which has the various operators to > >> make it GIST indexable. > > > Great, this looks very promising. > > >> No cast operators between ipr and inet types. > > > Any way to work around this, short of dumping and reloading tables? > > Wouldn't it be better to implement the GIST indexing operators of that > package on the standard datatypes? It wasn't apparent to me what "range > of IP addresses" does for you that isn't covered by "CIDR subnet" for > real-world cases. Well, maybe. However, many of the cases where people want to use this sort of functionality (address range ownership, email blacklists etc) an entity is likely to associated with one or a small number of ranges of contiguous addresses. Those ranges are often not simple CIDR blocks, and deaggregating them into a sequence of CIDR blocks doesn't buy anything and complicates the problem. I also managed to convince myself that it wasn't possible to do a useful GIST index of a CIDR datatype - as the union between two adjacent CIDR blocks as a CIDR block is often far, far larger than the actual range involved - consider 63.255.255.255/32 and 64.0.0.0/32. That seemed to break the indexing algorithms. I'd like to be proven wrong on that, but would still find ipr a more useful datatype than inet for my applications. Cheers, Steve
On Tue, Feb 24, 2004 at 01:07:10PM +0100, Eric Jain wrote: > > <http://word-to-the-wise.com/ipr.tgz> is a datatype that contains > > a range of IPv4 addresses, and which has the various operators to > > make it GIST indexable. > > Great, this looks very promising. > > > No cast operators between ipr and inet types. > > Any way to work around this, short of dumping and reloading tables? > > SELECT ipr '1.2.3.4'; -- Okay > SELECT ipr text(inet '1.2.3.4'); -- Syntax error, of course > SELECT ipr(text(inet '1.2.3.4')); -- Function does not exist, of course There's probably some horrible SQL hack that would let you do it, but I should add some casting code anyway. Shouldn't be too painful to do - I'll try and get that, and some minimal documentation out today. Cheers, Steve
On Tue, Feb 24, 2004 at 09:14:42AM -0800, Steve Atkins wrote: > On Tue, Feb 24, 2004 at 01:07:10PM +0100, Eric Jain wrote: > > > <http://word-to-the-wise.com/ipr.tgz> is a datatype that contains > > > a range of IPv4 addresses, and which has the various operators to > > > make it GIST indexable. > > > > Great, this looks very promising. > > > > > No cast operators between ipr and inet types. > > > > Any way to work around this, short of dumping and reloading tables? > > There's probably some horrible SQL hack that would let you do it, but > I should add some casting code anyway. Shouldn't be too painful to do - > I'll try and get that, and some minimal documentation out today. Done. <http://word-to-the-wise.com/ipr/> This really isn't pgsql-performance content, so this is the last time I'll mention it here. Cheers, Steve