RE: FW: Optimisation of IN condition - Mailing list pgsql-general

From Mayers, Philip J
Subject RE: FW: Optimisation of IN condition
Date
Msg-id A0F836836670D41183A800508BAF190B35E3F3@icex1.cc.ic.ac.uk
Whole thread Raw
In response to FW: Optimisation of IN condition  ("Mayers, Philip J" <p.mayers@ic.ac.uk>)
List pgsql-general
And a yet-more efficient system, I hope:

select * from arptable where not exists (select 1 from host where
arptable.mac = host.mac) order by router,interface,ip;

Could someone guarantee me that does what I think it does? If so, sorry for
the verbose emails!

Regards,
Phil

+----------------------------------+
| Phil Mayers, Network Support     |
| Centre for Computing Services    |
| Imperial College                 |
+----------------------------------+

-----Original Message-----
From: Mayers, Philip J [mailto:p.mayers@ic.ac.uk]
Sent: 08 March 2001 12:18
To: 'pgsql-general@postgresql.org'
Subject: RE: [GENERAL] FW: Optimisation of IN condition


Ok, after some suggestions from a colleague, I've refactored the query to
use an outer join, like this:

hdb=> select host.ip as registeredip,arptable.ip as
realip,host.mac,arptable.router,arptable.interface from host,arptable where
host.mac = arptable.mac and host.ip = arptable.ip
hdb-> union
hdb-> select NULL as registeredip,arptable.ip as
realip,arptable.mac,arptable.router,arptable.interface from arptable
hdb-> order by router,interface,mac;

  registeredip   |     realip      |        mac        |          router
| interface
-----------------+-----------------+-------------------+--------------------
------+-----------
                 | 192.168.4.39    | 00:10:5a:bd:79:2f |
a-routername.domain.xx   |        21
                 | 192.168.4.238   | 00:10:5a:bd:79:e8 |
a-routername.domain.xx   |        21
 192.168.4.181   | 192.168.4.181   | 00:10:5a:bd:7b:4e |
a-routername.domain.xx   |        21
                 | 192.168.4.181   | 00:10:5a:bd:7b:4e |
a-routername.domain.xx   |        21
                 | 192.168.4.192   | 00:10:5a:bd:7d:35 |
a-routername.domain.xx   |        21
                 | 192.168.4.239   | 00:10:5a:bd:82:6c |
a-routername.domain.xx   |        21
 192.168.4.171   | 192.168.4.171   | 00:10:5a:bd:84:6d |
a-routername.domain.xx   |        21
                 | 192.168.4.171   | 00:10:5a:bd:84:6d |
a-routername.domain.xx   |        21
                 | 192.168.4.212   | 00:10:5a:bd:84:97 |
a-routername.domain.xx   |        21
                 | 192.168.4.194   | 00:10:5a:bd:84:a1 |
a-routername.domain.xx   |        21
 192.168.4.182   | 192.168.4.182   | 00:10:5a:bd:84:c2 |
a-routername.domain.xx   |        21
                 | 192.168.4.182   | 00:10:5a:bd:84:c2 |
a-routername.domain.xx   |        21

Cool - now I can identify unregistered machines using a NULL. But, I'm
getting repeat columns. Do I have to use CORRESPONDING BY (realip,mac),
which postgres doesn't support, or is there another way?

Regards,
Phil

+----------------------------------+
| Phil Mayers, Network Support     |
| Centre for Computing Services    |
| Imperial College                 |
+----------------------------------+

-----Original Message-----
From: Mayers, Philip J [mailto:p.mayers@ic.ac.uk]
Sent: 08 March 2001 11:22
To: 'pgsql-general@postgresql.org'
Subject: [GENERAL] FW: Optimisation of IN condition



I've got some tables:

create table interface (
  machineid text,
  mac macaddr,
  primary key(mac)
);

create table arptable (
  router text,
  interface int2,
  mac macaddr,
  ip inet
);

They're big, 10k rows in interface, maybe 35k in arptable. I want to do
this:

hdb=> explain select * from arptable where mac not in (select mac from
interface);
NOTICE:  QUERY PLAN:

Seq Scan on arptable  (cost=0.00..407762.81 rows=4292 width=48)
  SubPlan
    ->  Seq Scan on interface  (cost=0.00..189.96 rows=8796 width=6)

But, of course, that a very expensive task. Now, it seems to me that, since
I have an index on mac in interface, I *should* in theory be able to speed
this up, in the following pseudo-code fashion:

foreach mac in arptable:
    if lookup(mac,interface_pkey):
        return *

Do you see what I'm getting at? Can I refashion the query somehow to take
advantage of that? The converse operation, finding registered machines:

hdb=> explain select interface.mac from arptable,interface where
interface.mac = arptable.mac;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..8838.17 rows=4292 width=12)
  ->  Seq Scan on arptable  (cost=0.00..97.92 rows=4292 width=6)
  ->  Index Scan using interface_pkey on interface  (cost=0.00..2.02 rows=1
width=6)

Is, of course, speedy. How can I take advantage of that. EXCEPT doesn't seem
to help:

hdb=> explain select mac from arptable except select interface.mac from
arptable,interface where interface.mac = arptable.mac;
NOTICE:  QUERY PLAN:

Seq Scan on arptable  (cost=0.00..37933516.98 rows=4292 width=6)
  SubPlan
    ->  Materialize  (cost=8838.17..8838.17 rows=4292 width=12)
          ->  Nested Loop  (cost=0.00..8838.17 rows=4292 width=12)
                ->  Seq Scan on arptable  (cost=0.00..97.92 rows=4292
width=6)
                ->  Index Scan using interface_pkey on interface
(cost=0.00..2.02 rows=1 width=6)


Help!

Regards,
Phil

+----------------------------------+
| Phil Mayers, Network Support     |
| Centre for Computing Services    |
| Imperial College                 |
+----------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

pgsql-general by date:

Previous
From: "Mayers, Philip J"
Date:
Subject: RE: FW: Optimisation of IN condition
Next
From: "chris markiewicz"
Date:
Subject: length of insert stmt?