Thread: The most efficient way to put this?

The most efficient way to put this?

From
"Arsalan Zaidi"
Date:
The query I'd like to run is fairly simple. Unfortunately, PG doesn't handle
IN's very well.

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN (select
zz.a from zz);

This is with two very large tables. Both zz and aa have many millions of
rows.

The best I've come up with so far is...

 SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE NOT EXISTS (SELECT
zz.a FROM zz WHERE aa.a = zz.a);

Does anyone have anything better?

--Arsalan

-------------------------------------------------------------------
People often hate those things which they do not know, or cannot understand.
--Ali Ibn Abi Talib (A.S.)


Re: The most efficient way to put this?

From
"Marc Polatschek"
Date:
hm, you can try to make a statement with UNION and GROUP by. i dont know
if it is faster...

-----Ursprüngliche Nachricht-----
Von: Arsalan Zaidi [mailto:azaidi@directi.com]
Gesendet: Dienstag, 5. März 2002 11:12
An: pgsql-general@postgresql.org
Betreff: [GENERAL] The most efficient way to put this?


The query I'd like to run is fairly simple. Unfortunately, PG doesn't
handle
IN's very well.

SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN
(select
zz.a from zz);

This is with two very large tables. Both zz and aa have many millions of
rows.

The best I've come up with so far is...

 SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE NOT EXISTS
(SELECT
zz.a FROM zz WHERE aa.a = zz.a);

Does anyone have anything better?

--Arsalan

-------------------------------------------------------------------
People often hate those things which they do not know, or cannot
understand.
--Ali Ibn Abi Talib (A.S.)


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: The most efficient way to put this?

From
Jean-Michel POURE
Date:
Le Mardi 5 Mars 2002 11:11, Arsalan Zaidi a écrit :
> SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN
> (select zz.a from zz);

1) LEFT JOIN
SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c
FROM aa
LEFT JOIN zz ON zz.a = aa.a
WHERE zz.a IS NULL;

Run the query twice and VACUUM ANALYSE the database. Re-run. Is it faster
now? Make sure the required fields are indexed.

2) TRIGGER
Another solution if is to add a boolean field in aa and

- create a trigger on aa which stores the result of "NOT IN (select zz.a FROM
zz)". Then, select queries only rely on table aa, which is "lightning fast".
This is the kind of queries where PostgreSQL can be 10 times faster than
MySQL.

- create a trigger on bb which updates calculated values in aa when bb values
are updated.

The trigger solution is only possible if one of the two tables does not
change ofter (its values are not altered continuously).

Cheers,
Jean-Michel POURE

Re: The most efficient way to put this?

From
rolf.ostvik@axxessit.no
Date:
On 2002-03-05 "Arsalan Zaidi" <azaidi@directi.com> wrote

>SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN
(select
>zz.a from zz);

Does this work?

select distinct on (aa.a) aa.a, aa.b, aa.c
  from aa left join zz on aa.a = zz.a
  where zz.a isnull

--
Rolf