Thread: The most efficient way to put this?
The query I'd like to run is fairly simple. Unfortunately, PG doesn't handle IN's very well. SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN (select zz.a from zz); This is with two very large tables. Both zz and aa have many millions of rows. The best I've come up with so far is... SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE NOT EXISTS (SELECT zz.a FROM zz WHERE aa.a = zz.a); Does anyone have anything better? --Arsalan ------------------------------------------------------------------- People often hate those things which they do not know, or cannot understand. --Ali Ibn Abi Talib (A.S.)
hm, you can try to make a statement with UNION and GROUP by. i dont know if it is faster... -----Ursprüngliche Nachricht----- Von: Arsalan Zaidi [mailto:azaidi@directi.com] Gesendet: Dienstag, 5. März 2002 11:12 An: pgsql-general@postgresql.org Betreff: [GENERAL] The most efficient way to put this? The query I'd like to run is fairly simple. Unfortunately, PG doesn't handle IN's very well. SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN (select zz.a from zz); This is with two very large tables. Both zz and aa have many millions of rows. The best I've come up with so far is... SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE NOT EXISTS (SELECT zz.a FROM zz WHERE aa.a = zz.a); Does anyone have anything better? --Arsalan ------------------------------------------------------------------- People often hate those things which they do not know, or cannot understand. --Ali Ibn Abi Talib (A.S.) ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Le Mardi 5 Mars 2002 11:11, Arsalan Zaidi a écrit : > SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN > (select zz.a from zz); 1) LEFT JOIN SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa LEFT JOIN zz ON zz.a = aa.a WHERE zz.a IS NULL; Run the query twice and VACUUM ANALYSE the database. Re-run. Is it faster now? Make sure the required fields are indexed. 2) TRIGGER Another solution if is to add a boolean field in aa and - create a trigger on aa which stores the result of "NOT IN (select zz.a FROM zz)". Then, select queries only rely on table aa, which is "lightning fast". This is the kind of queries where PostgreSQL can be 10 times faster than MySQL. - create a trigger on bb which updates calculated values in aa when bb values are updated. The trigger solution is only possible if one of the two tables does not change ofter (its values are not altered continuously). Cheers, Jean-Michel POURE
On 2002-03-05 "Arsalan Zaidi" <azaidi@directi.com> wrote >SELECT DISTINCT ON (aa.a) aa.a, aa.b, aa.c FROM aa WHERE aa.a NOT IN (select >zz.a from zz); Does this work? select distinct on (aa.a) aa.a, aa.b, aa.c from aa left join zz on aa.a = zz.a where zz.a isnull -- Rolf