"NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2 - Mailing list pgsql-performance

From Achilleas Mantzios
Subject "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
Date
Msg-id 201101171735.16162.achill@matrix.gatewaynet.com
Whole thread Raw
Responses Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2
List pgsql-performance
Hello,
just coming from this thread : http://archives.postgresql.org/pgsql-admin/2011-01/msg00050.php
It started as an admin question and turned out to be a performance question.
You may look at it for a history of this issue. I will repost all data here.

Description of the machines involved:

1) Prod machine (thereafter called LINUX_PROD) :
System: Linux Suse 2.6.16.46-0.12-smp, 16 x Intel Xeon(R) X7350 @ 2.93GHz, 64GB memory
DB: PostgreSQL 8.3.13, shared_buffers=16GB, work_mem=512MB, db size=94GB
2) Dev machine (therafter called FBSD_DEV) :
System : FreeBSD 6.3, Intel(R) Core(TM)2 Duo CPU @ 2.80GHz, 2GB memory
DB: PostgreSQL 8.3.13, shared_buffers=512MB, work_mem=1MB, db size=76GB
3) Test machine (thereafter called FBSD_TEST) :
System: FreeBSD 8.1, 4 x AMD Phenom(tm) 965 @ 3.4 GHz, 8GB memory
DB: PostgreSQL 9.0.2, shared_buffers=5GB, work_mem=512MB, db size=7GB
4) Linux Test machine (thereafter called LINUX_TEST) :
System : Debian GNU/Linux 5.0, 2x AMD athlon @2.2GZ, 4GB Mem
DB: PostgreSQL 9.0.2, shared_buffers=2GB, work_mem=512MB, db size=7GB

(all DBs in the last three systems are identical, originating from FBSD_DEV)
(additiinally no paging or thrashing were observed during the tests)

Query is :
SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm  
where m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15'
and 
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and m.id not in
(SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.idand  
msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <= '2007-01-11'
andexists  
(select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and msold2.id <>
msold.idand  
msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')  and mold.marinertype='Mariner' )
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');

i get the following execution times: (with \timing)
FBSD_DEV : query : 240.419 ms
LINUX_PROD : query : 219.568 ms
FBSD_TEST : query :  2285.509 ms
LINUX_TEST : query : 5788.988 ms

Re writing the query in the "NOT EXIST" variation like:

SELECT distinct m.id,coalesce(m.givenname,''),coalesce(m.midname,''),m.surname from marinerstates ms,vessels
vsl,marinerm where  
m.id=ms.marinerid and ms.vslid=vsl.id and ms.state='Active' and coalesce(ms.endtime,now())::date >= '2006-07-15' and
ms.starttime::date <= '2007-01-11'  and m.marinertype='Mariner'  and NOT EXISTS
   (SELECT distinct mold.id from marinerstates msold,vessels vslold,mariner mold  where mold.id=msold.marinerid and
msold.vslid=vslold.idand  
   msold.state='Active' and coalesce(msold.endtime,now())::date >= '2006-07-15' and msold.starttime::date <=
'2007-01-11'and  
   exists (select 1 from marinerstates msold2 where msold2.marinerid=msold.marinerid and msold2.state='Active' and
msold2.id<> msold.id and  
              msold2.starttime<msold.starttime AND (msold.starttime-msold2.endtime)<='18 months')  
   and mold.marinertype='Mariner' AND mold.id=m.id)
order by m.surname,coalesce(m.givenname,''),coalesce(m.midname,'');
gives:

FBSD_DEV : query : 154.000 ms
LINUX_PROD : query : 153.408 ms
FBSD_TEST : query :  137.000 ms
LINUX_TEST : query : 404.000 ms

I found this query, since i observed that running the calling program was actually the first case that i
encountered FBSD_TEST (while running a bigger database, a recent dump from LINUX_PROD) to be actually
slower than LINUX_PROD.
From the whole set of the tests involved, it seems like the "NOT IN" version of the query runs slow
in any postgresql 9.0.2 tested.
--
Achilleas Mantzios

pgsql-performance by date:

Previous
From: "Ing. Marcos Ortiz Valmaseda"
Date:
Subject: Re: Possible to improve query plan?
Next
From: "Ing. Marcos Ortiz Valmaseda"
Date:
Subject: Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2