BUG: endless lseek(.., SEEK_END) from select queries on x64 builds - Mailing list pgsql-performance

From Markus Schulz
Subject BUG: endless lseek(.., SEEK_END) from select queries on x64 builds
Date
Msg-id 201302220925.26460.msc@antzsystem.de
Whole thread Raw
Responses Re: BUG: endless lseek(.., SEEK_END) from select queries on x64 builds
Re: BUG: endless lseek(.., SEEK_END) from select queries on x64 builds
List pgsql-performance
hello,

i have a strange and reproducible bug with some select queries and 64bit
postgresql builds (works fine on 32bit builds).
The postgres process will run with 100% cpu-load (no io-wait) and strace
will show endless lseek(..., SEEK_END) calls on one table for minutes.
lseek(28, 0, SEEK_END)                  = 26697728
lseek(28, 0, SEEK_END)                  = 26697728
lseek(28, 0, SEEK_END)                  = 26697728
...
the file-descriptor 28 points to the file for the webapps_base.Acquisition
table (see query/plan below).

Now the details:

The Query:
select count(ac.ID) as col_0_0_ from
    webapps_base.Acquisition ac,
    webapps_base.SalesPartnerStructure struc
    where
        struc.fk_SalesPartner_child=ac.fk_SalesPartner_ID
        and struc.fk_SalesPartner_parent=200
        and (ac.CreationDate between '2012-02-01' and '2013-01-31')
        and ac.acquisitiondepot='STANDARD'
        and ('2013-01-31' between struc.ValidFrom
            and coalesce(struc.ValidTo, '2013-01-31'))

The Plan:
"Aggregate  (cost=32617.11..32617.12 rows=1 width=8) (actual time=204.180..204.180 rows=1 loops=1)"
"  ->  Merge Join  (cost=32232.01..32598.26 rows=7543 width=8) (actual time=172.882..202.218 rows=21111 loops=1)"
"        Merge Cond: (ac.fk_salespartner_id = struc.fk_salespartner_child)"
"        ->  Sort  (cost=5582.60..5635.69 rows=21235 width=16) (actual time=28.920..31.121 rows=21204 loops=1)"
"              Sort Key: ac.fk_salespartner_id"
"              Sort Method: quicksort  Memory: 1763kB"
"              ->  Bitmap Heap Scan on acquisition ac  (cost=395.26..4056.43 rows=21235 width=16) (actual
time=3.064..15.868rows=21223 loops=1)" 
"                    Recheck Cond: ((creationdate >= '2012-02-01'::date) AND (creationdate <= '2013-01-31'::date))"
"                    Filter: ((acquisitiondepot)::text = 'STANDARD'::text)"
"                    ->  Bitmap Index Scan on index_acquistion_creationdate  (cost=0.00..389.95 rows=21267 width=0)
(actualtime=2.890..2.890 rows=21514 loops=1)" 
"                          Index Cond: ((creationdate >= '2012-02-01'::date) AND (creationdate <= '2013-01-31'::date))"
"        ->  Sort  (cost=26648.60..26742.61 rows=37606 width=8) (actual time=143.952..152.808 rows=131713 loops=1)"
"              Sort Key: struc.fk_salespartner_child"
"              Sort Method: quicksort  Memory: 8452kB"
"              ->  Bitmap Heap Scan on salespartnerstructure struc  (cost=3976.80..23790.79 rows=37606 width=8) (actual
time=13.279..64.681rows=114772 loops=1)" 
"                    Recheck Cond: (fk_salespartner_parent = 200)"
"                    Filter: (('2013-01-31'::date >= validfrom) AND ('2013-01-31'::date <= COALESCE(validto,
'2013-01-31'::date)))"
"                    ->  Bitmap Index Scan on index_parent_salespartner  (cost=0.00..3967.39 rows=114514 width=0)
(actualtime=13.065..13.065 rows=116479 loops=1)" 
"                          Index Cond: (fk_salespartner_parent = 200)"
"Total runtime: 205.397 ms"

as you can see the query runs fine.
I can run this query from a bash-psql-while-loop/jdbc-cli-tool
endless without any problems.
so far so good.

But now i run the same query from:

JBoss EAP 5.1.2 with connection pooling and xa-datasource/two-phase-commits
(transactions on multiple datasources needed)
*and* *<prepared-statement-cache-size>1000</prepared-statement-cache-size>*

i can run the query four times with good performance and after that postgresql
starts with the strange lseek() behavior.
The query needs more then a minute to complete and during execution the
postgres process runs at 100% cpu load with lseek calls (straced).
If i flush the connection pool (close all open connections from the jboss
jmx-console) it works again for four calls.
These problem applies only to 64bit builds. If i run a 32bit postgresql
server it works fine.

We have tested the following environments:

- Debian Squeeze 64bit with Postgresql 9.1.[5,6,7] -> Bad behavior
- Debian Wheezy 64bit with Postgresql 9.1.8 64bit -> Bad behavior
- Ubuntu 12.04 LTS 64bit with Postgresql 9.1.8 64bit -> Bad behavior
- Windows 7 x64 with Postgresql 9.1.8 64bit -> Bad behavior
- Debian Wheezy 64bit with EnterpriseDB 9.2 64bit -> Bad behavior

- Debian Wheezy 64bit with Postgresql 9.1.8 32bit -> Good behavior
- Debian Wheezy 32bit with Postgresql 9.1.8 32bit -> Good behavior

as you can see all 64bit builds of postgresql are affected (independent from os-arch).

If i disable the prepared-statement-cache-size (remove it from -ds.xml)
it works on 64bit build too.

regards,
msc


pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Are bitmap index scans slow to start?
Next
From: Heikki Linnakangas
Date:
Subject: Re: BUG: endless lseek(.., SEEK_END) from select queries on x64 builds