Home > mailing lists

Re: mis-estimate in nested query causes slow runtimes - Mailing list pgsql-performance

From	Chris Kratz
Subject	Re: mis-estimate in nested query causes slow runtimes
Date	February 18, 2008 15:40:40
Msg-id	3642025c0802181132g425dfb68xec0565e9f99413f8@mail.gmail.com Whole thread Raw
In response to	Re: mis-estimate in nested query causes slow runtimes (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: mis-estimate in nested query causes slow runtimes
List	pgsql-performance

Tree view

On 2/11/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Chris Kratz <chris.kratz@vistashare.com> writes:
> The first frustration is that I can't get the transaction details scan
> to get any more accurate.  It thinks it will find 1407 records,
> instead it finds 20,153.  Then for whatever reason it thinks that a
> join between 1 record and 1407 records will return 1 record.  This is
> mainly what I can't understand.  Why does it think it will only get
> one record in response when it's a left join?

I don't see any left join there ...

> PG 8.2.4 on Linux kernel 2.6.9 x64

The first thing you should do is update to 8.2.6; we've fixed a fair
number of problems since then that were fallout from the outer-join
planning rewrite in 8.2.

If it still doesn't work very well, please post the pg_stats rows for
the join columns involved (idatrndtl.ida_trans_match_source_id and
mtchsrcprj3.id).  (You do have up-to-date ANALYZE stats for both
of those tables, right?)

                        regards, tom lane

Hello Tom,

We've updated to Postgres 8.2.6 on our production database over the weekend. Unfortunately, the estimates on this query are no better after the upgrade. Here is just the part of the estimate that is incorrect. (2 vs 20153)

-> Nested Loop (cost=12.68..165.69 rows=2 width=38) (actual time=0.089..29.792 rows=20153 loops=1)

-> Hash Join (cost=12.68..24.37 rows=1 width=24) (actual time=0.064..0.135 rows=1 loops=1)

Hash Cond: (mtchsrcprj3.funding_source_id = mtchsrcprjfs3.nameid)

-> Seq Scan on project mtchsrcprj3 (cost=0.00..11.22 rows=122 width=8) (actual time=0.002..0.053 rows=122 loops=1)

-> Hash (cost=12.61..12.61 rows=6 width=24) (actual time=0.017..0.017 rows=1 loops=1)

-> Index Scan using name_float_lfm_idx on namemaster mtchsrcprjfs3 (cost=0.00..12.61 rows=6 width=24) (actual time=0.012..0.013 rows=1 loops=1)

Index Cond: (name_float_lfm = 'DWS'::text)

-> Index Scan using transaction_details_ida_trans_match_source_id on transaction_details idatrndtl (cost=0.00..123.72 rows=1408 width=22) (actual time=0.023..17.128 rows=20153 loops=1)

(Entire explain analyze posted earlier in thread)

Total Query runtime: 35309.298 ms

Same w/ enable_nestloop off: 761.715 ms

I've tried the stats up to 1000 on both columns which causes no differences. Currently the stats are at 100.

test=# alter table transaction_details alter column ida_trans_match_source_id set statistics 100;

ALTER TABLE

test=# analyze transaction_details;

ANALYZE

test=# alter table project alter column id set statistics 100;

ALTER TABLE

test=# analyze project;

ANALYZE

Stats rows in pg_stats for these two columns:

test=# select * from pg_stats where tablename = 'transaction_details' and attname='ida_trans_match_source_id';

------------+---------------------+---------------------------+-----------+-----------+------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------

public | transaction_details | ida_trans_match_source_id | 0.479533 | 4 | 69 | {818,832,930,937,923,812,931,836,837,829,830,14,809} | {0.1024,0.0991333,0.0408,0.0232,0.0221,0.0219,0.0207,0.0188667,0.0186667,0.0177667,0.0176667,0.0130333,0.0118667} | {6,802,813,813,814,814,815,815,816,816,817,817,827,827,833,835,835,838,838,838,838,838,843,920,921,921,921,921,922,922,924,924,924,924,925,926,926,928,928,934,936,936,936,936,936,938,939,941,941,955,965,967,968,968,974,980} | 0.178655

(1 row)

test=# select * from pg_stats where tablename = 'project' and attname='id';

------------+-----------+---------+-----------+-----------+------------+------------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------

public | project | id | 0 | 4 | -1 | | | {6,7,8,12,13,15,17,18,19,24,26,27,28,29,30,32,33,34,35,36,41,42,71,72,802,803,809,812,813,815,816,817,818,822,824,825,826,827,828,830,831,832,833,835,836,837,838,839,841,842,843,844,845,847,848,849,920,921,923,924,925,926,928,929,930,931,932,934,935,936,937,938,940,941,942,946,947,949,950,951,952,954,955,956,957,958,960,961,962,963,964,966,967,968,969,970,973,974,975,977,980} | 0.937228

(1 row)

PG 8.2.6 on linux x86_64, 8G ram, 4x15k->db, 2x10k-> OS & WAL

postgresql.conf settings of note:

shared_buffers = 1024MB

work_mem = 246MB

maintenance_work_mem = 256MB

random_page_cost = 1.75

effective_cache_size=2048MB

Any ideas how we can get the query to run faster?

Thanks,

-Chris

pgsql-performance by date:

From: Tobias Brox
Date: 18 February 2008, 12:49:14
Subject: Re: Disable WAL completely

From: Greg Smith
Date: 18 February 2008, 17:24:39
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine

Re: mis-estimate in nested query causes slow runtimes - Mailing list pgsql-performance

Previous

Next