Re: query produces 1 GB temp file - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: query produces 1 GB temp file
Date
Msg-id 1161964168.1655.1517.camel@state.g2switchworks.com
Whole thread Raw
In response to query produces 1 GB temp file  (Dirk Lutzebaeck <lutzeb@aeccom.com>)
List pgsql-performance
On Sat, 2005-02-05 at 11:25, Dirk Lutzebaeck wrote:
> Hi,
>
> here is a query which produces over 1G temp file in pgsql_tmp. This
> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
> sort_mem and 320MB shared_mem.

First step, upgrade to the latest 7.4.x version.  7.4.2 is an OLD
version of 7.4  I think the latest version is 7.4.13.

> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
> tables have been analyzed before.

SNIP

> EXPLAIN ANALYZE gives:
>
>
>  Limit  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1)
>    ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1)
>          ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
>                Sort Key: ft.val_9, ft.created, ft.flatid
>                ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
>                      ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563
loops=1)
>                            ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014
rows=48563loops=1) 
>                                  ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8) (actual
time=0.184..46.519rows=5863 loops=1) 
>                                        Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
>                                  ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42) (actual
time=0.052..0.218rows=8 loops=5863) 
>                                        Index Cond: ("outer".en = bi.en)
>                                        Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
>                            ->  Index Scan using en_oid_index on en  (cost=0.00..5.01 rows=1 width=9) (actual
time=0.015..0.019rows=1 loops=48563) 
>                                  Index Cond: ("outer".en = en.oid)
>                      ->  Index Scan using df_en on df ft  (cost=0.00..151.71 rows=49 width=1322) (actual
time=0.038..0.148rows=14 loops=48563) 
>                            Index Cond: ("outer".en = ft.en)
>                            Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
>  Total runtime: 81782.052 ms
> (18 rows)

Why do you have an index scan on en_oid_index that thinks it will return
1 row when it returns 48563, and one on df_en that thinks it will return
49 and returns 48563 as well?   Is this database analyzed often?  Are
oids even analyzed?  I'd really recommend switching off of them as they
complicate backups and restores.

If analyze doesn't help, you can try brute forcing off nested loops for
this query and see if that helps.  nested loop is really slow for large
numbers of rows.

pgsql-performance by date:

Previous
From: Thomas Burdairon
Date:
Subject: Re: query produces 1 GB temp file
Next
From: Tom Lane
Date:
Subject: Re: VACUUMs take twice as long across all nodes