Thread: query produces 1 GB temp file

query produces 1 GB temp file

From

Dirk Lutzebaeck

Date:

27 October 2006, 11:46:30

Hi,

here is a query which produces over 1G temp file in pgsql_tmp. This
is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
sort_mem and 320MB shared_mem.

Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
tables have been analyzed before.

Can some please explain why the temp file is so huge? I understand
there are a lot of rows. All relevant indices seem to be used.

Thanks in advance,

Dirk

EXPLAIN
SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart, ft.flatobj, bi.oid, bi.en
FROM bi, en, df AS ft, es
WHERE bi.rc=130170467
AND bi.en=ft.en
AND bi.co=117305223
AND bi.hide=FALSE
AND ft.en=en.oid
AND es.en=bi.en
AND es.co=bi.co
AND es.spec=122293729
AND (ft.val_2='DG' OR ft.val_2='SK')
AND ft.docstart=1
ORDER BY ft.val_9 ASC, ft.created DESC
LIMIT 1000 OFFSET 0;

 Limit  (cost=8346.75..8346.78 rows=3 width=1361)
   ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361)
         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361)
               Sort Key: ft.val_9, ft.created, ft.flatid
               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361)
                     ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51)
                           ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42)
                                 ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8)
                                       Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
                                 ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42)
                                       Index Cond: ("outer".en = bi.en)
                                       Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
                           ->  Index Scan using en_oid_index on en  (cost=0.00..5.01 rows=1 width=9)
                                 Index Cond: ("outer".en = en.oid)
                     ->  Index Scan using df_en on df ft  (cost=0.00..151.71 rows=49 width=1322)
                           Index Cond: ("outer".en = ft.en)
                           Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
(17 rows)


--------------

EXPLAIN ANALYZE gives:


 Limit  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1)
   ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1)
         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
               Sort Key: ft.val_9, ft.created, ft.flatid
               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
                     ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563
loops=1)
                           ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014
rows=48563loops=1) 
                                 ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8) (actual
time=0.184..46.519rows=5863 loops=1) 
                                       Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
                                 ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42) (actual
time=0.052..0.218rows=8 loops=5863) 
                                       Index Cond: ("outer".en = bi.en)
                                       Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
                           ->  Index Scan using en_oid_index on en  (cost=0.00..5.01 rows=1 width=9) (actual
time=0.015..0.019rows=1 loops=48563) 
                                 Index Cond: ("outer".en = en.oid)
                     ->  Index Scan using df_en on df ft  (cost=0.00..151.71 rows=49 width=1322) (actual
time=0.038..0.148rows=14 loops=48563) 
                           Index Cond: ("outer".en = ft.en)
                           Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
 Total runtime: 81782.052 ms
(18 rows)

Re: query produces 1 GB temp file

From

"Merlin Moncure"

Date:

27 October 2006, 13:52:14

On 2/5/05, Dirk Lutzebaeck <lutzeb@aeccom.com> wrote:
> here is a query which produces over 1G temp file in pgsql_tmp. This
> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
> sort_mem and 320MB shared_mem.
>
> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
> tables have been analyzed before.
>
> Can some please explain why the temp file is so huge? I understand
> there are a lot of rows. All relevant indices seem to be used.

how much memory have you set aside for sorting? also, this query will
likely run better in a more recent version of postgresql if thats
possible.

merlin

Re: query produces 1 GB temp file

From

Bricklen Anderson

Date:

27 October 2006, 14:34:19

Merlin Moncure wrote:
> On 2/5/05, Dirk Lutzebaeck <lutzeb@aeccom.com> wrote:
<snip>
Was the original message actually from 2/5/05?

Re: query produces 1 GB temp file

From

Alexander Staubo

Date:

27 October 2006, 14:48:44

While I can't explain why PostgreSQL would use that memory, I
recommend looking into tweaking the work_mem parameter. This setting
specifies how much memory PostgreSQL on certain temporary data
structures (hash tables, sort vectors) until it starts using
temporary files. Quoting the docs:

> work_mem (integer)
> Specifies the amount of memory to be used by internal sort
> operations and hash tables before switching to temporary disk
> files. The value is specified in kilobytes, and defaults to 1024
> kilobytes (1 MB). Note that for a complex query, several sort or
> hash operations might be running in parallel; each one will be
> allowed to use as much memory as this value specifies before it
> starts to put data into temporary files. Also, several running
> sessions could be doing such operations concurrently. So the total
> memory used could be many times the value of work_mem; it is
> necessary to keep this fact in mind when choosing the value. Sort
> operations are used for ORDER BY, DISTINCT, and merge joins. Hash
> tables are used in hash joins, hash-based aggregation, and hash-
> based processing of IN subqueries.

Alexander.

On Feb 5, 2005, at 18:25 , Dirk Lutzebaeck wrote:

> Hi,
>
> here is a query which produces over 1G temp file in pgsql_tmp. This
> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
> sort_mem and 320MB shared_mem.
>
> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
> tables have been analyzed before.
>
> Can some please explain why the temp file is so huge? I understand
> there are a lot of rows. All relevant indices seem to be used.
>
> Thanks in advance,
>
> Dirk
>
> EXPLAIN
> SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart,
> ft.flatobj, bi.oid, bi.en
> FROM bi, en, df AS ft, es
> WHERE bi.rc=130170467
> AND bi.en=ft.en
> AND bi.co=117305223
> AND bi.hide=FALSE
> AND ft.en=en.oid
> AND es.en=bi.en
> AND es.co=bi.co
> AND es.spec=122293729
> AND (ft.val_2='DG' OR ft.val_2='SK')
> AND ft.docstart=1
> ORDER BY ft.val_9 ASC, ft.created DESC
> LIMIT 1000 OFFSET 0;
>
>  Limit  (cost=8346.75..8346.78 rows=3 width=1361)
>    ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361)
>          ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361)
>                Sort Key: ft.val_9, ft.created, ft.flatid
>                ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361)
>                      ->  Nested Loop  (cost=0.00..5757.17 rows=17
> width=51)
>                            ->  Nested Loop  (cost=0.00..5606.39
> rows=30 width=42)
>                                  ->  Index Scan using es_sc_index
> on es  (cost=0.00..847.71 rows=301 width=8)
>                                        Index Cond: ((spec =
> 122293729) AND (co = 117305223::oid))
>                                  ->  Index Scan using bi_env_index
> on bi  (cost=0.00..15.80 rows=1 width=42)
>                                        Index Cond: ("outer".en =
> bi.en)
>                                        Filter: ((rc =
> 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
>                            ->  Index Scan using en_oid_index on en
> (cost=0.00..5.01 rows=1 width=9)
>                                  Index Cond: ("outer".en = en.oid)
>                      ->  Index Scan using df_en on df ft
> (cost=0.00..151.71 rows=49 width=1322)
>                            Index Cond: ("outer".en = ft.en)
>                            Filter: (((val_2 = 'DG'::text) OR (val_2
> = 'SK'::text)) AND (docstart = 1))
> (17 rows)
>
>
> --------------
>
> EXPLAIN ANALYZE gives:
>
>
>  Limit  (cost=8346.75..8346.78 rows=3 width=1361) (actual
> time=75357.465..75679.964 rows=1000 loops=1)
>    ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361) (actual
> time=75357.459..75675.371 rows=1000 loops=1)
>          ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361)
> (actual time=75357.448..75499.263 rows=22439 loops=1)
>                Sort Key: ft.val_9, ft.created, ft.flatid
>                ->  Nested Loop  (cost=0.00..8346.73 rows=3
> width=1361) (actual time=34.104..18016.005 rows=703677 loops=1)
>                      ->  Nested Loop  (cost=0.00..5757.17 rows=17
> width=51) (actual time=0.467..3216.342 rows=48563 loops=1)
>                            ->  Nested Loop  (cost=0.00..5606.39
> rows=30 width=42) (actual time=0.381..1677.014 rows=48563 loops=1)
>                                  ->  Index Scan using es_sc_index
> on es  (cost=0.00..847.71 rows=301 width=8) (actual
> time=0.184..46.519 rows=5863 loops=1)
>                                        Index Cond: ((spec =
> 122293729) AND (co = 117305223::oid))
>                                  ->  Index Scan using bi_env_index
> on bi  (cost=0.00..15.80 rows=1 width=42) (actual time=0.052..0.218
> rows=8 loops=5863)
>                                        Index Cond: ("outer".en =
> bi.en)
>                                        Filter: ((rc =
> 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
>                            ->  Index Scan using en_oid_index on en
> (cost=0.00..5.01 rows=1 width=9) (actual time=0.015..0.019 rows=1
> loops=48563)
>                                  Index Cond: ("outer".en = en.oid)
>                      ->  Index Scan using df_en on df ft
> (cost=0.00..151.71 rows=49 width=1322) (actual time=0.038..0.148
> rows=14 loops=48563)
>                            Index Cond: ("outer".en = ft.en)
>                            Filter: (((val_2 = 'DG'::text) OR (val_2
> = 'SK'::text)) AND (docstart = 1))
>  Total runtime: 81782.052 ms
> (18 rows)
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: query produces 1 GB temp file

From

Dirk Lutzebäck

Date:

27 October 2006, 14:53:25

Hi,

I'm sorry but it look like my computer has resent older posts from me, sigh...

Dirk

Alexander Staubo wrote:

While I can't explain why PostgreSQL would use that memory, I recommend looking into tweaking the work_mem parameter. This setting specifies how much memory PostgreSQL on certain temporary data structures (hash tables, sort vectors) until it starts using temporary files. Quoting the docs:

work_mem (integer)
Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The value is specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that for a complex query, several sort or hash operations might be running in parallel; each one will be allowed to use as much memory as this value specifies before it starts to put data into temporary files. Also, several running sessions could be doing such operations concurrently. So the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries.

Alexander.

On Feb 5, 2005, at 18:25 , Dirk Lutzebaeck wrote:

Hi,

here is a query which produces over 1G temp file in pgsql_tmp. This
is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
sort_mem and 320MB shared_mem.

Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
tables have been analyzed before.

Can some please explain why the temp file is so huge? I understand
there are a lot of rows. All relevant indices seem to be used.

Thanks in advance,

Dirk

EXPLAIN
SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart, ft.flatobj, bi.oid, bi.en
FROM bi, en, df AS ft, es
WHERE bi.rc=130170467
AND bi.en=ft.en
AND bi.co=117305223
AND bi.hide=FALSE
AND ft.en=en.oid
AND es.en=bi.en
AND es.co=bi.co
AND es.spec=122293729
AND (ft.val_2='DG' OR ft.val_2='SK')
AND ft.docstart=1
ORDER BY ft.val_9 ASC, ft.created DESC
LIMIT 1000 OFFSET 0;

Limit (cost=8346.75..8346.78 rows=3 width=1361)
   -> Unique (cost=8346.75..8346.78 rows=3 width=1361)
         -> Sort (cost=8346.75..8346.76 rows=3 width=1361)
               Sort Key: ft.val_9, ft.created, ft.flatid
               -> Nested Loop (cost=0.00..8346.73 rows=3 width=1361)
                     -> Nested Loop (cost=0.00..5757.17 rows=17 width=51)
                           -> Nested Loop (cost=0.00..5606.39 rows=30 width=42)
                                 -> Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8)
                                       Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
                                 -> Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42)
                                       Index Cond: ("outer".en = bi.en)
                                       Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
                           -> Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9)
                                 Index Cond: ("outer".en = en.oid)
                     -> Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322)
                           Index Cond: ("outer".en = ft.en)
                           Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
(17 rows)

--------------

EXPLAIN ANALYZE gives:

Limit (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1)
   -> Unique (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1)
         -> Sort (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
               Sort Key: ft.val_9, ft.created, ft.flatid
               -> Nested Loop (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677 loops=1)
                     -> Nested Loop (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563 loops=1)
                           -> Nested Loop (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014 rows=48563 loops=1)
                                 -> Index Scan using es_sc_index on es (cost=0.00..847.71 rows=301 width=8) (actual time=0.184..46.519 rows=5863 loops=1)
                                       Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
                                 -> Index Scan using bi_env_index on bi (cost=0.00..15.80 rows=1 width=42) (actual time=0.052..0.218 rows=8 loops=5863)
                                       Index Cond: ("outer".en = bi.en)
                                       Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
                           -> Index Scan using en_oid_index on en (cost=0.00..5.01 rows=1 width=9) (actual time=0.015..0.019 rows=1 loops=48563)
                                 Index Cond: ("outer".en = en.oid)
                     -> Index Scan using df_en on df ft (cost=0.00..151.71 rows=49 width=1322) (actual time=0.038..0.148 rows=14 loops=48563)
                           Index Cond: ("outer".en = ft.en)
                           Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
Total runtime: 81782.052 ms
(18 rows)

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Dirk Lutzebäck <lutzeb@aeccom.com> Tel +49.30.5362.1635 Fax .1638
CTO AEC/communications GmbH & Co. KG, Berlin, Germany

Re: query produces 1 GB temp file

From

Thomas Burdairon

Date:

27 October 2006, 15:09:59

He is probably using IPOT (IP Other Time) :

http://kadreg.free.fr/ipot/ :-) (sorry only french page )

On Oct 27, 2006, at 16:33, Bricklen Anderson wrote:

Merlin Moncure wrote:
On 2/5/05, Dirk Lutzebaeck <lutzeb@aeccom.com> wrote:
<snip>
Was the original message actually from 2/5/05?

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Re: query produces 1 GB temp file

From

Scott Marlowe

Date:

27 October 2006, 15:49:43

On Sat, 2005-02-05 at 11:25, Dirk Lutzebaeck wrote:
> Hi,
>
> here is a query which produces over 1G temp file in pgsql_tmp. This
> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
> sort_mem and 320MB shared_mem.

First step, upgrade to the latest 7.4.x version.  7.4.2 is an OLD
version of 7.4  I think the latest version is 7.4.13.

> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
> tables have been analyzed before.

SNIP

> EXPLAIN ANALYZE gives:
>
>
>  Limit  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1)
>    ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1)
>          ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
>                Sort Key: ft.val_9, ft.created, ft.flatid
>                ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
>                      ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563
loops=1)
>                            ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014
rows=48563loops=1) 
>                                  ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8) (actual
time=0.184..46.519rows=5863 loops=1) 
>                                        Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
>                                  ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42) (actual
time=0.052..0.218rows=8 loops=5863) 
>                                        Index Cond: ("outer".en = bi.en)
>                                        Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
>                            ->  Index Scan using en_oid_index on en  (cost=0.00..5.01 rows=1 width=9) (actual
time=0.015..0.019rows=1 loops=48563) 
>                                  Index Cond: ("outer".en = en.oid)
>                      ->  Index Scan using df_en on df ft  (cost=0.00..151.71 rows=49 width=1322) (actual
time=0.038..0.148rows=14 loops=48563) 
>                            Index Cond: ("outer".en = ft.en)
>                            Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
>  Total runtime: 81782.052 ms
> (18 rows)

Why do you have an index scan on en_oid_index that thinks it will return
1 row when it returns 48563, and one on df_en that thinks it will return
49 and returns 48563 as well?   Is this database analyzed often?  Are
oids even analyzed?  I'd really recommend switching off of them as they
complicate backups and restores.

If analyze doesn't help, you can try brute forcing off nested loops for
this query and see if that helps.  nested loop is really slow for large
numbers of rows.