Thread: query produces 1 GB temp file

query produces 1 GB temp file

From
Dirk Lutzebaeck
Date:
Hi,

here is a query which produces over 1G temp file in pgsql_tmp. This
is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
sort_mem and 320MB shared_mem.

Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
tables have been analyzed before.

Can some please explain why the temp file is so huge? I understand
there are a lot of rows.

Thanks in advance,

Dirk

EXPLAIN
SELECT DISTINCT ON (ft.val_9, ft.created, ft.flatid) ft.docstart, ft.docindex, ft.flatobj, bi.oid, bi.en
FROM bi, en, df AS ft, es
WHERE bi.rc=130170467
AND bi.en=ft.en
AND bi.co=117305223
AND bi.hide=FALSE
AND ft.en=en.oid
AND es.en=bi.en
AND es.co=bi.co
AND es.spec=122293729
AND (ft.val_2='DG' OR ft.val_2='SK')
AND ft.docstart=1
ORDER BY ft.val_9 ASC, ft.created DESC
LIMIT 1000 OFFSET 0;

 Limit  (cost=8346.75..8346.78 rows=3 width=1361)
   ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361)
         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361)
               Sort Key: ft.val_9, ft.created, ft.flatid
               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361)
                     ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51)
                           ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42)
                                 ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8)
                                       Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
                                 ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42)
                                       Index Cond: ("outer".en = bi.en)
                                       Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
                           ->  Index Scan using en_oid_index on en  (cost=0.00..5.01 rows=1 width=9)
                                 Index Cond: ("outer".en = en.oid)
                     ->  Index Scan using df_en on df ft  (cost=0.00..151.71 rows=49 width=1322)
                           Index Cond: ("outer".en = ft.en)
                           Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
(17 rows)


--------------

EXPLAIN ANALYZE gives:


 Limit  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.465..75679.964 rows=1000 loops=1)
   ->  Unique  (cost=8346.75..8346.78 rows=3 width=1361) (actual time=75357.459..75675.371 rows=1000 loops=1)
         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
               Sort Key: ft.val_9, ft.created, ft.flatid
               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
                     ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563
loops=1)
                           ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014
rows=48563loops=1) 
                                 ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8) (actual
time=0.184..46.519rows=5863 loops=1) 
                                       Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
                                 ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42) (actual
time=0.052..0.218rows=8 loops=5863) 
                                       Index Cond: ("outer".en = bi.en)
                                       Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
                           ->  Index Scan using en_oid_index on en  (cost=0.00..5.01 rows=1 width=9) (actual
time=0.015..0.019rows=1 loops=48563) 
                                 Index Cond: ("outer".en = en.oid)
                     ->  Index Scan using df_en on df ft  (cost=0.00..151.71 rows=49 width=1322) (actual
time=0.038..0.148rows=14 loops=48563) 
                           Index Cond: ("outer".en = ft.en)
                           Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))
 Total runtime: 81782.052 ms
(18 rows)


Re: query produces 1 GB temp file

From
John A Meinel
Date:
Dirk Lutzebaeck wrote:

>Hi,
>
>here is a query which produces over 1G temp file in pgsql_tmp. This
>is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
>sort_mem and 320MB shared_mem.
>
>Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
>tables have been analyzed before.
>
>Can some please explain why the temp file is so huge? I understand
>there are a lot of rows.
>
>Thanks in advance,
>
>Dirk
>
>
...

>               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
>
>
Well, there is this particular query where it thinks there will only be
3 rows, but in fact there are 703,677 of them. And the previous line:

>         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
>
>
Seem to indicate that after sorting you still have 22,439 rows, which
then gets pared down again down to 1000.

I'm assuming that the sort you are trying to do is extremely expensive.
You are sorting 700k rows, which takes up too much memory (1GB), which
forces it to create a temporary table, and write it out to disk.

I didn't analyze it a lot, but you might get a lot better performance
from doing a subselect, rather than the query you wrote.

You are joining 4 tables (bi, en, df AS ft, es) I don't know which
tables are what size. In the end, though, you don't really care about
the en table or es tables (they aren't in your output).

So maybe one of you subselects could be:

where bi.en = (select en from es where es.co = bi.co and es.spec=122293729);

I'm pretty sure the reason you need 1GB of temp space is because at one
point you have 700k rows. Is it possible to rewrite the query so that it
does more filtering earlier? Your distinct criteria seems to filter it
down to 20k rows. So maybe it's possible to do some sort of a distinct
in part of the subselect, before you start joining against other tables.

If you have that much redundancy, you might also need to think of doing
a different normalization.

Just some thoughts.

Also, I thought using the "oid" column wasn't really recommended, since
in *high* volume databases they aren't even guaranteed to be unique. (I
think it is a 32-bit number that rolls over.) Also on a database dump
and restore, they don't stay the same, unless you take a lot of extra
care that they are included in both the dump and the restore. I believe
it is better to create your own "id" per table (say SERIAL or BIGSERIAL).

John
=:->


Attachment

Re: query produces 1 GB temp file

From
Greg Stark
Date:
Dirk Lutzebaeck <lutzeb@aeccom.com> writes:

> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
> tables have been analyzed before.

Really? A lot of the estimates are very far off. If you really just analyzed
these tables immediately prior to the query then perhaps you should try
raising the statistics target on spec and co. Or is the problem that there's a
correlation between those two columns?

>                ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
>                      ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563
loops=1)
>                            ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014
rows=48563loops=1) 
>                                  ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8) (actual
time=0.184..46.519rows=5863 loops=1) 
>                                        Index Cond: ((spec = 122293729) AND (co = 117305223::oid))

The root of your problem,. The optimizer is off by a factor of 20. It thinks
these two columns are much more selective than they are.

>                                  ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42) (actual
time=0.052..0.218rows=8 loops=5863) 
>                                        Index Cond: ("outer".en = bi.en)
>                                        Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))

It also thinks these three columns are much more selective than they are.

How accurate are its estimates if you just do these?

explain analyze select * from es where spec = 122293729
explain analyze select * from es where co = 117305223::oid
explain analyze select * from bi where rc = 130170467::oid
explain analyze select * from bi where co = 117305223
explain analyze select * from bi where hide = false

If they're individually accurate then you've run into the familiar problem of
needing cross-column statistics. If they're individually inaccurate then you
should try raising the targets on those columns with:

ALTER TABLE [ ONLY ] name [ * ]
    ALTER [ COLUMN ] column SET STATISTICS integer

and reanalyzing.


Dirk Lutzebaeck <lutzeb@aeccom.com> writes:

> Can some please explain why the temp file is so huge? I understand
> there are a lot of rows.

Well that I can't explain. 22k rows of width 1361 doesn't sound so big to me
either. The temporary table does need to store three copies of the records at
a given time, but still it sounds like an awful lot.


--
greg

Re: query produces 1 GB temp file

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Dirk Lutzebaeck <lutzeb@aeccom.com> writes:
>> Can some please explain why the temp file is so huge? I understand
>> there are a lot of rows.

> Well that I can't explain. 22k rows of width 1361 doesn't sound so big to me
> either.

It was 700k rows to sort, not 22k.  The Unique/Limit superstructure
only demanded 22k rows out from the sort, but we still had to sort 'em
all to figure out which ones were the first 22k.

> The temporary table does need to store three copies of the records at
> a given time, but still it sounds like an awful lot.

Huh?

            regards, tom lane

Re: query produces 1 GB temp file

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> It was 700k rows to sort, not 22k.

Oops, missed that.

> > The temporary table does need to store three copies of the records at
> > a given time, but still it sounds like an awful lot.
>
> Huh?

Am I wrong? I thought the disk sort algorithm was the polyphase tape sort from
Knuth which is always reading two tapes and writing to a third.

--
greg

Re: query produces 1 GB temp file

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Am I wrong? I thought the disk sort algorithm was the polyphase tape sort from
> Knuth which is always reading two tapes and writing to a third.

It is a polyphase sort, but we recycle the input "tapes" as fast as we
use them, so that the maximum disk space usage is about as much as the
data volume to sort.

            regards, tom lane

Re: query produces 1 GB temp file

From
John A Meinel
Date:
Dirk Lutzebaeck wrote:

> Greg,
>
> Thanks for your analysis. But I dont get any better after bumping
> STATISTICS target from 10 to 200.
> explain analyze shows that the optimizer is still way off estimating
> the rows. Is this normal? It still produces a 1 GB temp file.
> I simplified the query a bit, now only two tables are involved (bi,
> df). I also vacuumed.


Are you just doing VACUUM? Or are you doing VACUUM ANALYZE? You might
also try VACUUM ANALYZE FULL (in the case that you have too many dead
tuples in the table).

VACUUM cleans up, but doesn't adjust any planner statistics without ANALYZE.

John
=:->


Attachment

Re: query produces 1 GB temp file

From
Greg Stark
Date:
I gave a bunch of "explain analyze select" commands to test estimates for
individual columns. What results do they come up with? If those are inaccurate
then raising the statistics target is a good route. If those are accurate
individually but the combination is inaccurate then you have a more difficult
problem.

--
greg

Re: query produces 1 GB temp file

From
John A Meinel
Date:
Dirk Lutzebaeck wrote:

> Greg Stark wrote:
>
>> I gave a bunch of "explain analyze select" commands to test estimates
>> for
>> individual columns. What results do they come up with? If those are
>> inaccurate
>> then raising the statistics target is a good route. If those are
>> accurate
>> individually but the combination is inaccurate then you have a more
>> difficult
>> problem.
>>
>>
>>
> After  setting the new statistics target to 200 they did slightly
> better but not accurate. The results were attached to my last post.
> Here is a copy:
>
>
It does seem that setting the statistics to a higher value would help.
Since rc=130170467 seems to account for almost 1/3 of the data. Probably
you have other values that are much less common. So setting a high
statistics target would help the planner realize that this value occurs
at a different frequency from the other ones. Can you try other numbers
and see what the counts are?

I assume you did do a vacuum analyze after adjusting the statistics target.

Also interesting that in the time it took you to place these queries,
you had received 26 new rows.

And finally, what is the row count if you do
explain analyze select * from bi where rc=130170467::oid and
co=117305223::oid;

If this is a lot less than say 500k, then probably you aren't going to
be helped a lot. The postgresql statistics engine doesn't generate cross
column statistics. It always assumes random distribution of data. So if
two columns are correlated (or anti-correlated), it won't realize that.

Even so, your original desire was to reduce the size of the intermediate
step (where you have 700k rows). So you need to try and design a
subselect on bi which is as restrictive as possible, so that you don't
get all of these rows. With any luck, the planner will realize ahead of
time that there won't be that many rows, and can use indexes, etc. But
even if it doesn't use an index scan, if you have a query that doesn't
use a lot of rows, then you won't need a lot of disk space.

John
=:->

>
> explain analyze select * from bi where rc=130170467;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
>
> Seq Scan on bi (cost=0.00..41078.76 rows=190960 width=53) (actual
> time=0.157..3066.028 rows=513724 loops=1)
> Filter: (rc = 130170467::oid)
> Total runtime: 4208.663 ms
> (3 rows)
>
>
> explain analyze select * from bi where co=117305223;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
>
> Seq Scan on bi (cost=0.00..41078.76 rows=603988 width=53) (actual
> time=0.021..3692.238 rows=945487 loops=1)
> Filter: (co = 117305223::oid)
> Total runtime: 5786.268 ms
> (3 rows)
>
> Here is the distribution of the data in bi:
> select count(*) from bi;
>
> 1841966
>
>
> select count(*) from bi where rc=130170467::oid;
>
> 513732
>
>
> select count(*) from bi where co=117305223::oid;
>
> 945503
>
>
>


Attachment

Re: query produces 1 GB temp file

From
Tom Lane
Date:
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck) writes:
> SELECT DISTINCT ON (df.val_9, df.created, df.flatid) df.docindex,
> df.flatobj, bi.oid, bi.en
> FROM bi,df
> WHERE bi.rc=130170467
> ...
> ORDER BY df.val_9 ASC, df.created DESC
> LIMIT 1000 OFFSET 0

Just out of curiosity, what is this query supposed to *do* exactly?
It looks to me like it will give indeterminate results.  Practical
uses of DISTINCT ON generally specify more ORDER BY columns than
there are DISTINCT ON columns, because the extra columns determine
which rows have priority to survive the DISTINCT filter.  With the
above query, you have absolutely no idea which row will be output
for a given combination of val_9/created/flatid.

            regards, tom lane

Re: query produces 1 GB temp file

From
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Date:
Greg,

Thanks for your analysis. But I dont get any better after bumping
STATISTICS target from 10 to 200.
explain analyze shows that the optimizer is still way off estimating the
rows. Is this normal? It still produces a 1 GB temp file.
I simplified the query a bit, now only two tables are involved (bi, df).
I also vacuumed.


alter table bi alter rc set statistics 200;
alter table bi alter hide set statistics 200;
alter table bi alter co set statistics 200;
alter table bi alter en set statistics 200;
analyze bi;

alter table df alter en set statistics 200;
alter table df alter val_2 set statistics 200;
analyze df;

EXPLAIN ANALYZE
SELECT DISTINCT ON (df.val_9, df.created, df.flatid) df.docindex,
df.flatobj, bi.oid, bi.en
FROM bi,df
WHERE bi.rc=130170467
AND bi.en=df.en
AND bi.co=117305223
AND bi.hide=FALSE
AND (df.val_2='DG' OR df.val_2='SK')
AND df.docstart=1
ORDER BY df.val_9 ASC, df.created DESC
LIMIT 1000 OFFSET 0
;

Limit (cost=82470.09..82480.09 rows=1000 width=646) (actual
time=71768.685..72084.622 rows=1000 loops=1)
-> Unique (cost=82470.09..82643.71 rows=17362 width=646) (actual
time=71768.679..72079.987 rows=1000 loops=1)
-> Sort (cost=82470.09..82513.50 rows=17362 width=646) (actual
time=71768.668..71905.138 rows=22439 loops=1)
Sort Key: df.val_9, df.created, df.flatid
-> Merge Join (cost=80422.51..81247.49 rows=17362 width=646) (actual
time=7657.872..18486.551 rows=703677 loops=1)
Merge Cond: ("outer".en = "inner".en)
-> Sort (cost=55086.74..55340.18 rows=101378 width=8) (actual
time=5606.137..6672.630 rows=471871 loops=1)
Sort Key: bi.en
-> Seq Scan on bi (cost=0.00..46657.47 rows=101378 width=8) (actual
time=0.178..3715.109 rows=472320 loops=1)
Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
-> Sort (cost=25335.77..25408.23 rows=28982 width=642) (actual
time=2048.039..3677.140 rows=706482 loops=1)
Sort Key: df.en
-> Seq Scan on df (cost=0.00..23187.79 rows=28982 width=642) (actual
time=0.112..1546.580 rows=71978 loops=1)
Filter: (((val_2 = 'DG'::text) OR (val_2 = 'SK'::text)) AND (docstart = 1))


explain analyze select * from bi where rc=130170467;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on bi (cost=0.00..41078.76 rows=190960 width=53) (actual
time=0.157..3066.028 rows=513724 loops=1)
Filter: (rc = 130170467::oid)
Total runtime: 4208.663 ms
(3 rows)


explain analyze select * from bi where co=117305223;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on bi (cost=0.00..41078.76 rows=603988 width=53) (actual
time=0.021..3692.238 rows=945487 loops=1)
Filter: (co = 117305223::oid)
Total runtime: 5786.268 ms
(3 rows)





Greg Stark wrote:

>Dirk Lutzebaeck <lutzeb@aeccom.com> writes:
>
>
>
>>Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
>>tables have been analyzed before.
>>
>>
>
>Really? A lot of the estimates are very far off. If you really just analyzed
>these tables immediately prior to the query then perhaps you should try
>raising the statistics target on spec and co. Or is the problem that there's a
>correlation between those two columns?
>
>
>
>>               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
>>                     ->  Nested Loop  (cost=0.00..5757.17 rows=17 width=51) (actual time=0.467..3216.342 rows=48563
loops=1)
>>                           ->  Nested Loop  (cost=0.00..5606.39 rows=30 width=42) (actual time=0.381..1677.014
rows=48563loops=1) 
>>                                 ->  Index Scan using es_sc_index on es  (cost=0.00..847.71 rows=301 width=8) (actual
time=0.184..46.519rows=5863 loops=1) 
>>                                       Index Cond: ((spec = 122293729) AND (co = 117305223::oid))
>>
>>
>
>The root of your problem,. The optimizer is off by a factor of 20. It thinks
>these two columns are much more selective than they are.
>
>
>
>>                                 ->  Index Scan using bi_env_index on bi  (cost=0.00..15.80 rows=1 width=42) (actual
time=0.052..0.218rows=8 loops=5863) 
>>                                       Index Cond: ("outer".en = bi.en)
>>                                       Filter: ((rc = 130170467::oid) AND (co = 117305223::oid) AND (hide = false))
>>
>>
>
>It also thinks these three columns are much more selective than they are.
>
>How accurate are its estimates if you just do these?
>
>explain analyze select * from es where spec = 122293729
>explain analyze select * from es where co = 117305223::oid
>explain analyze select * from bi where rc = 130170467::oid
>explain analyze select * from bi where co = 117305223
>explain analyze select * from bi where hide = false
>
>If they're individually accurate then you've run into the familiar problem of
>needing cross-column statistics. If they're individually inaccurate then you
>should try raising the targets on those columns with:
>
>ALTER TABLE [ ONLY ] name [ * ]
>    ALTER [ COLUMN ] column SET STATISTICS integer
>
>and reanalyzing.
>
>
>Dirk Lutzebaeck <lutzeb@aeccom.com> writes:
>
>
>
>>Can some please explain why the temp file is so huge? I understand
>>there are a lot of rows.
>>
>>
>
>Well that I can't explain. 22k rows of width 1361 doesn't sound so big to me
>either. The temporary table does need to store three copies of the records at
>a given time, but still it sounds like an awful lot.
>
>
>
>


Re: query produces 1 GB temp file

From
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Date:
John,

I'm doing VACUUM ANALYZE once a night. Before the tests I did VACUUM and
then ANALYZE.

Dirk

John A Meinel wrote:

> Dirk Lutzebaeck wrote:
>
>> Greg,
>>
>> Thanks for your analysis. But I dont get any better after bumping
>> STATISTICS target from 10 to 200.
>> explain analyze shows that the optimizer is still way off estimating
>> the rows. Is this normal? It still produces a 1 GB temp file.
>> I simplified the query a bit, now only two tables are involved (bi,
>> df). I also vacuumed.
>
>
>
> Are you just doing VACUUM? Or are you doing VACUUM ANALYZE? You might
> also try VACUUM ANALYZE FULL (in the case that you have too many dead
> tuples in the table).
>
> VACUUM cleans up, but doesn't adjust any planner statistics without
> ANALYZE.
>
> John
> =:->
>


Re: query produces 1 GB temp file

From
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Date:
Greg Stark wrote:

>I gave a bunch of "explain analyze select" commands to test estimates for
>individual columns. What results do they come up with? If those are inaccurate
>then raising the statistics target is a good route. If those are accurate
>individually but the combination is inaccurate then you have a more difficult
>problem.
>
>
>
After  setting the new statistics target to 200 they did slightly better
but not accurate. The results were attached to my last post. Here is a copy:



explain analyze select * from bi where rc=130170467;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------

Seq Scan on bi (cost=0.00..41078.76 rows=190960 width=53) (actual
time=0.157..3066.028 rows=513724 loops=1)
Filter: (rc = 130170467::oid)
Total runtime: 4208.663 ms
(3 rows)


explain analyze select * from bi where co=117305223;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------

Seq Scan on bi (cost=0.00..41078.76 rows=603988 width=53) (actual
time=0.021..3692.238 rows=945487 loops=1)
Filter: (co = 117305223::oid)
Total runtime: 5786.268 ms
(3 rows)

Here is the distribution of the data in bi:
select count(*) from bi;

 1841966


select count(*) from bi where rc=130170467::oid;

 513732


select count(*) from bi where co=117305223::oid;

 945503





Re: query produces 1 GB temp file

From
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Date:
Hi John,

thanks very much for your analysis. I'll probably need to reorganize
some things.

Regards,

Dirk

John A Meinel wrote:

> Dirk Lutzebaeck wrote:
>
>> Hi,
>>
>> here is a query which produces over 1G temp file in pgsql_tmp. This
>> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
>> sort_mem and 320MB shared_mem.
>>
>> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
>> tables have been analyzed before.
>>
>> Can some please explain why the temp file is so huge? I understand
>> there are a lot of rows.
>>
>> Thanks in advance,
>>
>> Dirk
>>
>>
> ...
>
>>               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361)
>> (actual time=34.104..18016.005 rows=703677 loops=1)
>>
>>
> Well, there is this particular query where it thinks there will only
> be 3 rows, but in fact there are 703,677 of them. And the previous line:
>
>>         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual
>> time=75357.448..75499.263 rows=22439 loops=1)
>>
>>
> Seem to indicate that after sorting you still have 22,439 rows, which
> then gets pared down again down to 1000.
>
> I'm assuming that the sort you are trying to do is extremely
> expensive. You are sorting 700k rows, which takes up too much memory
> (1GB), which forces it to create a temporary table, and write it out
> to disk.
>
> I didn't analyze it a lot, but you might get a lot better performance
> from doing a subselect, rather than the query you wrote.
>
> You are joining 4 tables (bi, en, df AS ft, es) I don't know which
> tables are what size. In the end, though, you don't really care about
> the en table or es tables (they aren't in your output).
>
> So maybe one of you subselects could be:
>
> where bi.en = (select en from es where es.co = bi.co and
> es.spec=122293729);
>
> I'm pretty sure the reason you need 1GB of temp space is because at
> one point you have 700k rows. Is it possible to rewrite the query so
> that it does more filtering earlier? Your distinct criteria seems to
> filter it down to 20k rows. So maybe it's possible to do some sort of
> a distinct in part of the subselect, before you start joining against
> other tables.
>
> If you have that much redundancy, you might also need to think of
> doing a different normalization.
>
> Just some thoughts.
>
> Also, I thought using the "oid" column wasn't really recommended,
> since in *high* volume databases they aren't even guaranteed to be
> unique. (I think it is a 32-bit number that rolls over.) Also on a
> database dump and restore, they don't stay the same, unless you take a
> lot of extra care that they are included in both the dump and the
> restore. I believe it is better to create your own "id" per table (say
> SERIAL or BIGSERIAL).
>
> John
> =:->
>


Re: query produces 1 GB temp file

From
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Date:
John A Meinel wrote:

> Dirk Lutzebaeck wrote:
>
>> Greg Stark wrote:
>>
>>> I gave a bunch of "explain analyze select" commands to test
>>> estimates for
>>> individual columns. What results do they come up with? If those are
>>> inaccurate
>>> then raising the statistics target is a good route. If those are
>>> accurate
>>> individually but the combination is inaccurate then you have a more
>>> difficult
>>> problem.
>>>
>>>
>>>
>> After  setting the new statistics target to 200 they did slightly
>> better but not accurate. The results were attached to my last post.
>> Here is a copy:
>>
>>
> It does seem that setting the statistics to a higher value would help.
> Since rc=130170467 seems to account for almost 1/3 of the data.
> Probably you have other values that are much less common. So setting a
> high statistics target would help the planner realize that this value
> occurs at a different frequency from the other ones. Can you try other
> numbers and see what the counts are?

There is not much effect when increasing statistics target much higher.
I guess this is because rc=130170467 takes a large portion of the column
distribution.

> I assume you did do a vacuum analyze after adjusting the statistics
> target.

Yes.

> Also interesting that in the time it took you to place these queries,
> you had received 26 new rows.

Yes, it's a live system...

> And finally, what is the row count if you do
> explain analyze select * from bi where rc=130170467::oid and
> co=117305223::oid;

explain analyze select * from bi where rc=130170467::oid and
co=117305223::oid;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on bi  (cost=0.00..43866.19 rows=105544 width=51) (actual
time=0.402..3724.222 rows=513732 loops=1)
   Filter: ((rc = 130170467::oid) AND (co = 117305223::oid))

Well both columns data take about 1/4 of the whole table. There is not
much distributed data. So it needs to do full scans...

> If this is a lot less than say 500k, then probably you aren't going to
> be helped a lot. The postgresql statistics engine doesn't generate
> cross column statistics. It always assumes random distribution of
> data. So if two columns are correlated (or anti-correlated), it won't
> realize that.

105k, that seems to be may problem. No much random data. Does 8.0
address this problem?

> Even so, your original desire was to reduce the size of the
> intermediate step (where you have 700k rows). So you need to try and
> design a subselect on bi which is as restrictive as possible, so that
> you don't get all of these rows. With any luck, the planner will
> realize ahead of time that there won't be that many rows, and can use
> indexes, etc. But even if it doesn't use an index scan, if you have a
> query that doesn't use a lot of rows, then you won't need a lot of
> disk space.

I'll try that. What I have already noticed it that one of my output
column is quite large so that's why it uses so much temp space. I'll
probably need to sort without that output column  and  read it in
afterwards using a subselect on the limted result.

Thanks for your help,

Dirk

>
> John
> =:->
>
>>
>> explain analyze select * from bi where rc=130170467;
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on bi (cost=0.00..41078.76 rows=190960 width=53) (actual
>> time=0.157..3066.028 rows=513724 loops=1)
>> Filter: (rc = 130170467::oid)
>> Total runtime: 4208.663 ms
>> (3 rows)
>>
>>
>> explain analyze select * from bi where co=117305223;
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on bi (cost=0.00..41078.76 rows=603988 width=53) (actual
>> time=0.021..3692.238 rows=945487 loops=1)
>> Filter: (co = 117305223::oid)
>> Total runtime: 5786.268 ms
>> (3 rows)
>>
>> Here is the distribution of the data in bi:
>> select count(*) from bi;
>>
>> 1841966
>>
>>
>> select count(*) from bi where rc=130170467::oid;
>>
>> 513732
>>
>>
>> select count(*) from bi where co=117305223::oid;
>>
>> 945503
>>
>>
>>
>


Re: query produces 1 GB temp file

From
Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Date:
Tom,

the orginal query has more output columns. I reduced it for readability.
Specifically it returns a persitent object (flatobj column) which needs
to be processed by the application as the returned result. The problem
of the huge sort space usage seems to be that the flatobj is part of the
row, so it used always copied in the sort algorithm I guess. When I drop
the flatobj from the output columns the size of the temp space file
drops dramatically. So I'll probably need to read flatobj after the
sorting from the limited return result in a subselect.

Regards,

Dirk

Tom Lane wrote:

>Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck) writes:
>
>
>>SELECT DISTINCT ON (df.val_9, df.created, df.flatid) df.docindex,
>>df.flatobj, bi.oid, bi.en
>>FROM bi,df
>>WHERE bi.rc=130170467
>>...
>>ORDER BY df.val_9 ASC, df.created DESC
>>LIMIT 1000 OFFSET 0
>>
>>
>
>Just out of curiosity, what is this query supposed to *do* exactly?
>It looks to me like it will give indeterminate results.  Practical
>uses of DISTINCT ON generally specify more ORDER BY columns than
>there are DISTINCT ON columns, because the extra columns determine
>which rows have priority to survive the DISTINCT filter.  With the
>above query, you have absolutely no idea which row will be output
>for a given combination of val_9/created/flatid.
>
>            regards, tom lane
>
>
>


Re: query produces 1 GB temp file

From
Christopher Kings-Lynne
Date:
> I'm doing VACUUM ANALYZE once a night. Before the tests I did VACUUM and
> then ANALYZE.

I'd suggest once an hour on any resonably active database...

Chris