Thread: Retry: Is this possible / slow performance?

Retry: Is this possible / slow performance?

From
"Joost Kraaijeveld"
Date:
Hi all,

A retry of the question asked before. All tables freshly vacuumed an analized.

Two queries: one with "set enable_seqscan = on" , the other with "set enable_seqscan = off". The first query lasts
59403ms, the second query 31 ms ( the desc order variant has the same large difference: 122494 ms vs. 1297 ms). (for
thequery plans see below). 

Can I, without changing the SQL (because it is generated by a tool) or explicitely setting "set enable_seqscan = off"
forthis query, trick PostgreSQL in taking the fast variant of the queryplan? 

TIA


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl


------------------------------- Query 1

begin;
set enable_seqscan = on;
declare SQL_CUR01 cursor for
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer
ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;
commit;

QUERY PLAN
Sort  (cost=259968.77..262729.72 rows=1104380 width=12)
  Sort Key: a.klantnummer, a.ordernummer
  ->  Hash Left Join  (cost=42818.43..126847.70 rows=1104380 width=12)
        Hash Cond: ("outer".klantnummer = "inner".klantnummer)
        ->  Seq Scan on orders a  (cost=0.00..46530.79 rows=1104379 width=8)
        ->  Hash  (cost=40635.14..40635.14 rows=368914 width=4)
              ->  Seq Scan on klt_alg b  (cost=0.00..40635.14 rows=368914 width=4)

Actual running time:  59403 ms.

------------------------------- Query 2

begin;
set enable_seqscan = off;
declare SQL_CUR01 cursor for
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer
ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;
commit;

QUERY PLAN
Merge Left Join  (cost=0.00..2586604.86 rows=1104380 width=12)
  Merge Cond: ("outer".klantnummer = "inner".klantnummer)
  ->  Index Scan using orders_klantnummer on orders a  (cost=0.00..2435790.17 rows=1104379 width=8)
  ->  Index Scan using klt_alg_klantnummer on klt_alg b  (cost=0.00..44909.11 rows=368914 width=4)

Actual running time: 31 ms.



Re: Retry: Is this possible / slow performance?

From
Tom Lane
Date:
"Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes:
> Two queries: one with "set enable_seqscan = on" , the other with "set enable_seqscan = off". The first query lasts
59403ms, the second query 31 ms ( the desc order variant has the same large difference: 122494 ms vs. 1297 ms). (for
thequery plans see below). 

The reason for the difference is that the mergejoin plan has a much
lower startup cost than the hash plan, and since you're only fetching
100 rows the startup cost is dominant.  IIRC the planner does make some
allowance for this effect when preparing a DECLARE CURSOR plan (ie,
it puts some weight on startup cost rather than considering only total
cost) ... but it's not so optimistic as to assume that you only want 100
out of an estimated 1 million+ result rows.

The best solution is probably to put a LIMIT into the DECLARE CURSOR,
so that the planner can see how much you intend to fetch.

            regards, tom lane

Re: Retry: Is this possible / slow performance?

From
PFC
Date:
    Does the planner also take into account that the Hash Join will need a
huge temporary space which will exist for the whole length of the cursor
existence (which may be quite long if he intends to fetch everything),
whereas the Merge Join should need very little space as it is sending the
rows as it fetches them using the Indexes ?




On Mon, 07 Feb 2005 12:03:56 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes:
>> Two queries: one with "set enable_seqscan = on" , the other with "set
>> enable_seqscan = off". The first query lasts 59403 ms, the second query
>> 31 ms ( the desc order variant has the same large difference: 122494 ms
>> vs. 1297 ms). (for the query plans see below).
>
> The reason for the difference is that the mergejoin plan has a much
> lower startup cost than the hash plan, and since you're only fetching
> 100 rows the startup cost is dominant.  IIRC the planner does make some
> allowance for this effect when preparing a DECLARE CURSOR plan (ie,
> it puts some weight on startup cost rather than considering only total
> cost) ... but it's not so optimistic as to assume that you only want 100
> out of an estimated 1 million+ result rows.
>
> The best solution is probably to put a LIMIT into the DECLARE CURSOR,
> so that the planner can see how much you intend to fetch.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>



Re: Retry: Is this possible / slow performance?

From
"Joost Kraaijeveld"
Date:
>> The best solution is probably to put a LIMIT into the DECLARE CURSOR,
>> so that the planner can see how much you intend to fetch.
I assume that this limits the resultset to a LIMIT. That is not what I was hoping for. I was hoping for a way to
scrolllthrought the whole tables with orders. 

I have tested, and if one really wants the whole table the query with "set enable_seqscan = on" lasts 137 secs, the
querywith "set enable_seqscan = off" lasts 473 secs, so (alas), the planner is right.  

I sure would like to have ISAM like behaviour once in a while.


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl

Re: Retry: Is this possible / slow performance?

From
"Merlin Moncure"
Date:
> >> The best solution is probably to put a LIMIT into the DECLARE
CURSOR,
> >> so that the planner can see how much you intend to fetch.
> I assume that this limits the resultset to a LIMIT. That is not what I
was
> hoping for. I was hoping for a way to scrolll throught the whole
tables
> with orders.
>
> I have tested, and if one really wants the whole table the query with
"set
> enable_seqscan = on" lasts 137 secs, the query with "set
enable_seqscan =
> off" lasts 473 secs, so (alas), the planner is right.
>
> I sure would like to have ISAM like behaviour once in a while.

Then stop using cursors.  A few months back I detailed the relative
merits of using Cursors v. Queries to provide ISAM like functionality
and Queries win hands down.  Right now I am using pg as an ISAM backend
for a relatively old and large COBOL ERP via a C++ ISAM driver, for
which a publicly available version of the source will be available Real
Soon Now :-).

Merlin