Re: Cursor-based results: bafflingly slow - Mailing list pgsql-jdbc

From Oliver Jowett
Subject Re: Cursor-based results: bafflingly slow
Date
Msg-id 4A4E11EB.4040809@opencloud.com
Whole thread Raw
In response to Cursor-based results: bafflingly slow  (Robin Houston <robin.houston@gmail.com>)
Responses Re: Cursor-based results: bafflingly slow  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-jdbc
Robin Houston wrote:
> 2009/7/3 Oliver Jowett <oliver@opencloud.com <mailto:oliver@opencloud.com>>:
>> They're not really equivalent. A portal with a limited fetchsize is more
>> like DECLARE CURSOR, not LIMIT.
>>
>> What does the plan look like if you plan it without the LIMIT?
>
> Hmm, well... If I do the same as before, but without the limit, then the
> plan is: [.. different plan with an outermost Sort step ..]

> Surely the server is not so stupid as to use an unnecessary server-side
> sort, if I've explicitly asked to fetch the rows 100 at a time?

I'm not too familiar with the planner any more, but as I understand it
this is what's happening:

Think about the difference between

(1) SELECT * FROM hugetable ORDER BY some_key;
(2) SELECT * FROM hugetable ORDER BY some_key LIMIT 100;

where there is a suitable index on some_key.

For (1) a seqscan + local sort is likely to be faster (because a
sequential scan is faster than doing a random-access index scan over the
entire table)
For (2) an index scan is likely to be faster, because you'll stop after
at most 100 random-access lookups, which is cheaper than scanning the
entire table.

If you take case (1) and set the fetchsize to 100, that doesn't actually
change things at all - it doesn't turn into case (2)! Portals let you
grab results a bit at a time (and let you give up on the query halfway
through easily), but they don't change the underlying plan at all; the
server plans on the assumption that the query will run to completion. In
fact, it doesn't even know the fetchsize that will eventually be used
during execution when it is doing the planning step. (If you look back
at your connection tracing, you'll see that the fetchsize is only
provided in the Execute message; query planning happens back when the
Parse message is sent)

> Is there any easy way to find out what execution plan the server is
> actually using? I suppose I could ask the DBAs to enable auto_explain on
> the dev database.

You should be able to issue an EXPLAIN ANALYZE via JDBC; I can't
remember exactly how the results get back to you though (might be
SQLWarnings, or a separate resultset)

> If this *is* the problem, do you know any way round it. Obviously I want
> the plan that *doesn't* involve sorting everything before returning
> anything.

If you want to stream results promptly, you may be stuck with removing
that ORDER BY. I'm not sure if there's any way to say to the planner
"give me a plan with a low startup cost, even if it has a higher total
cost" which seems to be what you want here.

Or if you really do only care about the first N rows, then put a LIMIT
in your query.

> If I explicitly create a cursor, with no limit, then it does return the
> first 100 rows very quickly. On the other hand, I have to hard-code the
> parameter this way (because I don't know any way, in psql, to use a bind
> variable with an explicit cursor.)

I came across some (very old) posts in the archives that suggested that
DECLARE CURSOR has a fudge where it ranks plan costs assuming that only
some rows (10%?) will actually be fetched. I'm not sure if that's true
of the current planner.

-O

pgsql-jdbc by date:

Previous
From: Robin Houston
Date:
Subject: Cursor-based results: bafflingly slow
Next
From: Tom Lane
Date:
Subject: Re: Cursor-based results: bafflingly slow