Alvaro Herrera wrote:
> Tom Lane wrote:
>> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>>> For the record, this patch has a small negative impact on scans like
>>> "SELECT * FROM foo LIMIT 1000". If such a scan is run repeatedly, in CVS
>>> HEAD the first 1000 rows will stay in buffer cache, but with the patch
>>> each scan will start from roughly where previous one stopped, requiring
>>> more pages to be read from disk each time. I don't think it's something
>>> to worry about in practice, but I thought I'd mention it.
>> Urgh. The answers change depending on (more or less) the phase of the
>> moon? I've got a serious problem with that. You might look back to
>> 1997 when GEQO very nearly got tossed out entirely because it destroyed
>> reproducibility of query results.
>
> What about the simple idea of just disabling the use of a sync scan when
> the query has LIMIT and no ORDER BY, and start always at block 0 in that
> case?
That handles the LIMIT case, but you would still observe the different
ordering. And some people do LIMIT-like behavior in client side, by
opening a cursor and only fetching first n rows.
I don't think anyone can reasonably expect to get the same ordering when
the same query issued twice in general, but within the same transaction
it wouldn't be that unreasonable. If we care about that, we could keep
track of starting locations per transaction, only do the synchronization
on the first scan in a transaction, and start subsequent scans from the
same page as the first one. That way if you issue the same query twice
in a transaction, or do something like:
BEGIN;
SELECT * FROM queue FOR UPDATE LIMIT 10
do stuff..
DELETE FROM queue LIMIT 10
COMMIT;
you'd get the expected result.
I think the warning on LIMIT without ORDER BY is a good idea, regardless
of the synchronized scans patch.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com