Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop()) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop())
Date
Msg-id 27752.1271697885@sss.pgh.pa.us
Whole thread Raw
In response to Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop())  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> One problem with the current implementation of cost_rescan() is that
> it ignores caching effects.

Well, that's intentional, per the head comment for the function.
We might want to extend it later but I'd like to get some field
experience with what it's trying to model now.  I believe that it
is covering the first-order effects, and possible cache effects
would be second-order.

> It seems to be faster to rescan a
> materialize node than it is to rescan a seqscan of a table, even if
> there are no restriction clauses, presumably because you get to skip
> tuple visibility checks and maybe some other overhead, too.

Exactly.  IIRC, tuplestore's on-disk representation is also more compact
(less header overhead, no dead tuples, etc) so the amount of I/O needed
will also be less, if you're doing any at all.  But the code already
knows that scanning a tuplestore is cheaper than scanning a table ---
that doesn't seem to me to be relevant to the question of whether we
need to model cache effects in cost_rescan.

> Another potential problem is that materializing a whole-table seqscan
> to avoid repeating the tuple visibility checks may be a win in some
> strict sense, but there are externalities: it's also going to use a
> lot more memory/disk than just rescanning the table.

This is not specific to materialize, it is part of the generic problem
that we don't model the true costs of using work_mem in each of several
parts of a query.  There have been discussions about how to fix that
before, but no particularly good ideas have emerged.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Standalone backends run StartupXLOG in an incorrect environment
Next
From: Simon Riggs
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop