On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> This whole discussion seems to me to be about trying to do things outside
>>> the FDW that should properly be left inside the FDW. Who's to say that
>>> the remote side even *has* statistics of the sort that PG creates?
>>>
>>> We should provide an API that lets the FDW return a cost estimate for a
>>> proposed access path. Where it gets the cost estimate from is not
>>> something that should be presupposed.
>
>> Unless there's some way for the FDW to have local tables for caching
>> its statistics, the chances of this having decent performance seem to
>> be near-zero.
>
> Perhaps, but that would be the FDW's problem to implement. Trying to
> design such tables in advance of actually writing an FDW seems like a
> completely backwards design process.
Oh, I agree. I don't want to dictate the structure of those tables; I
just think it's inevitable that an FDW is going to need the ability to
be bound to some local tables which the admin should set up before
installing it. That is, we need a general capability, not a specific
set of tables.
> (I'd also say that your performance estimate is miles in advance of any
> facts; but even if it's true, the caching ought to be inside the FDW,
> because we have no clear idea of what it will need to cache.)
I can't imagine how an FDW could possibly be expected to perform well
without some persistent local data storage. Even assume the remote
end is PG. To return a cost, it's going to need the contents of
pg_statistic cached locally, for each remote table. Do you really
think it's going to work to incur that overhead once per table per
backend startup? Or else every time we try to plan against a foreign
table we can fire off an SQL query to the remote side instead of
trying to compute the cost locally. That's got to be two orders of
magnitude slower than planning based off local stats.
We could punt the issue of stats altogether for the first version and
simply say, hey, this is only intended for things like reading from
CSV files. But if we're going to have it at all then I can't see how
we're going to get by without persistent local storage.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company