On Mon, Aug 6, 2012 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Aug 5, 2012 at 10:41 PM, Etsuro Fujita
>> <fujita.etsuro@lab.ntt.co.jp> wrote:
>>> I think file_fdw is useful for managing log files such as PG CSV logs. Since
>>> often, such files are sorted by timestamp, I think the patch can improve the
>>> performance of log analysis, though I have to admit my demonstration was not
>>> realistic.
>
>> Hmm, I guess I could buy that as a plausible use case.
>
> In the particular case of PG log files, I'd bet good money against them
> being *exactly* sorted by timestamp. Clock skew between backends, or
> varying amounts of time to construct and send messages, will result in
> small inconsistencies. This would generally not matter, until the
> planner relied on the claim of sortedness for something like a mergejoin
> ... and then it would matter a lot.
Hmm, true.
> In general I'm quite suspicious of the idea of believing that externally
> supplied data is sorted in exactly the way that PG thinks it should
> sort. If we implement this you can bet that people will screw up, for
> instance by using the wrong locale/collation to sort text data.
I think that optimizations like this are going to be essential for
things like pgsql_fdw (or other_rdms_fdw). Despite the thorny
semantic issues, we're just not going to be able to get around it.
There will even be people who want SELECT * FROM ft ORDER BY 1 to
order by the remote side's notion of ordering rather than ours,
despite the fact that the remote side has some insane-by-PG-standards
definition of ordering. People are going to find ways to do that kind
of thing whether we condone it or not, so we might as well start
thinking now about how we're going to live with it. But that doesn't
answer the question of whether or not we ought to support it for
file_fdw in particular, which seems like a more arguable point.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company