Re: WIP Patch: Use sortedness of CSV foreign tables for query planning - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: WIP Patch: Use sortedness of CSV foreign tables for query planning
Date
Msg-id 005a01cd737d$06d549c0$147fdd40$@lab.ntt.co.jp
Whole thread Raw
In response to Re: WIP Patch: Use sortedness of CSV foreign tables for query planning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: WIP Patch: Use sortedness of CSV foreign tables for query planning
List pgsql-hackers
Hi Robert,

> From: Robert Haas [mailto:robertmhaas@gmail.com]

> On Thu, Aug 2, 2012 at 7:01 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp> wrote:
> > The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c:
> >
> >     /*
> >      * If data file was sorted, and we knew it somehow, we could insert
> >      * appropriate pathkeys into the ForeignPath node to tell the planner
> >      * that.
> >      */
> >
> > To do this, I would like to propose new generic options for a file_fdw
foreign
> > table to specify the sortedness of a data file.  While it is best to allow
> to
> > specify the sortedness on multiple columns, the current interface for the
> > generic options dose not seems to be suitable for doing it.  As a
compromise,
> I
> > would like to propose single-column sortedness options and insert
appropriate
> > pathkeys into the ForeignPath node based on these information:
> 
> I am not sure it is a good idea to complicate file_fdw with frammishes
> of marginal utility.  I guess I tend to view things like file_fdw as a
> mechanism for getting the data into the database, not necessarily
> something that you actually want to keep your data in permanently and
> run complex queries against.

I think file_fdw is useful for managing log files such as PG CSV logs.  Since
often, such files are sorted by timestamp, I think the patch can improve the
performance of log analysis, though I have to admit my demonstration was not
realistic.

> It seems like that's the direction we're
> headed in here - statistics, indexing, etc.  I am all in favor of
> having some kind of pluggable storage engine as an alternative to our
> heap, but I'm not sure a flat-file is a good choice.

As you pointed out, I would like to allow indexing to be done for CSV foreign
tables, but that is another problem.  The submitted patch or the above comment
is not something toward indexing, so to say, an optimization of the current
file_fdw module.

Thanks,

Best regards,
Etsuro Fujita



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WIP patch for LATERAL subqueries
Next
From: Pavel Stehule
Date:
Subject: Re: WIP patch for LATERAL subqueries