Re: Proposal: Create index on foreign table - Mailing list pgsql-hackers

From David Fetter
Subject Re: Proposal: Create index on foreign table
Date
Msg-id 20120316170743.GE456@fetter.org
Whole thread Raw
In response to Re: Proposal: Create index on foreign table  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Proposal: Create index on foreign table  (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>)
List pgsql-hackers
On Fri, Mar 16, 2012 at 11:58:29AM +0200, Heikki Linnakangas wrote:
> On 16.03.2012 10:44, Etsuro Fujita wrote:
> > I have a plan to support 'Create index on foreign table' for 9.3.
> > Here is my plan.
> > 
> > The index creation is supported for a flat file such as CSV and a

As others, I don't see a reason to restrict this to some particular
type of FDW.

> > remote table on a RDB e.g., Postgres using CREATE INDEX.  (I
> > thought using a new statement, CREATE FOREIGN INDEX, at first, but
> > I think that CREATE INDEX would be sufficient to define an index
> > for the foreign table.) For a flat file, CREATE INDEX constructs
> > an index in the same way as an index for a regular table.
> 
> I think this belongs completely in the remote data source.

I think this needs to be decided on a case-by-case basis instead.

> If you want to index flat files, create an extra file for it or
> something, and enhance the wrapper so that it can take advantage of
> it. Keeping the index inside the database while the data is
> somewhere else creates a whole new class of problems.

How?  These aren't super different from those for, say, unlogged
tables.

> For starters, how would you keep the index up-to-date when the flat
> file is modified?

One way is to poll the remote source for evidence of such changes
during auto_vacuum or with similar daemon processes.  Another is by
waiting for a signal from an external source such as a NOTIFY.  Which
is more appropriate will again depend on circumstances.

> If you want to take advantage of PostgreSQL's indexing, you'll just
> have to just load the data into a regular table.

I disagree.  Indexing in general allows you to store only log-N index
rows for each N rows in an external table, which could be a very big
win.  Deciding in advance for everyone that this is not worthwhile is
not in our purview.

> > On the other hand, for a remote table, CREATE INDEX collects
> > information about the index on the specified column(s) for the
> > specified table that was created on the remote table.
> 
> I don't see the point of this either. The planner asks the FDW for
> cost estimates, and if the FDW knows about indexes in the remote
> server, it can certainly adjust the estimates accordingly. But
> that's all internal to the FDW. It might want delegate the whole
> cost estimation to the remote server by running EXPLAIN there, or it
> could use its knowledge of indexes that exist there, but I don't see
> why the rest of the system would need to know what indexes there are
> in the remote system.

Good point, for the remote index case, which I contend is not every
one :)

> If the FDW needs that information, it can query the remote server
> for it on first access, and cache the information for the lifetime
> of the session.

Of course, a mere few GB of information queried each time couldn't
possibly cause intolerable overheads...

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: patch: autocomplete for functions
Next
From: Jeff Davis
Date:
Subject: Re: initdb and fsync