Re: Proposal: Create index on foreign table - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: Proposal: Create index on foreign table |
Date | |
Msg-id | 20120316170743.GE456@fetter.org Whole thread Raw |
In response to | Re: Proposal: Create index on foreign table (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Proposal: Create index on foreign table
|
List | pgsql-hackers |
On Fri, Mar 16, 2012 at 11:58:29AM +0200, Heikki Linnakangas wrote: > On 16.03.2012 10:44, Etsuro Fujita wrote: > > I have a plan to support 'Create index on foreign table' for 9.3. > > Here is my plan. > > > > The index creation is supported for a flat file such as CSV and a As others, I don't see a reason to restrict this to some particular type of FDW. > > remote table on a RDB e.g., Postgres using CREATE INDEX. (I > > thought using a new statement, CREATE FOREIGN INDEX, at first, but > > I think that CREATE INDEX would be sufficient to define an index > > for the foreign table.) For a flat file, CREATE INDEX constructs > > an index in the same way as an index for a regular table. > > I think this belongs completely in the remote data source. I think this needs to be decided on a case-by-case basis instead. > If you want to index flat files, create an extra file for it or > something, and enhance the wrapper so that it can take advantage of > it. Keeping the index inside the database while the data is > somewhere else creates a whole new class of problems. How? These aren't super different from those for, say, unlogged tables. > For starters, how would you keep the index up-to-date when the flat > file is modified? One way is to poll the remote source for evidence of such changes during auto_vacuum or with similar daemon processes. Another is by waiting for a signal from an external source such as a NOTIFY. Which is more appropriate will again depend on circumstances. > If you want to take advantage of PostgreSQL's indexing, you'll just > have to just load the data into a regular table. I disagree. Indexing in general allows you to store only log-N index rows for each N rows in an external table, which could be a very big win. Deciding in advance for everyone that this is not worthwhile is not in our purview. > > On the other hand, for a remote table, CREATE INDEX collects > > information about the index on the specified column(s) for the > > specified table that was created on the remote table. > > I don't see the point of this either. The planner asks the FDW for > cost estimates, and if the FDW knows about indexes in the remote > server, it can certainly adjust the estimates accordingly. But > that's all internal to the FDW. It might want delegate the whole > cost estimation to the remote server by running EXPLAIN there, or it > could use its knowledge of indexes that exist there, but I don't see > why the rest of the system would need to know what indexes there are > in the remote system. Good point, for the remote index case, which I contend is not every one :) > If the FDW needs that information, it can query the remote server > for it on first access, and cache the information for the lifetime > of the session. Of course, a mere few GB of information queried each time couldn't possibly cause intolerable overheads... Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: