Re: New hook after raw parsing, before analyze - Mailing list pgsql-hackers

From Greg Stark
Subject Re: New hook after raw parsing, before analyze
Date
Msg-id CAM-w4HPv6Cpd+LsEEoO8V48TbvAtxhaTTEP1VY4dVR0jD9q4AA@mail.gmail.com
Whole thread Raw
In response to Re: New hook after raw parsing, before analyze  (David Beck <dbeck@starschema.net>)
Responses Re: New hook after raw parsing, before analyze  (David Beck <dbeck@starschema.net>)
List pgsql-hackers
On Sat, Feb 15, 2014 at 2:06 PM, David Beck <dbeck@starschema.net> wrote:
> - when the query arrives a smart rewrite would know 1) what tables are local 2) what tables need new catalog entries
3)what can be joined on the other side
 
> - the rewriter would potentially add SQL statements in the beginning of the query for creating the missing FDW
catalogentries if needed
 
> - the FDW would be handled by the same extension so they can easily talk to each other about the status of the
objects,so the rewriter would know if the background transfer of the small table is completed and should do the rewrite
accordingly

I think you have a pretty big impedence mismatch with Postgres which
assumes the schema of the database is fairly static and known when
parsing begins. To do what you describe you pretty much need to write
your own SQL parser.

There is a hook "post_parse_analyze_hook" but I think it comes too
late as it comes after the analyze step which is when Postgres looks
up the schema information for every relation mentioned in the query.
What you would need is a post_parse_hook which would work on the raw
parse tree before the analyze step. That doesn't seem terribly
controversial to add though there may be some technical details. The
API would of course be completely unstable from major release to major
release -- the parse tree gets knocked around quite a bit.

But that only gets you so far. I think you would be able to get lazy
creation of schema objects fairly straightforwardly. Ie. Any legacy
objects referenced in a query get corresponding fdw schema objects
created before analyze is done -- though I would expect a few gotchas,
possibly deadlocks, with concurrent queries creating the same objects.

But I don't think that gets you the joins which I think would be quite a battle.

And I have to wonder if you aren't going the long way around to do
something that can be done more simply some other way. If you have
150k objects I wonder if your objects aren't all very similar and
could be handled by a single Postgres schema object. Either a single
FDW object or a simple function.

If they really are different schema objects perhaps a single function
that returns a more flexible data type like an hstore blob? That has
obvious disadvantages over an object that the planner understands
better but at least you would be in a well supported stable API (well,
for interesting definitions of "stable" given 9.4 will have a whole
new version of hstore).

As a side note, you should evaluate carefully what lazily creating
objects will buy you. Perhaps just creating 150k objects would be
cheaper than maintaining this code. In particular since the user
*might* access all 150k you still have to worry about the worst case
anyway and it might be cheaper to just engineer for it in the first
place.

-- 
greg



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Relocation of tablespaces in pg_basebackup
Next
From: Robert Haas
Date:
Subject: Re: old warning in docs