Re: A Replication Idea - Mailing list pgsql-general
From | Medi Montaseri |
---|---|
Subject | Re: A Replication Idea |
Date | |
Msg-id | 3C759F80.9CEA2CB6@cybershell.com Whole thread Raw |
In response to | Re: A Replication Idea ("Command Prompt, Inc." <pgsql-general@commandprompt.com>) |
List | pgsql-general |
high level SQL language. The proxy should delve into a deeper layer after
the plan has been written and before the execuation is kicked in.
In other words, you take a PG engine, you pill off the fron end, parser, planner
part and then slip in a layer before the execution.
See your installation docs, "Chap 2, Section 2.1 The Path of a Query"
The path is
Connection, Parser Stage, Rewrite System, Planner/Optimizer, Executor.
In fact the name is already there "Planner/Optimizer" what we want is
optimization. I know people usually mean a different thing, but why not.
HA is optimization as well...
By the way I got this idea from Solaris Virtual File System (VFS), I call
this VDB (Virtual DataBase).
"Command Prompt, Inc." wrote:
>How would it handle functions, which could potentially modify data, even
>from a select statement?It seems that you'd have two options, if you wanted the proxy to be truly
transparent to the client:1. Send ALL SQL statements down the wire to each node, including SELECT
statements, since selected functions may modify data.2. Write a small, fast, reliable parser that checks for criteria which
would make the statement potentially data-modifying (e.g., the
existence of a function), and send only data-modifying SELECTs along
with your standard UPDATEs, DELETEs, etc.However, it probably just occurred to you all as it just occurred to me
that this is pretty moot, because functions aren't the only concern: you
could have a trigger on a table that would wipe out idea #2. ;)Really, there are too many transparent ways data can be modified by
seemingly innocuous statements, so parsing a statement for distribution
is right out; it seems as though each node is going to have to require a
copy of EACH statement that the proxy runs into in order to maintain 100%
integrity.However, that doesn't mean your proxy needs to get answer back from all of
the nodes in terms of result sets. Something as simple as a systemic
packet indicating that the downstream-execution was successful would be
enough data for the proxy to know what's going on, provided it knows it
should get its answer soon from another node (e.g., the node with the
lowest load).Result sets could still be cached based on a statement, within some
specified degree of accuracy (e.g., how much time elapses before a cached
resultset expires); you'd just need to make sure that even though you're
returning a cached result set, you still send the request to each back-end
to get processed in its own time.Seems like some *really* careful threading might be called for; one thread
to listen to incoming traffic, from which downstream events are queued up,
another thread sending off those events to the back-end in the order they
were received, and another thread listening for answers from nodes, and
queueing up responses to be sent back to the appropriate client's socket.Regards,
Jw.
--
jlx@commandprompt.com, by way of pgsql-general@commandprompt.com
http://www.postgresql.info/
http://www.commandprompt.com/---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
-- ------------------------------------------------------------------------- Medi Montaseri medi@CyberShell.com Unix Distributed Systems Engineer HTTP://www.CyberShell.com CyberShell Engineering -------------------------------------------------------------------------
pgsql-general by date: