Re: A Replication Idea - Mailing list pgsql-general

From Medi Montaseri
Subject Re: A Replication Idea
Date
Msg-id 3C759F80.9CEA2CB6@cybershell.com
Whole thread Raw
In response to Re: A Replication Idea  ("Command Prompt, Inc." <pgsql-general@commandprompt.com>)
List pgsql-general
I don't think we can figure out what the actual plans are based on the very
high level SQL language. The proxy should delve into a deeper layer after
the plan has been written and before the execuation is kicked in.

In other words, you take a PG engine, you pill off the fron end, parser, planner
part and then slip in a layer before the execution.

See  your installation docs, "Chap 2, Section 2.1 The Path of a Query"

The path is

Connection, Parser Stage, Rewrite System, Planner/Optimizer, Executor.

In fact the name is already there "Planner/Optimizer" what we want is
optimization. I know people usually mean a different thing, but why not.
HA is optimization as well...

By the way I got this idea from Solaris Virtual File System (VFS), I call
this VDB (Virtual DataBase).
 
 
 

"Command Prompt, Inc." wrote:

>How would it handle functions, which could potentially modify data, even
>from a select statement?

It seems that you'd have two options, if you wanted the proxy to be truly
transparent to the client:

  1. Send ALL SQL statements down the wire to each node, including SELECT
     statements, since selected functions may modify data.

  2. Write a small, fast, reliable parser that checks for criteria which
     would make the statement potentially data-modifying (e.g., the
     existence of a function), and send only data-modifying SELECTs along
     with your standard UPDATEs, DELETEs, etc.

However, it probably just occurred to you all as it just occurred to me
that this is pretty moot, because functions aren't the only concern: you
could have a trigger on a table that would wipe out idea #2. ;)

Really, there are too many transparent ways data can be modified by
seemingly innocuous statements, so parsing a statement for distribution
is right out; it seems as though each node is going to have to require a
copy of EACH statement that the proxy runs into in order to maintain 100%
integrity.

However, that doesn't mean your proxy needs to get answer back from all of
the nodes in terms of result sets. Something as simple as a systemic
packet indicating that the downstream-execution was successful would be
enough data for the proxy to know what's going on, provided it knows it
should get its answer soon from another node (e.g., the node with the
lowest load).

Result sets could still be cached based on a statement, within some
specified degree of accuracy (e.g., how much time elapses before a cached
resultset expires); you'd just need to make sure that even though you're
returning a cached result set, you still send the request to each back-end
to get processed in its own time.

Seems like some *really* careful threading might be called for; one thread
to listen to incoming traffic, from which downstream events are queued up,
another thread sending off those events to the back-end in the order they
were received, and another thread listening for answers from nodes, and
queueing up responses to be sent back to the appropriate client's socket.

Regards,
Jw.
--
jlx@commandprompt.com, by way of pgsql-general@commandprompt.com
http://www.postgresql.info/
http://www.commandprompt.com/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

-- 
-------------------------------------------------------------------------
Medi Montaseri                               medi@CyberShell.com
Unix Distributed Systems Engineer            HTTP://www.CyberShell.com
CyberShell Engineering
-------------------------------------------------------------------------
 

pgsql-general by date:

Previous
From: "Command Prompt, Inc."
Date:
Subject: Re: A Replication Idea
Next
From: Tom Lane
Date:
Subject: Re: number of connections to postmaster