Re: A Replication Idea - Mailing list pgsql-general

From Command Prompt, Inc.
Subject Re: A Replication Idea
Date
Msg-id Pine.LNX.4.30.0202211642550.31949-100000@commandprompt.com
Whole thread Raw
In response to Re: A Replication Idea  ("Steve Wolfe" <steve@iboats.com>)
List pgsql-general
>How would it handle functions, which could potentially modify data, even
>from a select statement?

It seems that you'd have two options, if you wanted the proxy to be truly
transparent to the client:

  1. Send ALL SQL statements down the wire to each node, including SELECT
     statements, since selected functions may modify data.

  2. Write a small, fast, reliable parser that checks for criteria which
     would make the statement potentially data-modifying (e.g., the
     existence of a function), and send only data-modifying SELECTs along
     with your standard UPDATEs, DELETEs, etc.

However, it probably just occurred to you all as it just occurred to me
that this is pretty moot, because functions aren't the only concern: you
could have a trigger on a table that would wipe out idea #2. ;)

Really, there are too many transparent ways data can be modified by
seemingly innocuous statements, so parsing a statement for distribution
is right out; it seems as though each node is going to have to require a
copy of EACH statement that the proxy runs into in order to maintain 100%
integrity.

However, that doesn't mean your proxy needs to get answer back from all of
the nodes in terms of result sets. Something as simple as a systemic
packet indicating that the downstream-execution was successful would be
enough data for the proxy to know what's going on, provided it knows it
should get its answer soon from another node (e.g., the node with the
lowest load).

Result sets could still be cached based on a statement, within some
specified degree of accuracy (e.g., how much time elapses before a cached
resultset expires); you'd just need to make sure that even though you're
returning a cached result set, you still send the request to each back-end
to get processed in its own time.

Seems like some *really* careful threading might be called for; one thread
to listen to incoming traffic, from which downstream events are queued up,
another thread sending off those events to the back-end in the order they
were received, and another thread listening for answers from nodes, and
queueing up responses to be sent back to the appropriate client's socket.

Regards,
Jw.
--
jlx@commandprompt.com, by way of pgsql-general@commandprompt.com
http://www.postgresql.info/
http://www.commandprompt.com/


pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: ANY GOOD USER'S GUIDE ONLINE?? (with simple example
Next
From: Medi Montaseri
Date:
Subject: Re: A Replication Idea