Re: Feature Request for 7.5 - Mailing list pgsql-general

From Jan Wieck
Subject Re: Feature Request for 7.5
Date
Msg-id 3FCDF678.6010504@Yahoo.com
Whole thread Raw
In response to Re: Feature Request for 7.5  ("Chris Travers" <chris@travelamericas.com>)
List pgsql-general
Chris Travers wrote:
> Comments inline
>
> From: "Jan Wieck" <JanWieck@Yahoo.com>:
>> There are many problems with a "proxy" solution. One is that you really
>> don't know if a statement does modify the database or not. A SELECT for
>> example can call a user defined function somewhere and that can do
>> whatever the programmer likes it to do. So you would have to "replicate"
>> all that too. Granted, you can exclude this type of database usage from
>> your supported list.
>
> That is why it would be nice to be able to check for altered tuples on a
> select before deciding whether to replicate...  In this case you could have
> a query->check for altered tuples-> if altered replicate query routine.

Better make that "did dirty any block". There are other things than tuples.

>> Next you don't have control over sequence allocation. Every application
>> that uses sequence allocated ID's is in danger, because they are not
>> blocking, you cannot force the order of assignments and they don't roll
>> back either.
>
> This is the more serious problem.  I will have to think this one over.  I
> wonder about having cross-proxy sequence generators.
>
>>
>> And you get into deadlock problems if you don't guarantee that your
>> proxy uses the same order to access all databases. And you cannot
>> guarantee that if your proxy tries to do it parallel. So it has to do
>> the queries against all databases one by one, that doesn't scale well.
>>
> This is true also, but if the sequence of the queries is similar, then I am
> having trouble seeing how the deadlocks would happen on a server in a case
> where you wouldn't otherwise have one.  Since a deadlock on ONE server would
> force a restore process (with some performance problems in the beginning of
> that process), it would not be too bad.

I think you don't understand completely.

Imagine databases DB1, and DB2 with proxy connections (client->proxy)
PC-A and PC-B having database connections PC-A1, PC-A2 and PC-B1, PC-B2.
Now A does "START TRANSACTION" and "UPDATE tab1 SET data=1 WHERE id=666"
while B does "START TRANSACTION" and "UPDATE tab1 SET data=2 WHERE id=666".

If PC-A does this on PC-A1 first while PC-B forwards the statements on
PC-B2, then both proxies will succeed and A has a lock for the row in
DB1 while B holds it in DB2. Now A tries the UPDATE in DB2 and blocks,
while B does the same in DB1 and blocks. None of the databases can
detect the deadlock, because for them it isn't clear that both are
waiting on each other but in different databases.

You have a couple of choices to solve this problem.

You can let A complete it's mission on DB1, buffer the remaining actions
in B and do vice versa with B on DB2 buffering for A. But you will later
have A stomping over B's update in DB2 and B doing the same in DB1 and
the end result is DB1: data=2 while DB2: data=1 ... out of sync.

You can avoid this by executing all queries on the same database first,
and repeat them on all other databases in the order they finished on
that leader database. This way the whole thing will not be able to be
any faster than one single database. There is no scaling here. All you
have is backup servers for failover.

So in order to scale you need to do your own deadlock detection in the
proxy server. I think real deadlock detection will lead to parsing a
substantial amount of SQL just to have a second level lock management
and is probably out of the scope of your proposal. So you could only
shoot for timeouts. Guess what, PostgreSQL did detect deadlocks that way
once, way back ... people didn't like it.

>
>> The last thing (I mention for now) is that I cannot imagine any way that
>> such proxy code allows for a new member to join without stopping the
>> whole application, creating an identical copy of one member
>> (dump+restore) and continue. So it is impossible to build 24*7 support
>> that way.
>
> Not too hard.  Read my comments on restoring from failure for details.  The
> same proceedure could be used to add a new member.  The only performance
> drawback is that new transactions would have to be queued up (uncommitted)
> while the old ones complete.  If you have hanging transactions, this could
> be a problem.  What we have is the following process:
> 1)  PRepare query queue for storing incoming queries.
> 2)  Request a restore point.
> 3)  At this point, all new queries get queued in the query queue.  No new
> transactions may be committed.
> 4)  When all transactions which were open at the beginning of step 3 are
> closed, give permission to start restore, and address of server to use.
> 5)  Use PGDump to start the restore.
> 6)  New transactions may now be committed.
> 7) when restore finishes, start committing transactions in the query log in
> order of committal.

Since this is one of my remaining problems for Slony-II I am curious how
exactly you determine the commit order. I know that it is not the order
in which your proxies issue the commit, nor must it necessarily be the
order in which they get the response for it either. So how do you do it?

> 8)  When no closed transactions remain, change status to online.
>>
>> No, separate proxy code doesn't strike me as the superior solution.
>>
> There are advantages and disadvantages to either.  The other option is to
> use some sort of library to handle the additional clustering protocols.
> Either way is limited and difficult.  Still working on these problems.

The real other option is full synchronous multi-master. I am planning
that, but it will take me a couple more months to get Slony-I done
before I can really start on that.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


pgsql-general by date:

Previous
From: Harald Fuchs
Date:
Subject: Re: DBD::Pg problem
Next
From: Carmen Gloria Sepulveda Dedes
Date:
Subject: autocommit off