Thread: AW: Re: MySQL and BerkleyDB (fwd)

AW: Re: MySQL and BerkleyDB (fwd)

From

Zeugswetter Andreas SB

Date:

22 January 2001, 11:29:02

> > Is anyone looking at doing this?  Is this purely a MySQL-ism, or is it
> > something that everyone else has except us?
> 
> It's not required by SQL, that's for sure.  I think in 7.2 we'll tackle
> schema support, which will accomplish the same thing.

It does not (e.g. remote access). 

>  Many people
> (including myself) are of the opinion that not allowing cross-db access is
> in fact a feature.

Can you tell me what that "feature" gains you other than mere inconvenience ?
And are you implying, that all the other db's are misfeatured in this regard?

Andreas

Re: AW: Re: MySQL and BerkleyDB (fwd)

From

Peter Eisentraut

Date:

22 January 2001, 12:07:22

Zeugswetter Andreas SB writes:

> > It's not required by SQL, that's for sure.  I think in 7.2 we'll tackle
> > schema support, which will accomplish the same thing.
>
> It does not (e.g. remote access).

Maybe this is handled better by an external corba server or some such
thing.

> >  Many people
> > (including myself) are of the opinion that not allowing cross-db access is
> > in fact a feature.
>
> Can you tell me what that "feature" gains you other than mere inconvenience ?
> And are you implying, that all the other db's are misfeatured in this regard?

It's a safety/security measure to me.  As long as one backend process can
only touch one database you can control things much better.  This could be
overcome of course, but I'm quite happy with it.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/

RE: Re: AW: Re: MySQL and BerkleyDB (fwd)

From

"Mikheev, Vadim"

Date:

23 January 2001, 16:14:45

>   I had thought that the pre-commit information could be stored in an
> auxiliary table by the middleware program ; we would then have
> to re-implement some sort of higher-level WAL (I thought of the list
> of the commands performed in the current transaction, with a sequence
> number for each of them that would guarantee correct ordering between
> concurrent transactions in case of a REDO). But I fear I am missing

This wouldn't work for READ COMMITTED isolation level.
But why do you want to log commands into WAL where each modification
is already logged in, hm, correct order?
Well, it has sense if you're looking for async replication but
you need not in two-phase commit for this and should aware about
problems with READ COMMITTED isolevel.

Back to two-phase commit - it's easiest part of work required for
distributed transaction processing.
Currently we place single commit record to log and transaction is
committed when this record (and so all other transaction records)
is on disk.
Two-phase commit:

1. For 1st phase we'll place into log "prepared-to-commit" record  and this phase will be accomplished after record is
flushedon disk.  At this point transaction may be committed at any time because of  all its modifications are logged.
Butit still may be rolled back  if this phase failed on other sites of distributed system.
 

2. When all sites are prepared to commit we'll place "committed"  record into log. No need to flush it because of in
theevent of  crash for all "prepared" transactions recoverer will have to  communicate other sites to know their
statusesanyway.
 

That's all! It is really hard to implement distributed lock- and
communication- managers but there is no problem with logging two
records instead of one. Period.

Vadim

Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

From

Bruce Momjian

Date:

23 January 2001, 16:54:34

[ Charset ISO-8859-1 unsupported, converting... ]
> >   I had thought that the pre-commit information could be stored in an
> > auxiliary table by the middleware program ; we would then have
> > to re-implement some sort of higher-level WAL (I thought of the list
> > of the commands performed in the current transaction, with a sequence
> > number for each of them that would guarantee correct ordering between
> > concurrent transactions in case of a REDO). But I fear I am missing
> 
> This wouldn't work for READ COMMITTED isolation level.
> But why do you want to log commands into WAL where each modification
> is already logged in, hm, correct order?
> Well, it has sense if you're looking for async replication but
> you need not in two-phase commit for this and should aware about
> problems with READ COMMITTED isolevel.
> 

I believe the issue here is that while SERIALIZABLE ISOLATION means all
queries can be run serially, our default is READ COMMITTED, meaning that
open transactions see committed transactions, even if the transaction
committed after our transaction started.  (FYI, see my chapter on
transactions for help,  http://www.postgresql.org/docs/awbook.html.)

To do higher-level WAL, you would have to record not only the queries,
but the other queries that were committed at the start of each command
in your transaction.

Ideally, you could number every commit by its XID your log, and then
when processing the query, pass the "committed" transaction ids that
were visible at the time each command began.

In other words, you can replay the queries in transaction commit order,
except that you have to have some transactions committed at specific
points while other transactions are open, i.e.:

XID    Open XIDS    Query
500            UPDATE t SET col = 3;
501    500        BEGIN;
501    500        UPDATE t SET col = 4;
501            UPDATE t SET col = 5;
501            COMMIT;

This is a silly example, but it shows that 500 must commit after the
first command in transaction 501, but before the second command in the
transaction.  This is because UPDATE t SET col = 5 actually sees the
changes made by transaction 500 in READ COMMITTED isolation level.

I am not advocating this.  I think WAL is a better choice.  I just
wanted to outline how replaying the queries in commit order is 
insufficient.

> Back to two-phase commit - it's easiest part of work required for
> distributed transaction processing.
> Currently we place single commit record to log and transaction is
> committed when this record (and so all other transaction records)
> is on disk.
> Two-phase commit:
> 
> 1. For 1st phase we'll place into log "prepared-to-commit" record
>    and this phase will be accomplished after record is flushed on disk.
>    At this point transaction may be committed at any time because of
>    all its modifications are logged. But it still may be rolled back
>    if this phase failed on other sites of distributed system.
> 
> 2. When all sites are prepared to commit we'll place "committed"
>    record into log. No need to flush it because of in the event of
>    crash for all "prepared" transactions recoverer will have to
>    communicate other sites to know their statuses anyway.
> 
> That's all! It is really hard to implement distributed lock- and
> communication- managers but there is no problem with logging two
> records instead of one. Period.

Great.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

RE: Re: AW: Re: MySQL and BerkleyDB (fwd)

From

"Mikheev, Vadim"

Date:

24 January 2001, 15:03:44

> > 1. For 1st phase we'll place into log "prepared-to-commit" record
> >    and this phase will be accomplished after record is 
> >    flushed on disk.
> >    At this point transaction may be committed at any time because of
> >    all its modifications are logged. But it still may be rolled back
> >    if this phase failed on other sites of distributed system.
> 
> 1st phase will also need to do all the delayed constraint checks,
> and all other work a commit currently does, that could possibly lead 
> to a transaction abort. The 2nd phase of 2phase commit is not 

It was assumed.

Vadim

Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

From

Bruce Momjian

Date:

24 January 2001, 22:46:00

Added to TODO.detail/replication.

[ Charset ISO-8859-1 unsupported, converting... ]
> >   I had thought that the pre-commit information could be stored in an
> > auxiliary table by the middleware program ; we would then have
> > to re-implement some sort of higher-level WAL (I thought of the list
> > of the commands performed in the current transaction, with a sequence
> > number for each of them that would guarantee correct ordering between
> > concurrent transactions in case of a REDO). But I fear I am missing
> 
> This wouldn't work for READ COMMITTED isolation level.
> But why do you want to log commands into WAL where each modification
> is already logged in, hm, correct order?
> Well, it has sense if you're looking for async replication but
> you need not in two-phase commit for this and should aware about
> problems with READ COMMITTED isolevel.
> 
> Back to two-phase commit - it's easiest part of work required for
> distributed transaction processing.
> Currently we place single commit record to log and transaction is
> committed when this record (and so all other transaction records)
> is on disk.
> Two-phase commit:
> 
> 1. For 1st phase we'll place into log "prepared-to-commit" record
>    and this phase will be accomplished after record is flushed on disk.
>    At this point transaction may be committed at any time because of
>    all its modifications are logged. But it still may be rolled back
>    if this phase failed on other sites of distributed system.
> 
> 2. When all sites are prepared to commit we'll place "committed"
>    record into log. No need to flush it because of in the event of
>    crash for all "prepared" transactions recoverer will have to
>    communicate other sites to know their statuses anyway.
> 
> That's all! It is really hard to implement distributed lock- and
> communication- managers but there is no problem with logging two
> records instead of one. Period.
> 
> Vadim
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

From

Bruce Momjian

Date:

24 January 2001, 22:46:01

Added to TODO.detail/replication.

> [ Charset ISO-8859-1 unsupported, converting... ]
> > >   I had thought that the pre-commit information could be stored in an
> > > auxiliary table by the middleware program ; we would then have
> > > to re-implement some sort of higher-level WAL (I thought of the list
> > > of the commands performed in the current transaction, with a sequence
> > > number for each of them that would guarantee correct ordering between
> > > concurrent transactions in case of a REDO). But I fear I am missing
> > 
> > This wouldn't work for READ COMMITTED isolation level.
> > But why do you want to log commands into WAL where each modification
> > is already logged in, hm, correct order?
> > Well, it has sense if you're looking for async replication but
> > you need not in two-phase commit for this and should aware about
> > problems with READ COMMITTED isolevel.
> > 
> 
> I believe the issue here is that while SERIALIZABLE ISOLATION means all
> queries can be run serially, our default is READ COMMITTED, meaning that
> open transactions see committed transactions, even if the transaction
> committed after our transaction started.  (FYI, see my chapter on
> transactions for help,  http://www.postgresql.org/docs/awbook.html.)
> 
> To do higher-level WAL, you would have to record not only the queries,
> but the other queries that were committed at the start of each command
> in your transaction.
> 
> Ideally, you could number every commit by its XID your log, and then
> when processing the query, pass the "committed" transaction ids that
> were visible at the time each command began.
> 
> In other words, you can replay the queries in transaction commit order,
> except that you have to have some transactions committed at specific
> points while other transactions are open, i.e.:
> 
> XID    Open XIDS    Query
> 500            UPDATE t SET col = 3;
> 501    500        BEGIN;
> 501    500        UPDATE t SET col = 4;
> 501            UPDATE t SET col = 5;
> 501            COMMIT;
> 
> This is a silly example, but it shows that 500 must commit after the
> first command in transaction 501, but before the second command in the
> transaction.  This is because UPDATE t SET col = 5 actually sees the
> changes made by transaction 500 in READ COMMITTED isolation level.
> 
> I am not advocating this.  I think WAL is a better choice.  I just
> wanted to outline how replaying the queries in commit order is 
> insufficient.
> 
> > Back to two-phase commit - it's easiest part of work required for
> > distributed transaction processing.
> > Currently we place single commit record to log and transaction is
> > committed when this record (and so all other transaction records)
> > is on disk.
> > Two-phase commit:
> > 
> > 1. For 1st phase we'll place into log "prepared-to-commit" record
> >    and this phase will be accomplished after record is flushed on disk.
> >    At this point transaction may be committed at any time because of
> >    all its modifications are logged. But it still may be rolled back
> >    if this phase failed on other sites of distributed system.
> > 
> > 2. When all sites are prepared to commit we'll place "committed"
> >    record into log. No need to flush it because of in the event of
> >    crash for all "prepared" transactions recoverer will have to
> >    communicate other sites to know their statuses anyway.
> > 
> > That's all! It is really hard to implement distributed lock- and
> > communication- managers but there is no problem with logging two
> > records instead of one. Period.
> 
> Great.
> 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026