Thread: why vacuum

why vacuum

From

Kenneth Gonsalves

Date:

26 October 2005, 01:47:28

hi,
i was in a minor flame war with a mysql guy - his major grouse was that
'I wouldnt commit mission critical data to a database that needs to be
vacuumed once a week'. So why does pg need vacuum?
--
regards
kg

http://www.livejournal.com/users/lawgon
tally ho! http://avsap.org.in
ಇಂಡ್ಲಿನಕ್ಸ வாழ்க!

Re: why vacuum

From

Tom Lane

Date:

26 October 2005, 02:14:43

Kenneth Gonsalves <lawgon@thenilgiris.com> writes:
> i was in a minor flame war with a mysql guy - his major grouse was that 
> 'I wouldnt commit mission critical data to a database that needs to be 
> vacuumed once a week'.

This guy is not worth arguing with.

> So why does pg need vacuum?

Every database needs maintenance operations.  PG is designed in a way
that exposes the maintenance operations to the control of the DBA a bit
more than most other DBMSes do: specifically, you get to decide when
some of the overhead work happens.  We think this is a feature, because
you can schedule the overhead for low-activity periods (nights,
weekends, whatever).  In other DBMSes the equivalent work happens as
part of foreground queries, no matter how time-critical they might be.

Now, there's no doubt that for a database run by a non-expert person
who can't even spell DBA, exposing this sort of knob isn't very helpful.
So there's work afoot to provide automatic maintenance tools (ie,
autovacuum).  Over time I think autovacuum will get smart enough that
even experts will usually use it.  But that point will only be reached
when autovacuum has some idea about doing more work during low-load
periods.

Unless MySQL invents some concept equivalent to VACUUM, they won't have
any prayer at all of being able to shift maintenance overhead to
low-load times.
        regards, tom lane

Re: why vacuum

From

Michael Fuhr

Date:

26 October 2005, 02:57:27

On Wed, Oct 26, 2005 at 10:15:17AM +0530, Kenneth Gonsalves wrote:
> i was in a minor flame war with a mysql guy - his major grouse was that 
> 'I wouldnt commit mission critical data to a database that needs to be 
> vacuumed once a week'.

The use of the word "commit" is amusing, considering that MySQL's
default table type doesn't support transactions.  There's always
InnoDB, but it seems like there was something about that in the
news recently....

Compare the following lists of gotchas and decide which database
*you'd* commit mission-critical data to:

http://sql-info.de/mysql/gotchas.html
http://sql-info.de/postgresql/postgres-gotchas.html

> So why does pg need vacuum?

See the documentation:

http://www.postgresql.org/docs/8.0/interactive/maintenance.html#ROUTINE-VACUUMING

-- 
Michael Fuhr

Re: why vacuum

From

"Bath, David"

Date:

26 October 2005, 03:12:49

On Wed, 26 Oct 2005 15:14, Tom Lane wrote: > Kenneth Gonsalves <lawgon@thenilgiris.com> writes: > > (A MySQueeeel guy
said,not Kenneth)... > > 'I wouldnt commit mission critical data to a database that needs to be  > > vacuumed once a
week'.
My two-penneth worth...<flamage>
I wouldn't commit mission critical data to a database (or DBA) that doesn't
have a concept of vacuuming (or desire to do it regularly).  But, less
flamingly, I wouldn't commit mission-critical data to something that lacked
the ability to have proper constraints, triggers and server-side procedures
to ensure the data actually remains sensible.  Note that Sybase/MS-SQL's
check constraint model asserts the constraint BEFORE the trigger, which
discourages you from attempting to check and handle meaning of data!</flamage>
 > This guy is not worth arguing with.
D'Accord!
 > > So why does pg need vacuum?
For (inter alia) the same reason that
* Oracle has an ANALYZE_SCHEMA and DBMS_SPACE_ADMIN
and (hoist by his own petard)
* MySQueeeeaL has myisamchk --stats_method=method_name --analyze

<flamage>Oh, well: MySQL bigot and internal consistency? whadya expect?</flamage>
Dave Bath
(Oracle DBA for health/telcos way back in 1986: honeywrong GCOS and Pr1mos)
-- 
David T. Bath
dave.bath@unix.net

Re: why vacuum

From

Tom Lane

Date:

26 October 2005, 04:00:28

"Bath, David" <dave.bath@unix.net> writes:
> ... Note that Sybase/MS-SQL's
> check constraint model asserts the constraint BEFORE the trigger, which
> discourages you from attempting to check and handle meaning of data!

Er, doesn't PG do it that way too?
        regards, tom lane

Re: why vacuum

From

Kenneth Gonsalves

Date:

26 October 2005, 04:51:45

On Wednesday 26 Oct 2005 11:52 am, Bath, David wrote:
>   > This guy is not worth arguing with.
> D'Accord!

thanks all for the clarification. in case anyone is interested in the
original conversation it is here:
http://ebergen.net/wordpress/?p=83

--
regards
kg

http://www.livejournal.com/users/lawgon
tally ho! http://avsap.org.in
ಇಂಡ್ಲಿನಕ್ಸ வாழ்க!

Re: why vacuum

From

Scott Marlowe

Date:

26 October 2005, 11:44:52

On Tue, 2005-10-25 at 23:45, Kenneth Gonsalves wrote:
> hi,
> i was in a minor flame war with a mysql guy - his major grouse was that 
> 'I wouldnt commit mission critical data to a database that needs to be 
> vacuumed once a week'. So why does pg need vacuum?

The absolutely funniest thing about what this guy is saying is that he
seems rather ignorant of the behaviour of innodb tables.  They have
another name for the vacuum command there.  It's:

ALTER TABLE tbl_name ENGINE=INNODB

Which rebuilds the whole fraggin's table, with an exclusive lock.

and guess what innodb does if you don't run this command every so often?

Can you guess yet?  Yep, that's right, it just keeps growing and growing
and growing.

Hell, innodb isn't any better than the original mvcc implementation
postgresql had when vacuums were all full and took exclusive locks.

But at least with PostgreSQL it was a well documented issue, and was
mentioned in the administrative section of the docs, so you knew you had
to do it.  It's kind of tucked away in the innodb section of the mysql
docs, and most mysql folks don't even know they need to do it, since
they almost all use myisam table types.

If someone is more worried about postgresql's non-blocking, easily
scheduled vacuuming, but is using myisam tables, and a database that by
default allows numeric overflows to just insert the maximum possible
value, I wouldn't trust them with handling my paycheck with their
fan-boy database.

Re: why vacuum

From

Scott Marlowe

Date:

26 October 2005, 12:19:52

On Tue, 2005-10-25 at 23:45, Kenneth Gonsalves wrote:
> hi,
> i was in a minor flame war with a mysql guy - his major grouse was that 
> 'I wouldnt commit mission critical data to a database that needs to be 
> vacuumed once a week'. So why does pg need vacuum?

Oh man oh man.  After reading the article, I realized he was saying that
he wouldn't trust PostgreSQL to replace Oracle.  He apparently wouldn't
trust MySQL to replace oracle either.

But, the next time someone says that slony is a toy add on, and MySQL
has REAL replication, point them to THIS page on the same blog:

http://ebergen.net/wordpress/?p=70

In short, it basically shows that MySQL replication is incredibly
fragile, and not fit for production on any real system.  The lack of
system wide transaction support, like postgresql has, makes the problem
he outlines that much worse.

The hoops people will jump through to use their favorite toys...

Re: why vacuum

From

Jan Wieck

Date:

26 October 2005, 13:09:19

On 10/26/2005 11:19 AM, Scott Marlowe wrote:

> On Tue, 2005-10-25 at 23:45, Kenneth Gonsalves wrote:
>> hi,
>> i was in a minor flame war with a mysql guy - his major grouse was that 
>> 'I wouldnt commit mission critical data to a database that needs to be 
>> vacuumed once a week'. So why does pg need vacuum?
> 
> Oh man oh man.  After reading the article, I realized he was saying that
> he wouldn't trust PostgreSQL to replace Oracle.  He apparently wouldn't
> trust MySQL to replace oracle either.
> 
> But, the next time someone says that slony is a toy add on, and MySQL
> has REAL replication, point them to THIS page on the same blog:
> 
> http://ebergen.net/wordpress/?p=70
> 

You must have missed the FAQ and other side notes about replication in 
the MySQL manual. Essentially MySQL replication is nothing but a query 
duplicating system, with the added sugar of taking care of now() and 
some other non-deterministic things, but not all of them.

Non-deterministic user defined procedures, functions and triggers will 
simply blow MySQL's sophisticated replication apart.


Jan

> In short, it basically shows that MySQL replication is incredibly
> fragile, and not fit for production on any real system.  The lack of
> system wide transaction support, like postgresql has, makes the problem
> he outlines that much worse.
> 
> The hoops people will jump through to use their favorite toys...
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: why vacuum

From

Scott Marlowe

Date:

26 October 2005, 13:10:32

On Wed, 2005-10-26 at 11:09, Jan Wieck wrote:
> On 10/26/2005 11:19 AM, Scott Marlowe wrote:
> 
> > On Tue, 2005-10-25 at 23:45, Kenneth Gonsalves wrote:
> >> hi,
> >> i was in a minor flame war with a mysql guy - his major grouse was that 
> >> 'I wouldnt commit mission critical data to a database that needs to be 
> >> vacuumed once a week'. So why does pg need vacuum?
> > 
> > Oh man oh man.  After reading the article, I realized he was saying that
> > he wouldn't trust PostgreSQL to replace Oracle.  He apparently wouldn't
> > trust MySQL to replace oracle either.
> > 
> > But, the next time someone says that slony is a toy add on, and MySQL
> > has REAL replication, point them to THIS page on the same blog:
> > 
> > http://ebergen.net/wordpress/?p=70
> > 
> 
> You must have missed the FAQ and other side notes about replication in 
> the MySQL manual. Essentially MySQL replication is nothing but a query 
> duplicating system, with the added sugar of taking care of now() and 
> some other non-deterministic things, but not all of them.
> 
> Non-deterministic user defined procedures, functions and triggers will 
> simply blow MySQL's sophisticated replication apart.


True, but I never expected a CTRL-C to the mysql command line to break
replication.  Even for MySQL's lackadaisical behaviour, that's pretty
far out.

Re: why vacuum

From

Tom Lane

Date:

26 October 2005, 13:13:03

Scott Marlowe <smarlowe@g2switchworks.com> writes:
> On Tue, 2005-10-25 at 23:45, Kenneth Gonsalves wrote:
>> i was in a minor flame war with a mysql guy - his major grouse was that 
>> 'I wouldnt commit mission critical data to a database that needs to be 
>> vacuumed once a week'. So why does pg need vacuum?

> Oh man oh man.  After reading the article, I realized he was saying that
> he wouldn't trust PostgreSQL to replace Oracle.

Well, that's a slightly more respectable point of view, but Oracle has
surely got its own set of gotchas ... doesn't it still have issues if
you run a transaction that's large enough to overrun the fixed-size
rollback areas (or whatever they call them)?
        regards, tom lane

Re: why vacuum

From

Scott Marlowe

Date:

26 October 2005, 13:18:41

On Wed, 2005-10-26 at 11:12, Tom Lane wrote:
> Scott Marlowe <smarlowe@g2switchworks.com> writes:
> > On Tue, 2005-10-25 at 23:45, Kenneth Gonsalves wrote:
> >> i was in a minor flame war with a mysql guy - his major grouse was that 
> >> 'I wouldnt commit mission critical data to a database that needs to be 
> >> vacuumed once a week'. So why does pg need vacuum?
> 
> > Oh man oh man.  After reading the article, I realized he was saying that
> > he wouldn't trust PostgreSQL to replace Oracle.
> 
> Well, that's a slightly more respectable point of view, but Oracle has
> surely got its own set of gotchas ... doesn't it still have issues if
> you run a transaction that's large enough to overrun the fixed-size
> rollback areas (or whatever they call them)?

Yep, and it also has the "snapshot too old" issue for long running
transactions (long running meaning how many other things have happened
since it started, not length of time).  Imagine starting a backup in
postgresql, getting an hour into it and suddenly it ends because
10,000,000 rows have been inserted while it was going.  That kind of
error.  My reply on that forum pointed out that EVERY database has
gotchas, and to pretend that your database's gotchas are ok but some
other database's gotchas are unacceptable is disingenuous at best.

Yes, pg does triggers first before asserting check constraints! Was Re: why vacuum

From

"Bath, David"

Date:

26 October 2005, 21:32:17

Tom,

After I wrote
> > Sybase/MS-SQL's check constraint model asserts the constraint
> > BEFORE the trigger, which discourages you from attempting to>
> > check and handle meaning of data! 
you wrote (2005-10-26 17:00)
> Er, doesn't PG do it that way too?

Well, it works for me!  In this case (with examples and caveats
below), postgresql (and Oracle) have got it right, which is a BIG
reason why you never get an Oracle guy to define Sybase/MS-SQL
systems - the other big reason is when you look at the sybase/mssql
nestlevel internal parameter - got the scars to prove it!

Quite frankly, if pg did not allow you to tidy things during
pre-insert and pre-update triggers, I wouldn't be so keen on it,
and stick to Oracle.

My general approach is:
1) Pre-* triggers are for tidying up the data to cover for  what is obviously a typo by user and their intent is clear
2) Post-* triggers are for propagating required data changes,  i.e. implications such as updating the "current balance"
attribute in a "customer account" record whenever the  dollar value in a transaction detail record changes.
 

Let me give a simple example:
1) Define attribute x as a varchar(whatever).
2) Ensure x has no leading/trailing whites  ... CHECK ((NOT (X ~ '^[ \t\n\r]')) AND (NOT (X ~ '[ \t\n\r]$')))
3) During "BEFORE INSERT" and "BEFORE UPDATE" (rowlevel) triggers, include  NEW.x := btrim(NEW.x, ' \t\n\r');
4) INSERT INTO y (x, ...) VALUES ('\tblah blahdy blah\n', ...)
5) SELECT x FROM y WHERE ...  Get 'blah blahdy blah' back.
Any processing overhead is trivial compared to the time wasted by
users, by a dba when the user's complain, or undesired application
behaviour when developers make invalid assumptions about the data.

Another useful example, based on my opinion/experience that
any change of arbitrary primary keys is imnsho wrong-headed
and recoverable, I usually do the following in a pre-update
row-level trigger, especially when pk is set from a sequence: NEW.pk := OLD.pk ;
or are least NEW.pk := coalesce(NEW.pk, OLD.pk)

Caveat:
This approach DOES NOT WORK if we
1) define a domain (z) as a varchar(whatever),
2) put the constraint on z
3) use domain z as the datatype for x
4) attempt to change x during pre* rowlevel triggers as above  as pg barfs as soon as you assign a value to a domain
that breaks its constraint.
 
This caveat prompted my recent question on this list about the
possibility of a "pre-assert trigger" on a domain.  (Version 9?
Pretty please with sugar on top?).

Wow! pg is even smarter than even YOU thought Tom!  For this to
be the case, you guys must be excellent designers and coders,
and I tips my lid to you.  Bugs are common, serendipitous useful
features are almost as rare as neonates with teeth.

-- 
dave.bath@unix.net
Question for the day: delta(dummies_guide, executive_summary)=?

Re: Yes, pg does triggers first before asserting check constraints! Was Re: why vacuum

From

Tom Lane

Date:

26 October 2005, 21:55:34

"Bath, David" <dave.bath@unix.net> writes:
> you wrote (2005-10-26 17:00)
>>> Sybase/MS-SQL's check constraint model asserts the constraint
>>> BEFORE the trigger, which discourages you from attempting to>
>>> check and handle meaning of data! 

>> Er, doesn't PG do it that way too?

> Well, it works for me!

You're right of course.  I was confusing this with a related problem
that people occasionally complain about, which is that datatype-related
errors are thrown before the trigger can do anything about them.  For
instance, you can't stuff "abc" into a int4 field and hope that the
trigger will have a chance to replace it with something valid.  (Yes,
people have asked for that :-()  A less silly example is that domain
constraints on a field of a domain type get checked before the trigger
can run.
        regards, tom lane

Re: why vacuum

From

Igor Shevchenko

Date:

26 October 2005, 22:34:06

On Wednesday 26 October 2005 17:44, Scott Marlowe wrote:
> On Tue, 2005-10-25 at 23:45, Kenneth Gonsalves wrote:
> > hi,
> > i was in a minor flame war with a mysql guy - his major grouse was that
> > 'I wouldnt commit mission critical data to a database that needs to be
> > vacuumed once a week'. So why does pg need vacuum?
>
> The absolutely funniest thing about what this guy is saying is that he
> seems rather ignorant of the behaviour of innodb tables.  They have
> another name for the vacuum command there.  It's:
>
> ALTER TABLE tbl_name ENGINE=INNODB
>
> Which rebuilds the whole fraggin's table, with an exclusive lock.
>
> and guess what innodb does if you don't run this command every so often?
>
> Can you guess yet?  Yep, that's right, it just keeps growing and growing
> and growing.

Not quite so.

I'm running quite a few (>50) mysql/innodb servers with database sizes raging 
from 500mb to 50gb, and I never had to rebuild any innodb tables this way. 
InnoDB uses index-based data storage and rollback segments, which makes it 
harder to add bloat to their databases, as compared to PG (but autovacuum is 
my saviour). Innodb will actually free space when you do DELETE or TRUNCATE, 
but still, it's tables, indexes and tablespaces will get fragmented. This 
gets worse over time, but it had never been a big problem for me. My 
databases do 50 queries/second on average, 24/7. Note - all of this can be 
due to my access and data change patterns; YMMV. The "only" cleanup operation 
I do is CHECK/OPTIMIZE, on monthly basis; it's not much better than old PG's 
VACUUM, as it brings mysql/innodb's performance down by 5x-10x times; same 
goes for almost any long-running query.

I'm moving those servers to PG, due to this (concurrency) and other reasons. 
My top 3 reasons are: a much better concurrency (even with bg vacuums 
running :-), a much better planner, and PG's rich feature set.

-- 
Best Regards,
Igor Shevchenko

Re: why vacuum

From

Mario Splivalo

Date:

27 October 2005, 05:22:39

On Wed, 2005-10-26 at 10:19 -0500, Scott Marlowe wrote:

> But, the next time someone says that slony is a toy add on, and MySQL
> has REAL replication, point them to THIS page on the same blog:
> 
> http://ebergen.net/wordpress/?p=70
> 
> In short, it basically shows that MySQL replication is incredibly
> fragile, and not fit for production on any real system.  The lack of
> system wide transaction support, like postgresql has, makes the problem
> he outlines that much worse.
> 
> The hoops people will jump through to use their favorite toys...

I see no point in blatantly putting 'other' products such shape. Pgsql
offers no replication at all, you need to use slony (wich is also a poor
replacement for a wannabe replication), or some other commercial
products. What about 2PC? What about linking the databases from
different servers?

I've been using MSSQL for over 6 years now, started with MSSQL7 and went
on with MSSQL2000. The replication it offers is superb! It runs
smoothly, you have click-me-click interface with wich you can create
publications and deploy them to subscribers with ease. Ok, there are
gotchas (and they-re ms-style funny), but it's all well documented, and
works most of the time as expected. So what?

Btw, I 'ported' the merge replication from MSSQL to postgres. It
basicaly adds triggers to every table that is 'published' for
replication. There is a separate table to store and calculate the change
differences from several servers (so you could do update on any of the
servers and change will be propagated to the others). I'm missing 2PC
badly here, I wrote some stupid python 'thingie' wich should act as 2PC
serializer, but that's slow as hell. And triggers slow down postgres
quite a bit.

So, to end this 'my father has bigger car than yours' debate, when will
postgres have two phase commit protocol implemented? I presume that
should come after you allow something like SELECT someCol FROM
myServer.myDatabase[Schema].myTable...
Mike
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."

Re: why vacuum

From

Mario Splivalo

Date:

27 October 2005, 05:22:48

On Wed, 2005-10-26 at 12:09 -0400, Jan Wieck wrote:
> > 
> 
> You must have missed the FAQ and other side notes about replication in 
> the MySQL manual. Essentially MySQL replication is nothing but a query 
> duplicating system, with the added sugar of taking care of now() and 
> some other non-deterministic things, but not all of them.
> 
> Non-deterministic user defined procedures, functions and triggers will 
> simply blow MySQL's sophisticated replication apart.
> 

That is just not true. I haven't tried MySQL 4 and above, but in 3.5.x
replication wos working as expected, and the load that replication posed
was insignificant. The only TRUE problem was that replication was
unidirectional. That SAME problem has Slony, and other 'replication
systems' available for postgres.
Mike
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."

Re: why vacuum

From

Andrew Sullivan

Date:

27 October 2005, 07:21:23

On Thu, Oct 27, 2005 at 10:22:41AM +0200, Mario Splivalo wrote:
> offers no replication at all, you need to use slony (wich is also a poor
> replacement for a wannabe replication), or some other commercial
> products. What about 2PC? What about linking the databases from

Slony is in fact a community-supported system; so I don't know why
you think that amounts to "no replication at all".  And since this is
a community-supported system, it'd be nice if you said why it's a
"poor replacement for wannabe replication".  What's wrong with it?

> Btw, I 'ported' the merge replication from MSSQL to postgres. It
> basicaly adds triggers to every table that is 'published' for
> replication. There is a separate table to store and calculate the change
> differences from several servers (so you could do update on any of the
> servers and change will be propagated to the others). I'm missing 2PC
> badly here, I wrote some stupid python 'thingie' wich should act as 2PC
> serializer, but that's slow as hell. And triggers slow down postgres
> quite a bit.

This is interesting.  Care to package it up for others, or write a
proof-of-concept outline for the lists or General Bits or something
like that?  This is a different sort of replication people are asking
for.  Note that you get 2PC in the next Postgres release.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
When my information changes, I alter my conclusions.  What do you do sir?    --attr. John Maynard Keynes

Re: why vacuum

From

Mario Splivalo

Date:

27 October 2005, 09:34:08

On Thu, 2005-10-27 at 06:21 -0400, Andrew Sullivan wrote:
> On Thu, Oct 27, 2005 at 10:22:41AM +0200, Mario Splivalo wrote:
> > offers no replication at all, you need to use slony (wich is also a poor
> > replacement for a wannabe replication), or some other commercial
> > products. What about 2PC? What about linking the databases from
> 
> Slony is in fact a community-supported system; so I don't know why
> you think that amounts to "no replication at all".  And since this is
> a community-supported system, it'd be nice if you said why it's a
> "poor replacement for wannabe replication".  What's wrong with it?

Postgres itself offers no replication. You could achive some sort of
replication by restoring the parts of WAL files, but that's rather
inconvinient. Then, if you want to replicate your data in any way, you
need to take slony, or whatever is existant out there, commercial or
open-free-source.
I appologize here if I insulted the slony developers, I ment nothing
like that :) I am sorry. I was just addressing the issue where
replication to one means 'just move my data here', and at others it
means 'merge my data'.

> > Btw, I 'ported' the merge replication from MSSQL to postgres. It
> > basicaly adds triggers to every table that is 'published' for
> > replication. There is a separate table to store and calculate the change
> > differences from several servers (so you could do update on any of the
> > servers and change will be propagated to the others). I'm missing 2PC
> > badly here, I wrote some stupid python 'thingie' wich should act as 2PC
> > serializer, but that's slow as hell. And triggers slow down postgres
> > quite a bit.
> 
> This is interesting.  Care to package it up for others, or write a
> proof-of-concept outline for the lists or General Bits or something
> like that?  This is a different sort of replication people are asking
> for.  Note that you get 2PC in the next Postgres release.

I'll be glad to, I'm just not that familiar (in fact, i'm not familiar
at all) with the 'proof-of-cocept' or 'General Bits' terms, so if you
could http-redirect me, I'll be  most thankfull.

Mentioning the 2PC, is it available in pg8.1beta4?
Mike
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."

Re: why vacuum

From

Richard Huxton

Date:

27 October 2005, 10:25:25

Mario Splivalo wrote:
> On Wed, 2005-10-26 at 12:09 -0400, Jan Wieck wrote:
>
>>You must have missed the FAQ and other side notes about replication in
>>the MySQL manual. Essentially MySQL replication is nothing but a query
>>duplicating system, with the added sugar of taking care of now() and
>>some other non-deterministic things, but not all of them.
>>
>>Non-deterministic user defined procedures, functions and triggers will
>>simply blow MySQL's sophisticated replication apart.
>
> That is just not true. I haven't tried MySQL 4 and above, but in 3.5.x
> replication wos working as expected, and the load that replication posed
> was insignificant. The only TRUE problem was that replication was
> unidirectional. That SAME problem has Slony, and other 'replication
> systems' available for postgres.

I think you're missing the "non-deterministic" bit. Mysql replication is
based on shipping statements AFAICT

http://dev.mysql.com/doc/refman/5.0/en/binary-log.html
"The binary log contains all statements which updated data or
potentially could have updated it (for example, a DELETE which matched
no rows). Statements are stored in the form of “events” that describe
the modifications."

So - if your statement contains something non-deterministic that isn't
catered for in Mysql's code then it will break.

At it's simplest - if I write a function my_random() and then do:  UPDATE foo SET a=1 WHERE b < my_random();
IF my_random() returns different results on different machines, then the
replication will be broken. See the manual entry below:
http://dev.mysql.com/doc/refman/5.0/en/replication-features.html

That's not to say the system is worthless - it works fine for many
people. But it does have limitations.

--  Richard Huxton  Archonet Ltd

Re: why vacuum

From

Jan Wieck

Date:

27 October 2005, 10:47:40

On 10/27/2005 4:22 AM, Mario Splivalo wrote:

> On Wed, 2005-10-26 at 12:09 -0400, Jan Wieck wrote:
>> > 
>> 
>> You must have missed the FAQ and other side notes about replication in 
>> the MySQL manual. Essentially MySQL replication is nothing but a query 
>> duplicating system, with the added sugar of taking care of now() and 
>> some other non-deterministic things, but not all of them.
>> 
>> Non-deterministic user defined procedures, functions and triggers will 
>> simply blow MySQL's sophisticated replication apart.
>> 
> 
> That is just not true. I haven't tried MySQL 4 and above, but in 3.5.x
> replication wos working as expected, and the load that replication posed
> was insignificant. The only TRUE problem was that replication was
> unidirectional. That SAME problem has Slony, and other 'replication
> systems' available for postgres.

Read again. "Non-deterministic user defined procedures ...". Please give
me *ONE* example of a user defined procedure in 3.5.x at all. You must
have missed the point that stored procedures are a new feature in 5.0.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: why vacuum

From

Jan Wieck

Date:

27 October 2005, 11:13:47

On 10/27/2005 4:22 AM, Mario Splivalo wrote:

> I see no point in blatantly putting 'other' products such shape. Pgsql
> offers no replication at all, you need to use slony (wich is also a poor
> replacement for a wannabe replication), or some other commercial
> products. What about 2PC? What about linking the databases from
> different servers?

I agree that Scott's comment was a bit harsh. But would you please add a 
few details that explain what makes Slony a "poor replacement" in your 
opinion? And please don't repeat that stupid "not builtin". Any add-on 
is as good as its reliability and features. Or would you at the same 
time say that MySQL has only a poor replacement for wannabe transactions 
and foreign keys, because their storage engines are in fact add-ons?

> Btw, I 'ported' the merge replication from MSSQL to postgres. It
> basicaly adds triggers to every table that is 'published' for
> replication. There is a separate table to store and calculate the change
> differences from several servers (so you could do update on any of the
> servers and change will be propagated to the others). I'm missing 2PC
> badly here, I wrote some stupid python 'thingie' wich should act as 2PC
> serializer, but that's slow as hell. And triggers slow down postgres
> quite a bit.

Would you consider publishing that code under the BSD license? It sounds 
very much like one of the "other add-on replication systems" our users 
keep asking for. If you can't publish the code, do you have any design 
papers or a technical concept that could be used as a base for a new 
PostgreSQL community project?

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: why vacuum

From

Andrew Sullivan

Date:

27 October 2005, 12:51:47

On Thu, Oct 27, 2005 at 02:34:13PM +0200, Mario Splivalo wrote:
> 
> Postgres itself offers no replication. You could achive some sort of
> replication by restoring the parts of WAL files, but that's rather
> inconvinient. Then, if you want to replicate your data in any way, you

Well, AFAIK Oracle itself offers no replication, either.  If you want
it, you have to buy a license for it.  Which means it's an add-on. 
Heck, most Linux distributions' kernels don't offer support for
network cards: they're an add-on.  We call them modules.  (In case
it's not clear, I don't buy the "itself/add-on" distinction.  The
point is that the whole system works together.  PostgreSQL most
definitely offers replication.  In fact, you can get warm-standby with
WAL shipping, or read-only capabilities with Slony or some other
tools.)

> like that :) I am sorry. I was just addressing the issue where
> replication to one means 'just move my data here', and at others it
> means 'merge my data'.

Yes: this multiple-meaning "replication" word causes a great deal of
confusion.  But just because one person has need A does not mean that
need B isn't a real one.  My employer, Afilias, sponsors the Slony
work, in the direction of our needs.  We did not initially have a
many-write-nodes scenario in mind, and we had a pressing need for a
"single master" system.  So that's the itch we scratched.

> I'll be glad to, I'm just not that familiar (in fact, i'm not familiar
> at all) with the 'proof-of-cocept' or 'General Bits' terms, so if you
> could http-redirect me, I'll be  most thankfull.

Proof of concept is just a description of what you did, how it
worked, design and limitations, &c.  Post it to the lists (uh,
-general or maybe -hackers, I suppose), or put it on a web page or
whatever.  General Bits is a fairly regular column that Elein Mustain
puts out.  I bet she'd include a submission on this topic, although
you'd have to ask her.  You can find GB at
<http://varlena.com/varlena/GeneralBits/>.

> Mentioning the 2PC, is it available in pg8.1beta4?

Should be.

A
-- 
Andrew Sullivan  | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.    --George Orwell

Re: why vacuum

From

Jan Wieck

Date:

27 October 2005, 14:25:35

On 10/27/2005 8:34 AM, Mario Splivalo wrote:

> Postgres itself offers no replication. 

Oracle itself offers no replication.
IBM DB2 itself offers no replication.

Yet most of the products out there for Oracle, DB2 and PostgreSQL are 
far better than what I read here:
    http://dev.mysql.com/doc/refman/5.0/en/replication-features.html

What a great example of "offering replication itself" ... *cough*

I guess it'd be easier to port a well designed add-on like Slony-I to 
MySQL (now that they have triggers) than fixing their broken design. Not 
that I volunteer to do it, but Slony-I is released under BSD, anyone 
feel free to offer a good solution for MySQL.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: why vacuum

From

"Jim C. Nasby"

Date:

27 October 2005, 16:24:50

On Thu, Oct 27, 2005 at 02:21:15PM +0100, Richard Huxton wrote:
> So - if your statement contains something non-deterministic that isn't 
> catered for in Mysql's code then it will break.
> 
> At it's simplest - if I write a function my_random() and then do:
>   UPDATE foo SET a=1 WHERE b < my_random();
> IF my_random() returns different results on different machines, then the 
> replication will be broken. See the manual entry below:
>   http://dev.mysql.com/doc/refman/5.0/en/replication-features.html
> 
> That's not to say the system is worthless - it works fine for many 
> people. But it does have limitations.

And you can easily have multi-master syncronous replication in
PostgreSQL using the same idea; just see pgCluster.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: why vacuum

From

Mario Splivalo

Date:

28 October 2005, 08:09:37

On Thu, 2005-10-27 at 11:51 -0400, Andrew Sullivan wrote:

> Well, AFAIK Oracle itself offers no replication, either.  If you want
> it, you have to buy a license for it.  Which means it's an add-on. 
> Heck, most Linux distributions' kernels don't offer support for
> network cards: they're an add-on.  We call them modules.  (In case
> it's not clear, I don't buy the "itself/add-on" distinction.  The
> point is that the whole system works together.  PostgreSQL most
> definitely offers replication.  In fact, you can get warm-standby with
> WAL shipping, or read-only capabilities with Slony or some other
> tools.)

I wish I was Dumbo now. I could hide myself with me ears :) 

> > I'll be glad to, I'm just not that familiar (in fact, i'm not familiar
> > at all) with the 'proof-of-cocept' or 'General Bits' terms, so if you
> > could http-redirect me, I'll be  most thankfull.
> 
> Proof of concept is just a description of what you did, how it
> worked, design and limitations, &c.  Post it to the lists (uh,
> -general or maybe -hackers, I suppose), or put it on a web page or
> whatever.  General Bits is a fairly regular column that Elein Mustain
> puts out.  I bet she'd include a submission on this topic, although
> you'd have to ask her.  You can find GB at
> <http://varlena.com/varlena/GeneralBits/>.

I'll go trough my code, it's been a while since I touched it, I'll write
some documentation and I'll inform the comunity. Thnx for the pointouts.
Mike
-- 
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."

Re: why vacuum

From

Andrew Sullivan

Date:

28 October 2005, 10:34:34

On Fri, Oct 28, 2005 at 01:09:43PM +0200, Mario Splivalo wrote:
> I'll go trough my code, it's been a while since I touched it, I'll write
> some documentation and I'll inform the comunity. Thnx for the pointouts.

And thank _you_ for proposing to do this.  If everyone contributes
their discoveries and improvements, we all benefit.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
When my information changes, I alter my conclusions.  What do you do sir?    --attr. John Maynard Keynes