Thread: PostgreSQL on 64 bit Linux

PostgreSQL on 64 bit Linux

From
Naz Gassiep
Date:
I have a PostgreSQL installation on a Debian box that had the 64bit SMP 
kernel installed before PostgreSQL was compiled and installed on it. 
Does PostgreSQL take any advantage of the 64 bit environment or have we 
not done anything to move into the 64 bit world yet?
Regards,
- Naz


Re: PostgreSQL on 64 bit Linux

From
Douglas McNaught
Date:
Naz Gassiep <naz@mira.net> writes:

> I have a PostgreSQL installation on a Debian box that had the 64bit
> SMP kernel installed before PostgreSQL was compiled and installed on
> it. Does PostgreSQL take any advantage of the 64 bit environment or
> have we not done anything to move into the 64 bit world yet?

Depends on whether PG was compiled as 64-bit or 32-bit--is your
toolchain 64-bit all the way, or is it just the kernel?

-Doug


Re: PostgreSQL on 64 bit Linux

From
Naz Gassiep
Date:
Douglas McNaught wrote: <blockquote cite="mid87mz9zkue1.fsf@suzuka.mcnaught.org" type="cite"><pre wrap="">Naz Gassiep
<aclass="moz-txt-link-rfc2396E" href="mailto:naz@mira.net"><naz@mira.net></a> writes:
 
 </pre><blockquote type="cite"><pre wrap="">I have a PostgreSQL installation on a Debian box that had the 64bit
SMP kernel installed before PostgreSQL was compiled and installed on
it. Does PostgreSQL take any advantage of the 64 bit environment or
have we not done anything to move into the 64 bit world yet?   </pre></blockquote><pre wrap="">
Depends on whether PG was compiled as 64-bit or 32-bit--is your
toolchain 64-bit all the way, or is it just the kernel?

-Doug</pre></blockquote> I just compiled as the manual says. I guess I must have compiled it in 32. I'll recompile in
64when I upgrade to 8.2 when it's out.<br /> Thanks,<br /> - Naz.<br /> 

Re: PostgreSQL on 64 bit Linux

From
Douglas McNaught
Date:
Naz Gassiep <naz@mira.net> writes:

> I just compiled as the manual says. I guess I must have compiled it
> in 32. I'll recompile in 64 when I upgrade to 8.2 when it's out.

The 'file' command will tell you whether a binary is 32- or 64-bit.

If you have a full 64-bit install, you'll get a 64-bit compile by
default, but it sounds like you just added a 64-bit kernel to a 32-bit
Debian system?

-Doug


Re: PostgreSQL on 64 bit Linux

From
mark@mark.mielke.cc
Date:
On Sun, Aug 20, 2006 at 04:46:30PM -0400, Douglas McNaught wrote:
> Naz Gassiep <naz@mira.net> writes:
> > I have a PostgreSQL installation on a Debian box that had the 64bit
> > SMP kernel installed before PostgreSQL was compiled and installed on
> > it. Does PostgreSQL take any advantage of the 64 bit environment or
> > have we not done anything to move into the 64 bit world yet?
> Depends on whether PG was compiled as 64-bit or 32-bit--is your
> toolchain 64-bit all the way, or is it just the kernel?

I think he means - have benchmarks, or profiling been done with the
goal of specifically improving performance on 64-bit platforms.

For most applications available today, the answer is no. Compiling an
application designed for 32-bit, on a 64-bit architecture, does not
automatically improve performance. Too frequently, it can actually
reduce performance. Pointers are large, which means that any
application that is heavily pointer based can be forced to deal with
twice as many copies of memory, which reduces the effectiveness of
the various cache levels, and RAM itself.

Hopefully GLIBC counts here, in that it should contain 64-bit specific
code where it might count, so libc calls should be able to take
advantage of the 64-bit machine instructions.

Is there an interest, or any active project to examine PostgreSQL in
the area of 64-bit processors? Has it already been done? I don't recall
seeing a reference to it in my travels. I'm also not sure on what to
expect for results, as the territory is still new. 64-bit processors
have existed for a while, but 32-bit processors have been the popular
choice, making 64-bit support an after thought?

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: PostgreSQL on 64 bit Linux

From
"Andrej Ricnik-Bay"
Date:
On 8/21/06, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:

> Is there an interest, or any active project to examine PostgreSQL in
> the area of 64-bit processors? Has it already been done? I don't recall
> seeing a reference to it in my travels. I'm also not sure on what to
> expect for results, as the territory is still new. 64-bit processors
> have existed for a while, but 32-bit processors have been the popular
> choice, making 64-bit support an after thought?
That's certainly just a reference to the wintel world? AIX, HP-UX
and Solaris-Sparc have been 64-bit for a while now...


> Cheers,
> mark
Cheers,
Andrej


-- 
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm


Re: PostgreSQL on 64 bit Linux

From
Douglas McNaught
Date:
mark@mark.mielke.cc writes:

> Is there an interest, or any active project to examine PostgreSQL in
> the area of 64-bit processors? Has it already been done? I don't recall
> seeing a reference to it in my travels. I'm also not sure on what to
> expect for results, as the territory is still new. 64-bit processors
> have existed for a while, but 32-bit processors have been the popular
> choice, making 64-bit support an after thought?

I find this question a bit amusing, since PG has run on 64-bit
architectures such as MIPS, Sparc, Alpha and PA-RISC for quite a while
now. :)

As I said in a private email to Naz, the main advantage I think you'd
see from 64-bit is the ability to run with more than 2GB or so of
shared buffers on a system with lots of RAM.  Whether you'd want to do
that, or let the OS do most of the buffering, is an open question...

-Doug


Re: PostgreSQL on 64 bit Linux

From
"Luke Lonergan"
Date:
Naz,

On 8/20/06 12:59 PM, "Naz Gassiep" <naz@mira.net> wrote:

> I have a PostgreSQL installation on a Debian box that had the 64bit SMP
> kernel installed before PostgreSQL was compiled and installed on it.
> Does PostgreSQL take any advantage of the 64 bit environment or have we
> not done anything to move into the 64 bit world yet?

Very likely the default gcc compiles for 64-bit, if not you need to specify
"-m64".  As another respondent said - do a "file `which initdb`" to find out
whether you have compiled for 64-bit or not.

WRT 64-bit and Postgres, it depends on the CPU as to whether you see a
simple performance benefit.  On the Opteron you will see a benefit when
doing CPU bound work.  When doing the CPU portion, the additional registers
of the Opteron running in 64-bit mode are used by the compiler to produce a
20-30% boost in performance.  On the Xeon in 64-bit mode, the same regions
of execution will slow down by about 5%.

Postgres benefits automatically from the larger memory addressing of the
64-bit kernel by using the larger I/O cache of Linux.

- Luke 




Re: PostgreSQL on 64 bit Linux

From
"Joshua D. Drake"
Date:
> WRT 64-bit and Postgres, it depends on the CPU as to whether you see a
> simple performance benefit.  On the Opteron you will see a benefit when
> doing CPU bound work.  When doing the CPU portion, the additional registers
> of the Opteron running in 64-bit mode are used by the compiler to produce a
> 20-30% boost in performance.  On the Xeon in 64-bit mode, the same regions
> of execution will slow down by about 5%.

Is that true of even Woodcrest?

Joshua D. Drake

> 
> Postgres benefits automatically from the larger memory addressing of the
> 64-bit kernel by using the larger I/O cache of Linux.
> 
> - Luke 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 


-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Replication

From
mdean
Date:
One person who commented on the The business of Postbrsql made this comment:

Posted Aug 3, 2006 8:45 UTC (Thu) by subscriber *jgarzik* [Link 
<http://lwn.net/Articles/193946/>]Cluster immaturity. MySQL has been 
shipping a workable single-master replication+failover for quite a while 
now in most Linux distros. MySQL's multi-master solution, while 
requiring RAM (not disk) for storage, is also well-integrated and 
deployed in production.

In contrast, the PostgreSQL team has chosen to provide hooks for 
replication and failover. This has led to a situation where there are 
multiple projects supporting replications/failover, none of which are 
production-ready nor shipped in a modern Linux distro.

Modern systems *must* scale beyond a single computer, and the PostgreSQL 
support shipped in modern Linux distros is completely incapable of this.


I really would appreciate a response. Thanks~ Michael




-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.11.3/423 - Release Date: 8/18/2006



Re: Replication

From
"Joshua D. Drake"
Date:
> In contrast, the PostgreSQL team has chosen to provide hooks for 
> replication and failover. This has led to a situation where there are 
> multiple projects supporting replications/failover, none of which are 
> production-ready nor shipped in a modern Linux distro.

And no, we don't really provide hooks :). However there are several 
projects trying to solve different problems with PostgreSQL.

> 
> Modern systems *must* scale beyond a single computer, and the PostgreSQL 
> support shipped in modern Linux distros is completely incapable of this.

Slony-I is quite capable as a production class FOSS replication system 
and is in use widely.

Joshua D. Drake


-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: PostgreSQL on 64 bit Linux

From
"Luke Lonergan"
Date:
Josh,

On 8/20/06 8:52 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:

> Is that true of even Woodcrest?
> 
> Joshua D. Drake

Not sure - haven't read anything about the register set on the Core 2 to
make me think it benefits from 64 bit.

The point may be academic from now on though - the comparisons between
Opteron and Core 2 will all likely be in 64-bit mode from now on.

- Luke




Re: Replication

From
Fujii Masao
Date:
Joshua D. Drake wrote:
>> Modern systems *must* scale beyond a single computer, and the PostgreSQL 
>> support shipped in modern Linux distros is completely incapable of this.
> 
> Slony-I is quite capable as a production class FOSS replication system 
> and is in use widely.

Slony-I is not enough because it can cause the inconsistency of data between servers.
IMO, log-based replication is needed also for PostgreSQL just like MySQL.


Regards;



Re: Replication

From
Stefan Kaltenbrunner
Date:
Fujii Masao wrote:
> Joshua D. Drake wrote:
>>> Modern systems *must* scale beyond a single computer, and the
>>> PostgreSQL support shipped in modern Linux distros is completely
>>> incapable of this.
>>
>> Slony-I is quite capable as a production class FOSS replication system
>> and is in use widely.
> 
> Slony-I is not enough because it can cause the inconsistency of data
> between servers.

hmm what are you refering to here ? slony1 does row-level replication
(something that MySQL cannot do until 5.1 which is still beta) - so it
should not be possible to cause data-inconsistency.
It is however async replication so you can loose data commited on the
master but not yet replicated to the slaves in case you loose the mastercompletely.


Stefan


Re: Replication

From
Fujii Masao
Date:
Stefan Kaltenbrunner wrote:
> It is however async replication so you can loose data commited on the
> master but not yet replicated to the slaves in case you loose the master
>  completely.

Yes, here is an insufficient point of Slony-I, i think.
Most systems will not permit the committed data to be lost, so use is limited.



>> IMO, log-based replication is needed also for PostgreSQL just like MySQL.

Well, I had misunderstood MySQL. Its replication is also asynchronous.

regards;


Re: PostgreSQL on 64 bit Linux

From
mark@mark.mielke.cc
Date:
On Mon, Aug 21, 2006 at 02:56:10PM +1200, Andrej Ricnik-Bay wrote:
> On 8/21/06, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote:
> >Is there an interest, or any active project to examine PostgreSQL in
> >the area of 64-bit processors? Has it already been done? I don't recall
> >seeing a reference to it in my travels. I'm also not sure on what to
> >expect for results, as the territory is still new. 64-bit processors
> >have existed for a while, but 32-bit processors have been the popular
> >choice, making 64-bit support an after thought?
> That's certainly just a reference to the wintel world? AIX, HP-UX
> and Solaris-Sparc have been 64-bit for a while now...

I don't think so. In the Open Source world, most projects are still 32-bit
centric, regardless of how many years the products have been supported on
64-bit platforms.

What application were you thinking of that takes full advantage of 64-bit,
making the 64-bit application much significantly faster than the 32-bit
application? The only area I am aware of, is video processing.

It's often a surprise to people that an upgrade to 64-bit, regardless of
CPU architecture, too often ends up slower, rather than faster.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: PostgreSQL on 64 bit Linux

From
mark@mark.mielke.cc
Date:
On Sun, Aug 20, 2006 at 11:00:26PM -0400, Douglas McNaught wrote:
> mark@mark.mielke.cc writes:
> > Is there an interest, or any active project to examine PostgreSQL in
> > the area of 64-bit processors? Has it already been done? I don't recall
> > seeing a reference to it in my travels. I'm also not sure on what to
> > expect for results, as the territory is still new. 64-bit processors
> > have existed for a while, but 32-bit processors have been the popular
> > choice, making 64-bit support an after thought?
> I find this question a bit amusing, since PG has run on 64-bit
> architectures such as MIPS, Sparc, Alpha and PA-RISC for quite a while
> now. :)

I don't think so. Software can be designed to take best advantage of
hardware. Recompiling it for a different architecture, running test
cases, and declaring support, is not the same as optimizing for.

> As I said in a private email to Naz, the main advantage I think you'd
> see from 64-bit is the ability to run with more than 2GB or so of
> shared buffers on a system with lots of RAM.  Whether you'd want to do
> that, or let the OS do most of the buffering, is an open question...

This is what I mean by after thought. PostgreSQL is designed for
32-bit processors. Which is fine. I'm not complaining. The question
was whether there is an interest in pursuing 64-bit specific
optimizations. In the PostgreSQL code, a quick check points me only to
"has long int 64" as a 64-bit source code #ifdef. Of the six places
that reference this, five of them actually slow down the code, as they
check for overflow of the 'long int' result beyond 4 bytes of
data. The sixth place is used to define the 64-bit type in use by
PostgreSQL, which I suspect is infrequently used.

I believe the answer is no. No or few 64-bit optimization possibilities
have been chased down, probably because some or many of these would:
   1) require significant re-architecture
   2) reduce the performance in a 32-bit world

It's a question that only half interests me. As with most projects, I
don't think the projects are ready to re-architect for this
purpose. Perhaps once 50%+ of people are running PostgreSQL in 64-bit
mode, the question will be more serious to more people.

As a half interesting question, I'm defending it as a valid question.
Please don't write it off, but it is fine to say "not yet, we have more
important things to work on".

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: PostgreSQL on 64 bit Linux

From
"Florian G. Pflug"
Date:
mark@mark.mielke.cc wrote:
> This is what I mean by after thought. PostgreSQL is designed for
> 32-bit processors. Which is fine. I'm not complaining. The question
> was whether there is an interest in pursuing 64-bit specific
> optimizations. In the PostgreSQL code, a quick check points me only to
> "has long int 64" as a 64-bit source code #ifdef. Of the six places
> that reference this, five of them actually slow down the code, as they
> check for overflow of the 'long int' result beyond 4 bytes of
> data. The sixth place is used to define the 64-bit type in use by
> PostgreSQL, which I suspect is infrequently used.
> 
> I believe the answer is no. No or few 64-bit optimization possibilities
> have been chased down, probably because some or many of these would:
> 
>     1) require significant re-architecture
> 
>     2) reduce the performance in a 32-bit world

Just out of intereset - what areas in postgres do you think could be
improved (performance wise) on 64-bit machines? The only area that
I can see is the int64 datatype - it's stored in palloc()'ed memory
on 32-bit machines AFAIK - I'm not sure if it uses the "long long"
datatype on 64-bit archs.. But I can't imagine any other area that
could be tuned by making use of (native) 64-bit ints.

greetings, Florian Pflug


Re: PostgreSQL on 64 bit Linux

From
Martijn van Oosterhout
Date:
On Mon, Aug 21, 2006 at 09:16:46AM -0400, mark@mark.mielke.cc wrote:
> This is what I mean by after thought. PostgreSQL is designed for
> 32-bit processors. Which is fine. I'm not complaining. The question
> was whether there is an interest in pursuing 64-bit specific
> optimizations. In the PostgreSQL code, a quick check points me only to
> "has long int 64" as a 64-bit source code #ifdef. Of the six places
> that reference this, five of them actually slow down the code, as they
> check for overflow of the 'long int' result beyond 4 bytes of
> data. The sixth place is used to define the 64-bit type in use by
> PostgreSQL, which I suspect is infrequently used.

There are two defines, the end result being to declare an int64 type
which is used a fair bit around the place. biginteger and bigserial
being the obvious ones.

The checks I see relate to strtol, where the code only wants an int4.
There's no strtoi so on 32 bit the range check is built-in, but if long
is 64 bit you have to do the check seperatly.

That's just an interface problem, there's not a lot we can do about
that really.

> I believe the answer is no. No or few 64-bit optimization possibilities
> have been chased down, probably because some or many of these would:
>
>     1) require significant re-architecture
>
>     2) reduce the performance in a 32-bit world

Can you think of any places at all where 64-bit would make a difference
to processing? 64-bit gives you more memory, and on some x86 chips, more
registers, but that's it.

Have anice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: PostgreSQL on 64 bit Linux

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Can you think of any places at all where 64-bit would make a difference
> to processing? 64-bit gives you more memory, and on some x86 chips, more
> registers, but that's it.

It would be interesting to look into making int8 and float8 be
pass-by-value datatypes to save palloc overhead on machines where
Datum is going to be 8 bytes wide anyway.  Of course this would
only help queries that use those datatypes extensively, and it
might be an insignificant savings anyhow :-(

For the most part though I think that the main advantage of 64-bit
for a database is the ability to address more memory.  We've been
working slowly towards allowing PG to make better use of large
numbers of shared buffers, for instance.
        regards, tom lane


Re: PostgreSQL on 64 bit Linux

From
Douglas McNaught
Date:
mark@mark.mielke.cc writes:

> I believe the answer is no. No or few 64-bit optimization possibilities
> have been chased down, probably because some or many of these would:
>
>     1) require significant re-architecture
>
>     2) reduce the performance in a 32-bit world

Honestly, I think the main "optimizations" happen automatically: the
compiler uses the extra registers (if present) and pointers in LP64
mode are automatically 64-bit, which allows much larger memory areas.
That's probably 95% of the win right there.  What other
"optimizations" need doing?

People have been running PG with big databases on 64-bit iron for
years; don't you think any glaring infelicities would have been
addressed?

> It's a question that only half interests me. As with most projects, I
> don't think the projects are ready to re-architect for this
> purpose.

What re-architecting would be needed?

-Doug


Re: Replication

From
"Joshua D. Drake"
Date:
Fujii Masao wrote:
> Stefan Kaltenbrunner wrote:
>> It is however async replication so you can loose data commited on the
>> master but not yet replicated to the slaves in case you loose the master
>>  completely.
> 
> Yes, here is an insufficient point of Slony-I, i think.
> Most systems will not permit the committed data to be lost, so use is 
> limited.

Wanna bet?

It is very, very common to have asynchronous replication. I would say 
the need for synchronous is far more limited (although greater desired).

Joshua D. Drake


> 
> 
> 
>>> IMO, log-based replication is needed also for PostgreSQL just like 
>>> MySQL.
> 
> Well, I had misunderstood MySQL. Its replication is also asynchronous.
> 
> regards;
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 


-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: PostgreSQL on 64 bit Linux

From
"Alexander Kirpa"
Date:
> WRT 64-bit and Postgres, it depends on the CPU as to whether you
> see a simple performance benefit.  On the Opteron you will see a
> benefit when doing CPU bound work.  When doing the CPU portion, the
> additional registers of the Opteron running in 64-bit mode are used
> by the compiler to produce a 20-30% boost in performance.  On the
> Xeon in 64-bit mode, the same regions of execution will slow down
> by about 5%.

> Postgres benefits automatically from the larger memory addressing
> of the 64-bit kernel by using the larger I/O cache of Linux.

Main benefit Postgres in 64-bit mode possible only in case dedicated 
DB server on system with RAM > 3GB and use most part of RAM for 
shared buffers and avoid persistent moving buffers between OS cache 
and shared memory. On system with RAM below 2-3GB to difficult found 
serious gain of performance.

Best regards,Alexander Kirpa



Re: PostgreSQL on 64 bit Linux

From
mark@mark.mielke.cc
Date:
On Mon, Aug 21, 2006 at 10:23:16AM -0400, Douglas McNaught wrote:
> mark@mark.mielke.cc writes:
> > I believe the answer is no. No or few 64-bit optimization possibilities
> > have been chased down, probably because some or many of these would:
> >     1) require significant re-architecture
> >     2) reduce the performance in a 32-bit world
> Honestly, I think the main "optimizations" happen automatically: the
> compiler uses the extra registers (if present) and pointers in LP64
> mode are automatically 64-bit, which allows much larger memory areas.
> That's probably 95% of the win right there.  What other
> "optimizations" need doing?

Compilers are getting smarter, but having code profiled, and hand tuned
for 32-bits does not automatically get compiled with a 64-bit compiler,
to code that has been profile, and hand tuned for 64-bit.

An example of this is in how algorithms are implemented. For
Intel-style processors at least, using array offsets is cheaper than
using many pointers. For 64-bit processors, 32-bit array offsets would
use less memory.

For re-architecture, the first thing that came to mind was Datum
increasing in size. Copy by value instead of allocating a pointer,
and passing the pointer. The pointer will be 64-bits, so Datum
could support 64-bits as well. Tom caught this, but he did point
out what I missed. This would only benefit data types that are
64-bits in length, which are still likely uncommon (most people
would use serial, not bigserial).

Has anybody taken any of the 64-bit processor optimization manuals,
and walked through it, cross referencing it against bottleneck
programs in PostgreSQL? To counter this, I suggest to you that most
every attempt to optimize PostgreSQL for performance reasons, likely
considered variations in algorithms measured on a 32-bit processor,
finally selecting the version that was simplest, and best performing,
on a 32-bit processor. This is normal evolution for a piece of
software. It is naturally optimized for the processor that is most
used by the users.

I'm not the expert on the matter. I don't see evidence that anybody
has taken this route though.

> People have been running PG with big databases on 64-bit iron for
> years; don't you think any glaring infelicities would have been
> addressed?

I doubt there are 'glaring infelicities'. I doubt any changes required
to provide a noticeable speed improvement would be one liners. They will
not be obvious. They will require a complete understanding of the many
architectural differences between common 32-bit processors, and common
64-bit processors. Few have this information in their head. I don't.
I suspect that anybody who claims that the only difference is the amount
of addressable memory, and the number of registers doesn't. :-)

> > It's a question that only half interests me. As with most projects, I
> > don't think the projects are ready to re-architect for this
> > purpose.
> What re-architecting would be needed?

I'm asking that it be admitted that it has not been looked at. Not
seriously. Nor is this unique to PostgreSQL. I expect the performance
for Linux applications in general to slowly improve on 64-bit
processors as more and more people begin to use it. I recall a few
threads on other mailing lists where the primary developers admitted
to not having ready access to a 64-bit machine. Until they do, the
chance that these applications are hand-tuned for 64-bit is unlikely.

I'm not claiming that I have the answers. I am claiming that a few of
you share this weakness, and that an assumption that PostgreSQL is
designed for 64-bit would be incorrect. :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: PostgreSQL on 64 bit Linux

From
mark@mark.mielke.cc
Date:
On Mon, Aug 21, 2006 at 12:00:58PM +0300, Alexander Kirpa wrote:
> > WRT 64-bit and Postgres, it depends on the CPU as to whether you
> > see a simple performance benefit.  On the Opteron you will see a
> > benefit when doing CPU bound work.  When doing the CPU portion, the
> > additional registers of the Opteron running in 64-bit mode are used
> > by the compiler to produce a 20-30% boost in performance.  On the
> > Xeon in 64-bit mode, the same regions of execution will slow down
> > by about 5%.
> 
> > Postgres benefits automatically from the larger memory addressing
> > of the 64-bit kernel by using the larger I/O cache of Linux.
> 
> Main benefit Postgres in 64-bit mode possible only in case dedicated 
> DB server on system with RAM > 3GB and use most part of RAM for 
> shared buffers and avoid persistent moving buffers between OS cache 
> and shared memory. On system with RAM below 2-3GB to difficult found 
> serious gain of performance.

This is the main difference between PostgreSQL today - designed for
32-bit - when recompiled with a 64-bit compiler.

The additional registers are barely enough to counter the increased
cost of processing in 64-bits.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: PostgreSQL on 64 bit Linux

From
Martijn van Oosterhout
Date:
On Mon, Aug 21, 2006 at 10:46:56AM -0400, mark@mark.mielke.cc wrote:
> Compilers are getting smarter, but having code profiled, and hand tuned
> for 32-bits does not automatically get compiled with a 64-bit compiler,
> to code that has been profile, and hand tuned for 64-bit.

I don't see any evidence that anyone has any kind of hand-tuning, not
for 32-bit, not for 64-bit. That's compiler's job really.

That's said, if you want to go hand-tuning, go for the low-hanging
fruit first: using likely()/unlikely() in often used code paths
(heap_get_attr for example), find the parts of the code that could
benefit from strict-aliasing, etc.

In general we avoid architechture specific code because the cost of
maintainence outweighs the benefits.

> For re-architecture, the first thing that came to mind was Datum
> increasing in size. Copy by value instead of allocating a pointer,
> and passing the pointer. The pointer will be 64-bits, so Datum
> could support 64-bits as well. Tom caught this, but he did point
> out what I missed. This would only benefit data types that are
> 64-bits in length, which are still likely uncommon (most people
> would use serial, not bigserial).

A Datum must be 64-bit on a 64-bit arch, how else is it going to hold
the pointer for a pass-by-ref value?

On the whole, PostgreSQL is still in the phase where we're trying to
find the best algorithms. When that's done (if ever), we can start
worrying about processor optimisations...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Replication

From
AgentM
Date:
On Aug 21, 2006, at 10:30 , Joshua D. Drake wrote:

> Fujii Masao wrote:
>> Stefan Kaltenbrunner wrote:
>>> It is however async replication so you can loose data commited on  
>>> the
>>> master but not yet replicated to the slaves in case you loose the  
>>> master
>>>  completely.
>> Yes, here is an insufficient point of Slony-I, i think.
>> Most systems will not permit the committed data to be lost, so use  
>> is limited.
>
> Wanna bet?
>
> It is very, very common to have asynchronous replication. I would  
> say the need for synchronous is far more limited (although greater  
> desired).

I would imagine that multi-master synchronous replication would be  
fairly trivial to implement with 2PC and wal-shipping available, no?


Re: Replication

From
"Joshua D. Drake"
Date:
>>
>> It is very, very common to have asynchronous replication. I would say 
>> the need for synchronous is far more limited (although greater desired).
> 
> I would imagine that multi-master synchronous replication would be 
> fairly trivial to implement with 2PC and wal-shipping available, no?

Trivial? I would say... no. There is a reason it hasn't been done yet, 
and a reason why CMD (for example) has zero plans to even try.

Sincerely,

Joshua D. Drake



-- 
   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240   Providing the most comprehensive  PostgreSQL
solutionssince 1997             http://www.commandprompt.com/
 




Re: Replication

From
Markus Schiltknecht
Date:
Hi,

AgentM wrote:
> I would imagine that multi-master synchronous replication would be 
> fairly trivial to implement with 2PC and wal-shipping available, no?

Yes, that could be done. And AFAIK eigter pgpool or PgCluster (1) try to 
do sync, multi-master replication that way.

The problem is that such an implementation is very network intensive and 
scales very badly for writing transactions. If you're interested in a 
good sync, multi-master replication algorithm, you might want to have a 
look at what Slony-II [1] or Postgres-R [2] try to do. I also recommend 
reading "Don't be lazy, be consistent" of Bettina Kemme [3].

Regards

Markus

[1]: http://www.slony2.org/wiki/index.php?title=Documentation
[2]: http://www.postgres-r.org/
[3]: http://www.cs.mcgill.ca/~kemme/papers/vldb00.html


Re: PostgreSQL on 64 bit Linux

From
mark@mark.mielke.cc
Date:
On Mon, Aug 21, 2006 at 05:13:58PM +0200, Martijn van Oosterhout wrote:
> On the whole, PostgreSQL is still in the phase where we're trying to
> find the best algorithms. When that's done (if ever), we can start
> worrying about processor optimisations...

I don't disagree. :-)

You claimed that PostgreSQL is not tuned to a specific processor, and
relies on the kernel, the compiler, and libc to perform some tuning. I
agree with this as well.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Replication

From
Stefan Kaltenbrunner
Date:
Fujii Masao wrote:
> Stefan Kaltenbrunner wrote:
>> It is however async replication so you can loose data commited on the
>> master but not yet replicated to the slaves in case you loose the master
>>  completely.
> 
> Yes, here is an insufficient point of Slony-I, i think.
> Most systems will not permit the committed data to be lost, so use is
> limited.

not sure i agree with "most systems" here - a _LOT_ of use cases
actually want async (and note that slony1 can do a controlled failover
without any transactions lost).

Nevertheless there are also points for having sync-replication but
calling slony1 "insufficient" in that regard is a bit much since it is
actually designed to be async and does quite a good job with that.


Stefan


Re: Replication

From
Jeff Davis
Date:
On Mon, 2006-08-21 at 11:33 -0400, AgentM wrote:
> I would imagine that multi-master synchronous replication would be  
> fairly trivial to implement with 2PC and wal-shipping available, no?
> 

How does WAL shipping help synchronous replication?

Regards,Jeff Davis



Re: PostgreSQL on 64 bit Linux

From
Tom Lane
Date:
mark@mark.mielke.cc writes:
>> What re-architecting would be needed?

> I'm asking that it be admitted that it has not been looked at. Not
> seriously. Nor is this unique to PostgreSQL. I expect the performance
> for Linux applications in general to slowly improve on 64-bit
> processors as more and more people begin to use it. I recall a few
> threads on other mailing lists where the primary developers admitted
> to not having ready access to a 64-bit machine. Until they do, the
> chance that these applications are hand-tuned for 64-bit is unlikely.

What you seem to be asking for is that we engage in a lot of
machine-specific tuning for some particular processor or other.
That has not happened and will not happen.  This project believes
in portable code, not "optimized for Xeon-of-the-month" code.
        regards, tom lane


Re: Replication

From
Markus Schiltknecht
Date:
Jeff Davis wrote: > How does WAL shipping help synchronous replication?

The WAL is written _before_ commit, logging all the changes the 
transaction wants to write to the disk. This makes it look very similar 
to what is needed for synchronous replication.

Instead of waiting for confirmation from the disk, that your WAL has 
been written, you would have to wait from the network to confirm that 
the changes have been delivered via the network.

Regards

Markus



Re: Replication

From
Alvaro Herrera
Date:
Markus Schiltknecht wrote:
> Jeff Davis wrote:
>  > How does WAL shipping help synchronous replication?
> 
> The WAL is written _before_ commit, logging all the changes the 
> transaction wants to write to the disk. This makes it look very similar 
> to what is needed for synchronous replication.
> 
> Instead of waiting for confirmation from the disk, that your WAL has 
> been written, you would have to wait from the network to confirm that 
> the changes have been delivered via the network.

But the confirmation that needs to come is that the WAL changes have
been applied (fsync'ed), so the performance will be terrible.  So bad,
that I don't think anyone will want to use such a replication system ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PostgreSQL on 64 bit Linux

From
mark@mark.mielke.cc
Date:
On Mon, Aug 21, 2006 at 01:05:13PM -0400, Tom Lane wrote:
> mark@mark.mielke.cc writes:
> >> What re-architecting would be needed?
> 
> > I'm asking that it be admitted that it has not been looked at. Not
> > seriously. Nor is this unique to PostgreSQL. I expect the performance
> > for Linux applications in general to slowly improve on 64-bit
> > processors as more and more people begin to use it. I recall a few
> > threads on other mailing lists where the primary developers admitted
> > to not having ready access to a 64-bit machine. Until they do, the
> > chance that these applications are hand-tuned for 64-bit is unlikely.
> 
> What you seem to be asking for is that we engage in a lot of
> machine-specific tuning for some particular processor or other.
> That has not happened and will not happen.  This project believes
> in portable code, not "optimized for Xeon-of-the-month" code.

Hi Tom.

I'm asking for a statement such as the above, that you provide. :-)

The question was whether PostgreSQL was optimized for 64-bit Linux.
The answer is "not specifically".

The answer is quite acceptable to me. It's not the original answer that
was given to the original poster though... :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: PostgreSQL on 64 bit Linux

From
Tom Lane
Date:
mark@mark.mielke.cc writes:
> On Mon, Aug 21, 2006 at 01:05:13PM -0400, Tom Lane wrote:
>> What you seem to be asking for is that we engage in a lot of
>> machine-specific tuning for some particular processor or other.

> The question was whether PostgreSQL was optimized for 64-bit Linux.
> The answer is "not specifically".

Well, my notion of "optimizing for 64 bit" seems to have little to do
with yours.  To me, stuff like making sure we can use lots of buffers
effectively is a 64-bit optimization.  What you seem to have in mind is
assembly-code-level optimization for specific processors.
        regards, tom lane


Re: Replication

From
"Gregory Maxwell"
Date:
On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> But the confirmation that needs to come is that the WAL changes have
> been applied (fsync'ed), so the performance will be terrible.  So bad,
> that I don't think anyone will want to use such a replication system ...

Okay. I give up... Why is waiting for fsync on a fast local network
which takes 15us to send a message (infiniband is cheap..) an
unimaginable delay when we tolerate a local 8ms fsync delay on systems
without writeback cache?


Re: Replication

From
Jeff Davis
Date:
On Mon, 2006-08-21 at 19:42 +0200, Markus Schiltknecht wrote:
> Jeff Davis wrote:
>   > How does WAL shipping help synchronous replication?
> 
> The WAL is written _before_ commit, logging all the changes the 
> transaction wants to write to the disk. This makes it look very similar 
> to what is needed for synchronous replication.
> 
> Instead of waiting for confirmation from the disk, that your WAL has 
> been written, you would have to wait from the network to confirm that 
> the changes have been delivered via the network.

Synchronous replication (to me) means that the data has been written to
permanent storage on all masters and all slaves before any master or
slave reports a successful COMMIT. Are you suggesting that you ship the
WAL over the network, wait for it to be written to the slave, and then
report a successful commit?

Regards,Jeff Davis





Re: Replication

From
"D'Arcy J.M. Cain"
Date:
On Mon, 21 Aug 2006 14:46:05 -0400
"Gregory Maxwell" <gmaxwell@gmail.com> wrote:
> On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > But the confirmation that needs to come is that the WAL changes have
> > been applied (fsync'ed), so the performance will be terrible.  So bad,
> > that I don't think anyone will want to use such a replication system ...
> 
> Okay. I give up... Why is waiting for fsync on a fast local network
> which takes 15us to send a message (infiniband is cheap..) an
> unimaginable delay when we tolerate a local 8ms fsync delay on systems
> without writeback cache?

OK, that solves your problem.  How about my problem where replication
has to happen on servers in three countries on two continents and
thousands of updates a second have to happen in less that 10ms? This is
the critical issue with replication - one size does not fit all.
Syncronous replication, in particular, fits almost no one.

My experience is that any replication needs to be based on your business
rules which will vary widely.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: Replication

From
Markus Schiltknecht
Date:
Alvaro Herrera wrote:
> But the confirmation that needs to come is that the WAL changes have
> been applied (fsync'ed), so the performance will be terrible.  So bad,
> that I don't think anyone will want to use such a replication system ...

Yeah, that's the big problem of sync, multi-master replication. IMHO the 
key to high performance sync, multi-master replication is to minimize 
the efforts needed to get that confirmation.

A hard drive has the built-in functionality to confirm what has been 
written (and hopefully it does not lie about that). A cluster does not 
have this ability. Now, what does it mean to get a confirmation from the 
cluster at all? First of all, you have to be sure the cluster has 
received your message, then you also need to be sure it can apply your 
changes (the so called WriteSet).

Unfortunately 'the cluster' consists of many independent machines. If 
you wait for every single machine in the cluster to confirm being able 
to apply your WriteSet, you won't get beyond single-machine performance 
for writing transactions.

Postgres-R uses a Group Communication System to get very fast 
'confirmation' by optimizing the logic a little: like with patches, you 
can be sure a WriteSet can be applied if you try to apply it to the same 
base. So Postgres-R uses totally ordered communication to ensure all 
transactions are processed in the very same order on all nodes. Another 
optimization is buried in the depth of the GCS: it only guarantees that 
the message you sent _will_be_delivered_. So thanks to the GCS you don't 
have to wait for all other nodes, but only for the GCS to confirm that 
your message will be delivered on the other nodes. Of course the GCS 
also needs to send messages around to be able to confirm that, but GCSes 
are designed to do exactly that very efficiently.

I hope to have brought some light to the sync, multi-master replication 
problem. Please ask if you have more questions. I propose to continue 
that discussion on the Postgres-R mailing list I have set up.

Regards

Markus


Re: Replication

From
Markus Schiltknecht
Date:
Gregory Maxwell wrote:
> infiniband is cheap..

Can I get one? I'd love to run some tests with Postgres-R ;-)


Re: Replication

From
AgentM
Date:
On Aug 21, 2006, at 15:00 , D'Arcy J.M. Cain wrote:

> On Mon, 21 Aug 2006 14:46:05 -0400
> "Gregory Maxwell" <gmaxwell@gmail.com> wrote:
>> On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>>> But the confirmation that needs to come is that the WAL changes have
>>> been applied (fsync'ed), so the performance will be terrible.  So  
>>> bad,
>>> that I don't think anyone will want to use such a replication  
>>> system ...
>>
>> Okay. I give up... Why is waiting for fsync on a fast local network
>> which takes 15us to send a message (infiniband is cheap..) an
>> unimaginable delay when we tolerate a local 8ms fsync delay on  
>> systems
>> without writeback cache?
>
> OK, that solves your problem.  How about my problem where replication
> has to happen on servers in three countries on two continents and
> thousands of updates a second have to happen in less that 10ms?  
> This is
> the critical issue with replication - one size does not fit all.
> Syncronous replication, in particular, fits almost no one.
>
> My experience is that any replication needs to be based on your  
> business
> rules which will vary widely.

Sure- and more specifically, replication rules may differ on every  
table according to those rules. The current solutions are on/off for  
a list of tables. I wonder if the various pgsql replication engines  
have any problems co-existing...

-M



Re: Replication

From
Markus Schiltknecht
Date:
Jeff Davis wrote:
> Synchronous replication (to me) means that the data has been written to
> permanent storage on all masters and all slaves before any master or
> slave reports a successful COMMIT. Are you suggesting that you ship the
> WAL over the network, wait for it to be written to the slave, and then
> report a successful commit?

I'm not suggesting doing replication like that. But I think that's what 
people mean when they propose that "multi-master synchronous replication 
would be fairly trivial to implement with 2PC and wal-shipping".

This is more or less trivial to implement, yes. But it won't scale.

Regards

Markus


Re: Replication

From
Alvaro Herrera
Date:
AgentM wrote:
> 
> On Aug 21, 2006, at 15:00 , D'Arcy J.M. Cain wrote:
> 
> >On Mon, 21 Aug 2006 14:46:05 -0400
> >"Gregory Maxwell" <gmaxwell@gmail.com> wrote:
> >>On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> >>>But the confirmation that needs to come is that the WAL changes have
> >>>been applied (fsync'ed), so the performance will be terrible.  So  
> >>>bad,
> >>>that I don't think anyone will want to use such a replication  
> >>>system ...
> >>
> >>Okay. I give up... Why is waiting for fsync on a fast local network
> >>which takes 15us to send a message (infiniband is cheap..) an
> >>unimaginable delay when we tolerate a local 8ms fsync delay on  
> >>systems
> >>without writeback cache?
> >
> >OK, that solves your problem.  How about my problem where replication
> >has to happen on servers in three countries on two continents and
> >thousands of updates a second have to happen in less that 10ms?  
> >This is
> >the critical issue with replication - one size does not fit all.
> >Syncronous replication, in particular, fits almost no one.
> >
> >My experience is that any replication needs to be based on your  
> >business
> >rules which will vary widely.
> 
> Sure- and more specifically, replication rules may differ on every  
> table according to those rules. The current solutions are on/off for  
> a list of tables. I wonder if the various pgsql replication engines  
> have any problems co-existing...

Althought I have never tried, I am sure Mammoth Replicator could
coexist relatively sanely with Slony-I.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Replication

From
"D'Arcy J.M. Cain"
Date:
On Mon, 21 Aug 2006 15:14:10 -0400
AgentM <agentm@themactionfaction.com> wrote:
> > My experience is that any replication needs to be based on your  
> > business
> > rules which will vary widely.
> 
> Sure- and more specifically, replication rules may differ on every  
> table according to those rules. The current solutions are on/off for  

In fact this is exactly what I did for my app.  Some tables were
multi-master and some were required to be modified from a single master.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL on 64 bit Linux

From
"Alexander Kirpa"
Date:
On 21 Aug 2006, at 10:48, mark@mark.mielke.cc wrote:

> On Mon, Aug 21, 2006 at 12:00:58PM +0300, Alexander Kirpa wrote:
> > > WRT 64-bit and Postgres, it depends on the CPU as to whether you
> > > see a simple performance benefit.  On the Opteron you will see a
> > > benefit when doing CPU bound work.  When doing the CPU portion,
> > > the additional registers of the Opteron running in 64-bit mode are
> > > used by the compiler to produce a 20-30% boost in performance.  On
> > > the Xeon in 64-bit mode, the same regions of execution will slow
> > > down by about 5%.
> > 
> > > Postgres benefits automatically from the larger memory addressing
> > > of the 64-bit kernel by using the larger I/O cache of Linux.
> > 
> > Main benefit Postgres in 64-bit mode possible only in case dedicated
> > DB server on system with RAM > 3GB and use most part of RAM for
> > shared buffers and avoid persistent moving buffers between OS cache
> > and shared memory. On system with RAM below 2-3GB to difficult found
> > serious gain of performance.
> 
> This is the main difference between PostgreSQL today - designed for
> 32-bit - when recompiled with a 64-bit compiler.
> 
> The additional registers are barely enough to counter the increased
> cost of processing in 64-bits.
> 
> Cheers,
> mark
Current 32-bit Postgres architecture allow use main benefit
of 64-bit OS - huge memory size for shared buffers.
At current time possible use 2G x 8KB = 16TB as shared memory
and regarding this issue need use (O_DIRECT) to avoid OS cache 
especially in case databases fit to shared memory.

Best regards,Alexander Kirpa




Re: Replication

From
Hannu Krosing
Date:
Ühel kenal päeval, E, 2006-08-21 kell 21:46, kirjutas Fujii Masao:
> Stefan Kaltenbrunner wrote:
> > It is however async replication so you can loose data commited on the
> > master but not yet replicated to the slaves in case you loose the master
> >  completely.
> 
> Yes, here is an insufficient point of Slony-I, i think.
> Most systems will not permit the committed data to be lost, so use is limited.

But any sync _replication_ system will have severe impact on
performance. My guess is that for a full sync replication, going from 1
server to 2 will actually lower performance andsome small gains would be
possible only starting from 3rd server. 

This has nothing to do with postgreSQL, but is just due to latencies of
memory, disk and network and the need to do remote locking.

My quess is based on using standard 100Gb ethernet. It may be possible
to do better on some more advanced interconnects like myrinet.


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: Replication

From
Hannu Krosing
Date:
Ühel kenal päeval, E, 2006-08-21 kell 15:00, kirjutas D'Arcy J.M. Cain:
> On Mon, 21 Aug 2006 14:46:05 -0400
> "Gregory Maxwell" <gmaxwell@gmail.com> wrote:
> > On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > > But the confirmation that needs to come is that the WAL changes have
> > > been applied (fsync'ed), so the performance will be terrible.  So bad,
> > > that I don't think anyone will want to use such a replication system ...
> > 
> > Okay. I give up... Why is waiting for fsync on a fast local network
> > which takes 15us to send a message (infiniband is cheap..) an
> > unimaginable delay when we tolerate a local 8ms fsync delay on systems
> > without writeback cache?
> 
> OK, that solves your problem.  How about my problem where replication
> has to happen on servers in three countries on two continents and
> thousands of updates a second have to happen in less that 10ms? 

For this scenario you are far better off with partitioning than
replication. 

That is if your data is partitionable. But geographically distributed
data often is.

> This is
> the critical issue with replication - one size does not fit all.
> Syncronous replication, in particular, fits almost no one.
> 
> My experience is that any replication needs to be based on your business
> rules which will vary widely.
> 
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: Replication

From
Markus Schiltknecht
Date:
Hannu Krosing wrote:
> But any sync _replication_ system will have severe impact on
> performance. My guess is that for a full sync replication, going from 1
> server to 2 will actually lower performance andsome small gains would be
> possible only starting from 3rd server.

Only testing will show concrete results, but for sure such a general 
answer does not make much sense. It all depends very much on the type of 
your load. Especially the read/write ratio is very important. Another 
important factor is the amount of conflicting transactions.

> My quess is based on using standard 100Gb ethernet. It may be possible
> to do better on some more advanced interconnects like myrinet.

100Gb ethernet already sounds very advanced... ;-)

Regards

Markus



Re: Replication

From
Hannu Krosing
Date:
Ühel kenal päeval, K, 2006-08-23 kell 13:09, kirjutas Markus
Schiltknecht:
> Hannu Krosing wrote:
> > But any sync _replication_ system will have severe impact on
> > performance. My guess is that for a full sync replication, going from 1
> > server to 2 will actually lower performance andsome small gains would be
> > possible only starting from 3rd server.
> 
> Only testing will show concrete results, but for sure such a general 
> answer does not make much sense. It all depends very much on the type of 
> your load. Especially the read/write ratio is very important. 

True.

But if you have very few writes, then there seems no reason to do sync
anyway.

> Another important factor is the amount of conflicting transactions.

That too, but just the need to do *any* locking on all nodes will
significantly slow down sync replication

> > My quess is based on using standard 100Gb ethernet. It may be possible
> > to do better on some more advanced interconnects like myrinet.
> 
> 100Gb ethernet already sounds very advanced... ;-)

Yeah, I meant some mix of 100Mb and 1Gb ;)

> Regards
> 
> Markus
> 
-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: Replication

From
Markus Schiltknecht
Date:
Hannu Krosing wrote:
> But if you have very few writes, then there seems no reason to do sync
> anyway.

I think there is one: high-availability. A standby-server which can 
continue if your primary fails. Of course sync is only needed if you 
absolutely cannot effort loosing any committed transaction.

>> Another important factor is the amount of conflicting transactions.
> 
> That too, but just the need to do *any* locking on all nodes will
> significantly slow down sync replication

If you implement sync replication with locking, yes. But there are 
better ways: the Postgres-R approach does not do network locking, but 
aborts conflicting transactions just before committing. That results in 
much less network traffic (one GCS-message per writing-transaction).

Regards

Markus


Re: Replication

From
"D'Arcy J.M. Cain"
Date:
On Wed, 23 Aug 2006 12:42:11 +0300
Hannu Krosing <hannu@skype.net> wrote:
> > OK, that solves your problem.  How about my problem where replication
> > has to happen on servers in three countries on two continents and
> > thousands of updates a second have to happen in less that 10ms? 
> 
> For this scenario you are far better off with partitioning than
> replication. 
> 
> That is if your data is partitionable. But geographically distributed
> data often is.

I agree but in this case it wasn't.

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: Replication

From
Hannu Krosing
Date:
Ühel kenal päeval, K, 2006-08-23 kell 13:36, kirjutas Markus
Schiltknecht:
> Hannu Krosing wrote:
> > But if you have very few writes, then there seems no reason to do sync
> > anyway.
> 
> I think there is one: high-availability. A standby-server which can 
> continue if your primary fails. Of course sync is only needed if you 
> absolutely cannot effort loosing any committed transaction.

Yes. I guess that for that an enchanced WAL-shipping would be ok.
Probably one where you require both sync to local disk and copy to
remote (without waiting for sync) to have happened in order to
COMMIT/ROLLBACK successfully.

> >> Another important factor is the amount of conflicting transactions.
> > 
> > That too, but just the need to do *any* locking on all nodes will
> > significantly slow down sync replication
> 
> If you implement sync replication with locking, yes. But there are 
> better ways: the Postgres-R approach does not do network locking, but 
> aborts conflicting transactions just before committing. That results in 
> much less network traffic (one GCS-message per writing-transaction).

but it still needs to do at least one network roundtrip + any needed
testing on all nodes + WAL sync on all nodes before it can COMMIT, no?

And I'm afraid that GCS serialisation will need more than one roundtrip
or risk being out-of-date.

I'm not saying that Postgres-R (or any other sync replication) is not
doable or even useful. I just can't see right away, how it can scale
very well for any significant write load.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



Re: Replication

From
Markus Schiltknecht
Date:
Hi,

Hannu Krosing wrote:
> but it still needs to do at least one network roundtrip + any needed
> testing on all nodes + WAL sync on all nodes before it can COMMIT, no?

No. It only needs the 'roundtrip' in the sense that a transaction sends 
out its writeset and has to wait for the GCS to have it serialized (i.e. 
the GCS sends the message back to the sender node).

Then all nodes do the testing and WAL sync independently. (As Neil 
recently pointed out in [1] this opens a small risk for data loss in the 
case all nodes crash.)

> And I'm afraid that GCS serialisation will need more than one roundtrip
> or risk being out-of-date.

The spread people did some tests on 20 pentium machines connected via 
100mbit ethernet. In [2] they state that it takes between 1.7 to 2.8 ms 
(depending on the message size) to 'serialize' a message within this 
group of 20 nodes.

> I'm not saying that Postgres-R (or any other sync replication) is not
> doable or even useful. I just can't see right away, how it can scale
> very well for any significant write load.

Sure, sync replication won't solve everybody's problems. But out of all 
the sync replication algorithms, Postgres-R is my clear favorite. ;-)

Regards

Markus

[1]: 
http://pgfoundry.org/pipermail/postgres-r-general/2006-August/000001.html

[2]: The Spread Toolkit: Architecture and Performance by Yair Amir, 
Claudiu Danilov, Michal Miskin-Amir, John Schultz, Jonathan Stanton
http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf



Re: Replication

From
Jeff Davis
Date:
On Wed, 2006-08-23 at 13:36 +0200, Markus Schiltknecht wrote:
> Hannu Krosing wrote:
> > But if you have very few writes, then there seems no reason to do sync
> > anyway.
> 
> I think there is one: high-availability. A standby-server which can 
> continue if your primary fails. Of course sync is only needed if you 
> absolutely cannot effort loosing any committed transaction.
> 

I disagree about high-availability. In fact, I would say that sync
replication is trading availability and performance for synchronization
(which is a valid tradeoff, but costly). 

If you have an async system, all nodes must go down for the system to go
down.

If you have a sync system, if any node goes down the system goes down.
If you plan on doing failover, consider this: what if it's not obvious
which system is still up? What if the network route between the two
systems goes down (or just becomes too slow to replicate over), but
clients can still connect to both servers? Then you have two systems
that both think that the other system went down, and both start
accepting transactions. Now you no longer have replication at all.

Regards,Jeff Davis



Re: Replication

From
Markus Schiltknecht
Date:
Hi,

Jeff Davis wrote:
> I disagree about high-availability. In fact, I would say that sync
> replication is trading availability and performance for synchronization
> (which is a valid tradeoff, but costly).

In a way, replication is for databases systems what RAID1 is for hard 
drives. Having multiple cluster nodes in sync minimizes the risk of a 
complete outage due to hardware failure. Thus maximizing availability. 
Of course, as you say, traded for performance.

> If you have an async system, all nodes must go down for the system to go
> down.

Yes. But it takes only one node to go down to potentially lose committed 
transactions. In contrast to synchronous replication systems, where a 
committed transaction is guaranteed to be 'committed on the cluster'. So 
if at least one node of the cluster is up and running, you can be 
assured to have consistent data.

Please note that the Postgres-R approach does relax some of these 
constraints a little to gain performance. The most obvious result of 
these relaxations is that the nodes may 'behind' with replaying 
transactions and show a past view of the data.

> If you have a sync system, if any node goes down the system goes down.

That's plain wrong.

> If you plan on doing failover, consider this: what if it's not obvious
> which system is still up? What if the network route between the two
> systems goes down (or just becomes too slow to replicate over), but
> clients can still connect to both servers? Then you have two systems
> that both think that the other system went down, and both start
> accepting transactions. Now you no longer have replication at all.

This problem is often called 'network partitioning', which also refers 
to a more general case: a group of M nodes being split into two groups 
of N and (M-N) nodes (due to network failure or whatever).

In Postgres-R a Group Communication System is used to cover all these 
aspects (error detection, congruent agreement on a major group, etc..).

Regards

Markus



Re: Replication

From
Chris Browne
Date:
pgsql@j-davis.com (Jeff Davis) writes:
> On Wed, 2006-08-23 at 13:36 +0200, Markus Schiltknecht wrote:
>> Hannu Krosing wrote:
>> > But if you have very few writes, then there seems no reason to do sync
>> > anyway.
>> 
>> I think there is one: high-availability. A standby-server which can 
>> continue if your primary fails. Of course sync is only needed if you 
>> absolutely cannot effort loosing any committed transaction.
>> 
>
> I disagree about high-availability. In fact, I would say that sync
> replication is trading availability and performance for synchronization
> (which is a valid tradeoff, but costly). 
>
> If you have an async system, all nodes must go down for the system to go
> down.
>
> If you have a sync system, if any node goes down the system goes down.
> If you plan on doing failover, consider this: what if it's not obvious
> which system is still up? What if the network route between the two
> systems goes down (or just becomes too slow to replicate over), but
> clients can still connect to both servers? Then you have two systems
> that both think that the other system went down, and both start
> accepting transactions. Now you no longer have replication at all.

That is why for multimaster, there's a need for both automatic policy
as well as some human intervention.

- You need an automatic determination of "quorum", where, to be safe, it is only permissible for a set of $m$ servers
tobelieve themselves to be active if they number more than 1/2 of the total of expected servers.
 
 Thus, if there are 13 servers in the cluster, then "quorum" is 7 servers.
 If a set of 6 servers get cut off from the rest of the network, they don't number at least 7, and thus know that they
can'trepresent a quorum.
 

- And if conditions change, a human may need to change the quorum number.
 If 4 new nodes get added, quorum moves up to 9.
 If 5 nodes get dropped, quorum moves down to 5.

Deciding when to throw a node out of the quorum because it is
responding too slowly is still not completely trivial, but having a
quorum policy does address your issue.
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/linux.html
"Be humble.   A lot happened before  you were born."   - Life's Little
Instruction Book


Re: Replication

From
Jeff Davis
Date:
On Thu, 2006-08-24 at 11:18 +0200, Markus Schiltknecht wrote:
> Hi,
> 
> Jeff Davis wrote:
> > I disagree about high-availability. In fact, I would say that sync
> > replication is trading availability and performance for synchronization
> > (which is a valid tradeoff, but costly).
> 
> In a way, replication is for databases systems what RAID1 is for hard 
> drives. Having multiple cluster nodes in sync minimizes the risk of a 
> complete outage due to hardware failure. Thus maximizing availability. 
> Of course, as you say, traded for performance.
> 
> > If you have an async system, all nodes must go down for the system to go
> > down.
> 
> Yes. But it takes only one node to go down to potentially lose committed 
> transactions. In contrast to synchronous replication systems, where a 
> committed transaction is guaranteed to be 'committed on the cluster'. So 
> if at least one node of the cluster is up and running, you can be 
> assured to have consistent data.

Right, that's the cost of asynchronous replication.

> Please note that the Postgres-R approach does relax some of these 
> constraints a little to gain performance. The most obvious result of 
> these relaxations is that the nodes may 'behind' with replaying 
> transactions and show a past view of the data.
> 
> > If you have a sync system, if any node goes down the system goes down.
> 
> That's plain wrong.

Ok, maybe not one node, but I don't think I'm totally off base. See my
explanation below.

> > If you plan on doing failover, consider this: what if it's not obvious
> > which system is still up? What if the network route between the two
> > systems goes down (or just becomes too slow to replicate over), but
> > clients can still connect to both servers? Then you have two systems
> > that both think that the other system went down, and both start
> > accepting transactions. Now you no longer have replication at all.
> 
> This problem is often called 'network partitioning', which also refers 
> to a more general case: a group of M nodes being split into two groups 
> of N and (M-N) nodes (due to network failure or whatever).
> 
> In Postgres-R a Group Communication System is used to cover all these 
> aspects (error detection, congruent agreement on a major group, etc..).
> 

Which doesn't work very well in the case of two groups of servers set up
in two physical locations. I can see two possibilities:
(1) You require a quorum to be effective, in which case your cluster of
databases is only as reliable as the location which holds more servers.
(2) You have another central authority that determines which databases
are up, and which are down. Then your cluster is only as reliable as
that central authority.

Sure, if you have a million groups of servers spread all over the
internet, it works with a very high degree of reliability because you
can likely always form a quorum. However, you then have horrible
performance because the updates need to be spread to so many locations.
And for truly synchronous replication you probably have to serialize the
updates, which is very costly over that many nodes all over a network.

Even if you have a large number of nodes at different locations, then
you end up with strange decisions to make if the network connections are
intermittent or very slow. A temporary slowdown of many nodes could
cause them to be degraded until some kind of human intervention brought
them back. Until that time you might not be able to determine which
nodes make up an authoritative group. This kind of degradation could
happen in the case of a DDoS attack, or perhaps a worm moving around the
internet.

In practice everyone can find a solution that works for them. However,
synchronous replication is not perfect, and there are many failure
scenarios which need to be resolved in a way that fits your business. I
think synchronous replication is inherently less available than
asynchronous.

Regards,Jeff Davis



Re: Replication

From
Markus Schiltknecht
Date:
Jeff Davis wrote:
> Which doesn't work very well in the case of two groups of servers set up
> in two physical locations. I can see two possibilities:
> (1) You require a quorum to be effective, in which case your cluster of
> databases is only as reliable as the location which holds more servers.
> (2) You have another central authority that determines which databases
> are up, and which are down. Then your cluster is only as reliable as
> that central authority.

Right, the ideal here would be two sync clusters a both locations, 
connected via async replication :-)

> Even if you have a large number of nodes at different locations, then
> you end up with strange decisions to make if the network connections are
> intermittent or very slow. A temporary slowdown of many nodes could
> cause them to be degraded until some kind of human intervention brought
> them back. Until that time you might not be able to determine which
> nodes make up an authoritative group. 

Side note: in such a case, I think a GCS will just choose only one node 
to be the 'authoritative group'. Because most systems cannot effort to 
have long waits for such decisions. For database replication I also 
think its better to have at least one node running than none.
> This kind of degradation could
> happen in the case of a DDoS attack, or perhaps a worm moving around the
> internet.

Well, sync replication in general needs a good, low latency and secure 
interconnect. The internet does not seem to be a good fit here.

> In practice everyone can find a solution that works for them. However,
> synchronous replication is not perfect, and there are many failure
> scenarios which need to be resolved in a way that fits your business. I
> think synchronous replication is inherently less available than
> asynchronous.

This surely depends on the environment. With a dedicated (i.e. low 
latency and secure) interconnect sync replication is surely more 
available because your arguments above don't apply. And because sync 
replication guarantees you won't loose committed transactions.

If however you want or have to replicate over the internet it depends. 
Your arguments above also apply to async replication. Only that because 
of the conflict resolution, async replication systems can continue to 
operate on all the disconnected nodes and merge their work later on as 
the network is up again. But then again, async still has the danger of 
loosing transactions.

So I probably agree: if you are on an unreliable network and if you have 
conflict resolution correctly setup then async replication is more 
available, but less secure.

As I said above, sync replication needs a reliable interconnect, better 
even have two interconnects, because it's a SPOF for a clustered 
database system.

Regards

Markus


Re: Replication

From
Jeff Davis
Date:
On Fri, 2006-08-25 at 11:23 +0200, Markus Schiltknecht wrote:
> Jeff Davis wrote:
> > Which doesn't work very well in the case of two groups of servers set up
> > in two physical locations. I can see two possibilities:
> > (1) You require a quorum to be effective, in which case your cluster of
> > databases is only as reliable as the location which holds more servers.
> > (2) You have another central authority that determines which databases
> > are up, and which are down. Then your cluster is only as reliable as
> > that central authority.
> 
> Right, the ideal here would be two sync clusters a both locations, 
> connected via async replication :-)
> 
> > Even if you have a large number of nodes at different locations, then
> > you end up with strange decisions to make if the network connections are
> > intermittent or very slow. A temporary slowdown of many nodes could
> > cause them to be degraded until some kind of human intervention brought
> > them back. Until that time you might not be able to determine which
> > nodes make up an authoritative group. 
> 
> Side note: in such a case, I think a GCS will just choose only one node 
> to be the 'authoritative group'. Because most systems cannot effort to 
> have long waits for such decisions. For database replication I also 
> think its better to have at least one node running than none.
> 
>  > This kind of degradation could
> > happen in the case of a DDoS attack, or perhaps a worm moving around the
> > internet.
> 
> Well, sync replication in general needs a good, low latency and secure 
> interconnect. The internet does not seem to be a good fit here.
> 
> > In practice everyone can find a solution that works for them. However,
> > synchronous replication is not perfect, and there are many failure
> > scenarios which need to be resolved in a way that fits your business. I
> > think synchronous replication is inherently less available than
> > asynchronous.
> 
> This surely depends on the environment. With a dedicated (i.e. low 
> latency and secure) interconnect sync replication is surely more 
> available because your arguments above don't apply. And because sync 
> replication guarantees you won't loose committed transactions.
> 
> If however you want or have to replicate over the internet it depends. 
> Your arguments above also apply to async replication. Only that because 
> of the conflict resolution, async replication systems can continue to 
> operate on all the disconnected nodes and merge their work later on as 
> the network is up again. But then again, async still has the danger of 
> loosing transactions.
> 
> So I probably agree: if you are on an unreliable network and if you have 
> conflict resolution correctly setup then async replication is more 
> available, but less secure.
> 
> As I said above, sync replication needs a reliable interconnect, better 
> even have two interconnects, because it's a SPOF for a clustered 
> database system.
> 

Ok, I agree with your statements. Async is convenient in many ways, but
has less durable transactions (at least for transactions committed
recently). Sync has some limitations, and is harder to get right (at
least if you want good availability as well) but provides more durable
transactions and consistency between systems.

Regards,Jeff Davis