Thread: PostgreSQL on 64 bit Linux
I have a PostgreSQL installation on a Debian box that had the 64bit SMP kernel installed before PostgreSQL was compiled and installed on it. Does PostgreSQL take any advantage of the 64 bit environment or have we not done anything to move into the 64 bit world yet? Regards, - Naz
Naz Gassiep <naz@mira.net> writes: > I have a PostgreSQL installation on a Debian box that had the 64bit > SMP kernel installed before PostgreSQL was compiled and installed on > it. Does PostgreSQL take any advantage of the 64 bit environment or > have we not done anything to move into the 64 bit world yet? Depends on whether PG was compiled as 64-bit or 32-bit--is your toolchain 64-bit all the way, or is it just the kernel? -Doug
Douglas McNaught wrote: <blockquote cite="mid87mz9zkue1.fsf@suzuka.mcnaught.org" type="cite"><pre wrap="">Naz Gassiep <aclass="moz-txt-link-rfc2396E" href="mailto:naz@mira.net"><naz@mira.net></a> writes: </pre><blockquote type="cite"><pre wrap="">I have a PostgreSQL installation on a Debian box that had the 64bit SMP kernel installed before PostgreSQL was compiled and installed on it. Does PostgreSQL take any advantage of the 64 bit environment or have we not done anything to move into the 64 bit world yet? </pre></blockquote><pre wrap=""> Depends on whether PG was compiled as 64-bit or 32-bit--is your toolchain 64-bit all the way, or is it just the kernel? -Doug</pre></blockquote> I just compiled as the manual says. I guess I must have compiled it in 32. I'll recompile in 64when I upgrade to 8.2 when it's out.<br /> Thanks,<br /> - Naz.<br />
Naz Gassiep <naz@mira.net> writes: > I just compiled as the manual says. I guess I must have compiled it > in 32. I'll recompile in 64 when I upgrade to 8.2 when it's out. The 'file' command will tell you whether a binary is 32- or 64-bit. If you have a full 64-bit install, you'll get a 64-bit compile by default, but it sounds like you just added a 64-bit kernel to a 32-bit Debian system? -Doug
On Sun, Aug 20, 2006 at 04:46:30PM -0400, Douglas McNaught wrote: > Naz Gassiep <naz@mira.net> writes: > > I have a PostgreSQL installation on a Debian box that had the 64bit > > SMP kernel installed before PostgreSQL was compiled and installed on > > it. Does PostgreSQL take any advantage of the 64 bit environment or > > have we not done anything to move into the 64 bit world yet? > Depends on whether PG was compiled as 64-bit or 32-bit--is your > toolchain 64-bit all the way, or is it just the kernel? I think he means - have benchmarks, or profiling been done with the goal of specifically improving performance on 64-bit platforms. For most applications available today, the answer is no. Compiling an application designed for 32-bit, on a 64-bit architecture, does not automatically improve performance. Too frequently, it can actually reduce performance. Pointers are large, which means that any application that is heavily pointer based can be forced to deal with twice as many copies of memory, which reduces the effectiveness of the various cache levels, and RAM itself. Hopefully GLIBC counts here, in that it should contain 64-bit specific code where it might count, so libc calls should be able to take advantage of the 64-bit machine instructions. Is there an interest, or any active project to examine PostgreSQL in the area of 64-bit processors? Has it already been done? I don't recall seeing a reference to it in my travels. I'm also not sure on what to expect for results, as the territory is still new. 64-bit processors have existed for a while, but 32-bit processors have been the popular choice, making 64-bit support an after thought? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On 8/21/06, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote: > Is there an interest, or any active project to examine PostgreSQL in > the area of 64-bit processors? Has it already been done? I don't recall > seeing a reference to it in my travels. I'm also not sure on what to > expect for results, as the territory is still new. 64-bit processors > have existed for a while, but 32-bit processors have been the popular > choice, making 64-bit support an after thought? That's certainly just a reference to the wintel world? AIX, HP-UX and Solaris-Sparc have been 64-bit for a while now... > Cheers, > mark Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm
mark@mark.mielke.cc writes: > Is there an interest, or any active project to examine PostgreSQL in > the area of 64-bit processors? Has it already been done? I don't recall > seeing a reference to it in my travels. I'm also not sure on what to > expect for results, as the territory is still new. 64-bit processors > have existed for a while, but 32-bit processors have been the popular > choice, making 64-bit support an after thought? I find this question a bit amusing, since PG has run on 64-bit architectures such as MIPS, Sparc, Alpha and PA-RISC for quite a while now. :) As I said in a private email to Naz, the main advantage I think you'd see from 64-bit is the ability to run with more than 2GB or so of shared buffers on a system with lots of RAM. Whether you'd want to do that, or let the OS do most of the buffering, is an open question... -Doug
Naz, On 8/20/06 12:59 PM, "Naz Gassiep" <naz@mira.net> wrote: > I have a PostgreSQL installation on a Debian box that had the 64bit SMP > kernel installed before PostgreSQL was compiled and installed on it. > Does PostgreSQL take any advantage of the 64 bit environment or have we > not done anything to move into the 64 bit world yet? Very likely the default gcc compiles for 64-bit, if not you need to specify "-m64". As another respondent said - do a "file `which initdb`" to find out whether you have compiled for 64-bit or not. WRT 64-bit and Postgres, it depends on the CPU as to whether you see a simple performance benefit. On the Opteron you will see a benefit when doing CPU bound work. When doing the CPU portion, the additional registers of the Opteron running in 64-bit mode are used by the compiler to produce a 20-30% boost in performance. On the Xeon in 64-bit mode, the same regions of execution will slow down by about 5%. Postgres benefits automatically from the larger memory addressing of the 64-bit kernel by using the larger I/O cache of Linux. - Luke
> WRT 64-bit and Postgres, it depends on the CPU as to whether you see a > simple performance benefit. On the Opteron you will see a benefit when > doing CPU bound work. When doing the CPU portion, the additional registers > of the Opteron running in 64-bit mode are used by the compiler to produce a > 20-30% boost in performance. On the Xeon in 64-bit mode, the same regions > of execution will slow down by about 5%. Is that true of even Woodcrest? Joshua D. Drake > > Postgres benefits automatically from the larger memory addressing of the > 64-bit kernel by using the larger I/O cache of Linux. > > - Luke > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
One person who commented on the The business of Postbrsql made this comment: Posted Aug 3, 2006 8:45 UTC (Thu) by subscriber *jgarzik* [Link <http://lwn.net/Articles/193946/>]Cluster immaturity. MySQL has been shipping a workable single-master replication+failover for quite a while now in most Linux distros. MySQL's multi-master solution, while requiring RAM (not disk) for storage, is also well-integrated and deployed in production. In contrast, the PostgreSQL team has chosen to provide hooks for replication and failover. This has led to a situation where there are multiple projects supporting replications/failover, none of which are production-ready nor shipped in a modern Linux distro. Modern systems *must* scale beyond a single computer, and the PostgreSQL support shipped in modern Linux distros is completely incapable of this. I really would appreciate a response. Thanks~ Michael -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.405 / Virus Database: 268.11.3/423 - Release Date: 8/18/2006
> In contrast, the PostgreSQL team has chosen to provide hooks for > replication and failover. This has led to a situation where there are > multiple projects supporting replications/failover, none of which are > production-ready nor shipped in a modern Linux distro. And no, we don't really provide hooks :). However there are several projects trying to solve different problems with PostgreSQL. > > Modern systems *must* scale beyond a single computer, and the PostgreSQL > support shipped in modern Linux distros is completely incapable of this. Slony-I is quite capable as a production class FOSS replication system and is in use widely. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
Josh, On 8/20/06 8:52 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote: > Is that true of even Woodcrest? > > Joshua D. Drake Not sure - haven't read anything about the register set on the Core 2 to make me think it benefits from 64 bit. The point may be academic from now on though - the comparisons between Opteron and Core 2 will all likely be in 64-bit mode from now on. - Luke
Joshua D. Drake wrote: >> Modern systems *must* scale beyond a single computer, and the PostgreSQL >> support shipped in modern Linux distros is completely incapable of this. > > Slony-I is quite capable as a production class FOSS replication system > and is in use widely. Slony-I is not enough because it can cause the inconsistency of data between servers. IMO, log-based replication is needed also for PostgreSQL just like MySQL. Regards;
Fujii Masao wrote: > Joshua D. Drake wrote: >>> Modern systems *must* scale beyond a single computer, and the >>> PostgreSQL support shipped in modern Linux distros is completely >>> incapable of this. >> >> Slony-I is quite capable as a production class FOSS replication system >> and is in use widely. > > Slony-I is not enough because it can cause the inconsistency of data > between servers. hmm what are you refering to here ? slony1 does row-level replication (something that MySQL cannot do until 5.1 which is still beta) - so it should not be possible to cause data-inconsistency. It is however async replication so you can loose data commited on the master but not yet replicated to the slaves in case you loose the mastercompletely. Stefan
Stefan Kaltenbrunner wrote: > It is however async replication so you can loose data commited on the > master but not yet replicated to the slaves in case you loose the master > completely. Yes, here is an insufficient point of Slony-I, i think. Most systems will not permit the committed data to be lost, so use is limited. >> IMO, log-based replication is needed also for PostgreSQL just like MySQL. Well, I had misunderstood MySQL. Its replication is also asynchronous. regards;
On Mon, Aug 21, 2006 at 02:56:10PM +1200, Andrej Ricnik-Bay wrote: > On 8/21/06, mark@mark.mielke.cc <mark@mark.mielke.cc> wrote: > >Is there an interest, or any active project to examine PostgreSQL in > >the area of 64-bit processors? Has it already been done? I don't recall > >seeing a reference to it in my travels. I'm also not sure on what to > >expect for results, as the territory is still new. 64-bit processors > >have existed for a while, but 32-bit processors have been the popular > >choice, making 64-bit support an after thought? > That's certainly just a reference to the wintel world? AIX, HP-UX > and Solaris-Sparc have been 64-bit for a while now... I don't think so. In the Open Source world, most projects are still 32-bit centric, regardless of how many years the products have been supported on 64-bit platforms. What application were you thinking of that takes full advantage of 64-bit, making the 64-bit application much significantly faster than the 32-bit application? The only area I am aware of, is video processing. It's often a surprise to people that an upgrade to 64-bit, regardless of CPU architecture, too often ends up slower, rather than faster. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Sun, Aug 20, 2006 at 11:00:26PM -0400, Douglas McNaught wrote: > mark@mark.mielke.cc writes: > > Is there an interest, or any active project to examine PostgreSQL in > > the area of 64-bit processors? Has it already been done? I don't recall > > seeing a reference to it in my travels. I'm also not sure on what to > > expect for results, as the territory is still new. 64-bit processors > > have existed for a while, but 32-bit processors have been the popular > > choice, making 64-bit support an after thought? > I find this question a bit amusing, since PG has run on 64-bit > architectures such as MIPS, Sparc, Alpha and PA-RISC for quite a while > now. :) I don't think so. Software can be designed to take best advantage of hardware. Recompiling it for a different architecture, running test cases, and declaring support, is not the same as optimizing for. > As I said in a private email to Naz, the main advantage I think you'd > see from 64-bit is the ability to run with more than 2GB or so of > shared buffers on a system with lots of RAM. Whether you'd want to do > that, or let the OS do most of the buffering, is an open question... This is what I mean by after thought. PostgreSQL is designed for 32-bit processors. Which is fine. I'm not complaining. The question was whether there is an interest in pursuing 64-bit specific optimizations. In the PostgreSQL code, a quick check points me only to "has long int 64" as a 64-bit source code #ifdef. Of the six places that reference this, five of them actually slow down the code, as they check for overflow of the 'long int' result beyond 4 bytes of data. The sixth place is used to define the 64-bit type in use by PostgreSQL, which I suspect is infrequently used. I believe the answer is no. No or few 64-bit optimization possibilities have been chased down, probably because some or many of these would: 1) require significant re-architecture 2) reduce the performance in a 32-bit world It's a question that only half interests me. As with most projects, I don't think the projects are ready to re-architect for this purpose. Perhaps once 50%+ of people are running PostgreSQL in 64-bit mode, the question will be more serious to more people. As a half interesting question, I'm defending it as a valid question. Please don't write it off, but it is fine to say "not yet, we have more important things to work on". Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc wrote: > This is what I mean by after thought. PostgreSQL is designed for > 32-bit processors. Which is fine. I'm not complaining. The question > was whether there is an interest in pursuing 64-bit specific > optimizations. In the PostgreSQL code, a quick check points me only to > "has long int 64" as a 64-bit source code #ifdef. Of the six places > that reference this, five of them actually slow down the code, as they > check for overflow of the 'long int' result beyond 4 bytes of > data. The sixth place is used to define the 64-bit type in use by > PostgreSQL, which I suspect is infrequently used. > > I believe the answer is no. No or few 64-bit optimization possibilities > have been chased down, probably because some or many of these would: > > 1) require significant re-architecture > > 2) reduce the performance in a 32-bit world Just out of intereset - what areas in postgres do you think could be improved (performance wise) on 64-bit machines? The only area that I can see is the int64 datatype - it's stored in palloc()'ed memory on 32-bit machines AFAIK - I'm not sure if it uses the "long long" datatype on 64-bit archs.. But I can't imagine any other area that could be tuned by making use of (native) 64-bit ints. greetings, Florian Pflug
On Mon, Aug 21, 2006 at 09:16:46AM -0400, mark@mark.mielke.cc wrote: > This is what I mean by after thought. PostgreSQL is designed for > 32-bit processors. Which is fine. I'm not complaining. The question > was whether there is an interest in pursuing 64-bit specific > optimizations. In the PostgreSQL code, a quick check points me only to > "has long int 64" as a 64-bit source code #ifdef. Of the six places > that reference this, five of them actually slow down the code, as they > check for overflow of the 'long int' result beyond 4 bytes of > data. The sixth place is used to define the 64-bit type in use by > PostgreSQL, which I suspect is infrequently used. There are two defines, the end result being to declare an int64 type which is used a fair bit around the place. biginteger and bigserial being the obvious ones. The checks I see relate to strtol, where the code only wants an int4. There's no strtoi so on 32 bit the range check is built-in, but if long is 64 bit you have to do the check seperatly. That's just an interface problem, there's not a lot we can do about that really. > I believe the answer is no. No or few 64-bit optimization possibilities > have been chased down, probably because some or many of these would: > > 1) require significant re-architecture > > 2) reduce the performance in a 32-bit world Can you think of any places at all where 64-bit would make a difference to processing? 64-bit gives you more memory, and on some x86 chips, more registers, but that's it. Have anice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > Can you think of any places at all where 64-bit would make a difference > to processing? 64-bit gives you more memory, and on some x86 chips, more > registers, but that's it. It would be interesting to look into making int8 and float8 be pass-by-value datatypes to save palloc overhead on machines where Datum is going to be 8 bytes wide anyway. Of course this would only help queries that use those datatypes extensively, and it might be an insignificant savings anyhow :-( For the most part though I think that the main advantage of 64-bit for a database is the ability to address more memory. We've been working slowly towards allowing PG to make better use of large numbers of shared buffers, for instance. regards, tom lane
mark@mark.mielke.cc writes: > I believe the answer is no. No or few 64-bit optimization possibilities > have been chased down, probably because some or many of these would: > > 1) require significant re-architecture > > 2) reduce the performance in a 32-bit world Honestly, I think the main "optimizations" happen automatically: the compiler uses the extra registers (if present) and pointers in LP64 mode are automatically 64-bit, which allows much larger memory areas. That's probably 95% of the win right there. What other "optimizations" need doing? People have been running PG with big databases on 64-bit iron for years; don't you think any glaring infelicities would have been addressed? > It's a question that only half interests me. As with most projects, I > don't think the projects are ready to re-architect for this > purpose. What re-architecting would be needed? -Doug
Fujii Masao wrote: > Stefan Kaltenbrunner wrote: >> It is however async replication so you can loose data commited on the >> master but not yet replicated to the slaves in case you loose the master >> completely. > > Yes, here is an insufficient point of Slony-I, i think. > Most systems will not permit the committed data to be lost, so use is > limited. Wanna bet? It is very, very common to have asynchronous replication. I would say the need for synchronous is far more limited (although greater desired). Joshua D. Drake > > > >>> IMO, log-based replication is needed also for PostgreSQL just like >>> MySQL. > > Well, I had misunderstood MySQL. Its replication is also asynchronous. > > regards; > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
> WRT 64-bit and Postgres, it depends on the CPU as to whether you > see a simple performance benefit. On the Opteron you will see a > benefit when doing CPU bound work. When doing the CPU portion, the > additional registers of the Opteron running in 64-bit mode are used > by the compiler to produce a 20-30% boost in performance. On the > Xeon in 64-bit mode, the same regions of execution will slow down > by about 5%. > Postgres benefits automatically from the larger memory addressing > of the 64-bit kernel by using the larger I/O cache of Linux. Main benefit Postgres in 64-bit mode possible only in case dedicated DB server on system with RAM > 3GB and use most part of RAM for shared buffers and avoid persistent moving buffers between OS cache and shared memory. On system with RAM below 2-3GB to difficult found serious gain of performance. Best regards,Alexander Kirpa
On Mon, Aug 21, 2006 at 10:23:16AM -0400, Douglas McNaught wrote: > mark@mark.mielke.cc writes: > > I believe the answer is no. No or few 64-bit optimization possibilities > > have been chased down, probably because some or many of these would: > > 1) require significant re-architecture > > 2) reduce the performance in a 32-bit world > Honestly, I think the main "optimizations" happen automatically: the > compiler uses the extra registers (if present) and pointers in LP64 > mode are automatically 64-bit, which allows much larger memory areas. > That's probably 95% of the win right there. What other > "optimizations" need doing? Compilers are getting smarter, but having code profiled, and hand tuned for 32-bits does not automatically get compiled with a 64-bit compiler, to code that has been profile, and hand tuned for 64-bit. An example of this is in how algorithms are implemented. For Intel-style processors at least, using array offsets is cheaper than using many pointers. For 64-bit processors, 32-bit array offsets would use less memory. For re-architecture, the first thing that came to mind was Datum increasing in size. Copy by value instead of allocating a pointer, and passing the pointer. The pointer will be 64-bits, so Datum could support 64-bits as well. Tom caught this, but he did point out what I missed. This would only benefit data types that are 64-bits in length, which are still likely uncommon (most people would use serial, not bigserial). Has anybody taken any of the 64-bit processor optimization manuals, and walked through it, cross referencing it against bottleneck programs in PostgreSQL? To counter this, I suggest to you that most every attempt to optimize PostgreSQL for performance reasons, likely considered variations in algorithms measured on a 32-bit processor, finally selecting the version that was simplest, and best performing, on a 32-bit processor. This is normal evolution for a piece of software. It is naturally optimized for the processor that is most used by the users. I'm not the expert on the matter. I don't see evidence that anybody has taken this route though. > People have been running PG with big databases on 64-bit iron for > years; don't you think any glaring infelicities would have been > addressed? I doubt there are 'glaring infelicities'. I doubt any changes required to provide a noticeable speed improvement would be one liners. They will not be obvious. They will require a complete understanding of the many architectural differences between common 32-bit processors, and common 64-bit processors. Few have this information in their head. I don't. I suspect that anybody who claims that the only difference is the amount of addressable memory, and the number of registers doesn't. :-) > > It's a question that only half interests me. As with most projects, I > > don't think the projects are ready to re-architect for this > > purpose. > What re-architecting would be needed? I'm asking that it be admitted that it has not been looked at. Not seriously. Nor is this unique to PostgreSQL. I expect the performance for Linux applications in general to slowly improve on 64-bit processors as more and more people begin to use it. I recall a few threads on other mailing lists where the primary developers admitted to not having ready access to a 64-bit machine. Until they do, the chance that these applications are hand-tuned for 64-bit is unlikely. I'm not claiming that I have the answers. I am claiming that a few of you share this weakness, and that an assumption that PostgreSQL is designed for 64-bit would be incorrect. :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Mon, Aug 21, 2006 at 12:00:58PM +0300, Alexander Kirpa wrote: > > WRT 64-bit and Postgres, it depends on the CPU as to whether you > > see a simple performance benefit. On the Opteron you will see a > > benefit when doing CPU bound work. When doing the CPU portion, the > > additional registers of the Opteron running in 64-bit mode are used > > by the compiler to produce a 20-30% boost in performance. On the > > Xeon in 64-bit mode, the same regions of execution will slow down > > by about 5%. > > > Postgres benefits automatically from the larger memory addressing > > of the 64-bit kernel by using the larger I/O cache of Linux. > > Main benefit Postgres in 64-bit mode possible only in case dedicated > DB server on system with RAM > 3GB and use most part of RAM for > shared buffers and avoid persistent moving buffers between OS cache > and shared memory. On system with RAM below 2-3GB to difficult found > serious gain of performance. This is the main difference between PostgreSQL today - designed for 32-bit - when recompiled with a 64-bit compiler. The additional registers are barely enough to counter the increased cost of processing in 64-bits. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Mon, Aug 21, 2006 at 10:46:56AM -0400, mark@mark.mielke.cc wrote: > Compilers are getting smarter, but having code profiled, and hand tuned > for 32-bits does not automatically get compiled with a 64-bit compiler, > to code that has been profile, and hand tuned for 64-bit. I don't see any evidence that anyone has any kind of hand-tuning, not for 32-bit, not for 64-bit. That's compiler's job really. That's said, if you want to go hand-tuning, go for the low-hanging fruit first: using likely()/unlikely() in often used code paths (heap_get_attr for example), find the parts of the code that could benefit from strict-aliasing, etc. In general we avoid architechture specific code because the cost of maintainence outweighs the benefits. > For re-architecture, the first thing that came to mind was Datum > increasing in size. Copy by value instead of allocating a pointer, > and passing the pointer. The pointer will be 64-bits, so Datum > could support 64-bits as well. Tom caught this, but he did point > out what I missed. This would only benefit data types that are > 64-bits in length, which are still likely uncommon (most people > would use serial, not bigserial). A Datum must be 64-bit on a 64-bit arch, how else is it going to hold the pointer for a pass-by-ref value? On the whole, PostgreSQL is still in the phase where we're trying to find the best algorithms. When that's done (if ever), we can start worrying about processor optimisations... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Aug 21, 2006, at 10:30 , Joshua D. Drake wrote: > Fujii Masao wrote: >> Stefan Kaltenbrunner wrote: >>> It is however async replication so you can loose data commited on >>> the >>> master but not yet replicated to the slaves in case you loose the >>> master >>> completely. >> Yes, here is an insufficient point of Slony-I, i think. >> Most systems will not permit the committed data to be lost, so use >> is limited. > > Wanna bet? > > It is very, very common to have asynchronous replication. I would > say the need for synchronous is far more limited (although greater > desired). I would imagine that multi-master synchronous replication would be fairly trivial to implement with 2PC and wal-shipping available, no?
>> >> It is very, very common to have asynchronous replication. I would say >> the need for synchronous is far more limited (although greater desired). > > I would imagine that multi-master synchronous replication would be > fairly trivial to implement with 2PC and wal-shipping available, no? Trivial? I would say... no. There is a reason it hasn't been done yet, and a reason why CMD (for example) has zero plans to even try. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
Hi, AgentM wrote: > I would imagine that multi-master synchronous replication would be > fairly trivial to implement with 2PC and wal-shipping available, no? Yes, that could be done. And AFAIK eigter pgpool or PgCluster (1) try to do sync, multi-master replication that way. The problem is that such an implementation is very network intensive and scales very badly for writing transactions. If you're interested in a good sync, multi-master replication algorithm, you might want to have a look at what Slony-II [1] or Postgres-R [2] try to do. I also recommend reading "Don't be lazy, be consistent" of Bettina Kemme [3]. Regards Markus [1]: http://www.slony2.org/wiki/index.php?title=Documentation [2]: http://www.postgres-r.org/ [3]: http://www.cs.mcgill.ca/~kemme/papers/vldb00.html
On Mon, Aug 21, 2006 at 05:13:58PM +0200, Martijn van Oosterhout wrote: > On the whole, PostgreSQL is still in the phase where we're trying to > find the best algorithms. When that's done (if ever), we can start > worrying about processor optimisations... I don't disagree. :-) You claimed that PostgreSQL is not tuned to a specific processor, and relies on the kernel, the compiler, and libc to perform some tuning. I agree with this as well. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Fujii Masao wrote: > Stefan Kaltenbrunner wrote: >> It is however async replication so you can loose data commited on the >> master but not yet replicated to the slaves in case you loose the master >> completely. > > Yes, here is an insufficient point of Slony-I, i think. > Most systems will not permit the committed data to be lost, so use is > limited. not sure i agree with "most systems" here - a _LOT_ of use cases actually want async (and note that slony1 can do a controlled failover without any transactions lost). Nevertheless there are also points for having sync-replication but calling slony1 "insufficient" in that regard is a bit much since it is actually designed to be async and does quite a good job with that. Stefan
On Mon, 2006-08-21 at 11:33 -0400, AgentM wrote: > I would imagine that multi-master synchronous replication would be > fairly trivial to implement with 2PC and wal-shipping available, no? > How does WAL shipping help synchronous replication? Regards,Jeff Davis
mark@mark.mielke.cc writes: >> What re-architecting would be needed? > I'm asking that it be admitted that it has not been looked at. Not > seriously. Nor is this unique to PostgreSQL. I expect the performance > for Linux applications in general to slowly improve on 64-bit > processors as more and more people begin to use it. I recall a few > threads on other mailing lists where the primary developers admitted > to not having ready access to a 64-bit machine. Until they do, the > chance that these applications are hand-tuned for 64-bit is unlikely. What you seem to be asking for is that we engage in a lot of machine-specific tuning for some particular processor or other. That has not happened and will not happen. This project believes in portable code, not "optimized for Xeon-of-the-month" code. regards, tom lane
Jeff Davis wrote: > How does WAL shipping help synchronous replication? The WAL is written _before_ commit, logging all the changes the transaction wants to write to the disk. This makes it look very similar to what is needed for synchronous replication. Instead of waiting for confirmation from the disk, that your WAL has been written, you would have to wait from the network to confirm that the changes have been delivered via the network. Regards Markus
Markus Schiltknecht wrote: > Jeff Davis wrote: > > How does WAL shipping help synchronous replication? > > The WAL is written _before_ commit, logging all the changes the > transaction wants to write to the disk. This makes it look very similar > to what is needed for synchronous replication. > > Instead of waiting for confirmation from the disk, that your WAL has > been written, you would have to wait from the network to confirm that > the changes have been delivered via the network. But the confirmation that needs to come is that the WAL changes have been applied (fsync'ed), so the performance will be terrible. So bad, that I don't think anyone will want to use such a replication system ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Aug 21, 2006 at 01:05:13PM -0400, Tom Lane wrote: > mark@mark.mielke.cc writes: > >> What re-architecting would be needed? > > > I'm asking that it be admitted that it has not been looked at. Not > > seriously. Nor is this unique to PostgreSQL. I expect the performance > > for Linux applications in general to slowly improve on 64-bit > > processors as more and more people begin to use it. I recall a few > > threads on other mailing lists where the primary developers admitted > > to not having ready access to a 64-bit machine. Until they do, the > > chance that these applications are hand-tuned for 64-bit is unlikely. > > What you seem to be asking for is that we engage in a lot of > machine-specific tuning for some particular processor or other. > That has not happened and will not happen. This project believes > in portable code, not "optimized for Xeon-of-the-month" code. Hi Tom. I'm asking for a statement such as the above, that you provide. :-) The question was whether PostgreSQL was optimized for 64-bit Linux. The answer is "not specifically". The answer is quite acceptable to me. It's not the original answer that was given to the original poster though... :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc writes: > On Mon, Aug 21, 2006 at 01:05:13PM -0400, Tom Lane wrote: >> What you seem to be asking for is that we engage in a lot of >> machine-specific tuning for some particular processor or other. > The question was whether PostgreSQL was optimized for 64-bit Linux. > The answer is "not specifically". Well, my notion of "optimizing for 64 bit" seems to have little to do with yours. To me, stuff like making sure we can use lots of buffers effectively is a 64-bit optimization. What you seem to have in mind is assembly-code-level optimization for specific processors. regards, tom lane
On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote: > But the confirmation that needs to come is that the WAL changes have > been applied (fsync'ed), so the performance will be terrible. So bad, > that I don't think anyone will want to use such a replication system ... Okay. I give up... Why is waiting for fsync on a fast local network which takes 15us to send a message (infiniband is cheap..) an unimaginable delay when we tolerate a local 8ms fsync delay on systems without writeback cache?
On Mon, 2006-08-21 at 19:42 +0200, Markus Schiltknecht wrote: > Jeff Davis wrote: > > How does WAL shipping help synchronous replication? > > The WAL is written _before_ commit, logging all the changes the > transaction wants to write to the disk. This makes it look very similar > to what is needed for synchronous replication. > > Instead of waiting for confirmation from the disk, that your WAL has > been written, you would have to wait from the network to confirm that > the changes have been delivered via the network. Synchronous replication (to me) means that the data has been written to permanent storage on all masters and all slaves before any master or slave reports a successful COMMIT. Are you suggesting that you ship the WAL over the network, wait for it to be written to the slave, and then report a successful commit? Regards,Jeff Davis
On Mon, 21 Aug 2006 14:46:05 -0400 "Gregory Maxwell" <gmaxwell@gmail.com> wrote: > On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > But the confirmation that needs to come is that the WAL changes have > > been applied (fsync'ed), so the performance will be terrible. So bad, > > that I don't think anyone will want to use such a replication system ... > > Okay. I give up... Why is waiting for fsync on a fast local network > which takes 15us to send a message (infiniband is cheap..) an > unimaginable delay when we tolerate a local 8ms fsync delay on systems > without writeback cache? OK, that solves your problem. How about my problem where replication has to happen on servers in three countries on two continents and thousands of updates a second have to happen in less that 10ms? This is the critical issue with replication - one size does not fit all. Syncronous replication, in particular, fits almost no one. My experience is that any replication needs to be based on your business rules which will vary widely. -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
Alvaro Herrera wrote: > But the confirmation that needs to come is that the WAL changes have > been applied (fsync'ed), so the performance will be terrible. So bad, > that I don't think anyone will want to use such a replication system ... Yeah, that's the big problem of sync, multi-master replication. IMHO the key to high performance sync, multi-master replication is to minimize the efforts needed to get that confirmation. A hard drive has the built-in functionality to confirm what has been written (and hopefully it does not lie about that). A cluster does not have this ability. Now, what does it mean to get a confirmation from the cluster at all? First of all, you have to be sure the cluster has received your message, then you also need to be sure it can apply your changes (the so called WriteSet). Unfortunately 'the cluster' consists of many independent machines. If you wait for every single machine in the cluster to confirm being able to apply your WriteSet, you won't get beyond single-machine performance for writing transactions. Postgres-R uses a Group Communication System to get very fast 'confirmation' by optimizing the logic a little: like with patches, you can be sure a WriteSet can be applied if you try to apply it to the same base. So Postgres-R uses totally ordered communication to ensure all transactions are processed in the very same order on all nodes. Another optimization is buried in the depth of the GCS: it only guarantees that the message you sent _will_be_delivered_. So thanks to the GCS you don't have to wait for all other nodes, but only for the GCS to confirm that your message will be delivered on the other nodes. Of course the GCS also needs to send messages around to be able to confirm that, but GCSes are designed to do exactly that very efficiently. I hope to have brought some light to the sync, multi-master replication problem. Please ask if you have more questions. I propose to continue that discussion on the Postgres-R mailing list I have set up. Regards Markus
Gregory Maxwell wrote: > infiniband is cheap.. Can I get one? I'd love to run some tests with Postgres-R ;-)
On Aug 21, 2006, at 15:00 , D'Arcy J.M. Cain wrote: > On Mon, 21 Aug 2006 14:46:05 -0400 > "Gregory Maxwell" <gmaxwell@gmail.com> wrote: >> On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote: >>> But the confirmation that needs to come is that the WAL changes have >>> been applied (fsync'ed), so the performance will be terrible. So >>> bad, >>> that I don't think anyone will want to use such a replication >>> system ... >> >> Okay. I give up... Why is waiting for fsync on a fast local network >> which takes 15us to send a message (infiniband is cheap..) an >> unimaginable delay when we tolerate a local 8ms fsync delay on >> systems >> without writeback cache? > > OK, that solves your problem. How about my problem where replication > has to happen on servers in three countries on two continents and > thousands of updates a second have to happen in less that 10ms? > This is > the critical issue with replication - one size does not fit all. > Syncronous replication, in particular, fits almost no one. > > My experience is that any replication needs to be based on your > business > rules which will vary widely. Sure- and more specifically, replication rules may differ on every table according to those rules. The current solutions are on/off for a list of tables. I wonder if the various pgsql replication engines have any problems co-existing... -M
Jeff Davis wrote: > Synchronous replication (to me) means that the data has been written to > permanent storage on all masters and all slaves before any master or > slave reports a successful COMMIT. Are you suggesting that you ship the > WAL over the network, wait for it to be written to the slave, and then > report a successful commit? I'm not suggesting doing replication like that. But I think that's what people mean when they propose that "multi-master synchronous replication would be fairly trivial to implement with 2PC and wal-shipping". This is more or less trivial to implement, yes. But it won't scale. Regards Markus
AgentM wrote: > > On Aug 21, 2006, at 15:00 , D'Arcy J.M. Cain wrote: > > >On Mon, 21 Aug 2006 14:46:05 -0400 > >"Gregory Maxwell" <gmaxwell@gmail.com> wrote: > >>On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote: > >>>But the confirmation that needs to come is that the WAL changes have > >>>been applied (fsync'ed), so the performance will be terrible. So > >>>bad, > >>>that I don't think anyone will want to use such a replication > >>>system ... > >> > >>Okay. I give up... Why is waiting for fsync on a fast local network > >>which takes 15us to send a message (infiniband is cheap..) an > >>unimaginable delay when we tolerate a local 8ms fsync delay on > >>systems > >>without writeback cache? > > > >OK, that solves your problem. How about my problem where replication > >has to happen on servers in three countries on two continents and > >thousands of updates a second have to happen in less that 10ms? > >This is > >the critical issue with replication - one size does not fit all. > >Syncronous replication, in particular, fits almost no one. > > > >My experience is that any replication needs to be based on your > >business > >rules which will vary widely. > > Sure- and more specifically, replication rules may differ on every > table according to those rules. The current solutions are on/off for > a list of tables. I wonder if the various pgsql replication engines > have any problems co-existing... Althought I have never tried, I am sure Mammoth Replicator could coexist relatively sanely with Slony-I. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, 21 Aug 2006 15:14:10 -0400 AgentM <agentm@themactionfaction.com> wrote: > > My experience is that any replication needs to be based on your > > business > > rules which will vary widely. > > Sure- and more specifically, replication rules may differ on every > table according to those rules. The current solutions are on/off for In fact this is exactly what I did for my app. Some tables were multi-master and some were required to be modified from a single master. -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On 21 Aug 2006, at 10:48, mark@mark.mielke.cc wrote: > On Mon, Aug 21, 2006 at 12:00:58PM +0300, Alexander Kirpa wrote: > > > WRT 64-bit and Postgres, it depends on the CPU as to whether you > > > see a simple performance benefit. On the Opteron you will see a > > > benefit when doing CPU bound work. When doing the CPU portion, > > > the additional registers of the Opteron running in 64-bit mode are > > > used by the compiler to produce a 20-30% boost in performance. On > > > the Xeon in 64-bit mode, the same regions of execution will slow > > > down by about 5%. > > > > > Postgres benefits automatically from the larger memory addressing > > > of the 64-bit kernel by using the larger I/O cache of Linux. > > > > Main benefit Postgres in 64-bit mode possible only in case dedicated > > DB server on system with RAM > 3GB and use most part of RAM for > > shared buffers and avoid persistent moving buffers between OS cache > > and shared memory. On system with RAM below 2-3GB to difficult found > > serious gain of performance. > > This is the main difference between PostgreSQL today - designed for > 32-bit - when recompiled with a 64-bit compiler. > > The additional registers are barely enough to counter the increased > cost of processing in 64-bits. > > Cheers, > mark Current 32-bit Postgres architecture allow use main benefit of 64-bit OS - huge memory size for shared buffers. At current time possible use 2G x 8KB = 16TB as shared memory and regarding this issue need use (O_DIRECT) to avoid OS cache especially in case databases fit to shared memory. Best regards,Alexander Kirpa
Ühel kenal päeval, E, 2006-08-21 kell 21:46, kirjutas Fujii Masao: > Stefan Kaltenbrunner wrote: > > It is however async replication so you can loose data commited on the > > master but not yet replicated to the slaves in case you loose the master > > completely. > > Yes, here is an insufficient point of Slony-I, i think. > Most systems will not permit the committed data to be lost, so use is limited. But any sync _replication_ system will have severe impact on performance. My guess is that for a full sync replication, going from 1 server to 2 will actually lower performance andsome small gains would be possible only starting from 3rd server. This has nothing to do with postgreSQL, but is just due to latencies of memory, disk and network and the need to do remote locking. My quess is based on using standard 100Gb ethernet. It may be possible to do better on some more advanced interconnects like myrinet. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Ühel kenal päeval, E, 2006-08-21 kell 15:00, kirjutas D'Arcy J.M. Cain: > On Mon, 21 Aug 2006 14:46:05 -0400 > "Gregory Maxwell" <gmaxwell@gmail.com> wrote: > > On 8/21/06, Alvaro Herrera <alvherre@commandprompt.com> wrote: > > > But the confirmation that needs to come is that the WAL changes have > > > been applied (fsync'ed), so the performance will be terrible. So bad, > > > that I don't think anyone will want to use such a replication system ... > > > > Okay. I give up... Why is waiting for fsync on a fast local network > > which takes 15us to send a message (infiniband is cheap..) an > > unimaginable delay when we tolerate a local 8ms fsync delay on systems > > without writeback cache? > > OK, that solves your problem. How about my problem where replication > has to happen on servers in three countries on two continents and > thousands of updates a second have to happen in less that 10ms? For this scenario you are far better off with partitioning than replication. That is if your data is partitionable. But geographically distributed data often is. > This is > the critical issue with replication - one size does not fit all. > Syncronous replication, in particular, fits almost no one. > > My experience is that any replication needs to be based on your business > rules which will vary widely. > -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hannu Krosing wrote: > But any sync _replication_ system will have severe impact on > performance. My guess is that for a full sync replication, going from 1 > server to 2 will actually lower performance andsome small gains would be > possible only starting from 3rd server. Only testing will show concrete results, but for sure such a general answer does not make much sense. It all depends very much on the type of your load. Especially the read/write ratio is very important. Another important factor is the amount of conflicting transactions. > My quess is based on using standard 100Gb ethernet. It may be possible > to do better on some more advanced interconnects like myrinet. 100Gb ethernet already sounds very advanced... ;-) Regards Markus
Ühel kenal päeval, K, 2006-08-23 kell 13:09, kirjutas Markus Schiltknecht: > Hannu Krosing wrote: > > But any sync _replication_ system will have severe impact on > > performance. My guess is that for a full sync replication, going from 1 > > server to 2 will actually lower performance andsome small gains would be > > possible only starting from 3rd server. > > Only testing will show concrete results, but for sure such a general > answer does not make much sense. It all depends very much on the type of > your load. Especially the read/write ratio is very important. True. But if you have very few writes, then there seems no reason to do sync anyway. > Another important factor is the amount of conflicting transactions. That too, but just the need to do *any* locking on all nodes will significantly slow down sync replication > > My quess is based on using standard 100Gb ethernet. It may be possible > > to do better on some more advanced interconnects like myrinet. > > 100Gb ethernet already sounds very advanced... ;-) Yeah, I meant some mix of 100Mb and 1Gb ;) > Regards > > Markus > -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hannu Krosing wrote: > But if you have very few writes, then there seems no reason to do sync > anyway. I think there is one: high-availability. A standby-server which can continue if your primary fails. Of course sync is only needed if you absolutely cannot effort loosing any committed transaction. >> Another important factor is the amount of conflicting transactions. > > That too, but just the need to do *any* locking on all nodes will > significantly slow down sync replication If you implement sync replication with locking, yes. But there are better ways: the Postgres-R approach does not do network locking, but aborts conflicting transactions just before committing. That results in much less network traffic (one GCS-message per writing-transaction). Regards Markus
On Wed, 23 Aug 2006 12:42:11 +0300 Hannu Krosing <hannu@skype.net> wrote: > > OK, that solves your problem. How about my problem where replication > > has to happen on servers in three countries on two continents and > > thousands of updates a second have to happen in less that 10ms? > > For this scenario you are far better off with partitioning than > replication. > > That is if your data is partitionable. But geographically distributed > data often is. I agree but in this case it wasn't. -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
Ühel kenal päeval, K, 2006-08-23 kell 13:36, kirjutas Markus Schiltknecht: > Hannu Krosing wrote: > > But if you have very few writes, then there seems no reason to do sync > > anyway. > > I think there is one: high-availability. A standby-server which can > continue if your primary fails. Of course sync is only needed if you > absolutely cannot effort loosing any committed transaction. Yes. I guess that for that an enchanced WAL-shipping would be ok. Probably one where you require both sync to local disk and copy to remote (without waiting for sync) to have happened in order to COMMIT/ROLLBACK successfully. > >> Another important factor is the amount of conflicting transactions. > > > > That too, but just the need to do *any* locking on all nodes will > > significantly slow down sync replication > > If you implement sync replication with locking, yes. But there are > better ways: the Postgres-R approach does not do network locking, but > aborts conflicting transactions just before committing. That results in > much less network traffic (one GCS-message per writing-transaction). but it still needs to do at least one network roundtrip + any needed testing on all nodes + WAL sync on all nodes before it can COMMIT, no? And I'm afraid that GCS serialisation will need more than one roundtrip or risk being out-of-date. I'm not saying that Postgres-R (or any other sync replication) is not doable or even useful. I just can't see right away, how it can scale very well for any significant write load. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
Hi, Hannu Krosing wrote: > but it still needs to do at least one network roundtrip + any needed > testing on all nodes + WAL sync on all nodes before it can COMMIT, no? No. It only needs the 'roundtrip' in the sense that a transaction sends out its writeset and has to wait for the GCS to have it serialized (i.e. the GCS sends the message back to the sender node). Then all nodes do the testing and WAL sync independently. (As Neil recently pointed out in [1] this opens a small risk for data loss in the case all nodes crash.) > And I'm afraid that GCS serialisation will need more than one roundtrip > or risk being out-of-date. The spread people did some tests on 20 pentium machines connected via 100mbit ethernet. In [2] they state that it takes between 1.7 to 2.8 ms (depending on the message size) to 'serialize' a message within this group of 20 nodes. > I'm not saying that Postgres-R (or any other sync replication) is not > doable or even useful. I just can't see right away, how it can scale > very well for any significant write load. Sure, sync replication won't solve everybody's problems. But out of all the sync replication algorithms, Postgres-R is my clear favorite. ;-) Regards Markus [1]: http://pgfoundry.org/pipermail/postgres-r-general/2006-August/000001.html [2]: The Spread Toolkit: Architecture and Performance by Yair Amir, Claudiu Danilov, Michal Miskin-Amir, John Schultz, Jonathan Stanton http://www.cnds.jhu.edu/pub/papers/cnds-2004-1.pdf
On Wed, 2006-08-23 at 13:36 +0200, Markus Schiltknecht wrote: > Hannu Krosing wrote: > > But if you have very few writes, then there seems no reason to do sync > > anyway. > > I think there is one: high-availability. A standby-server which can > continue if your primary fails. Of course sync is only needed if you > absolutely cannot effort loosing any committed transaction. > I disagree about high-availability. In fact, I would say that sync replication is trading availability and performance for synchronization (which is a valid tradeoff, but costly). If you have an async system, all nodes must go down for the system to go down. If you have a sync system, if any node goes down the system goes down. If you plan on doing failover, consider this: what if it's not obvious which system is still up? What if the network route between the two systems goes down (or just becomes too slow to replicate over), but clients can still connect to both servers? Then you have two systems that both think that the other system went down, and both start accepting transactions. Now you no longer have replication at all. Regards,Jeff Davis
Hi, Jeff Davis wrote: > I disagree about high-availability. In fact, I would say that sync > replication is trading availability and performance for synchronization > (which is a valid tradeoff, but costly). In a way, replication is for databases systems what RAID1 is for hard drives. Having multiple cluster nodes in sync minimizes the risk of a complete outage due to hardware failure. Thus maximizing availability. Of course, as you say, traded for performance. > If you have an async system, all nodes must go down for the system to go > down. Yes. But it takes only one node to go down to potentially lose committed transactions. In contrast to synchronous replication systems, where a committed transaction is guaranteed to be 'committed on the cluster'. So if at least one node of the cluster is up and running, you can be assured to have consistent data. Please note that the Postgres-R approach does relax some of these constraints a little to gain performance. The most obvious result of these relaxations is that the nodes may 'behind' with replaying transactions and show a past view of the data. > If you have a sync system, if any node goes down the system goes down. That's plain wrong. > If you plan on doing failover, consider this: what if it's not obvious > which system is still up? What if the network route between the two > systems goes down (or just becomes too slow to replicate over), but > clients can still connect to both servers? Then you have two systems > that both think that the other system went down, and both start > accepting transactions. Now you no longer have replication at all. This problem is often called 'network partitioning', which also refers to a more general case: a group of M nodes being split into two groups of N and (M-N) nodes (due to network failure or whatever). In Postgres-R a Group Communication System is used to cover all these aspects (error detection, congruent agreement on a major group, etc..). Regards Markus
pgsql@j-davis.com (Jeff Davis) writes: > On Wed, 2006-08-23 at 13:36 +0200, Markus Schiltknecht wrote: >> Hannu Krosing wrote: >> > But if you have very few writes, then there seems no reason to do sync >> > anyway. >> >> I think there is one: high-availability. A standby-server which can >> continue if your primary fails. Of course sync is only needed if you >> absolutely cannot effort loosing any committed transaction. >> > > I disagree about high-availability. In fact, I would say that sync > replication is trading availability and performance for synchronization > (which is a valid tradeoff, but costly). > > If you have an async system, all nodes must go down for the system to go > down. > > If you have a sync system, if any node goes down the system goes down. > If you plan on doing failover, consider this: what if it's not obvious > which system is still up? What if the network route between the two > systems goes down (or just becomes too slow to replicate over), but > clients can still connect to both servers? Then you have two systems > that both think that the other system went down, and both start > accepting transactions. Now you no longer have replication at all. That is why for multimaster, there's a need for both automatic policy as well as some human intervention. - You need an automatic determination of "quorum", where, to be safe, it is only permissible for a set of $m$ servers tobelieve themselves to be active if they number more than 1/2 of the total of expected servers. Thus, if there are 13 servers in the cluster, then "quorum" is 7 servers. If a set of 6 servers get cut off from the rest of the network, they don't number at least 7, and thus know that they can'trepresent a quorum. - And if conditions change, a human may need to change the quorum number. If 4 new nodes get added, quorum moves up to 9. If 5 nodes get dropped, quorum moves down to 5. Deciding when to throw a node out of the quorum because it is responding too slowly is still not completely trivial, but having a quorum policy does address your issue. -- let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;; http://cbbrowne.com/info/linux.html "Be humble. A lot happened before you were born." - Life's Little Instruction Book
On Thu, 2006-08-24 at 11:18 +0200, Markus Schiltknecht wrote: > Hi, > > Jeff Davis wrote: > > I disagree about high-availability. In fact, I would say that sync > > replication is trading availability and performance for synchronization > > (which is a valid tradeoff, but costly). > > In a way, replication is for databases systems what RAID1 is for hard > drives. Having multiple cluster nodes in sync minimizes the risk of a > complete outage due to hardware failure. Thus maximizing availability. > Of course, as you say, traded for performance. > > > If you have an async system, all nodes must go down for the system to go > > down. > > Yes. But it takes only one node to go down to potentially lose committed > transactions. In contrast to synchronous replication systems, where a > committed transaction is guaranteed to be 'committed on the cluster'. So > if at least one node of the cluster is up and running, you can be > assured to have consistent data. Right, that's the cost of asynchronous replication. > Please note that the Postgres-R approach does relax some of these > constraints a little to gain performance. The most obvious result of > these relaxations is that the nodes may 'behind' with replaying > transactions and show a past view of the data. > > > If you have a sync system, if any node goes down the system goes down. > > That's plain wrong. Ok, maybe not one node, but I don't think I'm totally off base. See my explanation below. > > If you plan on doing failover, consider this: what if it's not obvious > > which system is still up? What if the network route between the two > > systems goes down (or just becomes too slow to replicate over), but > > clients can still connect to both servers? Then you have two systems > > that both think that the other system went down, and both start > > accepting transactions. Now you no longer have replication at all. > > This problem is often called 'network partitioning', which also refers > to a more general case: a group of M nodes being split into two groups > of N and (M-N) nodes (due to network failure or whatever). > > In Postgres-R a Group Communication System is used to cover all these > aspects (error detection, congruent agreement on a major group, etc..). > Which doesn't work very well in the case of two groups of servers set up in two physical locations. I can see two possibilities: (1) You require a quorum to be effective, in which case your cluster of databases is only as reliable as the location which holds more servers. (2) You have another central authority that determines which databases are up, and which are down. Then your cluster is only as reliable as that central authority. Sure, if you have a million groups of servers spread all over the internet, it works with a very high degree of reliability because you can likely always form a quorum. However, you then have horrible performance because the updates need to be spread to so many locations. And for truly synchronous replication you probably have to serialize the updates, which is very costly over that many nodes all over a network. Even if you have a large number of nodes at different locations, then you end up with strange decisions to make if the network connections are intermittent or very slow. A temporary slowdown of many nodes could cause them to be degraded until some kind of human intervention brought them back. Until that time you might not be able to determine which nodes make up an authoritative group. This kind of degradation could happen in the case of a DDoS attack, or perhaps a worm moving around the internet. In practice everyone can find a solution that works for them. However, synchronous replication is not perfect, and there are many failure scenarios which need to be resolved in a way that fits your business. I think synchronous replication is inherently less available than asynchronous. Regards,Jeff Davis
Jeff Davis wrote: > Which doesn't work very well in the case of two groups of servers set up > in two physical locations. I can see two possibilities: > (1) You require a quorum to be effective, in which case your cluster of > databases is only as reliable as the location which holds more servers. > (2) You have another central authority that determines which databases > are up, and which are down. Then your cluster is only as reliable as > that central authority. Right, the ideal here would be two sync clusters a both locations, connected via async replication :-) > Even if you have a large number of nodes at different locations, then > you end up with strange decisions to make if the network connections are > intermittent or very slow. A temporary slowdown of many nodes could > cause them to be degraded until some kind of human intervention brought > them back. Until that time you might not be able to determine which > nodes make up an authoritative group. Side note: in such a case, I think a GCS will just choose only one node to be the 'authoritative group'. Because most systems cannot effort to have long waits for such decisions. For database replication I also think its better to have at least one node running than none. > This kind of degradation could > happen in the case of a DDoS attack, or perhaps a worm moving around the > internet. Well, sync replication in general needs a good, low latency and secure interconnect. The internet does not seem to be a good fit here. > In practice everyone can find a solution that works for them. However, > synchronous replication is not perfect, and there are many failure > scenarios which need to be resolved in a way that fits your business. I > think synchronous replication is inherently less available than > asynchronous. This surely depends on the environment. With a dedicated (i.e. low latency and secure) interconnect sync replication is surely more available because your arguments above don't apply. And because sync replication guarantees you won't loose committed transactions. If however you want or have to replicate over the internet it depends. Your arguments above also apply to async replication. Only that because of the conflict resolution, async replication systems can continue to operate on all the disconnected nodes and merge their work later on as the network is up again. But then again, async still has the danger of loosing transactions. So I probably agree: if you are on an unreliable network and if you have conflict resolution correctly setup then async replication is more available, but less secure. As I said above, sync replication needs a reliable interconnect, better even have two interconnects, because it's a SPOF for a clustered database system. Regards Markus
On Fri, 2006-08-25 at 11:23 +0200, Markus Schiltknecht wrote: > Jeff Davis wrote: > > Which doesn't work very well in the case of two groups of servers set up > > in two physical locations. I can see two possibilities: > > (1) You require a quorum to be effective, in which case your cluster of > > databases is only as reliable as the location which holds more servers. > > (2) You have another central authority that determines which databases > > are up, and which are down. Then your cluster is only as reliable as > > that central authority. > > Right, the ideal here would be two sync clusters a both locations, > connected via async replication :-) > > > Even if you have a large number of nodes at different locations, then > > you end up with strange decisions to make if the network connections are > > intermittent or very slow. A temporary slowdown of many nodes could > > cause them to be degraded until some kind of human intervention brought > > them back. Until that time you might not be able to determine which > > nodes make up an authoritative group. > > Side note: in such a case, I think a GCS will just choose only one node > to be the 'authoritative group'. Because most systems cannot effort to > have long waits for such decisions. For database replication I also > think its better to have at least one node running than none. > > > This kind of degradation could > > happen in the case of a DDoS attack, or perhaps a worm moving around the > > internet. > > Well, sync replication in general needs a good, low latency and secure > interconnect. The internet does not seem to be a good fit here. > > > In practice everyone can find a solution that works for them. However, > > synchronous replication is not perfect, and there are many failure > > scenarios which need to be resolved in a way that fits your business. I > > think synchronous replication is inherently less available than > > asynchronous. > > This surely depends on the environment. With a dedicated (i.e. low > latency and secure) interconnect sync replication is surely more > available because your arguments above don't apply. And because sync > replication guarantees you won't loose committed transactions. > > If however you want or have to replicate over the internet it depends. > Your arguments above also apply to async replication. Only that because > of the conflict resolution, async replication systems can continue to > operate on all the disconnected nodes and merge their work later on as > the network is up again. But then again, async still has the danger of > loosing transactions. > > So I probably agree: if you are on an unreliable network and if you have > conflict resolution correctly setup then async replication is more > available, but less secure. > > As I said above, sync replication needs a reliable interconnect, better > even have two interconnects, because it's a SPOF for a clustered > database system. > Ok, I agree with your statements. Async is convenient in many ways, but has less durable transactions (at least for transactions committed recently). Sync has some limitations, and is harder to get right (at least if you want good availability as well) but provides more durable transactions and consistency between systems. Regards,Jeff Davis