Thread: PostgreSQL crashes with Qmail-SQL
Hi guys, Michael Devogelaere, the guy who writes the QMail-SQL package has recently started moving away from using PostgreSQL to MySQL. In his experience, MySQL works better. Now, he's just tested them over the weekend, and what he reports is that PostgreSQL *crashes*. Doesn't just go slow, but *crashes*. Would someone be able to take a look into this? ??? Regards and best wishes, Justin Clift -------- Original Message -------- Subject: Re: Qmail-SQL Date: Wed, 23 Jan 2002 15:17:12 +0100 From: Michael Devogelaere <michael@digibel.be> To: Justin Clift <justin@postgresql.org> References: <3C48F4B9.AC03E251@postgresql.org> <20020119122728.A9661@digibel.be> > > > > It makes me wonder if the poor performance of PostgreSQL is still > > relevant, and I'm wondering if you tuned the memory size of the > > PostgreSQL database when you tested it. The default memory allocation > > gives really high CPU load and low performance, but this can be adjusted > > much easier now. > I didn't tune anything, but i'll redo my tests this weekend and play a little > with it ;) Ok. I worked a bit on it this weekend and put the results on http://qmail-sql.digibel.be/testing.html. I'm very sorry but postgresql was between 3 and 4 times slower than mysql and didn't survive all tests. Kind regards, Michael Devogelaere. ----------------------------------------------------------------------- Some people have told me they don't think a fat penguin really embodies the grace of Linux, which just tells me they have never seen a angry penguin charging at them in excess of 100mph. They'd be a lot more careful about what they say if they had -- Linus Torvalds
On Thu, 24 Jan 2002, Justin Clift wrote: > Michael Devogelaere, the guy who writes the QMail-SQL package has > recently started moving away from using PostgreSQL to MySQL. > > In his experience, MySQL works better. Now, he's just tested them over > the weekend, and what he reports is that PostgreSQL *crashes*. Doesn't > just go slow, but *crashes*. > > Would someone be able to take a look into this? I went to the page mentioned below and wanted to try grabbing the test scripts to see if I could replicate the crash, but got 404's for the testscripts.tgz link on the page. The speed doesn't surprise me as much as the crash does. > -------- Original Message -------- > Subject: Re: Qmail-SQL > Date: Wed, 23 Jan 2002 15:17:12 +0100 > From: Michael Devogelaere <michael@digibel.be> > To: Justin Clift <justin@postgresql.org> > References: <3C48F4B9.AC03E251@postgresql.org> > <20020119122728.A9661@digibel.be> > > Ok. I worked a bit on it this weekend and put the results on > http://qmail-sql.digibel.be/testing.html. I'm very sorry but > postgresql was between 3 and 4 times slower than mysql and didn't > survive all tests. > > Kind regards, > Michael Devogelaere.
Stephan Szabo wrote: > > On Thu, 24 Jan 2002, Justin Clift wrote: > > > Michael Devogelaere, the guy who writes the QMail-SQL package has > > recently started moving away from using PostgreSQL to MySQL. > > > > In his experience, MySQL works better. Now, he's just tested them over > > the weekend, and what he reports is that PostgreSQL *crashes*. Doesn't > > just go slow, but *crashes*. > > > > Would someone be able to take a look into this? > > I went to the page mentioned below and wanted to try grabbing the test > scripts to see if I could replicate the crash, but got 404's for the > testscripts.tgz link on the page. The speed doesn't surprise me as much > as the crash does. As the doc says, all done totally untuned. And CRASH by itself doesn't say anything. A little more precise would be good. Other than that, once again one of these mostly read only scenarios with simple queries where it is well known that a real database cannot compete. Jan > > > -------- Original Message -------- > > Subject: Re: Qmail-SQL > > Date: Wed, 23 Jan 2002 15:17:12 +0100 > > From: Michael Devogelaere <michael@digibel.be> > > To: Justin Clift <justin@postgresql.org> > > References: <3C48F4B9.AC03E251@postgresql.org> > > <20020119122728.A9661@digibel.be> > > > > Ok. I worked a bit on it this weekend and put the results on > > http://qmail-sql.digibel.be/testing.html. I'm very sorry but > > postgresql was between 3 and 4 times slower than mysql and didn't > > survive all tests. > > > > Kind regards, > > Michael Devogelaere. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Justin Clift wrote: > Hi guys, > > Michael Devogelaere, the guy who writes the QMail-SQL package has > recently started moving away from using PostgreSQL to MySQL. > > In his experience, MySQL works better. Now, he's just tested them over > the weekend, and what he reports is that PostgreSQL *crashes*. Doesn't > just go slow, but *crashes*. > > I tried to grab his scripts but both his "here" links return a 404 Not Found. One interesting point ... he does: * A failing query for the user. * A query for the alias-user. * A query for alias-username in the dotqmails-table. * A query for alias-default in the dotqmails-table. Four queries that most likely can be done with a single query ... gotta wonder about these MySQL-trained wunderkinds and their ability to write decent queries. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
On Wed, Jan 23, 2002 at 04:24:17PM -0800, Stephan Szabo wrote: > > On Thu, 24 Jan 2002, Justin Clift wrote: > > > Michael Devogelaere, the guy who writes the QMail-SQL package has > > recently started moving away from using PostgreSQL to MySQL. That's not entirely true: i'm still using PostgreSQL and i don't even consider moving to MySQL, since i'm using a more complicated database which doesn't run on MySQL. But - sorry to tell it - in my opinion, MySQL is a lot faster for simple queries if one needs to connect/disconnect frequently. I suspect that PostgreSQL connects quite slowly and therefore performs bad in this kind of tests. > > > > Would someone be able to take a look into this? > > I went to the page mentioned below and wanted to try grabbing the test > scripts to see if I could replicate the crash, but got 404's for the > testscripts.tgz link on the page. The speed doesn't surprise me as much > as the crash does. The link is fixed now. Kind regards, Michael Devogelaere.
> As the doc says, all done totally untuned. And CRASH by > itself doesn't say anything. A little more precise would be > good. Ok: the client reported something like: "Unexpected EOF from PostgreSQL-backend". When looking with ps aux, i noted that all postmaster-childs where <defunct>. I couldn't connect anymore with psql (i aborted the test and no other processes tried to access the database since my machine was in single user mode). After killing the master process and restarting, the database worked fine. > > Other than that, once again one of these mostly read only > scenarios with simple queries where it is well known that a > real database cannot compete. True: i planned two tests. One big read-only test and then another which would add simulation of pop-logins. After a successful pop-login the field 'lastlogin' is updated. But i didn't run that test since postgresql already failed the simple read-only test. Kind regards, Michael Devogelaere.
Michael Devogelaere wrote: > But - sorry to tell it - in my opinion, MySQL is a > lot faster for simple queries if one needs to connect/disconnect frequently. > I suspect that PostgreSQL connects quite slowly and therefore performs > bad in this kind of tests. Sure, this is known. Serious applications pool persistent connections, though, making it a non-issue for many of us. -- Don Baccus Portland, OR http://donb.photo.net, http://birdnotes.net, http://openacs.org
Michael Devogelaere wrote: > > As the doc says, all done totally untuned. And CRASH by > > itself doesn't say anything. A little more precise would be > > good. > Ok: the client reported something like: > "Unexpected EOF from PostgreSQL-backend". When looking with ps aux, i noted > that all postmaster-childs where <defunct>. I couldn't connect anymore with > psql (i aborted the test and no other processes tried to access the database > since my machine was in single user mode). After killing the master process and > restarting, the database worked fine. Looks like leftover or not fast enough reaped old connections that fill up all possible backend slots (default max 32). Persistent connections is definitely something that PostgreSQL likes. > > > > Other than that, once again one of these mostly read only > > scenarios with simple queries where it is well known that a > > real database cannot compete. > True: i planned two tests. One big read-only test and then another which would > add simulation of pop-logins. After a successful pop-login the field > 'lastlogin' is updated. But i didn't run that test since postgresql already > failed the simple read-only test. As said, "simple read-only" is not really something you want a full featured RDBMS for. Maybe you are better offwith a simple and stupid system on the feature level of gdbm or MySql. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck wrote: > <snip> > > As said, "simple read-only" is not really something you want > a full featured RDBMS for. Maybe you are better off with a > simple and stupid system on the feature level of gdbm or > MySql. > > Jan I'd agree with this on the query-level functionality... but... Michael, does Qmail-SQL *store* the email in the database? (haven't checked) If so, there's no way I'd want new customer inquiries or other *important* email stored in a system which didn't know how to fully recover if the server crashes. Imagine... 200,000 customer emails in a busy MySQL 3.23.x database, and the UPS power cuts off. Erk... + Justin > > -- > > #======================================================================# > # It's easier to get forgiveness for being wrong than for being right. # > # Let's break this rule - forgive me. # > #================================================== JanWieck@Yahoo.com # > > _________________________________________________________ > Do You Yahoo!? > Get your free @yahoo.com address at http://mail.yahoo.com -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Fri, Jan 25, 2002 at 03:48:40AM +1100, Justin Clift wrote: > Jan Wieck wrote: > > > <snip> > > > > As said, "simple read-only" is not really something you want > > a full featured RDBMS for. Maybe you are better off with a > > simple and stupid system on the feature level of gdbm or > > MySql. > > > > Jan > > I'd agree with this on the query-level functionality... but... > > Michael, does Qmail-SQL *store* the email in the database? (haven't > checked) No way ;) Only the user/authentication-management moved to the database. > > If so, there's no way I'd want new customer inquiries or other > *important* email stored in a system which didn't know how to fully > recover if the server crashes. Neither do i. But i know there exists a patch (qmail seems to consist merely of patches) which stores the mail in a database. Maybe you can use this to "save" helpdesk-calls ;) Regards, Michael.
After reading about the patch, it seems that the database is only used for virtualhost lookups and password/account verification -- it never mentioned doing any more than that... I've just read over the docs once though, so no one take my word as law yet. I'm going to install this later and play with it as I've been looking for a solution like this for a while (though I'm a postfix user, I'd gladly switch if this patch works!). I'll see if I can get PG to crash with it and investigate further.. Just in theory, I don't even trust MySQL to store my usernames and passwords, I've seen it take a dive too many times to use it for much of anything... They've released several versions since I last used it but it was a lot less stable for me than older 6.X versions of PG when the load got a little high... If the patch just does a few simple queries, I'd think something along the lines of mSQL might be nice (though I've never used it, I've heard some nice things about it for tiny databases).. PG's feature set is grossly underused for applications like this... If I do use it I'll probably install another copy of PG and turn down the sort mem and such to get a little better scalability -- spawning a new PG process every time someone checks their mail is going to cost me dearly with the way my PG is setup now.. Well, we'll see how it goes. -Mitch > > As said, "simple read-only" is not really something you want > > a full featured RDBMS for. Maybe you are better off with a > > simple and stupid system on the feature level of gdbm or > > MySql. > > > > Jan > > I'd agree with this on the query-level functionality... but... > > Michael, does Qmail-SQL *store* the email in the database? (haven't > checked) > > If so, there's no way I'd want new customer inquiries or other > *important* email stored in a system which didn't know how to fully > recover if the server crashes. > > Imagine... 200,000 customer emails in a busy MySQL 3.23.x database, and > the UPS power cuts off.
Michael Devogelaere <michael@digibel.be> writes: > Ok: the client reported something like: > "Unexpected EOF from PostgreSQL-backend". What showed up in the postmaster log when this happened? I would like *exact* error message texts, not approximations. > When looking with ps aux, i noted > that all postmaster-childs where <defunct>. I couldn't connect anymore with > psql What happened when you tried to connect with psql? Again, exact, not approximate. It sounds like the postmaster got into a state where it was not responding to SIGCHLD signals. We fixed one possible cause of that between 7.1 and 7.2, but without a more concrete report I have no way to know if you saw the same problem or a different one. I'd have expected connection attempts to unwedge the postmaster in any case. regards, tom lane
On Thu, Jan 24, 2002 at 01:11:39PM -0500, Tom Lane wrote: > Michael Devogelaere <michael@digibel.be> writes: > > Ok: the client reported something like: > > "Unexpected EOF from PostgreSQL-backend". > > What showed up in the postmaster log when this happened? I would like > *exact* error message texts, not approximations. Nothing. I disabled all logging since the database responded too slowly with logging turned on. So i cannot help you on this. > > > When looking with ps aux, i noted > > that all postmaster-childs where <defunct>. I couldn't connect anymore with > > psql > > What happened when you tried to connect with psql? Again, exact, not > approximate. psql: connectDBStart() -- connect() failed: No such file or directoryIs the postmaster running locallyand accepting connectionon Unix socket ... Kind regards, Michael Devogelaere.
Michael Devogelaere <michael@digibel.be> writes: > On Thu, Jan 24, 2002 at 01:11:39PM -0500, Tom Lane wrote: >> What showed up in the postmaster log when this happened? I would like >> *exact* error message texts, not approximations. > Nothing. I disabled all logging since the database responded too slowly > with logging turned on. So i cannot help you on this. If you're not going to be cooperative, then I don't see how you expect us to fix the problem. FWIW, I don't believe for a moment that /dev/null'ing the postmaster log improves performance measurably. I've done plenty of profiling in my time, and never seen any indication that it's an issue; at least not at the default verbosity level. >> What happened when you tried to connect with psql? Again, exact, not >> approximate. > psql: connectDBStart() -- connect() failed: No such file or directory > Is the postmaster running locally > and accepting connection on Unix socket ... No such file?? Hard to believe that that could happen while the postmaster was still running. Unless something else had decided to delete the socket file from /tmp. The postmaster certainly would not do it. regards, tom lane
Tom Lane wrote: > Michael Devogelaere <michael@digibel.be> writes: > > On Thu, Jan 24, 2002 at 01:11:39PM -0500, Tom Lane wrote: > >> What showed up in the postmaster log when this happened? I would like > >> *exact* error message texts, not approximations. > > Nothing. I disabled all logging since the database responded too slowly > > with logging turned on. So i cannot help you on this. > > If you're not going to be cooperative, then I don't see how you expect > us to fix the problem. > > FWIW, I don't believe for a moment that /dev/null'ing the postmaster log > improves performance measurably. I've done plenty of profiling in my > time, and never seen any indication that it's an issue; at least not at > the default verbosity level. > > >> What happened when you tried to connect with psql? Again, exact, not > >> approximate. > > psql: connectDBStart() -- connect() failed: No such file or directory > > Is the postmaster running locally > > and accepting connection on Unix socket ... > > No such file?? Hard to believe that that could happen while the > postmaster was still running. Unless something else had decided to > delete the socket file from /tmp. The postmaster certainly would not > do it. Haven't there been some over enthusiastic cleanup scripts in some Linux distro's, that removed the socket from /tmp because of it's age? Anyway, so in summary: 1. The test case was the *well known* MySQL favorite suite; Simple one-table read-only access with myriads of connect's. 2. The *well known* fact that PostgreSQL out of the box is not configured for production was ignored. 3. Any possibility to track down the reasons for problems was disabled. 4. Instead of investigating what the problem is, PostgreSQL was reported to *Crash*. It cannot get any more obvious. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Hi Tom, Tom Lane wrote: > <snip> > If you're not going to be cooperative, then I don't see how you expect > us to fix the problem. Hey, lets not start a war here! Michael's been using PostgreSQL for a while, but hasn't had a chance to really get into it. When he makes a mistake like this, it's not because of evil intentions! <snip> > No such file?? Hard to believe that that could happen while the > postmaster was still running. Unless something else had decided to > delete the socket file from /tmp. The postmaster certainly would not > do it. This provides an interesting lead. There's at least one linux distribution which does this. Part of the cron'd maintenance scripts delete all the files in /tmp, and therefore play havoc.... But, I don't remember which distribution, although I don't think it was RedHat. ??? Regards and best wishes, Justin Clift > > regards, tom lane -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Justin Clift <justin@postgresql.org> writes: >> No such file?? Hard to believe that that could happen while the >> postmaster was still running. Unless something else had decided to >> delete the socket file from /tmp. The postmaster certainly would not >> do it. > This provides an interesting lead. There's at least one linux > distribution which does this. Part of the cron'd maintenance scripts > delete all the files in /tmp, and therefore play havoc.... Yeah, I do recall that some versions had a tmp-scrubber that didn't make any exception for socket files. But it's kind of a big coincidence to assume that would happen just while Michael was running his benchmark. Not sure I credit it. regards, tom lane
I said: > Yeah, I do recall that some versions had a tmp-scrubber that didn't make > any exception for socket files. But it's kind of a big coincidence to > assume that would happen just while Michael was running his benchmark. ... or maybe not. I just looked back at Michael's benchmark page and observed that the extrapolated time to complete the run in question was over 24 hours (and the first two parts of the script would've taken more than 12). If he'd left the machine alone for a couple of days while the script ran, maybe it's credible that a /tmp-scrubber did its thing meanwhile. That still leaves us with all the defunct postmaster children to explain though. Hmm. I wonder exactly what the postmaster does when someone forcibly removes its socket file... probably system-dependent, but I could certainly believe getting into a busy-wait loop of select/accept. That doesn't look like it should prevent SIGCHLD from getting noticed, though. regards, tom lane
Tom Lane wrote: > Justin Clift <justin@postgresql.org> writes: > >> No such file?? Hard to believe that that could happen while the > >> postmaster was still running. Unless something else had decided to > >> delete the socket file from /tmp. The postmaster certainly would not > >> do it. > > > This provides an interesting lead. There's at least one linux > > distribution which does this. Part of the cron'd maintenance scripts > > delete all the files in /tmp, and therefore play havoc.... > > Yeah, I do recall that some versions had a tmp-scrubber that didn't make > any exception for socket files. But it's kind of a big coincidence to > assume that would happen just while Michael was running his benchmark. > Not sure I credit it. We added some PostgreSQL code to touch the socket file during checkpoints, and I thought that was in 7.1. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > We added some PostgreSQL code to touch the socket file during > checkpoints, and I thought that was in 7.1. You're thinking about the socket lock file, which is a plain file. The problem with socket files is that the file mod time usually doesn't change even when it's in active use. That's why things like /tmp-scrubbers need to make an exception for socket files. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > We added some PostgreSQL code to touch the socket file during > > checkpoints, and I thought that was in 7.1. > > You're thinking about the socket lock file, which is a plain file. > > The problem with socket files is that the file mod time usually doesn't > change even when it's in active use. That's why things like > /tmp-scrubbers need to make an exception for socket files. Hard to imagine how X11 runs in such a case. Does it not go into /tmp/.X11_unix? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Justin Clift writes: > This provides an interesting lead. There's at least one linux > distribution which does this. Part of the cron'd maintenance scripts > delete all the files in /tmp, and therefore play havoc.... Deleting a socket file doesn't create havoc, it simply means you can't connect anymore. -- Peter Eisentraut peter_e@gmx.net
> > > psql: connectDBStart() -- connect() failed: No such file or directory > > > Is the postmaster running locally > > > and accepting connection on Unix socket ... > > > > No such file?? Hard to believe that that could happen while the > > postmaster was still running. Unless something else had decided to > > delete the socket file from /tmp. The postmaster certainly would not > > do it. > Haven't there been some over enthusiastic cleanup scripts in > some Linux distro's, that removed the socket from /tmp > because of it's age? The system was running in single user mode when running the tests (i think i already mentioned that). This was done to prevent the results from being influenced by daily/hourly cronjobs such as logrotate. The symptom you're probably referring too is the daily run of 'tmpwatch': - since cron was disabled, tmpwatch didn't run - tmpwatch cleans empty directories and regular files. Sockets are leaved unchanged. > > Anyway, so in summary: > > 1. The test case was the *well known* MySQL favorite suite; > Simple one-table read-only access with myriads of > connect's. That's correct: qmail-getpw is called myriads of times: each time a mail needs to be delivered locally. I don't care about whether this favours one database or not: it's what happens on a busy mailserver. > > 2. The *well known* fact that PostgreSQL out of the box is > not configured for production was ignored. From http://qmail-sql.digibel.be/testing.html: The first testround uses the database-configurations as shipped by Red Hat: postgresql 7.1.3 and mysql 3.23.41. No additional performance tuning was performed. IMHO i didn't ignore the lack of tuning. And i DO want to tune it, but it seemed fair to me to start the test with the default configuration as shipped by RedHat and then investigate the results of different tunings. But the next tests were skipped since postgresql crashed. > > 3. Any possibility to track down the reasons for problems > was disabled. My first idea was to log connects,queries and disconnects: i imagine most database-admins turned logging on in their databases. With mysql it runs these things when you turn on 'log=/var/log/..'. I configured postgresql to log the same and ran 1 'qmail-getpw' testrun: - with logging: 112 seconds. - without logging: 32 seconds. I suspect this has to with the fact that i configured postgresql to log to syslog. Don't blame me for this: it's the way RedHat ships postgresql. To make things fair i disabled logging on both databases (mysql- performance is not affected by logging, but it doesn't use syslog). This also excluded vital debug-messages. Frankly: i didn't expect postgresql to crash but i'll turn them on a next time. > > 4. Instead of investigating what the problem is, PostgreSQL > was reported to *Crash*. Yes: it *crashed*. Since i disabled all debugging i cannot help you with investigating this problem. I hope i won't get the death penalty for this ;) > > It cannot get any more obvious. Please elaborate. Michael.
I said: > That still leaves us with all the defunct postmaster children to explain > though. Hmm. I wonder exactly what the postmaster does when someone > forcibly removes its socket file... probably system-dependent, but I > could certainly believe getting into a busy-wait loop of select/accept. > That doesn't look like it should prevent SIGCHLD from getting noticed, > though. On Linux (at least RH 7.2), the answer to what happens when the socket file is removed is: nothing. Clients can't connect anymore, but the postmaster gets no error indicating that anything is wrong. So it sits. And that means that the 7.1-to-7.2 change I mentioned before is relevant. In 7.1, the SIGCHLD signal handler blocked signals at its beginning, and didn't think to unblock them on exit. So after servicing one SIGCHLD interrupt, the postmaster would end up sitting at its select() with signals blocked. Further SIGCHLDs would not get serviced until the next spin around the outer loop re-enabled interrupts. Normally, no big deal, but with no new connection requests coming in, the postmaster wouldn't ever get around to wait()ing for its last few children. (7.2 re-enables signals at exit from the handler, so I don't think it will show this problem; and indeed I don't see any zombies after "rm /tmp/.s.PGSQL.5432" during a run of Michael's benchmark script with 7.2. Not incidentally, I do observe a complete lack of any complaints out of the benchmark script; it keeps flailing along without any sign that all its database connection attempts are failing.) In short: all the reported facts can be explained by the theory that *something* removed the socket file during that long test run. regards, tom lane
Peter Eisentraut wrote: > > Justin Clift writes: > > > This provides an interesting lead. There's at least one linux > > distribution which does this. Part of the cron'd maintenance scripts > > delete all the files in /tmp, and therefore play havoc.... > > Deleting a socket file doesn't create havoc, it simply means you can't > connect anymore. Sorry Peter, I wasn't being entirely accurate, more describing that Something Not Wanted Happens. I'll try to be more accurate next time. :) + Justin > > -- > Peter Eisentraut peter_e@gmx.net -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Michael Devogelaere wrote: > > > > 4. Instead of investigating what the problem is, PostgreSQL > > was reported to *Crash*. > Yes: it *crashed*. Since i disabled all debugging i cannot help you > with investigating this problem. I hope i won't get the death penalty > for this ;) > > > > It cannot get any more obvious. > Please elaborate. I hope you don't take any of my comments personal. Because they are not! It is just that I am tired and bored of these every so often repeated MySQL optimized "benchmarks". I see a clear difference between a database server process crash and a disabled service caused by misbehaving sysadmin scripts and/or bad service because of contra-optimized client behaviour. This is exactly the same style of reporting crashes or bad performace, the MySQL folks have practiced for years.I remember creating and dropping tables a couple thousand times, then VACUUM with a user that doesn't havepermissions to vacuum system catalogs, and report bad performance because the system cache got successfully screwedup ... there was even a comment in the script saying "this makes Postgres slow" ... haha. Other reported *crashes* have been core dumps of the test-scripts, because Postgres dealt with datums bigger than the perl-clientwas able to swallow ... well, the test driver just reported a crash, not exactly where and why, does that really matter? MySQL shows success and Postgres does not, that's what counts. The lowest level still accepted Transaction Processing Council Benchmark, TPC-C, can be implemented with a SUTusing MySQL. Do it using LAMP, if you want to learn what a database crash is ;-) There is a good reason why TPC has abandoned the TPC-1, TPC-A and TPC-B benchmarks. They are "too simple" to be of any meaning for benchmarking purposes these days. Yet all the stuff this huge crowd of MySQL-Lemmings is constantly babbling about is even more simple than that! They all have their reasons, the TPC members (basicallyall serious RDBMS vendors) on one side as well as the MySQL folks on the other. As a matter of fact, the MySQL folks are alone with their point of view that "beeing fastest on a single-table-select" is the most important criteria for a relational database management system. And as anothermatter of fact, Lemmings get what Lemmings deserve, MySQL! Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Hello again, I've rerun the tests and got: NOTICE: Message from PostgreSQL backend:The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory.I have rolled back the current transaction and am going to terminate your database system connection and exit.Please reconnect to the database system and repeat your query. Nothing showed up in the logs however. This time i aborted the test after the first errormessages and could reconnect to the database ! I don't know if this has to do with the number of <defunct> postmasters: there only showed up 2 now and not a few screens as in the first crash. When typing 'ps aux' a few minutes later they were all gone ! Kind regards, Michael Devogelaere.
On Fri, Jan 25, 2002 at 10:20:01AM +0100, Michael Devogelaere wrote: > I've rerun the tests and got: > NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend died abnormally > and possibly corrupted shared memory. > I have rolled back the current transaction and am going to terminate > your database system connection and exit. > Please reconnect to the database system and repeat your query. > > Nothing showed up in the logs however. This time i aborted the test after the > first errormessages and could reconnect to the database ! I don't know if this > has to do with the number of <defunct> postmasters: there only showed up 2 now > and not a few screens as in the first crash. When typing 'ps aux' a few minutes > later they were all gone ! I know nothing about the reasons of your crash, if the message given above suffices to determine the reason and how to configure the database to get more reasonable log messages (actually about the last point I know something, but you can read it at * http://developer.postgresql.org/docs/postgres/install-procedure.html * http://developer.postgresql.org/docs/postgres/runtime-config.html#LOGGING so I won't repeat that stuff here), but I know that for scenarios like yours connection pooling was invented. Connection pooling avoids the creation of a new backend process for each single query. Did you ever try your test with connection pooling ? Real databases *require* connection pooling in such a case, MySQL or file systems don't. (Actually when considering to use MySQL, why not using ReiserFS. It's also kind of a database with hashs and all this stuff ;-) I know that another mail in this thread already mentioned connection pooling, but I don't remember any answer from your side. Maybe it's worth an attempt. -- Holger Krug hkrug@rationalizer.com
At 11:26 AM 1/25/02 +0100, Holger Krug wrote: >so I won't repeat that stuff here), but I know that for scenarios like >yours connection pooling was invented. Connection pooling avoids the >creation of a new backend process for each single query. Did you ever >try your test with connection pooling ? Real databases *require* >connection pooling in such a case, MySQL or file systems If the database crashes are not due to resource limits, connection pooling does not seem to be the real solution. If postgresql crashes after X concurrent backends are respawned Y times I figure something is wrong and should be fixed. Regards, Link.
On Fri, Jan 25, 2002 at 08:46:24PM +0800, Lincoln Yeoh wrote: > If the database crashes are not due to resource limits, connection pooling > does not seem to be the real solution. That's clear. But the other part of his problem (`bad performance') suffers from his setup (no connection pooling). -- Holger Krug hkrug@rationalizer.com
Lincoln Yeoh wrote: > At 11:26 AM 1/25/02 +0100, Holger Krug wrote: > >so I won't repeat that stuff here), but I know that for scenarios like > >yours connection pooling was invented. Connection pooling avoids the > >creation of a new backend process for each single query. Did you ever > >try your test with connection pooling ? Real databases *require* > >connection pooling in such a case, MySQL or file systems > > If the database crashes are not due to resource limits, connection pooling > does not seem to be the real solution. The crash he reported this time looks like a backend dumping core. I wonder how he killed the postmaster the last time and if he by doing it with -9 corrupted the database? The entire discussion is somehow pointless. Tell some Riksha- puller to compare his Riksha with this brand new Ferrari,and wait his comments after the test drive. He'll probably won't get the damn thing moving, and if, it'd bea hell of a ride, so he will tell you that his Riksha has a much better handling and the Ferrari *crashed*. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
> > The crash he reported this time looks like a backend dumping > core. I wonder how he killed the postmaster the last time > and if he by doing it with -9 corrupted the database? He killed the database with -9, rebooted the machine, removed /var/lib/pgsql and created a new database from scratch. Then he populated the database again and ran the test again. > > The entire discussion is somehow pointless. Tell some Riksha- > puller to compare his Riksha with this brand new Ferrari, and > wait his comments after the test drive. He'll probably won't > get the damn thing moving, and if, it'd be a hell of a ride, > so he will tell you that his Riksha has a much better > handling and the Ferrari *crashed*. He doesn't have a Riksha or a Ferrari. But he would hate it if the doors fell of his Ferrari after opening and closing them a few times ;) Michael.
Michael Devogelaere wrote: > > > > The crash he reported this time looks like a backend dumping > > core. I wonder how he killed the postmaster the last time > > and if he by doing it with -9 corrupted the database? > He killed the database with -9, rebooted the machine, removed /var/lib/pgsql > and created a new database from scratch. Then he populated the database again > and ran the test again. So you did a new initdb? And then you got a core dump. Ewe, that doesn't seem right. The test case is still on your web page? I think I should download it and take a closer look. > He doesn't have a Riksha or a Ferrari. But he would hate it if the doors > fell of his Ferrari after opening and closing them a few times ;) That's the problem with Ferrari. What makes them expensive is that you need 3 cars. The Ferrari, a useful reliable one,and the car for the mechanic. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
On Fri, 25 Jan 2002, Michael Devogelaere wrote: > Hello again, > > I've rerun the tests and got: > NOTICE: Message from PostgreSQL backend: > The Postmaster has informed me that some other backend died abnormally > and possibly corrupted shared memory. > I have rolled back the current transaction and am going to terminate > your database system connection and exit. > Please reconnect to the database system and repeat your query. > > Nothing showed up in the logs however. This time i aborted the test after the > first errormessages and could reconnect to the database ! I don't know if this > has to do with the number of <defunct> postmasters: there only showed up 2 now > and not a few screens as in the first crash. When typing 'ps aux' a few minutes > later they were all gone ! Hmm, did you get any core files in the database directory? And since I don't know what the test does and I'm not sure if locale is enabled in the default red hat package, as a low probability thing, you do have a version of glibc >= 2.2.3? As a side question, how far do you need to install qmail to use the test? I'd gotten the test files to run against a 7.1 server I set up, but didn't feel comfortable doing a full install of qmail if possible.
Michael Devogelaere <michael@digibel.be> writes: >> The crash he reported this time looks like a backend dumping >> core. I wonder how he killed the postmaster the last time >> and if he by doing it with -9 corrupted the database? > He killed the database with -9, rebooted the machine, removed /var/lib/pgsql > and created a new database from scratch. Then he populated the database again > and ran the test again. In any case, the original report mentioned something that looked suspiciously like a backend crash: : Ok: the client reported something like: "Unexpected EOF from : PostgreSQL-backend". When looking with ps aux, i noted that all : postmaster-childs where <defunct>. I couldn't connect anymore with psql : (i aborted the test and no other processes tried to access the database : since my machine was in single user mode). After killing the master : process and restarting, the database worked fine. The disappearing-socket-file theory explains most of this, but (AFAICS) not the "unexpected EOF" message. That was why I was pressing Michael for more details to begin with. I've been running Michael's test script on an RH 7.2 box since yesterday afternoon, but have yet to reproduce any failure (other than the expected symptoms from manually removing the socket file). regards, tom lane
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > As a side question, how far do you need to install qmail to use the test? > I'd gotten the test files to run against a 7.1 server I set up, but didn't > feel comfortable doing a full install of qmail if possible. I didn't want to do that either. I managed to get things running by unpacking the rpm, unpacking the qmail-1.03.tar.gz original sources, applying the qmail-sql-0.19.patch (and not any of the other ones, which might've been a mistake), then setting up the three needed config files by hand: $ cat sqltype.h /* Uncomment to choose postgresql */ #define SQLTYPE PGSQLTYPE /* Uncomment to choose mysql */ // #define SQLTYPE MYSQLTYPE $ cat sql.headers -I/usr/include/pgsql $ cat sql.lib -lpq $ and then I was able to do "make qmail-getpw" which is the only executable you need for the benchmark. Put that somewhere, copy the sqlserver.sample file to where qmail-getpw will look for it [1] and edit to taste, and you're set. Caution: I wasted some time running "benchmarks" that proved only to be exercising how fast the client could fail. qmail-getpw's approach to error handling seems to be (a) don't bother testing for very many error conditions (eg, it coredumps on an empty sqlserver control file), and (b) if it does detect a failure, exiting with a nonzero error code is a sufficient way of reporting it. Error messages are for wimps, apparently. So don't bother running the querydb script until you've made *sure* qmail-getpw is working. After running the initdb script (watch out for name conflicts with ours!) you should get $ ./qmail-getpw alias0 domain0 alias00existance_irrelevant_for_testing-alias00trash@devnull.biz$ if all is well. Lack of output means you have a problem. regards, tom lane [1] I didn't bother making the expected /var/qmail/control/sqlserver, but just put the sqlserver file in subdirectory control of where I was running the tests. Works fine since qmail-getpw doesn't check whether its chdir("/var/qmail") succeeds. [2] No, I don't think I'll be trusting my email to this thing real soon. But I'd like to know whether it really is provoking a PG crash.
On Fri, 25 Jan 2002, Tom Lane wrote: > [2] No, I don't think I'll be trusting my email to this thing real soon. > But I'd like to know whether it really is provoking a PG crash. One thing I've missed in this thread is where qmail-sql is. Since I run nothing but qmail it's already installed on all of my machines and I just happen to have a 7.2b5 system pretty much sitting idle. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo atPop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
On Fri, Jan 25, 2002 at 12:30:17PM -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I didn't want to do that either. I managed to get things running by > unpacking the rpm, unpacking the qmail-1.03.tar.gz original sources, > applying the qmail-sql-0.19.patch (and not any of the other ones, > which might've been a mistake), then setting up the three needed config > files by hand: I run qmail without any of the patches and it works just fine. There could be something about the sql patch that requires some other patch, but hopefully that would have been mentioned in the patch. > Caution: I wasted some time running "benchmarks" that proved only > to be exercising how fast the client could fail. qmail-getpw's > approach to error handling seems to be (a) don't bother testing for > very many error conditions (eg, it coredumps on an empty sqlserver > control file), and (b) if it does detect a failure, exiting with a This is probably something caused by the patch. DJB's stuff doesn't coredump. > nonzero error code is a sufficient way of reporting it. Error messages > are for wimps, apparently. So don't bother running the querydb script > until you've made *sure* qmail-getpw is working. After running the > initdb script (watch out for name conflicts with ours!) you should get This (modifying qmail-getpw) seems like a poor way to do things. You could probably do the same thing using a script that queries the database and writes output that can be processed by unpatched qmail-pw2u (if you want some stuff to come from /etc/passwd) and qmail-newu. I use these (but not the sql stuff) since only a couple of accounts should get email and I have a couple of lists set up as subusers of my normal account to simplify maintainance (i.e. the list files are owned by me instead of their own accounts). The idea of using a database to help handle local delivery is also bad. cdb is much faster for typical kinds of usage (users change MUCH less often than email arrives). It would make a lot more sense to keep the authoritative information in a database and resync the cdb information after a change.
> Caution: I wasted some time running "benchmarks" that proved only > to be exercising how fast the client could fail. qmail-getpw's > approach to error handling seems to be (a) don't bother testing for > very many error conditions (eg, it coredumps on an empty sqlserver > control file), and (b) if it does detect a failure, exiting with a > nonzero error code is a sufficient way of reporting it. Error messages > are for wimps, apparently. (b) is part of the qmail strategy - qmail is implemented as a set of independent processes with different owners and rights and they communicate problems through standard exit codes. We can agree that it should be more forthcoming with meaningful help for people setting up the system, but it can't just write an message to STDOUT because its caller has probably already set up a pipe to another process - any error message would normally find itself inserted into the mail queue! (a), in contrast, is definitely not normal for qmail and is the type of thing that should be fixed... even if all it does after detecting a problem is return with a nonzero error code. :-) Bear
Bear Giles <bear@coyotesong.com> writes: > > Caution: I wasted some time running "benchmarks" that proved only > > to be exercising how fast the client could fail. qmail-getpw's > > approach to error handling seems to be (a) don't bother testing for > > very many error conditions (eg, it coredumps on an empty sqlserver > > control file), and (b) if it does detect a failure, exiting with a > > nonzero error code is a sufficient way of reporting it. Error messages > > are for wimps, apparently. > > (b) is part of the qmail strategy - qmail is implemented as a set > of independent processes with different owners and rights and they > communicate problems through standard exit codes. > > We can agree that it should be more forthcoming with meaningful > help for people setting up the system, but it can't just write an > message to STDOUT because its caller has probably already set up > a pipe to another process - any error message would normally find > itself inserted into the mail queue! Gaah. Has djb ever heard of syslog(3)? Or is that too insecure for him? -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Doug McNaught <doug@wireboard.com> writes: > Bear Giles <bear@coyotesong.com> writes: >> We can agree that it should be more forthcoming with meaningful >> help for people setting up the system, but it can't just write an >> message to STDOUT because its caller has probably already set up >> a pipe to another process - any error message would normally find >> itself inserted into the mail queue! > Gaah. Has djb ever heard of syslog(3)? Or stderr? There's a good reason why Unix has both stdout and stderr as part of the standard process model. stderr is for human-readable error messages. In a noninteractive situation you can send it to /dev/null, if you don't believe in logging; but when a human is running a program it's polite to say something on stderr before going belly-up. regards, tom lane
On Fri, Jan 25, 2002 at 11:01:23AM -0700, Bear Giles <bear@coyotesong.com> wrote: > > (a), in contrast, is definitely not normal for qmail and is the type > of thing that should be fixed... even if all it does after detecting > a problem is return with a nonzero error code. :-) This was probably caused by an error in the patch rather than the original code. Unless they have a very high rate of change for accounts, it seems that it would have been better to not use the database directly for account information. When stuff changed in the database, the cdb file could be rebuilt using stock qmail tools.
On 1 Feb 2002, Doug McNaught wrote: > Bear Giles <bear@coyotesong.com> writes: > > > > Caution: I wasted some time running "benchmarks" that proved only > > > to be exercising how fast the client could fail. qmail-getpw's > > > approach to error handling seems to be (a) don't bother testing for > > > very many error conditions (eg, it coredumps on an empty sqlserver > > > control file), and (b) if it does detect a failure, exiting with a > > > nonzero error code is a sufficient way of reporting it. Error messages > > > are for wimps, apparently. > > > > (b) is part of the qmail strategy - qmail is implemented as a set > > of independent processes with different owners and rights and they > > communicate problems through standard exit codes. > > > > We can agree that it should be more forthcoming with meaningful > > help for people setting up the system, but it can't just write an > > message to STDOUT because its caller has probably already set up > > a pipe to another process - any error message would normally find > > itself inserted into the mail queue! > > Gaah. Has djb ever heard of syslog(3)? Or is that too insecure for > him? Insecure and slow. Check out multilog which is part of daemontools. http://cr.yp.to/daemontools/multilog.html Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo atPop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
On Fri, Feb 01, 2002 at 01:58:33PM -0500, Doug McNaught <doug@wireboard.com> wrote: > Bear Giles <bear@coyotesong.com> writes: > > Gaah. Has djb ever heard of syslog(3)? Or is that too insecure for > him? Its called multilog and he feels it is much better than syslog. Instead of using the network to send data around, a pipe is used. multilog also does the log rotation (with configurable sizes) so that you don't have to worry about that.
Vince Vielhaber <vev@michvhf.com> writes: >> Gaah. Has djb ever heard of syslog(3)? Or is that too insecure for >> him? > Insecure and slow. Check out multilog which is part of daemontools. > http://cr.yp.to/daemontools/multilog.html Hmm, should we (do we?) have a pointer to this in our docs somewhere? I know I wrote a section of the Admin Guide that just handwaves about where to get log-rotation scripts. regards, tom lane
On Fri, 1 Feb 2002, Tom Lane wrote: > Vince Vielhaber <vev@michvhf.com> writes: > >> Gaah. Has djb ever heard of syslog(3)? Or is that too insecure for > >> him? > > > Insecure and slow. Check out multilog which is part of daemontools. > > http://cr.yp.to/daemontools/multilog.html > > Hmm, should we (do we?) have a pointer to this in our docs somewhere? > I know I wrote a section of the Admin Guide that just handwaves about > where to get log-rotation scripts. I don't believe so. Lemme look and see where it might fit best. Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo atPop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================