Thread: 2nd update on TOAST
Uh oh - on one side I'm a happy camper. pg_dump already ignored pg_* tables and since the TOAST tables are named pg_toast_..., nothing to be done. I loaded my test DB with TOAST entries, dumped and restored it. Anything is there,works perfectly. But then I added the ALTER TABLE for unlimited rewrite rule size to initdb and the problems started. I can createa table with 500+ attributes. Also I can create a view on it (the rules size is 170K - whow). Anything workspretty well, just a pg_dump output is garbage. Seems the dynamic string buffers used in pg_dump aren't as bullet proof as they should. I'm still busy with other things, so can someone please take a look on it? Attached is an SQL script that creates the table and the view that I cannot dump. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
JanWieck@t-online.de (Jan Wieck) writes: > size to initdb and the problems started. I can create a table > with 500+ attributes. Also I can create a view on it (the > rules size is 170K - whow). Anything works pretty well, just > a pg_dump output is garbage. Are you using current sources? Bruce committed Philip Warner's pg_dump rewrite a day or so ago (which I thought was way premature, but anyway...). Just want to know which pg_dump we're talking about. regards, tom lane
At 22:01 5/07/00 +0200, Jan Wieck wrote: > > Seems the dynamic string buffers used in pg_dump aren't as > bullet proof as they should. I'm still busy with other > things, so can someone please take a look on it? Attached is > an SQL script that creates the table and the view that I > cannot dump. > If the TOAST patch on the FTP site the most recent? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 17:18 5/07/00 -0400, Tom Lane wrote: >JanWieck@t-online.de (Jan Wieck) writes: >> size to initdb and the problems started. I can create a table >> with 500+ attributes. Also I can create a view on it (the >> rules size is 170K - whow). Anything works pretty well, just >> a pg_dump output is garbage. > >Are you using current sources? Bruce committed Philip Warner's >pg_dump rewrite a day or so ago (which I thought was way premature, >but anyway...). Just want to know which pg_dump we're talking about. It's good that any bugs had a chnace to come out ASAP. However, I'd be interested to know what actually happens: I do recall that pg_dump (both versions) have seomething like '#define COPY_BUFFER_SIZE 8192'; if I were to start looking, that's where I'd go first. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 12:50 6/07/00 +1000, Philip Warner wrote: >> >>Are you using current sources? Bruce committed Philip Warner's >>pg_dump rewrite a day or so ago (which I thought was way premature, >>but anyway...). Just want to know which pg_dump we're talking about. > >It's good that any bugs had a chnace to come out ASAP. > >However, I'd be interested to know what actually happens: I do recall that >pg_dump (both versions) have seomething like '#define COPY_BUFFER_SIZE >8192'; if I were to start looking, that's where I'd go first. > Further to this, looking at the code, it now uses 'archputs', which replaces 'fputs', to output the copy buffer (assuming the error is in the copy, not while dumping the definitions); if PQgetline fills the entire buffer (no trailing \0), then I could imagine both would croak. The simplest test would be to tell PQgetline that the buffer size if 1 byte smaller, and see if it fixes the problem. I'd be interested to hear from someone...at least to know a little more of the circumstaces of the error. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 13:15 6/07/00 +1000, Philip Warner wrote: >At 12:50 6/07/00 +1000, Philip Warner wrote: >>> >>>Are you using current sources? Bruce committed Philip Warner's >>>pg_dump rewrite a day or so ago (which I thought was way premature, >>>but anyway...). Just want to know which pg_dump we're talking about. >> OK, I've built from the latest CVS, run initdb etc, created a new DB, created the megatable & megaview, and done a pg_dump. It works, except that I introduced a but that caused the view to be dumped as a table as well. I will keep looking, and submit a patch shortly. In the mean time, if anyone can reproduce Jan's problem, that would help... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > At 22:01 5/07/00 +0200, Jan Wieck wrote: > > > > Seems the dynamic string buffers used in pg_dump aren't as > > bullet proof as they should. I'm still busy with other > > things, so can someone please take a look on it? Attached is > > an SQL script that creates the table and the view that I > > cannot dump. > > > > If the TOAST patch on the FTP site the most recent? No. It's all in the current CVS tree. I removed that patch already. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
At 11:09 6/07/00 +0200, Jan Wieck wrote: > > No. It's all in the current CVS tree. I removed that patch > already. > OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a 'make distclean' then a 'make' & 'make install' again, and now postmaster wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the process dies so I cant get a backtrace. I have now rebuilt with SYSLOG support, and get the following: Jul 6 20:54:54 Cerberus2 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000038 Jul 6 20:54:54 Cerberus2 kernel: current->tss.cr3 = 02467000, %cr3 = 02467000 Jul 6 20:54:54 Cerberus2 kernel: *pde = 00000000 Jul 6 20:54:54 Cerberus2 kernel: Oops: 0000 Jul 6 20:54:54 Cerberus2 kernel: CPU: 0 Jul 6 20:54:54 Cerberus2 kernel: EIP: 0010:[fcntl_setlk+327/404] Jul 6 20:54:54 Cerberus2 kernel: EFLAGS: 00000202 Jul 6 20:54:54 Cerberus2 kernel: eax: 00000000 ebx: c15485b0 ecx: c2494000 edx: c0d49a50 Jul 6 20:54:54 Cerberus2 kernel: esi: bffff574 edi: 00000004 ebp: fffffff7 esp: c2495f34 Jul 6 20:54:54 Cerberus2 kernel: ds: 0018 es: 0018 ss: 0018 Jul 6 20:54:54 Cerberus2 kernel: Process postmaster (pid: 5661, process nr: 50, stackpage=c2495000) Jul 6 20:54:54 Cerberus2 kernel: Stack: 00000000 c15485b0 c2495f40 00000001 00000000 00000000 4000bc74 00000000 Jul 6 20:54:54 Cerberus2 kernel: 00000000 00000000 00000000 00000000 c0d49a50 0000161d 00000000 c15485b0 Jul 6 20:54:54 Cerberus2 kernel: 00000101 00000000 7fffffff 00000000 00000000 00000000 c012c687 00000004 Jul 6 20:54:54 Cerberus2 kernel: Call Trace: [sys_fcntl+595/772] [sys_open+94/124] [system_call+52/56] Jul 6 20:54:54 Cerberus2 kernel: Code: 8b 50 38 85 d2 74 15 8d 44 24 24 50 ff 74 24 6c 53 ff d2 89 If anyone can give me some tips on tracking this down, I would appreciate it.... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > At 11:09 6/07/00 +0200, Jan Wieck wrote: > > > > No. It's all in the current CVS tree. I removed that patch > > already. > > > > OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a > 'make distclean' then a 'make' & 'make install' again, and now postmaster > wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the > process dies so I cant get a backtrace. Have the same symptom with a completely fresh cvs checkout. > If anyone can give me some tips on tracking this down, I would appreciate > it.... Bruce applied a patch to configure.in yesterday. Read the comments from the cvslog. It tells that it triggers abug in the Linux kernels fcntl(SETLK) code when used with unix domain sockets, and that the bug is present inLinux kernels <= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies in pqcomm.c line 229 on fcntl() againstthe socket. Undefining HAVE_FCNTL_SETLK in config.h did it for me temporary. Don't know how to deal with it finally. With this setup I did initdb createdb psql <megaview.sql pg_dump pgsql >megaview.dump In the dump file, the first 2183 bytes look OK. What's following then looks like internal tables where pg_dumpholds the info of the schema analyzing. And don't worry that the view is dumped as table with a later CREATE RULE. That's correct this way. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
At 14:04 6/07/00 +0200, Jan Wieck wrote: >Philip Warner wrote: > > In the dump file, the first 2183 bytes look OK. What's > following then looks like internal tables where pg_dump holds > the info of the schema analyzing. Any chance you could mail it direct to me? > And don't worry that the view is dumped as table with a later > CREATE RULE. That's correct this way. I figured this out the hard way! ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 14:04 6/07/00 +0200, Jan Wieck wrote: > > With this setup I did > > initdb > createdb > psql <megaview.sql > pg_dump pgsql >megaview.dump > > In the dump file, the first 2183 bytes look OK. What's > following then looks like internal tables where pg_dump holds > the info of the schema analyzing. > Just in case there is some other factor, can you let me know what your choices in 'configure' were? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > At 14:04 6/07/00 +0200, Jan Wieck wrote: > >Philip Warner wrote: > > > > In the dump file, the first 2183 bytes look OK. What's > > following then looks like internal tables where pg_dump holds > > the info of the schema analyzing. > > Any chance you could mail it direct to me? Attached. > > > > And don't worry that the view is dumped as table with a later > > CREATE RULE. That's correct this way. > > I figured this out the hard way! :-) Will be off after this until approx. 1:00 UCT. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Philip Warner wrote: > At 14:04 6/07/00 +0200, Jan Wieck wrote: > > > > With this setup I did > > > > initdb > > createdb > > psql <megaview.sql > > pg_dump pgsql >megaview.dump > > > > In the dump file, the first 2183 bytes look OK. What's > > following then looks like internal tables where pg_dump holds > > the info of the schema analyzing. > > > > Just in case there is some other factor, can you let me know what your > choices in 'configure' were? --with-tcl --enable-cassert --enable-debug Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> Bruce applied a patch to configure.in yesterday. Read the > comments from the cvslog. It tells that it triggers a bug in > the Linux kernels fcntl(SETLK) code when used with unix > domain sockets, and that the bug is present in Linux kernels > <= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies > in pqcomm.c line 229 on fcntl() against the socket. > Undefining HAVE_FCNTL_SETLK in config.h did it for me > temporary. Don't know how to deal with it finally. Thanks Jan for the workaround. I'll see if this gets me up and going again. - Thomas
> JanWieck@t-online.de (Jan Wieck) writes: > > size to initdb and the problems started. I can create a table > > with 500+ attributes. Also I can create a view on it (the > > rules size is 170K - whow). Anything works pretty well, just > > a pg_dump output is garbage. > > Are you using current sources? Bruce committed Philip Warner's > pg_dump rewrite a day or so ago (which I thought was way premature, > but anyway...). Just want to know which pg_dump we're talking about. I committed pg_dump so people could see his changes and start making additions. I can then have him supply incremental patches. Keeping stuff out of the tree usually causes the author to get frustrated. Of course, if it comes in too quickly, it can cause chaos if he needs to make major changes. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a > > 'make distclean' then a 'make' & 'make install' again, and now postmaster > > wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the > > process dies so I cant get a backtrace. > > Have the same symptom with a completely fresh cvs checkout. > > > If anyone can give me some tips on tracking this down, I would appreciate > > it.... > > Bruce applied a patch to configure.in yesterday. Read the > comments from the cvslog. It tells that it triggers a bug in > the Linux kernels fcntl(SETLK) code when used with unix > domain sockets, and that the bug is present in Linux kernels > <= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies > in pqcomm.c line 229 on fcntl() against the socket. I thought when he said flock() bug, he meant only on the new IA64 platform, not on all Linux platforms. Yikes, I enable flock(), and it breaks initdb for all the Linux users. This is a problem! Tom was mentioning the configure check for flock() was broken recently, so I was glad to fix it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > OK, I've updated from CVS and rebuilt & it worked. The, to be sure, I did a > > > 'make distclean' then a 'make' & 'make install' again, and now postmaster > > > wont start (SIGSEGV). I have rebuild with '-O0 -g', gone into gdb, but the > > > process dies so I cant get a backtrace. > > > > Have the same symptom with a completely fresh cvs checkout. > > > > > If anyone can give me some tips on tracking this down, I would appreciate > > > it.... > > > > Bruce applied a patch to configure.in yesterday. Read the > > comments from the cvslog. It tells that it triggers a bug in > > the Linux kernels fcntl(SETLK) code when used with unix > > domain sockets, and that the bug is present in Linux kernels > > <= 2.2.16. I'm running a 2.2.12 here, and so it exactly dies > > in pqcomm.c line 229 on fcntl() against the socket. > > I thought when he said flock() bug, he meant only on the new IA64 > platform, not on all Linux platforms. Yikes, I enable flock(), and it > breaks initdb for all the Linux users. This is a problem! Not initdb, but postmaster. That's the one who tries (after a successful initdb) to do the fcntl(F_SETLK) on the unix domain socket. Causing the kernel saying "go to hell, go directly, don't write a core, don't leave useful info in gdb". The only reason I see for the entire section is to detect if it would be safe to unlink the socket because it's left by another postmaster in case of abnormal termination. Tell me if I've misread it. So why not doing it on the Linux platform different, using a separate file like .s.PGSQL.5432.LCK? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> Not initdb, but postmaster. That's the one who tries (after a > successful initdb) to do the fcntl(F_SETLK) on the unix > domain socket. Causing the kernel saying "go to hell, go > directly, don't write a core, don't leave useful info in > gdb". > > The only reason I see for the entire section is to detect if > it would be safe to unlink the socket because it's left by > another postmaster in case of abnormal termination. Tell me > if I've misread it. So why not doing it on the Linux > platform different, using a separate file like > .s.PGSQL.5432.LCK? But how do you know if that file still belongs to an active postmaster? What if it exited before removing the file. Seems we would have to write the PID into the file, and do a kill() to see if it is running. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>> The only reason I see for the entire section is to detect if >> it would be safe to unlink the socket because it's left by >> another postmaster in case of abnormal termination. Tell me >> if I've misread it. That's exactly what it's for. We need to tell whether there is still another postmaster running on the same port number. Too bad the kernel is not bright enough to unlink the socket file automatically when it's no longer in use... >> So why not doing it on the Linux >> platform different, using a separate file like >> .s.PGSQL.5432.LCK? I think it's a bad idea to do it differently on Linux than other platforms. If we fix this (other than by just disabling the fcntl call again on old Linuxen) we should use the new method everywhere. > But how do you know if that file still belongs to an active postmaster? > What if it exited before removing the file. Seems we would have to > write the PID into the file, and do a kill() to see if it is running. Well, if we wanted to continue to depend on fcntl(SETLK) then we could use an empty plain file. I read the bug report as being that old Linux kernels fail if fcntl(SETLK) is applied to a Unix-socket file. They'd surely have noticed long before if the feature didn't work on plain files. But if we are going to change this at all, I'd vote for storing pids in the lock files the way we are now doing in the data-directory pid lock files. Then we wouldn't have to depend on fcntl at all, which would be a Good Thing from a portability point of view. However, I think it would be a really bad idea to keep the lock files in /tmp --- that's way too open to accidental removals, not to mention deliberate denial-of-service attacks. They need to be in a more secure directory; but where? See the past discussions summarized in the TODO.detail file. regards, tom lane
Tom Lane writes: > However, I think it would be a really bad idea to keep the lock files > in /tmp --- that's way too open to accidental removals, not to mention > deliberate denial-of-service attacks. They need to be in a more secure > directory; but where? See the past discussions summarized in the > TODO.detail file. Quoth the file system standard: `sharedstatedir' The directory for installing architecture-independent data files which the programs modify while theyrun. This should normally be `/usr/local/com', but write it as `$(prefix)/com'. (If you are using Autoconf, writeit as `@sharedstatedir@'.) The problem with this approach is making that directory writeable by the server account. Solutions: 1) Making the postmaster executable as root but later drop root privileges. (This looks to be the cleanest solution, butit is probably a security problem waiting to happen.) 2) Making initdb executable as root but with some --user switch. Have it create a subdirectory of $sharedstatedir writableby the server account, possibly with sticky bit and whatnot. Use `su' to invoke `postgres'. This approach might be convenient also in terms of creating the data directory. 3) Making "initialize lock file area" a separate initialization step, possibly encapsulated into a shell script. Btw., what would happen if we did start a second postmaster at the same TCP port? Or more interestingly, what happens if some completely different program already runs at that port? How do we protect against that? This has something to do with SO_REUSEADDR, but I don't understand those things too well. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > Quoth the file system standard: > `sharedstatedir' > The directory for installing architecture-independent data files > which the programs modify while they run. This should normally be > `/usr/local/com', but write it as `$(prefix)/com'. (If you are > using Autoconf, write it as `@sharedstatedir@'.) > The problem with this approach is making that directory writeable by the > server account. The lock directory should certainly be one used only for Postgres locks, owned by postgres user and writable only by postgres user. > 2) Making initdb executable as root but with some --user switch. Have it > create a subdirectory of $sharedstatedir writable by the server > account, possibly with sticky bit and whatnot. Use `su' to invoke > `postgres'. > This approach might be convenient also in terms of creating the data > directory. We could do that, or we could just say "you must have arranged for creation of these directories before you run initdb". For the truly lazy, a small script that could be executed as root could be provided. Personally I'd be unwilling to run a script as complex as initdb as root; what if it goes wrong? Keep the stuff that requires root permission separate, and as small as possible. BTW, regardless of where exactly the lock directory lives (and IIRC there were several schools of thought on that), I believe that the lock directory pathname has to be wired in at configure time. It can't be an initdb argument because the whole locking thing is useless unless all the PG installations on a machine agree on where the port locks are. > Btw., what would happen if we did start a second postmaster at the same > TCP port? Or more interestingly, what happens if some completely different > program already runs at that port? How do we protect against that? This > has something to do with SO_REUSEADDR, but I don't understand those things > too well. SO_REUSEADDR solves the problem for TCP sockets. The problem with Unix sockets is that the kernel's detection of conflicts is pretty braindead: if there is an existing socket file of the same name, you get an "address in use" failure from bind(), regardless of whether anyone else is actually using the socket. So, if the previous postmaster died ungracefully and didn't delete its socket file, a new postmaster cannot be started up until the old socket file is removed. What we're trying to do here is automate that removal so the admin doesn't have to do it. The trouble is we can't just unlink() the old socket file because that'll succeed even if there is a postmaster actively using the socket! So we need to find out whether the old postmaster is still alive to decide whether it's OK to remove the old socket file or whether we should abort startup. Bruce and I were just talking by phone about this, and we realized that there is a completely different approach to making that decision: if you want to know whether there's an old postmaster connected to a socket file, try to connect to the old postmaster! In other words, pretend to be a client and see if your connection attempt is answered. (You don't have to try to log in, just see if you get a connection.) This might also answer Peter's concern about socket files that belong to non-Postgres programs, although I doubt that's really a big issue. There are some potential pitfalls here, like what if the old postmaster is there but overloaded? But on the whole it seems like it might be a cleaner answer than fooling around with lockfiles, and certainly safer than relying on fcntl(SETLK) to work on a socket file. Comments anyone? regards, tom lane
Tom Lane wrote: > > Bruce and I were just talking by phone about this, and we realized that > there is a completely different approach to making that decision: if you > want to know whether there's an old postmaster connected to a socket > file, try to connect to the old postmaster! In other words, pretend to > be a client and see if your connection attempt is answered. (You don't > have to try to log in, just see if you get a connection.) This might > also answer Peter's concern about socket files that belong to > non-Postgres programs, although I doubt that's really a big issue. > > There are some potential pitfalls here, like what if the old postmaster > is there but overloaded? But on the whole it seems like it might be > a cleaner answer than fooling around with lockfiles, and certainly safer > than relying on fcntl(SETLK) to work on a socket file. Comments anyone? Like it. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
* Jan Wieck <JanWieck@t-online.de> [000708 05:47] wrote: > Tom Lane wrote: > > > > Bruce and I were just talking by phone about this, and we realized that > > there is a completely different approach to making that decision: if you > > want to know whether there's an old postmaster connected to a socket > > file, try to connect to the old postmaster! In other words, pretend to > > be a client and see if your connection attempt is answered. (You don't > > have to try to log in, just see if you get a connection.) This might > > also answer Peter's concern about socket files that belong to > > non-Postgres programs, although I doubt that's really a big issue. > > > > There are some potential pitfalls here, like what if the old postmaster > > is there but overloaded? But on the whole it seems like it might be > > a cleaner answer than fooling around with lockfiles, and certainly safer > > than relying on fcntl(SETLK) to work on a socket file. Comments anyone? > > Like it. my $pgsocket = "/tmp/.s.PGSQL.5432"; # try to connect to the postmaster socket(SOCK, PF_UNIX, SOCK_STREAM, 0) or die "unable to create unix domain socket: $!"; connect(SOCK, sockaddr_un($pgsocket)) and errexit("postmaster is running you must shut it down"); oh yeah... :) -Alfred
Alfred Perlstein wrote: > > * Jan Wieck <JanWieck@t-online.de> [000708 05:47] wrote: > > Tom Lane wrote: > > > > > > Bruce and I were just talking by phone about this, and we realized that > > > there is a completely different approach to making that decision: if you > > > want to know whether there's an old postmaster connected to a socket > > > file, try to connect to the old postmaster! In other words, pretend to > > > be a client and see if your connection attempt is answered. (You don't > > > have to try to log in, just see if you get a connection.) This might > > > also answer Peter's concern about socket files that belong to > > > non-Postgres programs, although I doubt that's really a big issue. > > > > > > There are some potential pitfalls here, like what if the old postmaster > > > is there but overloaded? But on the whole it seems like it might be > > > a cleaner answer than fooling around with lockfiles, and certainly safer > > > than relying on fcntl(SETLK) to work on a socket file. Comments anyone? > > > > Like it. > > my $pgsocket = "/tmp/.s.PGSQL.5432"; > > # try to connect to the postmaster > socket(SOCK, PF_UNIX, SOCK_STREAM, 0) > or die "unable to create unix domain socket: $!"; > > connect(SOCK, sockaddr_un($pgsocket)) > and errexit("postmaster is running you must shut it down"); > > oh yeah... :) > > -Alfred I don't get this. Isn't there a race condition here? Just curious, Mike Mascari
* Mike Mascari <mascarm@mascari.com> [000708 05:55] wrote: > Alfred Perlstein wrote: > > > > * Jan Wieck <JanWieck@t-online.de> [000708 05:47] wrote: > > > Tom Lane wrote: > > > > > > > > Bruce and I were just talking by phone about this, and we realized that > > > > there is a completely different approach to making that decision: if you > > > > want to know whether there's an old postmaster connected to a socket > > > > file, try to connect to the old postmaster! In other words, pretend to > > > > be a client and see if your connection attempt is answered. (You don't > > > > have to try to log in, just see if you get a connection.) This might > > > > also answer Peter's concern about socket files that belong to > > > > non-Postgres programs, although I doubt that's really a big issue. > > > > > > > > There are some potential pitfalls here, like what if the old postmaster > > > > is there but overloaded? But on the whole it seems like it might be > > > > a cleaner answer than fooling around with lockfiles, and certainly safer > > > > than relying on fcntl(SETLK) to work on a socket file. Comments anyone? > > > > > > Like it. > > > > my $pgsocket = "/tmp/.s.PGSQL.5432"; > > > > # try to connect to the postmaster > > socket(SOCK, PF_UNIX, SOCK_STREAM, 0) > > or die "unable to create unix domain socket: $!"; > > > > connect(SOCK, sockaddr_un($pgsocket)) > > and errexit("postmaster is running you must shut it down"); > > > > oh yeah... :) > > > > -Alfred > > I don't get this. Isn't there a race condition here? > > Just curious, Sure but it's handled, if there's a postmaster starting at this exact instant, however since the script just runs postmaster afterwards the conflict will make postmaster abort and I'll get an error return from my invocation of postmaster. -Alfred
> > my $pgsocket = "/tmp/.s.PGSQL.5432"; > > > > # try to connect to the postmaster > > socket(SOCK, PF_UNIX, SOCK_STREAM, 0) > > or die "unable to create unix domain socket: $!"; > > > > connect(SOCK, sockaddr_un($pgsocket)) > > and errexit("postmaster is running you must shut it down"); > > > > oh yeah... :) > > > > -Alfred > > I don't get this. Isn't there a race condition here? That's a good point. I don't think so because the socket will only create for one user. Basically, we don't need something bulletproof here. We just need something to prevent admins from accidentally starting two postmasters on the same port. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
* Bruce Momjian <pgman@candle.pha.pa.us> [000708 06:02] wrote: > > > my $pgsocket = "/tmp/.s.PGSQL.5432"; > > > > > > # try to connect to the postmaster > > > socket(SOCK, PF_UNIX, SOCK_STREAM, 0) > > > or die "unable to create unix domain socket: $!"; > > > > > > connect(SOCK, sockaddr_un($pgsocket)) > > > and errexit("postmaster is running you must shut it down"); > > > > > > oh yeah... :) > > > > > > -Alfred > > > > I don't get this. Isn't there a race condition here? > > That's a good point. I don't think so because the socket will only > create for one user. Basically, we don't need something bulletproof > here. We just need something to prevent admins from accidentally > starting two postmasters on the same port. Actually I just remebered the issue here, if one wants to start postmaster on an alternate port there will be no conflict and all hell may break loose. -Alfred
> > That's a good point. I don't think so because the socket will only > > create for one user. Basically, we don't need something bulletproof > > here. We just need something to prevent admins from accidentally > > starting two postmasters on the same port. > > Actually I just remebered the issue here, if one wants to start > postmaster on an alternate port there will be no conflict and > all hell may break loose. We already lock the /data directory. This is for the port lock. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > But how do you know if that file still belongs to an active postmaster? > > What if it exited before removing the file. Seems we would have to > > write the PID into the file, and do a kill() to see if it is running. I believe we already do this (SetPidFile() in utils/init/miscinit.c). Isn't it sufficient (1) to prevent starting a new postmaster on the same data dir and (2) to unlink the accidently left socket file? -- Tatsuo Ishii
> > > But how do you know if that file still belongs to an active postmaster? > > > What if it exited before removing the file. Seems we would have to > > > write the PID into the file, and do a kill() to see if it is running. > > I believe we already do this (SetPidFile() in > utils/init/miscinit.c). Isn't it sufficient (1) to prevent starting a > new postmaster on the same data dir and (2) to unlink the accidently > left socket file? I noticed what I was missing after sending the mail. Sorry for the confusion. Seems the idea trying to connect a postmaster looks good. -- Tatsuo Ishii
Tom Lane writes: > Bruce and I were just talking by phone about this, and we realized that > there is a completely different approach to making that decision: if you > want to know whether there's an old postmaster connected to a socket > file, try to connect to the old postmaster! It seems that that would completely reverse the assumption of risk. Currently, the postmaster may fail to start because there's a stale socket file lying around, out of respect to a running colleague. With this idea it would be the running postmaster's job to "defend" his socket against newly starting colleagues. That doesn't seem fair. What are our problems? There's a possible DoS attack when someone else comes first and creates a file /tmp/.s.PGSQL.5432. But detecting whether there's another program running on that socket (if it's a socket) isn't going to help because you most likely won't be able to delete it anyway. The solution to this is to make the path of the socket file configurable more easily so that the administrator has the choice of putting it a safer place that he prepared appropriately. A complementary solution is of course to add an option to run without Unix socket, since we don't rely on the socket file for data directory locking anymore. In fact, does anybody mind if I add such an option? We can have tcpip_socket = yes|no unix_socket = yes|no (Security-conscious users may choose to turn off both. :-)) The other problem is a socket file left behind by a crashed postmaster. I don't consider this such a big problem; a crashed postmaster is not the normal mode of operation. The friendly message we have right now seems alright to me. And it's a way of tell that the postmaster crashed at all. One idea to get the pid in there somewhere is creating a socket file "/tmp/.s.PGSQL.port.pid" and making /tmp/.s.PGSQL.port a symlink to it. Then clients don't know the difference, but the server knows the pid and can take appropriate action. Or make the symlink the other way around, not sure. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> The other problem is a socket file left behind by a crashed postmaster. I > don't consider this such a big problem; a crashed postmaster is not the > normal mode of operation. The friendly message we have right now seems > alright to me. And it's a way of tell that the postmaster crashed at all. > > One idea to get the pid in there somewhere is creating a socket file > "/tmp/.s.PGSQL.port.pid" and making /tmp/.s.PGSQL.port a symlink to it. > Then clients don't know the difference, but the server knows the pid and > can take appropriate action. Or make the symlink the other way around, not > sure. The symlink is an interesting idea. lstat() on the normal name can give the file name with pid. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> There's a possible DoS attack when someone else comes first and creates a > file /tmp/.s.PGSQL.5432. But detecting whether there's another program > running on that socket (if it's a socket) isn't going to help because you > most likely won't be able to delete it anyway. The solution to this is to > make the path of the socket file configurable more easily so that the > administrator has the choice of putting it a safer place that he prepared > appropriately. If you are worried about DoS, I think the only solution is to figure out a way to be using one of the reserved <1000 ports. I don't think there's any way around that is there? Also presumably not using a reserved port is a security risk. Not that I'm worried.
Mike Mascari <mascarm@mascari.com> writes: > I don't get this. Isn't there a race condition here? Strictly speaking, there is, but the race window is only a couple of kernel calls wide, and as Bruce pointed out we do not need something that is absolutely gold-plated bulletproof. We are just trying to prevent dbadmins from accidentally starting two postmasters on the same port number. The way this would work is that pqcomm.c would do something like if (socketFileAlreadyExists) { try to open connection to existing postmaster; if (successful) { report portconflict and die; } delete existing socket file;}bind(socket); // kernel creates new socket file herelisten(); The race condition here is that if newly-started postmaster A has executed bind() but not yet listen(), then newly-started postmaster B could come along, observe the existing socket file, try to open connection, fail, delete socket file, proceed. AFAIK B will be allowed to bind() and create a new socket file, and A ends up listening to a port that's lost in hyperspace --- no one else can ever connect to it because it has no visible representative in the filesystem. But as soon as A has executed listen() it's safe --- even though it's not really ready to accept connections yet, the attempted connect from B will wait till it does. (We should, therefore, use a plain vanilla connect attempt for the probe --- no non-blocking connect or anything fancy.) The bind-to-listen delay in pqcomm.c is currently several lines long, but there's no reason they couldn't be successive kernel calls with nothing but a test for bind() failure between. That strikes me as plenty close enough... regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > It seems that that would completely reverse the assumption of risk. > Currently, the postmaster may fail to start because there's a stale socket > file lying around, out of respect to a running colleague. With this idea > it would be the running postmaster's job to "defend" his socket against > newly starting colleagues. That doesn't seem fair. True, it would reverse the most probable failure mode, but I'm not sure that's a bad thing. > The other problem is a socket file left behind by a crashed postmaster. I > don't consider this such a big problem; a crashed postmaster is not the > normal mode of operation. The friendly message we have right now seems > alright to me. And it's a way of tell that the postmaster crashed at all. No, actually this is a *big* problem. That friendly message is no help to a system boot script that can't read it (the same point you've made repeatedly w.r.t configure issues; surprised you don't see it here). If I do a fast shutdown of my Unix system (the kind where shutdown does a 'kill -9' on all user processes --- on HPUX systems this is invoked by hitting the power switch or by the power supply overtemperature sensor) then the postmaster doesn't get a chance to clean out its socket file. After reboot, the postmaster fails to start up until I manually intervene by removing the socket file. That's not robust and not acceptable. The way I currently get around this (and I believe it's a pretty popular thing to do) is that my postmaster-start script unconditionally deletes the socket file before launching the postmaster. That's actually far riskier than what we are discussing, because there is *no* safety check for an already-started postmaster. A connection check would be a big improvement. I consider failure-to-start during normal system bootup to be a far graver risk than the possibility that a second postmaster will usurp a first postmaster's Unix socket --- especially since the latter could only happen if the first postmaster isn't answering connections, in which case allowing it to keep the socket is of dubious value anyhow. So reversing the presumption of innocence seems like a good idea to me. > ... The solution to this is to make the path of the socket file > configurable more easily so that the administrator has the choice of > putting it a safer place that he prepared appropriately. We talked about that in the original discussion (you might want to review the flock pghackers thread from late August '98). The trouble is that the socket file path is a critical part of the client-to-postmaster protocol: change the path, and existing clients don't know where to connect. Oops. So even though /tmp is obviously a pretty bogus place to keep the socket, the compatibility headaches of moving it are so great that no one really wants to bite the bullet. We talked about compromises like keeping the real socket in some safer directory, with a symlink from /tmp for old clients, and I think that's what will happen eventually. But please note that if the socket file path is "easily configurable" then the same problem comes right back to bite you again. It's *not* "easy" to change your mind about where the socket files live; on any given platform that decision had better be graven on stone tablets, because you want all your clients of whatever vintage to be able to find your postmaster(s). I'm inclined to think that a configure option might be counterproductive --- nailing it down in the per-OS template file seems much less likely to get screwed up. The major problem with a hard-wired socket path that's not /tmp is that you can't install the socket directory if you're not root, so the ability to fire up a postmaster with no root privs whatever would no longer exist. We could get around that if it were possible to run with only TCP connection support, making Unix-domain connections an option instead of the base requirement. > A complementary solution is of course to add an option to run without Unix > socket, since we don't rely on the socket file for data directory locking > anymore. In fact, does anybody mind if I add such an option? We can have > tcpip_socket = yes|no > unix_socket = yes|no Yup, it would make a lot of sense to have an option for no Unix socket connections (we already have that as an #ifdef for a couple of platforms with no Unix socket support, but not as a postmaster start-time choice). > (Security-conscious users may choose to turn off both. :-)) Uh, not at the moment, because we use the port interlock(s) as a proxy for a shared-memory interlock. Really there are three resources that we must prevent concurrent postmasters from sharing:* data directory;* listen port number;* shared-memory blocks (and semaphoresets). We have a good solution in place now for locking the data directory, but the port interlock still needs work. Currently we use the port number to assign shmem/sema keys, and there is no separate interlock to guard against shmem conflicts. I believe we had a discussion a few months ago about rejiggering the shmem key assignment method so that shmem conflicts would be detected and dealt with cleanly --- might be a good idea to make that happen before we go too far with port interlock changes. regards, tom lane
* Bruce Momjian <pgman@candle.pha.pa.us> [000708 06:40] wrote: > > > That's a good point. I don't think so because the socket will only > > > create for one user. Basically, we don't need something bulletproof > > > here. We just need something to prevent admins from accidentally > > > starting two postmasters on the same port. > > > > Actually I just remebered the issue here, if one wants to start > > postmaster on an alternate port there will be no conflict and > > all hell may break loose. > > We already lock the /data directory. This is for the port lock. The whole process could be locked by an fcntl lock on a seperate file, which I think was already mentioned, however I've deleted most of the thread unfortunatly. /tmp/.l.PGSQL.5432 <- fcntl lockfile, aquired first. /tmp/.s.PGSQL.5432 <- socket. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."