Thread: Question about debugging bootstrapping and catalog entries
I've been fooling with catalog entries here and I've obviously done something wrong. But I'm a bit frustrated trying to debug initdb. Because of the way it starts up the database in a separate process I'm finding it really hard to connect to the database and get a backtrace. And the debugging log is being spectacularly unhelpful in not telling me where the problem is. Are there any tricks people have for debugging bootstrapping processing? I just need to know what index it's trying to build here and that should be enough to point me in the right direction: creating template1 database in /var/tmp/db7/base/1 ... FATAL: could not create unique index DETAIL: Table contains duplicated values. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Mon, Dec 18, 2006 at 11:35:44AM +0000, Gregory Stark wrote: > Are there any tricks people have for debugging bootstrapping processing? I > just need to know what index it's trying to build here and that should be > enough to point me in the right direction: Here's what I did: you can step over functions in initdb until it fails (although I alredy know which part it's failing I guess). Restart. Then you go into that function and step until the new backend has been started. At this point you attach another gdb to the backend and let it run. Some steps create multiple backends, a printf() statement sometime help determining where to stop. If the backend process segfaults, the easiest is to enable core dumps, then you can run gdb on the left-overs, so to speak. If you get an error, you put a breakpoint on errfinish(). Note, that gets called even on messages you don't normally see, so you may have to skip a couple to get the real message. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Re: Question about debugging bootstrapping and catalog entries
From
"Zeugswetter Andreas ADI SD"
Date:
> > Are there any tricks people have for debugging bootstrapping processing? I > > just need to know what index it's trying to build here and that should be > > enough to point me in the right direction: > > Here's what I did: you can step over functions in initdb until it fails > (although I alredy know which part it's failing I guess). Restart. Then > you go into that function and step until the new backend has been > started. At this point you attach another gdb to the backend and let it > run. How do you attach fast enough, so not all is over before you are able to attach ? I'd like to debug initdb failure on Windows (postgres executable not found) when running make check with disabled is_admin check and --prefix=/postgres in msys. The program "postgres" is needed by initdb but was not found in the same directory as "j:/postgres/src/test/regress/./tmp_check/install/postgres/bin/initdb". Andreas
"Martijn van Oosterhout" <kleptog@svana.org> writes: > Here's what I did: you can step over functions in initdb until it fails > (although I alredy know which part it's failing I guess). Restart. Then > you go into that function and step until the new backend has been > started. At this point you attach another gdb to the backend and let it > run. Hm, I suppose. Though starting a second gdb is a pain. What I've done in the past is introduce a usleep(30000000) in strategic points in the backend to give me a chance to attach. Perhaps what would be handy is having an option to initdb to just run the backend under gdb automatically. I'm not sure if initdb runs the backend in the terminal though. Or perhaps initdb should start the backend with an option that instructs it to enter an infinite loop shortly after startup so you can attach with gdb. In the meantime this trivial patch saved my day: diff -c -r1.225 bootstrap.c *** src/backend/bootstrap/bootstrap.c 4 Oct 2006 00:29:49 -0000 1.225 --- src/backend/bootstrap/bootstrap.c 18 Dec 2006 12:11:11 -0000 *************** *** 1293,1298 **** --- 1293,1300 ---- heap = heap_open(ILHead->il_heap, NoLock); ind = index_open(ILHead->il_ind, NoLock); + elog(DEBUG4, "building index %s on %s", NameStr(ind->rd_rel->relname), NameStr(heap->rd_rel->relname)); + index_build(heap, ind, ILHead->il_info, false); index_close(ind, NoLock); -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 12/18/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
You wouldn't need to skip anything if you put the breakpoint inside the ' if (elevel == ERROR)' code-block in errfinish(). It will stop only for an ERROR.
Regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
If you get an error, you put a breakpoint on errfinish(). Note, that
gets called even on messages you don't normally see, so you may have to
skip a couple to get the real message.
You wouldn't need to skip anything if you put the breakpoint inside the ' if (elevel == ERROR)' code-block in errfinish(). It will stop only for an ERROR.
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
On Mon, Dec 18, 2006 at 12:59:28PM +0100, Zeugswetter Andreas ADI SD wrote: > How do you attach fast enough, so not all is over before you are able to > attach ? > I'd like to debug initdb failure on Windows (postgres executable not > found) when > running make check with disabled is_admin check and --prefix=/postgres > in msys. When running initdb under gdb, you step over the PG_CMD_OPEN;. At that point the backend is started, but hasn't done anything yet, so you can attach to it. The backend stays until the next PG_CMD_CLOSE; As someone pointed out, sleep works also. > The program "postgres" is needed by initdb but was not found in the > same directory as > "j:/postgres/src/test/regress/./tmp_check/install/postgres/bin/initdb". No idea about that, the binary *should* be there... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Mon, 18 Dec 2006, Gregory Stark wrote: > > I've been fooling with catalog entries here and I've obviously done something > wrong. But I'm a bit frustrated trying to debug initdb. Because of the way it > starts up the database in a separate process I'm finding it really hard to > connect to the database and get a backtrace. And the debugging log is being > spectacularly unhelpful in not telling me where the problem is. > > Are there any tricks people have for debugging bootstrapping processing? I > just need to know what index it's trying to build here and that should be > enough to point me in the right direction: > > creating template1 database in /var/tmp/db7/base/1 ... FATAL: could not create unique index > DETAIL: Table contains duplicated values. > Not much fun. Run src/include/catalog/duplicate_oids first. Thanks, Gavin
Gregory Stark wrote: > > I've been fooling with catalog entries here and I've obviously done something > wrong. But I'm a bit frustrated trying to debug initdb. Because of the way it > starts up the database in a separate process I'm finding it really hard to > connect to the database and get a backtrace. And the debugging log is being > spectacularly unhelpful in not telling me where the problem is. > > Are there any tricks people have for debugging bootstrapping processing? I > just need to know what index it's trying to build here and that should be > enough to point me in the right direction: > > creating template1 database in /var/tmp/db7/base/1 ... FATAL: could not create unique index > DETAIL: Table contains duplicated values. One easy thing to try is to use -n (noclean) and then start a standalone backend on the borked dir and issue the commands that initdb was feeding at that point (usually embedded in the initdb source). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Gregory Stark wrote: > "Martijn van Oosterhout" <kleptog@svana.org> writes: > >> Here's what I did: you can step over functions in initdb until it fails >> (although I alredy know which part it's failing I guess). Restart. Then >> you go into that function and step until the new backend has been >> started. At this point you attach another gdb to the backend and let it >> run. > > Hm, I suppose. Though starting a second gdb is a pain. What I've done in the > past is introduce a usleep(30000000) in strategic points in the backend to > give me a chance to attach. I use dtrace which wait on write syscall for stderr output and if it is happen then stop(freeze) the process and I able to connect into the process with debugger and examine what happened. Zdenek
Alvaro Herrera wrote: > Gregory Stark wrote: > >> I've been fooling with catalog entries here and I've obviously done something >> wrong. But I'm a bit frustrated trying to debug initdb. Because of the way it >> starts up the database in a separate process I'm finding it really hard to >> connect to the database and get a backtrace. And the debugging log is being >> spectacularly unhelpful in not telling me where the problem is. >> >> Are there any tricks people have for debugging bootstrapping processing? I >> just need to know what index it's trying to build here and that should be >> enough to point me in the right direction: >> >> creating template1 database in /var/tmp/db7/base/1 ... FATAL: could not create unique index >> DETAIL: Table contains duplicated values. >> > > One easy thing to try is to use -n (noclean) and then start a standalone > backend on the borked dir and issue the commands that initdb was feeding > at that point (usually embedded in the initdb source). > > This step actually runs the BKI file, so it's not embedded in the initdb code. The other thing with this procedure is to clean up any partial data left behind first, i.e. clean the global and base/1 directories. Apart from that it should work fine, I think - probably something like: gdb postgres set args -boot -x1 -F -d 5 template1 run < /path/to/bkifile cheers andrew
Gregory Stark <stark@enterprisedb.com> writes: > Hm, I suppose. Though starting a second gdb is a pain. What I've done in the > past is introduce a usleep(30000000) in strategic points in the backend to > give me a chance to attach. There is already an option to sleep early in backend startup for the normal case. Not sure if it works for bootstrap, autovacuum, etc, but I could see making it do so. The suggestion of single-stepping initdb will only work well if you have a version of gdb that can step into a fork, which is something that's never worked for me :-(. Otherwise the backend will free-run until it blocks waiting for input from initdb, which means you are still stuck for debugging startup crashes ... regards, tom lane
Hello, Mr. Stark > Are there any tricks people have for debugging bootstrapping processing? I > just need to know what index it's trying to build here and that should be > enough to point me in the right direction: As Mr. Lane says, it would be best to be able to make postgres sleep for an arbitrary time. The direction may be either a command line option or an environment variable (like BOOTSTRAP_SLEEP) or both. iI think the env variable is easy to handle n this case. How about mimicing postgres with a script that starts gdb to run postgres? That is, rename the original postgres module to postgres.org and create a shell script named postgres like this: #!/bin/bash gdb postgres $* Tell me if it works.
From: "Takayuki Tsunakawa" <tsunakawa.takay@jp.fujitsu.com>How about mimicing postgres with a script that starts gdb to run > postgres? That is, rename the original postgres module to > postgres.org and create a shell script named postgres like this: > > #!/bin/bash > gdb postgres $* Sorry, this should be postgres.org $*. ----- Original Message ----- From: "Takayuki Tsunakawa" <tsunakawa.takay@jp.fujitsu.com> To: "Gregory Stark" <stark@enterprisedb.com>; "PostgreSQL Hackers" <pgsql-hackers@postgresql.org> Sent: Tuesday, December 19, 2006 9:37 AM Subject: Re: [HACKERS] Question about debugging bootstrapping and catalog entries > Hello, Mr. Stark > >> Are there any tricks people have for debugging bootstrapping > processing? I >> just need to know what index it's trying to build here and that > should be >> enough to point me in the right direction: > > As Mr. Lane says, it would be best to be able to make postgres sleep > for an arbitrary time. The direction may be either a command line > option or an environment variable (like BOOTSTRAP_SLEEP) or both. iI > think the env variable is easy to handle n this case. > > How about mimicing postgres with a script that starts gdb to run > postgres? That is, rename the original postgres module to > postgres.org and create a shell script named postgres like this: > > #!/bin/bash > gdb postgres $* > > Tell me if it works. > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate >
On 12/18/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
You are probably referring to the command-line switch -W to posrgres, that translates to 'PostAuthDelay' GUC variable; I think that kicks in a bit too late! Once I was trying to debug check_root() (called by main() ), and had to resort to my own pg_usleep() to make the process wait for debugger-attach. We should somehow pull the sleep() code into main() as far up as possible.
BTW, here's how I made PG sleep until I attached to it (should be done only in the function you intend to debug):
{
bool waitFor_Debugger = true;
while( waitForDebugger )
pg_usleep(1000000);
}
It will wait forever here, until you set a breakpoint on 'while' and then set the var to false.
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Gregory Stark <stark@enterprisedb.com> writes:
> Hm, I suppose. Though starting a second gdb is a pain. What I've done in the
> past is introduce a usleep(30000000) in strategic points in the backend to
> give me a chance to attach.
There is already an option to sleep early in backend startup for the
normal case. Not sure if it works for bootstrap, autovacuum, etc,
but I could see making it do so.
You are probably referring to the command-line switch -W to posrgres, that translates to 'PostAuthDelay' GUC variable; I think that kicks in a bit too late! Once I was trying to debug check_root() (called by main() ), and had to resort to my own pg_usleep() to make the process wait for debugger-attach. We should somehow pull the sleep() code into main() as far up as possible.
BTW, here's how I made PG sleep until I attached to it (should be done only in the function you intend to debug):
{
bool waitFor_Debugger = true;
while( waitForDebugger )
pg_usleep(1000000);
}
It will wait forever here, until you set a breakpoint on 'while' and then set the var to false.
The suggestion of single-stepping
initdb will only work well if you have a version of gdb that can step
into a fork, which is something that's never worked for me :-(.
Otherwise the backend will free-run until it blocks waiting for input
from initdb, which means you are still stuck for debugging startup
crashes ...
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Zdenek Kotala wrote: > Gregory Stark wrote: >> "Martijn van Oosterhout" <kleptog@svana.org> writes: >> >>> Here's what I did: you can step over functions in initdb until it fails >>> (although I alredy know which part it's failing I guess). Restart. Then >>> you go into that function and step until the new backend has been >>> started. At this point you attach another gdb to the backend and let it >>> run. >> >> Hm, I suppose. Though starting a second gdb is a pain. What I've done >> in the >> past is introduce a usleep(30000000) in strategic points in the >> backend to >> give me a chance to attach. > > I use dtrace which wait on write syscall for stderr output and if it is > happen then stop(freeze) the process and I able to connect into the > process with debugger and examine what happened. > There is dtrace script which "sitting" on exec. It stops postgres process after exec. It works on Solaris. Different name of kernel function probably will be on other platform where is dtrace implemented (Freebsd,MacOS). ::exec_common:return /execname == "initdb"/ { exec_pg = 1; } syscall:::entry /execname == "postgres" && exec_pg == 1/ { stop(); printf("Postgres is stopped.\n"); exec_pg = 0; }
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes: > On 12/18/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There is already an option to sleep early in backend startup for the >> normal case. Not sure if it works for bootstrap, autovacuum, etc, >> but I could see making it do so. > You are probably referring to the command-line switch -W to posrgres, that > translates to 'PostAuthDelay' GUC variable; I think that kicks in a bit too > late! No, I was thinking of PreAuthDelay. There might be cases where even that is too late in the procedure --- probably not on Unix, but on Windows there's a lot that happens before BackendInitialize. But offhand I don't know how we'd have a configurable delay much earlier ... custom insertions of hardwired delays into the source code are probably the only good approach if you find that, say, guc.c initialization fails in individual backends under Windows. Back at the ranch, though, the question was whether it'd be worth honoring PreAuthDelay in the other startup code paths such as BootstrapMain. regards, tom lane