Thread: Backend message type 0x44 arrived while idle
WHile testing 6.5 cvs to see what's the progress with capability of Postgres to work with big joins I get following error messages: select a.a,at1.a as t1,at2.a as t2,at3.a as t3,at4.a as t4,at5.a as t5,a t6.a as t6,at7.a as t7,at8.a as t8,at9.a as t9,at10.a as t10 from t0 a ,t1 at1,t2 at2,t3 at3,t4 at4,t5 at5,t6 at6,t7 at7,t8 at8,t9 at9, t10 at10 where at1.a_id = a.a_id and at2.a_id=a.a_id and at3.a_id=a.a_id and at4.a_id=a.a_id and at5.a_id=a.a_id and at6.a_id=a.a_id and at7.a_id=a.a_id and at8.a_id=a.a_id and at9.a_id=a.a_id and at10.a_id=a.a_id ; Backend message type 0x44 arrived while idle Backend message type 0x44 arrived while idle We have lost the connection to the backend, so further processing is impossible. Terminating. Postgres+psql eaten all the memory+swap. Are these messages ok in such a situation ? Oleg PS. Postgres still can't serve large joins :-( _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > WHile testing 6.5 cvs to see what's the progress with capability > of Postgres to work with big joins I get following error messages: I think there are still some nasty bugs in the GEQO planner. (I assume you have the GEQO threshold set to less than the number of tables in your query?) Bruce did a lot of good cleanup work on the main planner but GEQO is mostly untouched. I've been hoping to poke at it some more before 6.5 release. regards, tom lane
> Oleg Bartunov <oleg@sai.msu.su> writes: > > WHile testing 6.5 cvs to see what's the progress with capability > > of Postgres to work with big joins I get following error messages: > > I think there are still some nasty bugs in the GEQO planner. (I assume > you have the GEQO threshold set to less than the number of tables in > your query?) Bruce did a lot of good cleanup work on the main planner > but GEQO is mostly untouched. I've been hoping to poke at it some more > before 6.5 release. I hope I didn't break it. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: >> WHile testing 6.5 cvs to see what's the progress with capability >> of Postgres to work with big joins I get following error messages: > I think there are still some nasty bugs in the GEQO planner. I have just committed some changes that fix bugs in the GEQO planner and limit its memory usage. It should now be possible to use GEQO even for queries that join a very large number of tables --- at least from the standpoint of not running out of memory during planning. (It can still take a while :-(. I think that the default GEQO parameter settings may be configured to use too many generations, but haven't poked at this yet.) I have observed that the regular optimizer requires about 50MB to plan some ten-way joins, and can exceed my system's 128MB process data limit on some eleven-way joins. We currently have the GEQO threshold set at 11, which prevents the latter case by default --- but 50MB is a lot. I wonder whether we shouldn't back the GEQO threshold off to 10. (When I suggested setting it to 11, I was only looking at speed relative to GEQO, not memory usage. There is now a *big* difference in memory usage...) Comments? regards, tom lane
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Bruce Momjian
Date:
> I have observed that the regular optimizer requires about 50MB to plan > some ten-way joins, and can exceed my system's 128MB process data limit > on some eleven-way joins. We currently have the GEQO threshold set at > 11, which prevents the latter case by default --- but 50MB is a lot. > I wonder whether we shouldn't back the GEQO threshold off to 10. > (When I suggested setting it to 11, I was only looking at speed relative > to GEQO, not memory usage. There is now a *big* difference in memory > usage...) Comments? You chose 11 by comparing GEQO with non-GEQO. I think you will find that with your improved GEQO, GEQO is faster for smaller number of joins, preventing the memory problem. Can you check the speeds again? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Oleg Bartunov
Date:
On Sun, 16 May 1999, Bruce Momjian wrote: > Date: Sun, 16 May 1999 21:17:30 -0400 (EDT) > From: Bruce Momjian <maillist@candle.pha.pa.us> > To: Tom Lane <tgl@sss.pgh.pa.us> > Cc: Oleg Bartunov <oleg@sai.msu.su>, pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle) > > > I have observed that the regular optimizer requires about 50MB to plan > > some ten-way joins, and can exceed my system's 128MB process data limit > > on some eleven-way joins. We currently have the GEQO threshold set at > > 11, which prevents the latter case by default --- but 50MB is a lot. > > I wonder whether we shouldn't back the GEQO threshold off to 10. > > (When I suggested setting it to 11, I was only looking at speed relative > > to GEQO, not memory usage. There is now a *big* difference in memory > > usage...) Comments? > > You chose 11 by comparing GEQO with non-GEQO. I think you will find > that with your improved GEQO, GEQO is faster for smaller number of > joins, preventing the memory problem. Can you check the speeds again? > I confirm big join with 11 tables doesn't eats all memory+swap on my Linux box as before and it runs *forever* :-). It took already 18 minutes of CPU (P200, 64Mb) ! Will wait. 8438 postgres 12 0 11104 3736 2620 R 0 98.6 5.9 18:16 postmaster This query doesn't use (expicitly) GEQO select t0.a,t1.a as t1,t2.a as t2,t3.a as t3,t4.a as t4,t5.a as t5,t6.a as t6,t7.a as t7,t8.a as t8,t9.a as t9,t10.a as t10fromt0 ,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10where t1.a_id = t0.a_t1_id and t2.a_id=t0.a_t2_id and t3.a_id=t0.a_t3_id and t4.a_id=t0.a_t4_idand t5.a_id=t0.a_t5_id and t6.a_id=t0.a_t6_id and t7.a_id=t0.a_t7_id and t8.a_id=t0.a_t8_id and t9.a_id=t0.a_t9_idand t10.a_id=t0.a_t10_id ; Regards, Oleg > -- > Bruce Momjian | http://www.op.net/~candle > maillist@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes: > I confirm big join with 11 tables doesn't eats all memory+swap on > my Linux box as before and it runs *forever* :-). It took already > 18 minutes of CPU (P200, 64Mb) ! Will wait. 18 minutes??? It takes barely over a minute on my aging 75MHz HP-PA box. (Practically all of which is planning time, since there are only 10 tuples to join... or are you doing this on a realistically sized set of tables now?) > This query doesn't use (expicitly) GEQO > select t0.a,t1.a as t1,t2.a as t2,t3.a as t3,t4.a as t4,t5.a as t5,t6.a as t6,t7.a as t7,t8.a as t8,t9.a as t9,t10.a ast10 > from t0 ,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10 > where t1.a_id = t0.a_t1_id and t2.a_id=t0.a_t2_id and t3.a_id=t0.a_t3_id and t4.a_id=t0.a_t4_id and t5.a_id=t0.a_t5_idand t6.a_id=t0.a_t6_id and t7.a_id=t0.a_t7_id and t8.a_id=t0.a_t8_id and t9.a_id=t0.a_t9_id and t10.a_id=t0.a_t10_id; No, but since there are 11 tables mentioned, it will be sent to the GEQO optimizer anyway with the default GEQO threshold of 11... regards, tom lane
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Oleg Bartunov
Date:
On Mon, 17 May 1999, Tom Lane wrote: > Date: Mon, 17 May 1999 09:44:18 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle) > > Oleg Bartunov <oleg@sai.msu.su> writes: > > I confirm big join with 11 tables doesn't eats all memory+swap on > > my Linux box as before and it runs *forever* :-). It took already > > 18 minutes of CPU (P200, 64Mb) ! Will wait. > > 18 minutes??? It takes barely over a minute on my aging 75MHz HP-PA > box. (Practically all of which is planning time, since there are only > 10 tuples to join... or are you doing this on a realistically sized > set of tables now?) > Oops, I found the problem. I modified my test script to add 'vacuum analyze' after creating test data and it works really fast ! Great ! Now I'm wondering why do I need vacuum analyze after creating test data and indices ? What's the state of discussion in hackers ? Regards, Oleg > > This query doesn't use (expicitly) GEQO > > > select t0.a,t1.a as t1,t2.a as t2,t3.a as t3,t4.a as t4,t5.a as t5,t6.a as t6,t7.a as t7,t8.a as t8,t9.a as t9,t10.aas t10 > > from t0 ,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10 > > where t1.a_id = t0.a_t1_id and t2.a_id=t0.a_t2_id and t3.a_id=t0.a_t3_id and t4.a_id=t0.a_t4_id and t5.a_id=t0.a_t5_idand t6.a_id=t0.a_t6_id and t7.a_id=t0.a_t7_id and t8.a_id=t0.a_t8_id and t9.a_id=t0.a_t9_id and t10.a_id=t0.a_t10_id; > > No, but since there are 11 tables mentioned, it will be sent to the GEQO > optimizer anyway with the default GEQO threshold of 11... > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Oleg Bartunov
Date:
Tom, I was so happy with the problem solved so I decided to play with more joins :-) Really, queries were processed very quickly but at 14 tables backend died : COPY t13 FROM STDIN USING DELIMITERS '|'; vacuum analyze; VACUUM select t0.a,t1.a as t1,t2.a as t2,t3.a as t3,t4.a as t4,t5.a as t5,t6.a as t6,t7.a as t7,t8.a as t8,t9.a as t9,t10.a as t10,t11.aas t11,t12.a as t12,t13.a as t13from t0 ,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12,t13where t1.a_id = t0.a_t1_id andt2.a_id=t0.a_t2_id and t3.a_id=t0.a_t3_id and t4.a_id=t0.a_t4_id and t5.a_id=t0.a_t5_id and t6.a_id=t0.a_t6_id and t7.a_id=t0.a_t7_idand t8.a_id=t0.a_t8_id and t9.a_id=t0.a_t9_id and t10.a_id=t0.a_t10_id and t11.a_id=t0.a_ t11_id and t12.a_id=t0.a_t12_id and t13.a_id=t0.a_t13_id ; pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. We have lost the connection to the backend, so further processing is impossible. Terminating. Tom, could you try my script at your machine ? I attached the script. You need perl to run it. mkjoindata.pl | psql test Regards, Oleg On Mon, 17 May 1999, Tom Lane wrote: > Date: Mon, 17 May 1999 09:44:18 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle) > > Oleg Bartunov <oleg@sai.msu.su> writes: > > I confirm big join with 11 tables doesn't eats all memory+swap on > > my Linux box as before and it runs *forever* :-). It took already > > 18 minutes of CPU (P200, 64Mb) ! Will wait. > > 18 minutes??? It takes barely over a minute on my aging 75MHz HP-PA > box. (Practically all of which is planning time, since there are only > 10 tuples to join... or are you doing this on a realistically sized > set of tables now?) > > > This query doesn't use (expicitly) GEQO > > > select t0.a,t1.a as t1,t2.a as t2,t3.a as t3,t4.a as t4,t5.a as t5,t6.a as t6,t7.a as t7,t8.a as t8,t9.a as t9,t10.aas t10 > > from t0 ,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10 > > where t1.a_id = t0.a_t1_id and t2.a_id=t0.a_t2_id and t3.a_id=t0.a_t3_id and t4.a_id=t0.a_t4_id and t5.a_id=t0.a_t5_idand t6.a_id=t0.a_t6_id and t7.a_id=t0.a_t7_id and t8.a_id=t0.a_t8_id and t9.a_id=t0.a_t9_id and t10.a_id=t0.a_t10_id; > > No, but since there are 11 tables mentioned, it will be sent to the GEQO > optimizer anyway with the default GEQO threshold of 11... > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 #!/usr/local/bin/perl -w #use strict; # Number of tables - classificators my $ntables = 14; # Number of joins, must be <= $ntables ! my $njoins = 13; # dimensions of tables my @nitems = ( 200, 200, 10, 10, 10, 10, 10,10,10, 10,10); die ("njoins ($njoins) must be >= ntables ($ntables) !") if ( $njoins > $ntables); my $tables = ($njoins < $ntables ) ? $njoins : $ntables; my $SQL = ''; # classificators for ( $i = 0; $i <= $tables; $i++ ) { my $table = 't'. $i; $SQL .= qq^drop table $table;\ncreate table $table ( a_id int4NOT NULL, a varchar(10)^; if ( $i == 0 ) { # main table for ( $k = 1; $k <= $ntables; $k++ ) { $SQL .=', a_t'.$k.'_id int4'; } } $SQL .= qq^);\ncreate index a_id_$table on $table (a_id);\n^; if ( $i == 0 ) { # maintable - create add. indices for ( $k = 1; $k <= $ntables; $k++ ) { my $table = 't'.$k; $SQL .= 'createindex a_'.$table.'_id_t0 on t0 (a_'.$table.'_id);'; $SQL .= "\n"; } } $SQL .= qq^COPY $table FROM STDINUSING DELIMITERS '|';\n^; srand($i); for ( $j = 1; $j <= $nitems[$i]; $j++ ) { my @cols = (); # push @cols, (int(rand($nitems[$i])),'a'.$table.$j); push @cols, ($j,'a'.$table); if ( $i == 0 ) { # main table -add cols for ( $k = 1; $k <= $ntables; $k++ ) { # push @cols, int(rand($nitems[$k])); push @cols, $j; } } $SQL .= join ('|',@cols)."\n"; } $SQL.= qq^\\.\n^; } # vacuum analyze $SQL .= qq^vacuum analyze;\n^; print "$SQL\n"; # Query with joins my $SELECT = ''; # Try GEQO #$SELECT .= qq^set GEQO to 'ON=$njoins';^; #$SELECT .= 'explain '; $SELECT .= 'select t0.a'; for ( $i = 1; $i <= $njoins; $i++ ) { $SELECT .= ',t'.$i.'.a as t'.$i; } $SELECT .= "\n".' from t0 '; for ( $i = 1; $i <= $njoins; $i++ ) { $SELECT .= ',t'.$i; } $SELECT .= "\n".' where t1.a_id = t0.a_t1_id '; for ( $i = 2; $i <= $njoins; $i++ ) { $SELECT .= 'and t'.$i.'.a_id=t0.a_t'.$i.'_id '; } $SELECT .= ';'; print $SELECT,"\n";
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Oleg Bartunov
Date:
Oops, it seems that was my fault, I didn't specified @nitems (sizes of tables) for all tables. Now it works fine. Tom, in case of my fault, why did postgres die ? Regards, Oleg On Mon, 17 May 1999, Oleg Bartunov wrote: > Date: Mon, 17 May 1999 18:08:45 +0400 (MSD) > From: Oleg Bartunov <oleg@sai.msu.su> > To: Tom Lane <tgl@sss.pgh.pa.us> > Cc: Oleg Bartunov <oleg@sai.msu.su>, pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle) > > Tom, > > I was so happy with the problem solved so I decided to play with more joins :-) > Really, queries were processed very quickly but at 14 tables backend died : > > COPY t13 FROM STDIN USING DELIMITERS '|'; > vacuum analyze; > VACUUM > > select t0.a,t1.a as t1,t2.a as t2,t3.a as t3,t4.a as t4,t5.a as t5,t6.a as t6,t7.a as t7,t8.a as t8,t9.a as t9,t10.a ast10,t11.a as t11,t12.a as t12,t13.a as t13 > from t0 ,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10,t11,t12,t13 > where t1.a_id = t0.a_t1_id and t2.a_id=t0.a_t2_id and t3.a_id=t0.a_t3_id and t4.a_id=t0.a_t4_id and t5.a_id=t0.a_t5_idand t6.a_id=t0.a_t6_id and t7.a_id=t0.a_t7_id and t8.a_id=t0.a_t8_id and t9.a_id=t0.a_t9_id and t10.a_id=t0.a_t10_idand t11.a_id=t0. a_ > t11_id and t12.a_id=t0.a_t12_id and t13.a_id=t0.a_t13_id ; > pqReadData() -- backend closed the channel unexpectedly. > This probably means the backend terminated abnormally > before or while processing the request. > We have lost the connection to the backend, so further processing is impossible. Terminating. > > > Tom, could you try my script at your machine ? > I attached the script. You need perl to run it. > > mkjoindata.pl | psql test > > Regards, > > Oleg > > > On Mon, 17 May 1999, Tom Lane wrote: > > > Date: Mon, 17 May 1999 09:44:18 -0400 > > From: Tom Lane <tgl@sss.pgh.pa.us> > > To: Oleg Bartunov <oleg@sai.msu.su> > > Cc: pgsql-hackers@postgreSQL.org > > Subject: Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle) > > > > Oleg Bartunov <oleg@sai.msu.su> writes: > > > I confirm big join with 11 tables doesn't eats all memory+swap on > > > my Linux box as before and it runs *forever* :-). It took already > > > 18 minutes of CPU (P200, 64Mb) ! Will wait. > > > > 18 minutes??? It takes barely over a minute on my aging 75MHz HP-PA > > box. (Practically all of which is planning time, since there are only > > 10 tuples to join... or are you doing this on a realistically sized > > set of tables now?) > > > > > This query doesn't use (expicitly) GEQO > > > > > select t0.a,t1.a as t1,t2.a as t2,t3.a as t3,t4.a as t4,t5.a as t5,t6.a as t6,t7.a as t7,t8.a as t8,t9.a as t9,t10.aas t10 > > > from t0 ,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10 > > > where t1.a_id = t0.a_t1_id and t2.a_id=t0.a_t2_id and t3.a_id=t0.a_t3_id and t4.a_id=t0.a_t4_id and t5.a_id=t0.a_t5_idand t6.a_id=t0.a_t6_id and t7.a_id=t0.a_t7_id and t8.a_id=t0.a_t8_id and t9.a_id=t0.a_t9_id and t10.a_id=t0.a_t10_id; > > > > No, but since there are 11 tables mentioned, it will be sent to the GEQO > > optimizer anyway with the default GEQO threshold of 11... > > > > regards, tom lane > > > > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes: > I found the problem. I modified my test script to add 'vacuum analyze' > after creating test data and it works really fast ! Great ! > Now I'm wondering why do I need vacuum analyze after creating test data > and indices ? VACUUM ANALYZE would create pg_statistics entries for the tables, which'd allow the optimizer to make better estimates of restriction and join selectivities. I expect that it changes the plan being used; what does EXPLAIN say with and without the analyze? regards, tom lane
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes: > it seems that was my fault, I didn't specified @nitems (sizes of tables) > for all tables. Now it works fine. > Tom, in case of my fault, why did postgres die ? I don't know --- I don't see it here. I just ran your script as given, and it worked. (It produced zero rows of output, since the missing nitems values meant that no data was loaded into the last few tables ... but there was no backend crash.) Is it crashing because of running out of memory, or something else? Can you provide a backtrace? regards, tom lane
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Oleg Bartunov
Date:
On Mon, 17 May 1999, Tom Lane wrote: > Date: Mon, 17 May 1999 11:24:31 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: Oleg Bartunov <oleg@sai.msu.su> > Cc: pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle) > > Oleg Bartunov <oleg@sai.msu.su> writes: > > it seems that was my fault, I didn't specified @nitems (sizes of tables) > > for all tables. Now it works fine. > > Tom, in case of my fault, why did postgres die ? > > I don't know --- I don't see it here. I just ran your script as given, > and it worked. (It produced zero rows of output, since the missing > nitems values meant that no data was loaded into the last few tables ... > but there was no backend crash.) > > Is it crashing because of running out of memory, or something else? No, memory is fine. It just dies. > Can you provide a backtrace? Will try to reproduce crash,. How do I can debug psql ? > > regards, tom lane > _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: >> Can you provide a backtrace? > Will try to reproduce crash,. How do I can debug psql ? There should be a core file left in the database subdirectory, eg /usr/local/pgsql/data/base/DB/core where DB represents the name of the database you used. As the postgres user, do this (with appropriate pathname changes of course) gdb /usr/local/pgsql/bin/postgres /usr/local/pgsql/data/base/DB/core and when you get the (gdb) prompt, enter "bt" for backtrace. You should get a few dozen lines of printout, more or less like this: (gdb) bt #0 AllocSetAlloc (set=0x40254600, size=1076446936) at aset.c:267 #1 0x169314 in PortalHeapMemoryAlloc (this=0x40254600, size=36) at portalmem.c:264 #2 0x168bb4 in MemoryContextAlloc (context=0x4007d940, size=36) at mcxt.c:230 #3 0xe4d88 in newNode (size=36, tag=T_Resdom) at nodes.c:41 #4 0xea92c in makeResdom (resno=17920, restype=23, restypmod=-1, resname=0x0, reskey=0, reskeyop=0, resjunk=0) at makefuncs.c:102 #5 0x101448 in create_tl_element (var=0x402402e0, resdomno=36) at tlist.c:135 #6 0xf689c in new_join_tlist (tlist=0x40254600, first_resdomno=36) at joinrels.c:286 ... The "q" command will get you out of gdb after you've copied and pasted this info. BTW, if you did not build the backend with "-g" included in CFLAGS, you will get a much less complete backtrace ... but it may still tell us something of use. regards, tom lane
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> I have observed that the regular optimizer requires about 50MB to plan >> some ten-way joins, and can exceed my system's 128MB process data limit >> on some eleven-way joins. We currently have the GEQO threshold set at >> 11, which prevents the latter case by default --- but 50MB is a lot. >> I wonder whether we shouldn't back the GEQO threshold off to 10. >> (When I suggested setting it to 11, I was only looking at speed relative >> to GEQO, not memory usage. There is now a *big* difference in memory >> usage...) Comments? > You chose 11 by comparing GEQO with non-GEQO. I think you will find > that with your improved GEQO, GEQO is faster for smaller number of > joins, preventing the memory problem. Can you check the speeds again? Bruce, I have rerun a couple of tests and am getting numbers like these: # tables joined ... 10 11 ... STD OPTIMIZER 24 115 GEQO 45 55 This is after tweaking the GEQO parameters to improve speed slightly in the default case. (Setting EFFORT=LOW reduces the 11-way plan time to about 40 sec, setting EFFORT=HIGH makes it about 70.) The breakpoint for speed is still clearly at GEQO threshold 11. *However*, the regular optimizer uses close to 120MB of memory to plan these 11-way joins, and that's excessive (especially since that's not even counting the space that will be used for execution...). Until we can do something about reclaiming space more effectively, I recommend reducing the default GEQO threshold to 10. regards, tom lane
Re: [HACKERS] GEQO optimizer (was Re: Backend message type 0x44 arrived while idle)
From
Bruce Momjian
Date:
> > You chose 11 by comparing GEQO with non-GEQO. I think you will find > > that with your improved GEQO, GEQO is faster for smaller number of > > joins, preventing the memory problem. Can you check the speeds again? > > Bruce, I have rerun a couple of tests and am getting numbers like these: > > # tables joined > > ... 10 11 ... > > STD OPTIMIZER 24 115 > GEQO 45 55 > > This is after tweaking the GEQO parameters to improve speed slightly > in the default case. (Setting EFFORT=LOW reduces the 11-way plan time > to about 40 sec, setting EFFORT=HIGH makes it about 70.) > > The breakpoint for speed is still clearly at GEQO threshold 11. > *However*, the regular optimizer uses close to 120MB of memory to > plan these 11-way joins, and that's excessive (especially since that's > not even counting the space that will be used for execution...). > Until we can do something about reclaiming space more effectively, > I recommend reducing the default GEQO threshold to 10. Agreed. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026