Thread: cluster test
For some reason the cluster test fails on my machine due to a different order of the result rows when I run "installcheck" instead of "check". Is there a problem adding an ORDER BY to it? Joachim
Attachment
Joachim Wieland <joe@mcknight.de> writes: > For some reason the cluster test fails on my machine due to a different > order of the result rows when I run "installcheck" instead of "check". Is > there a problem adding an ORDER BY to it? We should find out why that's happening rather than just throwing an ORDER BY at it. Considering the number of buildfarm machines that aren't showing any such problem, there must be something odd about yours. What's the platform? What plan is being chosen for that SELECT? regards, tom lane
On Fri, May 25, 2007 at 10:33:41AM -0400, Tom Lane wrote: > We should find out why that's happening rather than just throwing an > ORDER BY at it. Considering the number of buildfarm machines that > aren't showing any such problem, there must be something odd about > yours. What's the platform? What plan is being chosen for that SELECT? It's regular Debian Linux 2.6 on ix86. EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; QUERY PLAN --------------------------------------------------------------------------------------------------- Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.00..8.27 rows=1 width=64) Index Cond: (conrelid = 170982::oid) (2 rows) SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; conname ---------------- clstr_tst_con clstr_tst_pkey (2 rows) As said before, it only happens with "make installcheck", not "make check". Joachim
Joachim Wieland wrote: > On Fri, May 25, 2007 at 10:33:41AM -0400, Tom Lane wrote: > > We should find out why that's happening rather than just throwing an > > ORDER BY at it. Considering the number of buildfarm machines that > > aren't showing any such problem, there must be something odd about > > yours. What's the platform? What plan is being chosen for that SELECT? > > It's regular Debian Linux 2.6 on ix86. > > EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; > QUERY PLAN > --------------------------------------------------------------------------------------------------- > Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.00..8.27 rows=1 width=64) > Index Cond: (conrelid = 170982::oid) > (2 rows) > > > SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; > conname > ---------------- > clstr_tst_con > clstr_tst_pkey > (2 rows) > > As said before, it only happens with "make installcheck", not "make check". Maybe there's an autovacuum run just before the test that causes pg_constraint entries to be reordered? -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 Oh, oh, las chicas galacianas, lo harán por las perlas, ¡Y las de Arrakis por el agua! Pero si buscas damas Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)
Joachim Wieland <joe@mcknight.de> writes: > As said before, it only happens with "make installcheck", not "make check". Curious. I'm not sure if the buildfarm tries to isolate the installation against its locale environment --- can you check the locale used by the install case? regards, tom lane
Joachim Wieland <joe@mcknight.de> writes: > EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; > QUERY PLAN > --------------------------------------------------------------------------------------------------- > Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.00..8.27 rows=1 width=64) > Index Cond: (conrelid = 170982::oid) > (2 rows) Actually, can the locale idea --- it looks like a plan-instability thing. On my machines I get results like this: regression=# explain SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; QUERY PLAN -------------------------------------------------------------- Seq Scan on pg_constraint (cost=0.00..7.35 rows=1 width=64) Filter: (conrelid = 28856::oid) (2 rows) regression=# SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; conname ---------------- clstr_tst_pkey clstr_tst_con (2 rows) regression=# set enable_seqscan TO 0; SET regression=# explain SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; QUERY PLAN --------------------------------------------------------------------------------------------------- Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.00..8.27 rows=1 width=64) Index Cond: (conrelid = 28856::oid) (2 rows) regression=# SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; conname ---------------- clstr_tst_con clstr_tst_pkey (2 rows) This is in the regression database after a completed regression run, so it's possible that it's a bit different state from what's seen at the instant the cluster test was running, but it sure looks like the "expected" results are what you get from a seqscan. Would you force a seqscan and see what EXPLAIN shows as the cost on your machine? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > This is in the regression database after a completed regression run, so > it's possible that it's a bit different state from what's seen at the > instant the cluster test was running, but it sure looks like the > "expected" results are what you get from a seqscan. Would you force a > seqscan and see what EXPLAIN shows as the cost on your machine? Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore different size tables which because the planner does the lseek to measure the table size shows up as different estimates for sequential scan costs? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore > different size tables which because the planner does the lseek to measure the > table size shows up as different estimates for sequential scan costs? But we've got plenty of both in the buildfarm, and none of them are showing this failure. So I'm curious to know what's really different about Joachim's installation. It seems he must have a pg_constraint table enough larger than "normal" to discourage the seqscan, but where did that come from? There's only one row in pg_constraint in standard template0 --- could he be working with a custom system that has many more? regards, tom lane
On Fri, May 25, 2007 at 12:09:43PM -0400, Tom Lane wrote: > This is in the regression database after a completed regression run, so > it's possible that it's a bit different state from what's seen at the > instant the cluster test was running, but it sure looks like the > "expected" results are what you get from a seqscan. Would you force a > seqscan and see what EXPLAIN shows as the cost on your machine? I have appended the relevant parts of the modified cluster script. I haven't pointed out clear enough that I'm running this against a fresh database cluster, no vacuum, no other databases, no config changes, just initdb, postmaster startup and installcheck. I usually have es_ES@euro here but I get the same with locale C. Feel free to ask for more output. Oh, btw, its a regular 32bit box as well. Joachim
Attachment
Joachim Wieland <joe@mcknight.de> writes: > SET enable_bitmapscan = 0; > EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass; > QUERY PLAN > --------------------------------------------------------------- > Seq Scan on pg_constraint (cost=0.00..27.15 rows=1 width=64) > Filter: (conrelid = 54538::oid) > (2 rows) Hm, well, that's why it doesn't want to use a seqscan, but why is the estimate so high? I get 7.35 on my boxes, vs 8.27 (which does agree with yours) for the indexscans. Stranger and stranger. Would you try inserting a "vacuum verbose pg_constraint" into the test as well? Maybe that will tell something relevant. regards, tom lane
On Fri, May 25, 2007 at 05:58:58PM -0400, Tom Lane wrote: > Would you try inserting a "vacuum verbose pg_constraint" into the test > as well? Maybe that will tell something relevant. It's weird. I have run a "make check" with the serial schedule and a regular installcheck on a fresh DB. Vacuum verbose shows the same output on both but different plans are chosen. Joachim
Attachment
Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: >> Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore >> different size tables which because the planner does the lseek to measure the >> table size shows up as different estimates for sequential scan costs? > > But we've got plenty of both in the buildfarm, and none of them are > showing this failure. So I'm curious to know what's really different > about Joachim's installation. It seems he must have a pg_constraint > table enough larger than "normal" to discourage the seqscan, but where > did that come from? There's only one row in pg_constraint in standard > template0 --- could he be working with a custom system that has many > more? Or maybe some non-default values in postgresql.conf? Like random_page_cost? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, May 26, 2007 at 11:40:52AM +0100, Heikki Linnakangas wrote: > >But we've got plenty of both in the buildfarm, and none of them are > >showing this failure. So I'm curious to know what's really different > >about Joachim's installation. It seems he must have a pg_constraint > >table enough larger than "normal" to discourage the seqscan, but where > >did that come from? There's only one row in pg_constraint in standard > >template0 --- could he be working with a custom system that has many > >more? > Or maybe some non-default values in postgresql.conf? Like random_page_cost? No, I use what initdb creates: $ cat postgresql.conf | grep ^[a-zA-Z] max_connections = 100 # (change requires restart) shared_buffers = 24MB # min 128kB or max_connections*16kB max_fsm_pages = 153600 # min max_fsm_relations*16, 6 bytes each datestyle = 'iso, dmy' lc_messages = 'es_ES@euro' # locale for system error message lc_monetary = 'es_ES@euro' # locale for monetary formatting lc_numeric = 'es_ES@euro' # locale for number formatting lc_time = 'es_ES@euro' # locale for time formatting Joachim
Joachim Wieland <joe@mcknight.de> writes: > On Sat, May 26, 2007 at 11:40:52AM +0100, Heikki Linnakangas wrote: >> Or maybe some non-default values in postgresql.conf? Like random_page_cost? > No, I use what initdb creates: Curiouser and curiouser. You still get the indexscan as preferred if you EXPLAIN the query after the regression tests complete, right? Could you step through cost_seqscan and see how it's arriving at such a high value? regards, tom lane
On Sat, May 26, 2007 at 12:14:14PM -0400, Tom Lane wrote: > Curiouser and curiouser. You still get the indexscan as preferred if > you EXPLAIN the query after the regression tests complete, right? > Could you step through cost_seqscan and see how it's arriving at such > a high value? Ok... I figured it out... When I was doing the guc patch I modified my startdb.sh script such that it passes settings on the command line to postmaster. There I changed seq_page_cost. I know that I have removed those settings on my laptop computer since then but I seem to have copied it here on my other machine at that time and so it started with a different seq_page_cost... *blushing*... Sorry for the noise... Interesting though that it only made a difference to the cluster test... Joachim
Tom Lane wrote: > Joachim Wieland <joe@mcknight.de> writes: > >> As said before, it only happens with "make installcheck", not "make check". >> > > Curious. I'm not sure if the buildfarm tries to isolate the > installation against its locale environment --- can you check the locale > used by the install case? > Buildfarm does: @initout = `"bin/initdb" --no-locale data 2>&1`; If it didn't we'd have had problems with much more for much longer. cheers andrew