Thread: cluster test

cluster test

From

Joachim Wieland

Date:

25 May 2007, 12:26:08

For some reason the cluster test fails on my machine due to a different
order of the result rows when I run "installcheck" instead of "check". Is
there a problem adding an ORDER BY to it?

Joachim

Attachment

cluster-regress.diff

Re: cluster test

From

Tom Lane

Date:

25 May 2007, 14:33:56

Joachim Wieland <joe@mcknight.de> writes:
> For some reason the cluster test fails on my machine due to a different
> order of the result rows when I run "installcheck" instead of "check". Is
> there a problem adding an ORDER BY to it?

We should find out why that's happening rather than just throwing an
ORDER BY at it.  Considering the number of buildfarm machines that
aren't showing any such problem, there must be something odd about
yours.  What's the platform?  What plan is being chosen for that SELECT?

            regards, tom lane

Re: cluster test

From

Joachim Wieland

Date:

25 May 2007, 15:28:44

On Fri, May 25, 2007 at 10:33:41AM -0400, Tom Lane wrote:
> We should find out why that's happening rather than just throwing an
> ORDER BY at it.  Considering the number of buildfarm machines that
> aren't showing any such problem, there must be something odd about
> yours.  What's the platform?  What plan is being chosen for that SELECT?

It's regular Debian Linux 2.6 on ix86.

EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.00..8.27 rows=1 width=64)
   Index Cond: (conrelid = 170982::oid)
(2 rows)


SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
    conname
----------------
 clstr_tst_con
 clstr_tst_pkey
(2 rows)

As said before, it only happens with "make installcheck", not "make check".


Joachim

Re: cluster test

From

Alvaro Herrera

Date:

25 May 2007, 15:39:52

Joachim Wieland wrote:
> On Fri, May 25, 2007 at 10:33:41AM -0400, Tom Lane wrote:
> > We should find out why that's happening rather than just throwing an
> > ORDER BY at it.  Considering the number of buildfarm machines that
> > aren't showing any such problem, there must be something odd about
> > yours.  What's the platform?  What plan is being chosen for that SELECT?
>
> It's regular Debian Linux 2.6 on ix86.
>
> EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
>                                             QUERY PLAN
> ---------------------------------------------------------------------------------------------------
>  Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.00..8.27 rows=1 width=64)
>    Index Cond: (conrelid = 170982::oid)
> (2 rows)
>
>
> SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
>     conname
> ----------------
>  clstr_tst_con
>  clstr_tst_pkey
> (2 rows)
>
> As said before, it only happens with "make installcheck", not "make check".

Maybe there's an autovacuum run just before the test that causes
pg_constraint entries to be reordered?

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)

Re: cluster test

From

Tom Lane

Date:

25 May 2007, 16:00:42

Joachim Wieland <joe@mcknight.de> writes:
> As said before, it only happens with "make installcheck", not "make check".

Curious.  I'm not sure if the buildfarm tries to isolate the
installation against its locale environment --- can you check the locale
used by the install case?

            regards, tom lane

Re: cluster test

From

Tom Lane

Date:

25 May 2007, 16:09:49

Joachim Wieland <joe@mcknight.de> writes:
> EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
>                                             QUERY PLAN
> ---------------------------------------------------------------------------------------------------
>  Index Scan using pg_constraint_conrelid_index on pg_constraint (cost=0.00..8.27 rows=1 width=64)
>    Index Cond: (conrelid = 170982::oid)
> (2 rows)

Actually, can the locale idea --- it looks like a plan-instability
thing.  On my machines I get results like this:

regression=# explain SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
                          QUERY PLAN
--------------------------------------------------------------
 Seq Scan on pg_constraint  (cost=0.00..7.35 rows=1 width=64)
   Filter: (conrelid = 28856::oid)
(2 rows)

regression=# SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
    conname
----------------
 clstr_tst_pkey
 clstr_tst_con
(2 rows)

regression=# set enable_seqscan TO 0;
SET
regression=# explain SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using pg_constraint_conrelid_index on pg_constraint  (cost=0.00..8.27 rows=1 width=64)
   Index Cond: (conrelid = 28856::oid)
(2 rows)

regression=# SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
    conname
----------------
 clstr_tst_con
 clstr_tst_pkey
(2 rows)

This is in the regression database after a completed regression run, so
it's possible that it's a bit different state from what's seen at the
instant the cluster test was running, but it sure looks like the
"expected" results are what you get from a seqscan.  Would you force a
seqscan and see what EXPLAIN shows as the cost on your machine?

            regards, tom lane

Re: cluster test

From

Gregory Stark

Date:

25 May 2007, 16:55:29

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> This is in the regression database after a completed regression run, so
> it's possible that it's a bit different state from what's seen at the
> instant the cluster test was running, but it sure looks like the
> "expected" results are what you get from a seqscan.  Would you force a
> seqscan and see what EXPLAIN shows as the cost on your machine?

Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore
different size tables which because the planner does the lseek to measure the
table size shows up as different estimates for sequential scan costs?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: cluster test

From

Tom Lane

Date:

25 May 2007, 18:02:34

Gregory Stark <stark@enterprisedb.com> writes:
> Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore
> different size tables which because the planner does the lseek to measure the
> table size shows up as different estimates for sequential scan costs?

But we've got plenty of both in the buildfarm, and none of them are
showing this failure.  So I'm curious to know what's really different
about Joachim's installation.  It seems he must have a pg_constraint
table enough larger than "normal" to discourage the seqscan, but where
did that come from?  There's only one row in pg_constraint in standard
template0 --- could he be working with a custom system that has many
more?

            regards, tom lane

Re: cluster test

From

Joachim Wieland

Date:

25 May 2007, 21:21:23

On Fri, May 25, 2007 at 12:09:43PM -0400, Tom Lane wrote:
> This is in the regression database after a completed regression run, so
> it's possible that it's a bit different state from what's seen at the
> instant the cluster test was running, but it sure looks like the
> "expected" results are what you get from a seqscan.  Would you force a
> seqscan and see what EXPLAIN shows as the cost on your machine?

I have appended the relevant parts of the modified cluster script.
I haven't pointed out clear enough that I'm running this against a fresh
database cluster, no vacuum, no other databases, no config changes, just initdb,
postmaster startup and installcheck. I usually have es_ES@euro here but I
get the same with locale C.

Feel free to ask for more output.

Oh, btw, its a regular 32bit box as well.

Joachim

Attachment

cluster.out

Re: cluster test

From

Tom Lane

Date:

25 May 2007, 21:59:08

Joachim Wieland <joe@mcknight.de> writes:
> SET enable_bitmapscan = 0;
> EXPLAIN SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
>                           QUERY PLAN
> ---------------------------------------------------------------
>  Seq Scan on pg_constraint  (cost=0.00..27.15 rows=1 width=64)
>    Filter: (conrelid = 54538::oid)
> (2 rows)

Hm, well, that's why it doesn't want to use a seqscan, but why is the
estimate so high?  I get 7.35 on my boxes, vs 8.27 (which does agree
with yours) for the indexscans.  Stranger and stranger.

Would you try inserting a "vacuum verbose pg_constraint" into the test
as well?  Maybe that will tell something relevant.

            regards, tom lane

Re: cluster test

From

Joachim Wieland

Date:

26 May 2007, 09:58:51

On Fri, May 25, 2007 at 05:58:58PM -0400, Tom Lane wrote:
> Would you try inserting a "vacuum verbose pg_constraint" into the test
> as well?  Maybe that will tell something relevant.

It's weird. I have run a "make check" with the serial schedule and a regular
installcheck on a fresh DB. Vacuum verbose shows the same output on both but
different plans are chosen.

Joachim

Attachment

Re: cluster test

From

Heikki Linnakangas

Date:

26 May 2007, 10:43:29

Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> Perhaps this comes down to 64 vs 32 bit datum and aligments and therefore
>> different size tables which because the planner does the lseek to measure the
>> table size shows up as different estimates for sequential scan costs?
>
> But we've got plenty of both in the buildfarm, and none of them are
> showing this failure.  So I'm curious to know what's really different
> about Joachim's installation.  It seems he must have a pg_constraint
> table enough larger than "normal" to discourage the seqscan, but where
> did that come from?  There's only one row in pg_constraint in standard
> template0 --- could he be working with a custom system that has many
> more?

Or maybe some non-default values in postgresql.conf? Like random_page_cost?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: cluster test

From

Joachim Wieland

Date:

26 May 2007, 11:40:41

On Sat, May 26, 2007 at 11:40:52AM +0100, Heikki Linnakangas wrote:
> >But we've got plenty of both in the buildfarm, and none of them are
> >showing this failure.  So I'm curious to know what's really different
> >about Joachim's installation.  It seems he must have a pg_constraint
> >table enough larger than "normal" to discourage the seqscan, but where
> >did that come from?  There's only one row in pg_constraint in standard
> >template0 --- could he be working with a custom system that has many
> >more?

> Or maybe some non-default values in postgresql.conf? Like random_page_cost?

No, I use what initdb creates:

$ cat postgresql.conf | grep ^[a-zA-Z]
max_connections = 100                   # (change requires restart)
shared_buffers = 24MB                   # min 128kB or max_connections*16kB
max_fsm_pages = 153600                  # min max_fsm_relations*16, 6 bytes each
datestyle = 'iso, dmy'
lc_messages = 'es_ES@euro'                      # locale for system error message
lc_monetary = 'es_ES@euro'                      # locale for monetary formatting
lc_numeric = 'es_ES@euro'                       # locale for number formatting
lc_time = 'es_ES@euro'                          # locale for time formatting


Joachim

Re: cluster test

From

Tom Lane

Date:

26 May 2007, 16:14:44

Joachim Wieland <joe@mcknight.de> writes:
> On Sat, May 26, 2007 at 11:40:52AM +0100, Heikki Linnakangas wrote:
>> Or maybe some non-default values in postgresql.conf? Like random_page_cost?

> No, I use what initdb creates:

Curiouser and curiouser.  You still get the indexscan as preferred if
you EXPLAIN the query after the regression tests complete, right?
Could you step through cost_seqscan and see how it's arriving at such
a high value?

            regards, tom lane

Re: cluster test

From

Joachim Wieland

Date:

26 May 2007, 17:12:35

On Sat, May 26, 2007 at 12:14:14PM -0400, Tom Lane wrote:
> Curiouser and curiouser.  You still get the indexscan as preferred if
> you EXPLAIN the query after the regression tests complete, right?
> Could you step through cost_seqscan and see how it's arriving at such
> a high value?

Ok... I figured it out... When I was doing the guc patch I modified my
startdb.sh script such that it passes settings on the command line to
postmaster. There I changed seq_page_cost. I know that I have removed those
settings on my laptop computer since then but I seem to have copied it here
on my other machine at that time and so it started with a different
seq_page_cost... *blushing*...

Sorry for the noise... Interesting though that it only made a difference to
the cluster test...

Joachim

Re: cluster test

From

Andrew Dunstan

Date:

27 May 2007, 13:51:24


Tom Lane wrote:
> Joachim Wieland <joe@mcknight.de> writes:
>
>> As said before, it only happens with "make installcheck", not "make check".
>>
>
> Curious.  I'm not sure if the buildfarm tries to isolate the
> installation against its locale environment --- can you check the locale
> used by the install case?
>

Buildfarm does:

   @initout = `"bin/initdb" --no-locale data 2>&1`;

If it didn't we'd have had problems with much more for much longer.


cheers

andrew