Thread: bug in join?

bug in join?

From
Laurette Cisneros
Date:
We are running the 7.2b4 beta server.

This join worked last week and today it gets and error:

select * from b, dwhere b.address = d.address;

It now fails with the following error:
ERROR:  join_selectivity: bad value -0.121693

all that is in the pgsql.log file is the same error.

I am working on a reproduction I can give you, but in the mean time I was
wondering if this is a current problem with the server or me?

Thanks,

-- 
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere



Re: bug in join?

From
Tom Lane
Date:
Laurette Cisneros <laurette@nextbus.com> writes:
> This join worked last week and today it gets and error:
> select * from b, d
>  where b.address = d.address;
> It now fails with the following error:
> ERROR:  join_selectivity: bad value -0.121693

Probably what has changed is the pg_statistic data (VACUUM ANALYZE
results).  Please send the results of

select * from pg_stats where tablename = 'b';
select * from pg_stats where tablename = 'd';
        regards, tom lane


Re: bug in join?

From
Tom Lane
Date:
Laurette Cisneros <laurette@nextbus.com> writes:
> I will keep trying to recreate it for you.

You could just try
analyze b;analyze d;explain select * from b,d where b.address = d.address;

and repeat until you see the error from EXPLAIN.  Since ANALYZE takes
a random sampling these days, successive loops will in fact produce
slightly different results, and you may be able to recreate the
erroneous state eventually.

The math in eqjoinsel() is not entirely trivial, but I thought I had
convinced myself it was okay.  I need to see a failing example to 
figure out what's wrong with it.
        regards, tom lane


Re: bug in join?

From
Laurette Cisneros
Date:
Yeah, the culprit appears to be vacuum analyze (vacuum alone doesn't do it).

The problem is that I fixed the original database by dropping and
recreating the tables populating them with backed up data.  And, now it
won't recreate (the values in pg_stats for them is lost).

Ugh.

I will keep trying to recreate it for you.

L.
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > This join worked last week and today it gets and error:
> > select * from b, d
> >  where b.address = d.address;
> > It now fails with the following error:
> > ERROR:  join_selectivity: bad value -0.121693
>
> Probably what has changed is the pg_statistic data (VACUUM ANALYZE
> results).  Please send the results of
>
> select * from pg_stats where tablename = 'b';
> select * from pg_stats where tablename = 'd';
>
>             regards, tom lane
>

-- 
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere



Re: bug in join?

From
Laurette Cisneros
Date:
Will keep trying.  I also noticed that several triggers were dropped when I
dropped the table. I need to add these back and see if they make a
difference.

More as I have it...

L.
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > I will keep trying to recreate it for you.
>
> You could just try
>
>     analyze b;
>     analyze d;
>     explain select * from b,d where b.address = d.address;
>
> and repeat until you see the error from EXPLAIN.  Since ANALYZE takes
> a random sampling these days, successive loops will in fact produce
> slightly different results, and you may be able to recreate the
> erroneous state eventually.
>
> The math in eqjoinsel() is not entirely trivial, but I thought I had
> convinced myself it was okay.  I need to see a failing example to
> figure out what's wrong with it.
>
>             regards, tom lane
>

-- 
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere



Re: bug in join?

From
Tom Lane
Date:
Laurette Cisneros <laurette@nextbus.com> writes:
> Will keep trying.  I also noticed that several triggers were dropped when I
> dropped the table. I need to add these back and see if they make a
> difference.

No, join_selectivity isn't going to care about triggers.  What's failing
is the estimation of the fraction of rows that will match on address
between the two tables (join_selectivity is rejecting the result as
obviously bogus, which it is).  That doesn't depend on anything except
the ANALYZE statistics.
        regards, tom lane


Re: bug in join?

From
Laurette Cisneros
Date:
OOps, never mind about the triggers.  But, I will keep trying to induce the
problem for you.

L.
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > I will keep trying to recreate it for you.
>
> You could just try
>
>     analyze b;
>     analyze d;
>     explain select * from b,d where b.address = d.address;
>
> and repeat until you see the error from EXPLAIN.  Since ANALYZE takes
> a random sampling these days, successive loops will in fact produce
> slightly different results, and you may be able to recreate the
> erroneous state eventually.
>
> The math in eqjoinsel() is not entirely trivial, but I thought I had
> convinced myself it was okay.  I need to see a failing example to
> figure out what's wrong with it.
>
>             regards, tom lane
>

-- 
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere



Re: bug in join?

From
Laurette Cisneros
Date:
OK, I reproduced it again.  I had restored the data for the tables from the
wrong backup.  I restored the two tables from last night's backup (BTW, we use
pg_dump -a -O -Fc...) and ran vacuum analyze and it reproduces (each and
every time).

We've turned off vacuum analyze (we do it every night *after* the backup)
for now.  We would love for this to get fixed asap (of course ;.)

Here's the info. you asked for:

select * from pg_stats where tablesname ='b':
tablename |   attname    | null_frac | avg_width | n_distinct | most_common_vals |
most_common_freqs| histogram_bounds | correlation
 

-----------+--------------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+------------------+-------------b
       | rev          |         0 |         4 |  -0.142857 | {"0","1001","1002"} | {"0.333333","0.333333","0.333333"}
     |                  |           1b         | b_tag        |         0 |         7 |  -0.333333 |
{"S001","S002","S003","S004","S005","VT1","VT2"}|
{"0.142857","0.142857","0.142857","0.142857","0.142857","0.142857","0.142857"}|                  |    0.454545b
|input_tag    |         0 |        16 |          1 | {"AirLinkInput"} | {"1"} |                  |           1b
|address      |         0 |        19 |  -0.333333 |
{"166.128.052.237","166.128.053.084","166.128.054.017","166.128.054.018","166.128.057.250","166.128.058.202","166.128.058.203"}
|{"0.142857","0.142857","0.142857","0.142857","0.142857","0.142857","0.142857"} |                  |    0.454545b
 | b_distance   |         0 |         4 |          1 | {"200"} | {"1"} |                  |           1b         |
b_time       |         0 |         4 |          1 | {"90"} | {"1"} |                  |           1
 
(6 rows)

select * from pg_status where tablename = 'd';tablename |   attname   | null_frac | avg_width | n_distinct |
most_common_vals|                               most_common_freqs | histogram_bounds | correlation
 

-----------+-------------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------d
       | rev         |         0 |         4 |          3 | {"0","1002","1001"} | {"0.45679","0.45679","0.0864198"} | |
  0.912263d         | address     |         0 |        19 |   -0.45679 |
{"166.128.052.237","166.128.053.084","166.128.054.017","166.128.054.018","166.128.057.250","166.128.058.202","166.128.058.203"}
|{"0.037037","0.037037","0.037037","0.037037","0.037037","0.037037","0.037037"} |
{"166.128.169.189","166.128.169.199","166.128.169.236","166.128.171.002","166.129.108.003","166.132.126.184","166.133.174.093","166.133.174.098","166.204.012.171","166.204.045.135","166.204.066.001"}
|   0.451197d         | d_passwd    |         0 |         9 |          1 | {"Czech"} | {"1"} | |           1d         |
d_port     |         0 |         4 |          1 | {"22335"} | {"1"} |                                  |           1d
     | d_type      |         0 |         9 |          2 | {"signs","buses"} | {"0.740741","0.259259"} | |    0.912263d
      | d_status    |         0 |         4 |          1 | {"0"} | {"1"} | |           1
 
(6 rows)

Thanks!

Laurette
On Wed, 2 Jan 2002, Tom Lane wrote:

> Laurette Cisneros <laurette@nextbus.com> writes:
> > This join worked last week and today it gets and error:
> > select * from b, d
> >  where b.address = d.address;
> > It now fails with the following error:
> > ERROR:  join_selectivity: bad value -0.121693
>
> Probably what has changed is the pg_statistic data (VACUUM ANALYZE
> results).  Please send the results of
>
> select * from pg_stats where tablename = 'b';
> select * from pg_stats where tablename = 'd';
>
>             regards, tom lane
>

-- 
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere



Re: bug in join?

From
Tom Lane
Date:
Okay, I've been able to reproduce the problem here.  Looks like the
eqjoinsel math is not wrong, exactly, but small roundoff errors are
causing the logic to do unreasonable things.  I think
get_att_numdistinct needs to round its result to an integer, and
probably there needs to be some clamping of computed probabilities to
the 0..1 range (otherfreq1 is coming out about -4.4703483581542969e-08
in this example, which should be clamped to 0).

Will have a fix late tonight or tomorrow.

Thanks for the example case!
        regards, tom lane


Re: bug in join?

From
Laurette Cisneros
Date:
Thanks Tom!

On Wed, 2 Jan 2002, Tom Lane wrote:

> Okay, I've been able to reproduce the problem here.  Looks like the
> eqjoinsel math is not wrong, exactly, but small roundoff errors are
> causing the logic to do unreasonable things.  I think
> get_att_numdistinct needs to round its result to an integer, and
> probably there needs to be some clamping of computed probabilities to
> the 0..1 range (otherfreq1 is coming out about -4.4703483581542969e-08
> in this example, which should be clamped to 0).
>
> Will have a fix late tonight or tomorrow.
>
> Thanks for the example case!
>
>             regards, tom lane
>

-- 
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere



Re: bug in join?

From
Tom Lane
Date:
Okay, I've committed a fix to CVS.  The easiest way for you to pick it
up is probably to grab tonight's nightly snapshot,
ftp://ftp.us.postgresql.org/dev/postgresql-snapshot.tar.gz
(or a closer ftp mirror if you have one).
        regards, tom lane


Re: bug in join?

From
Laurette Cisneros
Date:
Hi Tom,

I did download this friday morning and compiled and installed it on my
system.  Looks like the bug is fixed!  I also ran vacuum analyze a few
hundred times to make sure ;.)

We were wondering when the next (beta) release is scheduled for?

Thank you for the quick response to this problem!

L.
On Wed, 2 Jan 2002, Tom Lane wrote:

> Okay, I've committed a fix to CVS.  The easiest way for you to pick it
> up is probably to grab tonight's nightly snapshot,
> ftp://ftp.us.postgresql.org/dev/postgresql-snapshot.tar.gz
> (or a closer ftp mirror if you have one).
>
>             regards, tom lane
>

-- 
Laurette Cisneros
(510) 420-3137
NextBus Information Systems, Inc.
www.nextbus.com
Passenger Information Everywhere