Thread: bug in join?
We are running the 7.2b4 beta server. This join worked last week and today it gets and error: select * from b, dwhere b.address = d.address; It now fails with the following error: ERROR: join_selectivity: bad value -0.121693 all that is in the pgsql.log file is the same error. I am working on a reproduction I can give you, but in the mean time I was wondering if this is a current problem with the server or me? Thanks, -- Laurette Cisneros (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com Passenger Information Everywhere
Laurette Cisneros <laurette@nextbus.com> writes: > This join worked last week and today it gets and error: > select * from b, d > where b.address = d.address; > It now fails with the following error: > ERROR: join_selectivity: bad value -0.121693 Probably what has changed is the pg_statistic data (VACUUM ANALYZE results). Please send the results of select * from pg_stats where tablename = 'b'; select * from pg_stats where tablename = 'd'; regards, tom lane
Laurette Cisneros <laurette@nextbus.com> writes: > I will keep trying to recreate it for you. You could just try analyze b;analyze d;explain select * from b,d where b.address = d.address; and repeat until you see the error from EXPLAIN. Since ANALYZE takes a random sampling these days, successive loops will in fact produce slightly different results, and you may be able to recreate the erroneous state eventually. The math in eqjoinsel() is not entirely trivial, but I thought I had convinced myself it was okay. I need to see a failing example to figure out what's wrong with it. regards, tom lane
Yeah, the culprit appears to be vacuum analyze (vacuum alone doesn't do it). The problem is that I fixed the original database by dropping and recreating the tables populating them with backed up data. And, now it won't recreate (the values in pg_stats for them is lost). Ugh. I will keep trying to recreate it for you. L. On Wed, 2 Jan 2002, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > This join worked last week and today it gets and error: > > select * from b, d > > where b.address = d.address; > > It now fails with the following error: > > ERROR: join_selectivity: bad value -0.121693 > > Probably what has changed is the pg_statistic data (VACUUM ANALYZE > results). Please send the results of > > select * from pg_stats where tablename = 'b'; > select * from pg_stats where tablename = 'd'; > > regards, tom lane > -- Laurette Cisneros (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com Passenger Information Everywhere
Will keep trying. I also noticed that several triggers were dropped when I dropped the table. I need to add these back and see if they make a difference. More as I have it... L. On Wed, 2 Jan 2002, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > I will keep trying to recreate it for you. > > You could just try > > analyze b; > analyze d; > explain select * from b,d where b.address = d.address; > > and repeat until you see the error from EXPLAIN. Since ANALYZE takes > a random sampling these days, successive loops will in fact produce > slightly different results, and you may be able to recreate the > erroneous state eventually. > > The math in eqjoinsel() is not entirely trivial, but I thought I had > convinced myself it was okay. I need to see a failing example to > figure out what's wrong with it. > > regards, tom lane > -- Laurette Cisneros (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com Passenger Information Everywhere
Laurette Cisneros <laurette@nextbus.com> writes: > Will keep trying. I also noticed that several triggers were dropped when I > dropped the table. I need to add these back and see if they make a > difference. No, join_selectivity isn't going to care about triggers. What's failing is the estimation of the fraction of rows that will match on address between the two tables (join_selectivity is rejecting the result as obviously bogus, which it is). That doesn't depend on anything except the ANALYZE statistics. regards, tom lane
OOps, never mind about the triggers. But, I will keep trying to induce the problem for you. L. On Wed, 2 Jan 2002, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > I will keep trying to recreate it for you. > > You could just try > > analyze b; > analyze d; > explain select * from b,d where b.address = d.address; > > and repeat until you see the error from EXPLAIN. Since ANALYZE takes > a random sampling these days, successive loops will in fact produce > slightly different results, and you may be able to recreate the > erroneous state eventually. > > The math in eqjoinsel() is not entirely trivial, but I thought I had > convinced myself it was okay. I need to see a failing example to > figure out what's wrong with it. > > regards, tom lane > -- Laurette Cisneros (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com Passenger Information Everywhere
OK, I reproduced it again. I had restored the data for the tables from the wrong backup. I restored the two tables from last night's backup (BTW, we use pg_dump -a -O -Fc...) and ran vacuum analyze and it reproduces (each and every time). We've turned off vacuum analyze (we do it every night *after* the backup) for now. We would love for this to get fixed asap (of course ;.) Here's the info. you asked for: select * from pg_stats where tablesname ='b': tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs| histogram_bounds | correlation -----------+--------------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+------------------+-------------b | rev | 0 | 4 | -0.142857 | {"0","1001","1002"} | {"0.333333","0.333333","0.333333"} | | 1b | b_tag | 0 | 7 | -0.333333 | {"S001","S002","S003","S004","S005","VT1","VT2"}| {"0.142857","0.142857","0.142857","0.142857","0.142857","0.142857","0.142857"}| | 0.454545b |input_tag | 0 | 16 | 1 | {"AirLinkInput"} | {"1"} | | 1b |address | 0 | 19 | -0.333333 | {"166.128.052.237","166.128.053.084","166.128.054.017","166.128.054.018","166.128.057.250","166.128.058.202","166.128.058.203"} |{"0.142857","0.142857","0.142857","0.142857","0.142857","0.142857","0.142857"} | | 0.454545b | b_distance | 0 | 4 | 1 | {"200"} | {"1"} | | 1b | b_time | 0 | 4 | 1 | {"90"} | {"1"} | | 1 (6 rows) select * from pg_status where tablename = 'd';tablename | attname | null_frac | avg_width | n_distinct | most_common_vals| most_common_freqs | histogram_bounds | correlation -----------+-------------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------d | rev | 0 | 4 | 3 | {"0","1002","1001"} | {"0.45679","0.45679","0.0864198"} | | 0.912263d | address | 0 | 19 | -0.45679 | {"166.128.052.237","166.128.053.084","166.128.054.017","166.128.054.018","166.128.057.250","166.128.058.202","166.128.058.203"} |{"0.037037","0.037037","0.037037","0.037037","0.037037","0.037037","0.037037"} | {"166.128.169.189","166.128.169.199","166.128.169.236","166.128.171.002","166.129.108.003","166.132.126.184","166.133.174.093","166.133.174.098","166.204.012.171","166.204.045.135","166.204.066.001"} | 0.451197d | d_passwd | 0 | 9 | 1 | {"Czech"} | {"1"} | | 1d | d_port | 0 | 4 | 1 | {"22335"} | {"1"} | | 1d | d_type | 0 | 9 | 2 | {"signs","buses"} | {"0.740741","0.259259"} | | 0.912263d | d_status | 0 | 4 | 1 | {"0"} | {"1"} | | 1 (6 rows) Thanks! Laurette On Wed, 2 Jan 2002, Tom Lane wrote: > Laurette Cisneros <laurette@nextbus.com> writes: > > This join worked last week and today it gets and error: > > select * from b, d > > where b.address = d.address; > > It now fails with the following error: > > ERROR: join_selectivity: bad value -0.121693 > > Probably what has changed is the pg_statistic data (VACUUM ANALYZE > results). Please send the results of > > select * from pg_stats where tablename = 'b'; > select * from pg_stats where tablename = 'd'; > > regards, tom lane > -- Laurette Cisneros (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com Passenger Information Everywhere
Okay, I've been able to reproduce the problem here. Looks like the eqjoinsel math is not wrong, exactly, but small roundoff errors are causing the logic to do unreasonable things. I think get_att_numdistinct needs to round its result to an integer, and probably there needs to be some clamping of computed probabilities to the 0..1 range (otherfreq1 is coming out about -4.4703483581542969e-08 in this example, which should be clamped to 0). Will have a fix late tonight or tomorrow. Thanks for the example case! regards, tom lane
Thanks Tom! On Wed, 2 Jan 2002, Tom Lane wrote: > Okay, I've been able to reproduce the problem here. Looks like the > eqjoinsel math is not wrong, exactly, but small roundoff errors are > causing the logic to do unreasonable things. I think > get_att_numdistinct needs to round its result to an integer, and > probably there needs to be some clamping of computed probabilities to > the 0..1 range (otherfreq1 is coming out about -4.4703483581542969e-08 > in this example, which should be clamped to 0). > > Will have a fix late tonight or tomorrow. > > Thanks for the example case! > > regards, tom lane > -- Laurette Cisneros (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com Passenger Information Everywhere
Okay, I've committed a fix to CVS. The easiest way for you to pick it up is probably to grab tonight's nightly snapshot, ftp://ftp.us.postgresql.org/dev/postgresql-snapshot.tar.gz (or a closer ftp mirror if you have one). regards, tom lane
Hi Tom, I did download this friday morning and compiled and installed it on my system. Looks like the bug is fixed! I also ran vacuum analyze a few hundred times to make sure ;.) We were wondering when the next (beta) release is scheduled for? Thank you for the quick response to this problem! L. On Wed, 2 Jan 2002, Tom Lane wrote: > Okay, I've committed a fix to CVS. The easiest way for you to pick it > up is probably to grab tonight's nightly snapshot, > ftp://ftp.us.postgresql.org/dev/postgresql-snapshot.tar.gz > (or a closer ftp mirror if you have one). > > regards, tom lane > -- Laurette Cisneros (510) 420-3137 NextBus Information Systems, Inc. www.nextbus.com Passenger Information Everywhere