identical joins on identical tables produce different output - Mailing list pgsql-hackers

From Charles Hornberger
Subject identical joins on identical tables produce different output
Date
Msg-id 3.0.5.32.19990114222608.00af0c80@k4azl.net
Whole thread Raw
List pgsql-hackers
Sorry to intrude onto the hackers list, but I've got a problem that I think
might be worth looking at.

I've got two databases, one a humongous DB ported from Solid to Postgres,
the other using just a few tables and indices from the first.  The
databases are called apx3 and apx5, I created apx5 by pg_dumping two tables
from apx3 and running `psql -f apx3.out apx5`.

Both sets of tables also contain identical sets of data.

But when I run the following join query on the tables in apx3:
 EXPLAIN SELECT a.headline, b.article_type from articles a, article_types b where a.art_type_id = b.art_type_id;

... I get the following output:
 Nested Loop  (cost=3.20 size=2 width=28)   ->  Seq Scan on articles a  (cost=1.07 size=2 width=14)   ->  Seq Scan on
article_typesb  (cost=1.07 size=2 width=14)
 

When I run it against apx5, it produces:
 Merge Join  (cost=0.00 size=1 width=28)   ->  Seq Scan  (cost=0.00 size=0 width=0)         ->  Sort  (cost=0.00 size=0
width=0)              ->  Seq Scan on articles a  (cost=0.00 size=0 width=14)   ->  Seq Scan  (cost=0.00 size=0
width=0)        ->  Sort  (cost=0.00 size=0 width=0)               ->  Seq Scan on article_types b  (cost=0.00 size=0
width=14)


Like I said, the two sets of tables (articles and article_types) are
identical.  The difference between the two databases is that apx3 contains
a large number of other tables/views.

pg_dumping the two tables from apx3 and apx5 produces identical output:

CREATE TABLE "articles" ("article_id" "int4" NOT NULL, "subject1_id"
"int2", "subject2_id" "int2", "subject3_id" "int2", "headline"
varchar(256), "subhead" varchar(256), "print_page" varchar(64), "byline"
varchar(128), "country_id" "int2", "dateline_id" "int2", "volume" "int2",
"issue" "int2", "publ_type" varchar(32), "publ_date" "date", "art_src_id"
"int2", "art_type_id" "int2", "contributor1" varchar(128), "contributor2"
varchar(128), "contributor3" varchar(128), "editor" varchar(32),
"managing_editor" varchar(32), "copy_editor" varchar(32),
"production_person" varchar(32), "entered_date" "date", "entered_time"
"time", "entered_by" varchar(50), "split600" "int2", "wets_front" "int2",
"wets_priority" "int2", "published" "int2", "article_text" "text",
"wait_for_publish" "int2");
CREATE UNIQUE INDEX "articles_article_id_key" on "articles" using btree (
"article_id" "int4_ops" );
CREATE  INDEX "art_publdate_ind" on "articles" using btree ( "publ_date"
"date_ops" );
CREATE  INDEX "art_wfi_ind" on "articles" using btree ( "wets_front"
"int2_ops" );
CREATE  INDEX "art_date_id_ind" on "articles" using btree ( "dateline_id"
"int2_ops" );
CREATE  INDEX "articles_art_type_ind" on "articles" using btree (
"art_type_id" "int2_ops" );
CREATE TABLE "article_types" ("art_type_id" "int2" NOT NULL, "article_type"
varchar(63), "listed" "int2", "priority" "int2");
CREATE UNIQUE INDEX "article_types_pkey" on "article_types" using btree (
"art_type_id" "int2_ops" );

My uneducated guess is that there's some other factor in apx3 that's
affecting the optimizer's ability to generate the most desirable query plan
(i.e., the efficient Merge Join rather than the expensive Nested Loop of
Sequential Scans).

The ugly part of this problem is that apx3 is -- at least to my mind --
quite monstrous.  It was built by dumping the Solid DB, then quickly
editing the output file and loading that into Postgres as apx2, then (when
this strange behavior started to appear) dumping apx2 out of Postgres and
reloading it as apx3.  The resulting database is far from perfect:  There
are cases where "primary key" declarations inside CREATE TABLE statements
were followed by redundant CREATE INDEX statements, resulting in multiple
indices being created on single columns. There are about 80 tables/views
and rougly as many indices.

I'd be happy to send dumps of the two databases if anyone is interested in
investigating this problem.  I'd happily do it myself, but beyond
individually dumping tables from apx3, loading them into apx5 and waiting
for the join query to fail, I'm not sure how to proceed.  Or if someone
wants to give me some instructions on how to debug this sort of thing...?

(BTW, I'm not on the hackers list, so please cc me on any replies.)

Thanks,
Charlie

-------------------------------------------
PREVIOUS E-MAIL ABOUT THIS PROBLEM FOLLOWS:
-------------------------------------------

X-POP3-Rcpt: clhmail@yompy
To: Charles Hornberger <charlie@k4azl.net>
Subject: Re: very slow response time on large, multi-table view 
Date: Thu, 14 Jan 1999 20:02:39 -0500
From: Tom Lane <tgl@sss.pgh.pa.us>

> As a matter of fact, when I created new (albeit smaller) tables to
> perform the same joins, I started getting some more logical results
> from optimizer:

That's really weird.  I wonder if the system's info about those original
tables might've gotten corrupted somehow?  It might be worth dumping and
reloading the database just to see... (btw, you did do an initdb and
reload when you upgraded to 6.4 from prior version, I hope...)

> (BTW, what's the difference between a Merge Join and a Hash Join?)

Merge Join works by scanning two pre-sorted tables in sync, like a
traditional list-merge sort.  The particular plan you show did the
sort the hard way --- I'd kind of expect it to read the tables in
order using an index, if there is a btree index available.

Hash Join works by making a hashtable describing all the entries in
one table, then probing the hashtable for each record in the other
table.

I'm not sure about which one is faster, but I expect either one would
beat nested loop handily...

If you don't figure out what's going on, I'd suggest posting the new
info to the hackers list --- this could be a real bug in the optimizer.
        regards, tom lane

---------------------------------------------------------------------

To: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: very slow response time on large, multi-table view 
Date: Thu, 14 Jan 1999 19:09:39 -0900
From: Charles Hornberger <charlie@k4azl.net>

At 08:02 PM 1/14/99 -0500, you wrote:
>> As a matter of fact, when I created new (albeit smaller) tables to
>> perform the same joins, I started getting some more logical results
>> from optimizer:
>
>That's really weird.  I wonder if the system's info about those original
>tables might've gotten corrupted somehow?  It might be worth dumping and
>reloading the database just to see... (btw, you did do an initdb and
>reload when you upgraded to 6.4 from prior version, I hope...)

Actually, this database was created under 6.4.  There's one oddity I can
think of that might've messed up the tables:  When I dumped the Solid DB,
it included a bunch of "primary key" declarations, as well as explicit
CREATE INDEX statements on some of the columns that were already declared
as primary keys.  Perhaps the existence of multiple indices on single
columns caused the problem?  I've since deleted the redundant indices; now
there's one per indexed column.  (As I mentioned in my first post, though
perhaps not clearly, I dumped a Solid DB, did a quick & dirty edit of all
the CREATE TABLE & CREATE VIEW statements to make them compatible with
Postgres, then reloaded the thing into Postgres.)

In any case, I'm going to dump & reload the DB now,  (And about initdb, I
suppose that the guy who did the upgrade did do that, but I'll ask to make
sure.)  But since *some* of the tables in this database work fine, and
others don't, I'd *guess* that it has to do with the table structures. Of
course, that's one of the most uneducated guesses I've ever made....

It just occurred to me that there's NO reason for me to be writing this
e-mail until AFTER I do the dump & reload, so ....

OK, I've dumped and reloaded the original database (apx2) as apx3.  And
nothing seems to have been fixed.

apx3=> explain select * from test_view;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=3.20 size=2 width=28) ->  Seq Scan on articles a  (cost=1.07 size=2 width=14) ->  Seq Scan on
article_typesat  (cost=1.07 size=2 width=14) 

EXPLAIN
apx3=> explain select * from int2_test;
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00 size=1 width=28) ->  Seq Scan  (cost=0.00 size=0 width=0)       ->  Sort  (cost=0.00 size=0
width=0)            ->  Seq Scan on int2_articles a  (cost=0.00 size=0 width=14) ->  Seq Scan  (cost=0.00 size=0
width=0)      ->  Sort  (cost=0.00 size=0 width=0)             ->  Seq Scan on int2_types at  (cost=0.00 size=0
width=14)

Here are the CREATE RULE statements executed when I created apx3 from
apx2's dump file:

CREATE RULE "_RETint2_test" AS ON SELECT TO "int2_test" DO INSTEAD SELECT
"a"."headline", "at"."article_type" FROM "int2_articles" "a", "int2_types"
"at" WHERE "a"."art_type_id" = "at"."art_type_id";
CREATE
CREATE RULE "_RETtest_view" AS ON SELECT TO "test_view" DO INSTEAD SELECT
"a"."headline", "at"."article_type" FROM "articles" "a", "article_types"
"at" WHERE "a"."art_type_id" = "at"."art_type_id";
CREATE


>> (BTW, what's the difference between a Merge Join and a Hash Join?)
>
>Merge Join works by scanning two pre-sorted tables in sync, like a
>traditional list-merge sort.  The particular plan you show did the
>sort the hard way --- I'd kind of expect it to read the tables in
>order using an index, if there is a btree index available.
>
>Hash Join works by making a hashtable describing all the entries in
>one table, then probing the hashtable for each record in the other
>table.
>
>I'm not sure about which one is faster, but I expect either one would
>beat nested loop handily...
>
>If you don't figure out what's going on, I'd suggest posting the new
>info to the hackers list --- this could be a real bug in the optimizer.

Well, what do you think?  Should I post all this stuff to the hackers list? 

Thanks,
Charlie




pgsql-hackers by date:

Previous
From: Vadim Mikheev
Date:
Subject: Re: [HACKERS] CONSTRAINTS...
Next
From: Ross Johnson
Date:
Subject: Re: [HACKERS] Time zones