identical joins on identical tables produce different output - Mailing list pgsql-hackers
From | Charles Hornberger |
---|---|
Subject | identical joins on identical tables produce different output |
Date | |
Msg-id | 3.0.5.32.19990114222608.00af0c80@k4azl.net Whole thread Raw |
List | pgsql-hackers |
Sorry to intrude onto the hackers list, but I've got a problem that I think might be worth looking at. I've got two databases, one a humongous DB ported from Solid to Postgres, the other using just a few tables and indices from the first. The databases are called apx3 and apx5, I created apx5 by pg_dumping two tables from apx3 and running `psql -f apx3.out apx5`. Both sets of tables also contain identical sets of data. But when I run the following join query on the tables in apx3: EXPLAIN SELECT a.headline, b.article_type from articles a, article_types b where a.art_type_id = b.art_type_id; ... I get the following output: Nested Loop (cost=3.20 size=2 width=28) -> Seq Scan on articles a (cost=1.07 size=2 width=14) -> Seq Scan on article_typesb (cost=1.07 size=2 width=14) When I run it against apx5, it produces: Merge Join (cost=0.00 size=1 width=28) -> Seq Scan (cost=0.00 size=0 width=0) -> Sort (cost=0.00 size=0 width=0) -> Seq Scan on articles a (cost=0.00 size=0 width=14) -> Seq Scan (cost=0.00 size=0 width=0) -> Sort (cost=0.00 size=0 width=0) -> Seq Scan on article_types b (cost=0.00 size=0 width=14) Like I said, the two sets of tables (articles and article_types) are identical. The difference between the two databases is that apx3 contains a large number of other tables/views. pg_dumping the two tables from apx3 and apx5 produces identical output: CREATE TABLE "articles" ("article_id" "int4" NOT NULL, "subject1_id" "int2", "subject2_id" "int2", "subject3_id" "int2", "headline" varchar(256), "subhead" varchar(256), "print_page" varchar(64), "byline" varchar(128), "country_id" "int2", "dateline_id" "int2", "volume" "int2", "issue" "int2", "publ_type" varchar(32), "publ_date" "date", "art_src_id" "int2", "art_type_id" "int2", "contributor1" varchar(128), "contributor2" varchar(128), "contributor3" varchar(128), "editor" varchar(32), "managing_editor" varchar(32), "copy_editor" varchar(32), "production_person" varchar(32), "entered_date" "date", "entered_time" "time", "entered_by" varchar(50), "split600" "int2", "wets_front" "int2", "wets_priority" "int2", "published" "int2", "article_text" "text", "wait_for_publish" "int2"); CREATE UNIQUE INDEX "articles_article_id_key" on "articles" using btree ( "article_id" "int4_ops" ); CREATE INDEX "art_publdate_ind" on "articles" using btree ( "publ_date" "date_ops" ); CREATE INDEX "art_wfi_ind" on "articles" using btree ( "wets_front" "int2_ops" ); CREATE INDEX "art_date_id_ind" on "articles" using btree ( "dateline_id" "int2_ops" ); CREATE INDEX "articles_art_type_ind" on "articles" using btree ( "art_type_id" "int2_ops" ); CREATE TABLE "article_types" ("art_type_id" "int2" NOT NULL, "article_type" varchar(63), "listed" "int2", "priority" "int2"); CREATE UNIQUE INDEX "article_types_pkey" on "article_types" using btree ( "art_type_id" "int2_ops" ); My uneducated guess is that there's some other factor in apx3 that's affecting the optimizer's ability to generate the most desirable query plan (i.e., the efficient Merge Join rather than the expensive Nested Loop of Sequential Scans). The ugly part of this problem is that apx3 is -- at least to my mind -- quite monstrous. It was built by dumping the Solid DB, then quickly editing the output file and loading that into Postgres as apx2, then (when this strange behavior started to appear) dumping apx2 out of Postgres and reloading it as apx3. The resulting database is far from perfect: There are cases where "primary key" declarations inside CREATE TABLE statements were followed by redundant CREATE INDEX statements, resulting in multiple indices being created on single columns. There are about 80 tables/views and rougly as many indices. I'd be happy to send dumps of the two databases if anyone is interested in investigating this problem. I'd happily do it myself, but beyond individually dumping tables from apx3, loading them into apx5 and waiting for the join query to fail, I'm not sure how to proceed. Or if someone wants to give me some instructions on how to debug this sort of thing...? (BTW, I'm not on the hackers list, so please cc me on any replies.) Thanks, Charlie ------------------------------------------- PREVIOUS E-MAIL ABOUT THIS PROBLEM FOLLOWS: ------------------------------------------- X-POP3-Rcpt: clhmail@yompy To: Charles Hornberger <charlie@k4azl.net> Subject: Re: very slow response time on large, multi-table view Date: Thu, 14 Jan 1999 20:02:39 -0500 From: Tom Lane <tgl@sss.pgh.pa.us> > As a matter of fact, when I created new (albeit smaller) tables to > perform the same joins, I started getting some more logical results > from optimizer: That's really weird. I wonder if the system's info about those original tables might've gotten corrupted somehow? It might be worth dumping and reloading the database just to see... (btw, you did do an initdb and reload when you upgraded to 6.4 from prior version, I hope...) > (BTW, what's the difference between a Merge Join and a Hash Join?) Merge Join works by scanning two pre-sorted tables in sync, like a traditional list-merge sort. The particular plan you show did the sort the hard way --- I'd kind of expect it to read the tables in order using an index, if there is a btree index available. Hash Join works by making a hashtable describing all the entries in one table, then probing the hashtable for each record in the other table. I'm not sure about which one is faster, but I expect either one would beat nested loop handily... If you don't figure out what's going on, I'd suggest posting the new info to the hackers list --- this could be a real bug in the optimizer. regards, tom lane --------------------------------------------------------------------- To: Tom Lane <tgl@sss.pgh.pa.us> Subject: Re: very slow response time on large, multi-table view Date: Thu, 14 Jan 1999 19:09:39 -0900 From: Charles Hornberger <charlie@k4azl.net> At 08:02 PM 1/14/99 -0500, you wrote: >> As a matter of fact, when I created new (albeit smaller) tables to >> perform the same joins, I started getting some more logical results >> from optimizer: > >That's really weird. I wonder if the system's info about those original >tables might've gotten corrupted somehow? It might be worth dumping and >reloading the database just to see... (btw, you did do an initdb and >reload when you upgraded to 6.4 from prior version, I hope...) Actually, this database was created under 6.4. There's one oddity I can think of that might've messed up the tables: When I dumped the Solid DB, it included a bunch of "primary key" declarations, as well as explicit CREATE INDEX statements on some of the columns that were already declared as primary keys. Perhaps the existence of multiple indices on single columns caused the problem? I've since deleted the redundant indices; now there's one per indexed column. (As I mentioned in my first post, though perhaps not clearly, I dumped a Solid DB, did a quick & dirty edit of all the CREATE TABLE & CREATE VIEW statements to make them compatible with Postgres, then reloaded the thing into Postgres.) In any case, I'm going to dump & reload the DB now, (And about initdb, I suppose that the guy who did the upgrade did do that, but I'll ask to make sure.) But since *some* of the tables in this database work fine, and others don't, I'd *guess* that it has to do with the table structures. Of course, that's one of the most uneducated guesses I've ever made.... It just occurred to me that there's NO reason for me to be writing this e-mail until AFTER I do the dump & reload, so .... OK, I've dumped and reloaded the original database (apx2) as apx3. And nothing seems to have been fixed. apx3=> explain select * from test_view; NOTICE: QUERY PLAN: Nested Loop (cost=3.20 size=2 width=28) -> Seq Scan on articles a (cost=1.07 size=2 width=14) -> Seq Scan on article_typesat (cost=1.07 size=2 width=14) EXPLAIN apx3=> explain select * from int2_test; NOTICE: QUERY PLAN: Merge Join (cost=0.00 size=1 width=28) -> Seq Scan (cost=0.00 size=0 width=0) -> Sort (cost=0.00 size=0 width=0) -> Seq Scan on int2_articles a (cost=0.00 size=0 width=14) -> Seq Scan (cost=0.00 size=0 width=0) -> Sort (cost=0.00 size=0 width=0) -> Seq Scan on int2_types at (cost=0.00 size=0 width=14) Here are the CREATE RULE statements executed when I created apx3 from apx2's dump file: CREATE RULE "_RETint2_test" AS ON SELECT TO "int2_test" DO INSTEAD SELECT "a"."headline", "at"."article_type" FROM "int2_articles" "a", "int2_types" "at" WHERE "a"."art_type_id" = "at"."art_type_id"; CREATE CREATE RULE "_RETtest_view" AS ON SELECT TO "test_view" DO INSTEAD SELECT "a"."headline", "at"."article_type" FROM "articles" "a", "article_types" "at" WHERE "a"."art_type_id" = "at"."art_type_id"; CREATE >> (BTW, what's the difference between a Merge Join and a Hash Join?) > >Merge Join works by scanning two pre-sorted tables in sync, like a >traditional list-merge sort. The particular plan you show did the >sort the hard way --- I'd kind of expect it to read the tables in >order using an index, if there is a btree index available. > >Hash Join works by making a hashtable describing all the entries in >one table, then probing the hashtable for each record in the other >table. > >I'm not sure about which one is faster, but I expect either one would >beat nested loop handily... > >If you don't figure out what's going on, I'd suggest posting the new >info to the hackers list --- this could be a real bug in the optimizer. Well, what do you think? Should I post all this stuff to the hackers list? Thanks, Charlie
pgsql-hackers by date: