Re: Query plan and Inheritance. Weird behavior - Mailing list pgsql-performance

From John Lange
Subject Re: Query plan and Inheritance. Weird behavior
Date
Msg-id 1043864940.2368.53.camel@johnlaptop.darkcore.net
Whole thread Raw
In response to Re: Query plan and Inheritance. Weird behavior  (Andras Kadinger <bandit@surfnonstop.com>)
List pgsql-performance
On Wed, 2003-01-29 at 01:52, Andras Kadinger wrote:
> John Lange wrote:
> >
> > > I don't see how performance would be significantly better if you stored
> > > the common columns of all rows (parent and children) in the parent
> > > table, in contrast with how it is done now, storing entire rows of child
> > > tables in the child table and omitting them from the parent table.
> >
> > Well there are a couple of points.
> >
> > Firstly, from the simple standpoint of database normalization you
> > shouldn't have tables that have the same columns. The way it is
> > implemented, child tables are copies of parent tables.
>
> As Tom pointed out, only the schema is copied, but not the data.

I guess you are right, strictly speaking this isn't a violation of
normalization since no data is duplicated.

> This has the following advantages:
> - if you select from child tables, PostgreSQL will only have to scan
> rows that belong to that child (and further down), and not all rows in
> all tables of the inheritance hierarchy; so if you have 100 million rows
> in the whole hierarchy, but only have say 1 million in the child you are
> currently interested in, you only have to scan those 1 million rows, and
> not the whole 100 million.
> - all columns of rows are stored together, so to read a row only one
> disk access is needed (your way it would probably need roughly one
> random disk access per each inheritance level upwards, both for
> reads/selects and writes/inserts/updates; with a sizable inheritance
> hierarchy this would be a considerable performance hit)
> - it doesn't really cost much in terms of disk space, only some
> bookkeeping information is needed
>
> I don't think inheritance really fits into 'database normalization'
> itself, but still there are cases where it is more convenient/efficient
> than with traditional database normalization, where you would have to
> either go create completely separate tables for each type (and do UNIONs
> of SELECTs if you are interested in more than one child only), or what's
> even more cumbersome, create a table with common columns ('parent' here)
> and then go create children and children of children that each link
> upwards to their respective parents through some kind of key: in a
> select, you would have to explicitly specify all tables upwards the
> inheritance hierarchy, and specify the respective joins for them.
>
> So I think whether you should choose more traditional database
> normalization or use inheritance depends on what you want to do.
>
> > But more importantly it is bad for performance because selecting from a
> > parent table causes the same select to be done on all the child tables.
> > In my case selecting from the parent causes six selects to be done (one
> > for every child table).
>
> 'causes the same select to be done on all the child tables' - I don't
> agree with that, and I hope this is where the misunderstanding lies.
>
> Consider this:
>
> CREATE TABLE parent ( id integer NOT NULL, text text);
> CREATE TABLE child1 ( child1field text) INHERITS (parent);
> CREATE TABLE child2 ( child2field text) INHERITS (parent);
> CREATE TABLE child3 ( child3field text) INHERITS (parent);
> CREATE TABLE child4 ( child4field text) INHERITS (parent);
>
> CREATE TABLE othertable ( id integer NOT NULL, othertext text);
>
> ALTER TABLE ONLY parent ADD CONSTRAINT parent_pkey PRIMARY KEY (id);
> ALTER TABLE ONLY child1 ADD CONSTRAINT child1_pkey PRIMARY KEY (id);
> ALTER TABLE ONLY child2 ADD CONSTRAINT child2_pkey PRIMARY KEY (id);
> ALTER TABLE ONLY child3 ADD CONSTRAINT child3_pkey PRIMARY KEY (id);
> ALTER TABLE ONLY child4 ADD CONSTRAINT child4_pkey PRIMARY KEY (id);
>
> ALTER TABLE ONLY othertable ADD CONSTRAINT othertable_pkey PRIMARY KEY
> (id);
>
> Then I filled all tables with 10000 rows of synthetic data and ANALYZEd
> just to make sure the optimizer considers indexes to be valuable.
>
> First I tried this:
>
> johnlange=# explain select * from parent where id=13;
>                                           QUERY
> PLAN
> ----------------------------------------------------------------------------------------------
>  Result  (cost=0.00..15.07 rows=5 width=36)
>    ->  Append  (cost=0.00..15.07 rows=5 width=36)
>          ->  Index Scan using parent_pkey on parent  (cost=0.00..3.01
> rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child1_pkey on child1 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child2_pkey on child2 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child3_pkey on child3 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child4_pkey on child4 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
> (12 rows)
>
> The planner has rightly chosen to use indexes, and as a result the query
> will be pretty fast.
>
> Also, at first sight this might look like the multiple selects you
> mention, but actually it isn't; here's another example to show that:
>
> inh=# explain select * from parent natural join othertable where
> parent.id=13;
>                                           QUERY
> PLAN
> ----------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..30.20 rows=5 width=72)
>    ->  Append  (cost=0.00..15.07 rows=5 width=36)
>          ->  Index Scan using parent_pkey on parent  (cost=0.00..3.01
> rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child1_pkey on child1 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child2_pkey on child2 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child3_pkey on child3 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
>          ->  Index Scan using child4_pkey on child4 parent
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
>    ->  Index Scan using othertable_pkey on othertable  (cost=0.00..3.01
> rows=1 width=36)
>          Index Cond: ("outer".id = othertable.id)
> (14 rows)
>
> As you can see, the planner decided to use the indexes on parent and
> children here too, retrieved and then collated the resulting rows first
> and only then performed the join against othertable.
>
> In other words, it is not peforming 5 separate selects with their
> respective joins; it collects all qualifying rows first from the
> inheritance hierarchy, and only then performs the join; so the extra
> cost compared to the non-inheriting case is pretty much only the added
> cost of consulting five indexes instead of just one - unless you have
> inheritance hierarchies consisting of several dozen tables or more (and
> even then) I don't think this added cost would be significant.
>
> > This is entirely reasonable and efficient compared to the current model
> > where a select on a parent table requires the same select to be executed
> > on EVERY child table. If it's a large expensive JOIN of some kind then
> > this is verging on un-workable.
>
> Please show us a join that you would like to use and let us see how well
> the planner handles it.

Ok, your reply here is very informative. Firstly, I can see from your
example that I likely don't have my keys and constraints implemented
properly.

However, the issue of indexes is not necessarily that relevant since you
may not be selecting rows based on columns that have indexes.

So the issue of indexes aside, I think some of the misunderstanding is
related to my assumption that the appended operations are relatively
more expensive than scanning the same number of rows in a single select.

Here is the way it looks on my system when I select a single object.

db_drs0001=> explain select * from tbl_objects where id = 1;
NOTICE:  QUERY PLAN:

Result  (cost=0.00..153.70 rows=6 width=111)
 ->  Append  (cost=0.00..153.70 rows=6 width=111)
  ->  Seq Scan on tbl_objects  (cost=0.00..144.35 rows=1 width=107)
  ->  Seq Scan on tbl_viewers tbl_objects  (cost=0.00..1.06 rows=1
width=97)
  ->  Seq Scan on tbl_documents tbl_objects  (cost=0.00..4.91 rows=1
width=111)
  ->  Seq Scan on tbl_formats tbl_objects  (cost=0.00..1.11 rows=1
width=100)
  ->  Seq Scan on tbl_massemails tbl_objects  (cost=0.00..1.01 rows=1
width=90)
  ->  Seq Scan on tbl_icons tbl_objects  (cost=0.00..1.25 rows=1
width=110)

db_drs0001=> select version();
                            version
---------------------------------------------------------------
 PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.3
(1 row)

The only question here is, if a select requires the scanning of all rows
to return a result set, is it dramatically less efficient to have it
scanning 100,000 rows spread over 6 tables or to scan 100,000 rows in a
single table?

(At the moment I have no where near that amount of data. Side question,
what technique do you use to generate data to fill your tables for
testing?)

I'm now starting to see that it likely isn't that much different either
way so the benefits of the way it's implemented probably out weigh the
negatives. Your end up scanning the same number of rows either way.

On the topic of proper indexes, if you would indulge me, can you show me
where I have gone wrong in that regard? My biggest point of confusion
here is with regards to the sequences that are used in the parent table.

Here is the schema as produced by pg_dump. The original create used the
keyword "serial" or "bigserial" as the case may be. I've edited some of
the columns out just to keep the example shorter:

CREATE SEQUENCE "tbl_objects_id_seq" start 1 increment 1 maxvalue
9223372036854775807 minvalue 1 cache 1;

CREATE TABLE "tbl_objects" (
   "id" bigint DEFAULT nextval('"tbl_objects_id_seq"'::text) NOT NULL,
   "name" text DEFAULT '' NOT NULL,
   "description" text DEFAULT '' NOT NULL,
   "status" smallint DEFAULT '1' NOT NULL,
   "class" text
);

CREATE TABLE "tbl_viewers" (
        "exec" text DEFAULT '' NOT NULL )
INHERITS ("tbl_objects");

CREATE TABLE "tbl_documents" (
        "filename" text DEFAULT '' NOT NULL )
INHERITS ("tbl_objects");

CREATE TABLE "tbl_massemails" (
        "from" text DEFAULT '' NOT NULL,
        "subject" text DEFAULT '' NOT NULL,
        "message" text DEFAULT '' NOT NULL )
INHERITS ("tbl_objects");

CREATE TABLE "tbl_icons" (
        "format_id" bigint DEFAULT '0' NOT NULL )
INHERITS ("tbl_documents");

CREATE TABLE "tbl_formats" (
        "viewer_id" bigint DEFAULT '0' NOT NULL,
        "extension" text DEFAULT '' NOT NULL,
        "contenttype" text DEFAULT '' NOT NULL,
        "upload_class" text )
INHERITS ("tbl_objects");

CREATE UNIQUE INDEX tbl_objects_id_key ON tbl_objects USING btree (id);

Thanks very much for taking the time to look into this with me. It has
been most informative.

John Lange

>
> Regards,
> Andras
>
> PS (John, don't look here :) ): I have found some queries with plans
> that are less efficient than I think they could be.
>
> Changing the where clause in the above query to refer to othertable
> gives:
>
> johnlange=# explain select * from parent natural join othertable where
> othertable.id=13;
>                                           QUERY
> PLAN
> -----------------------------------------------------------------------------------------------
>  Hash Join  (cost=3.02..978.08 rows=5 width=72)
>    Hash Cond: ("outer".id = "inner".id)
>    ->  Append  (cost=0.00..725.00 rows=50000 width=36)
>          ->  Seq Scan on parent  (cost=0.00..145.00 rows=10000 width=36)
>          ->  Seq Scan on child1 parent  (cost=0.00..145.00 rows=10000
> width=36)
>          ->  Seq Scan on child2 parent  (cost=0.00..145.00 rows=10000
> width=36)
>          ->  Seq Scan on child3 parent  (cost=0.00..145.00 rows=10000
> width=36)
>          ->  Seq Scan on child4 parent  (cost=0.00..145.00 rows=10000
> width=36)
>    ->  Hash  (cost=3.01..3.01 rows=1 width=36)
>          ->  Index Scan using othertable_pkey on othertable
> (cost=0.00..3.01 rows=1 width=36)
>                Index Cond: (id = 13)
> (11 rows)
>
> While:
>
> johnlange=# explain select * from ONLY parent natural join othertable
> where othertable.id=13;
>                                        QUERY
> PLAN
> -----------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..6.04 rows=1 width=72)
>    ->  Index Scan using othertable_pkey on othertable  (cost=0.00..3.01
> rows=1 width=36)
>          Index Cond: (id = 13)
>    ->  Index Scan using parent_pkey on parent  (cost=0.00..3.01 rows=1
> width=36)
>          Index Cond: (parent.id = "outer".id)
> (5 rows)
>
> Similarly, as a somewhat more real-life example:
>
> johnlange=# explain select * from parent natural join othertable where
> othertable.othertext='apple';
>                                             QUERY
> PLAN
> --------------------------------------------------------------------------------------------------
>  Hash Join  (cost=131.37..1234.50 rows=250 width=72)
>    Hash Cond: ("outer".id = "inner".id)
>    ->  Append  (cost=0.00..725.00 rows=50000 width=36)
>          ->  Seq Scan on parent  (cost=0.00..145.00 rows=10000 width=36)
>          ->  Seq Scan on child1 parent  (cost=0.00..145.00 rows=10000
> width=36)
>          ->  Seq Scan on child2 parent  (cost=0.00..145.00 rows=10000
> width=36)
>          ->  Seq Scan on child3 parent  (cost=0.00..145.00 rows=10000
> width=36)
>          ->  Seq Scan on child4 parent  (cost=0.00..145.00 rows=10000
> width=36)
>    ->  Hash  (cost=131.25..131.25 rows=50 width=36)
>          ->  Index Scan using othertable_text on othertable
> (cost=0.00..131.25 rows=50 width=36)
>                Index Cond: (othertext = 'alma'::text)
> (11 rows)
>
> What's more strange, that it still does it with enable_seqscan set to
> off:
>
> johnlange=# explain select * from parent natural join othertable where
> othertable.othertext='apple';
>                                             QUERY
> PLAN
> --------------------------------------------------------------------------------------------------
>  Hash Join  (cost=100000131.37..500001234.50 rows=250 width=72)
>    Hash Cond: ("outer".id = "inner".id)
>    ->  Append  (cost=100000000.00..500000725.00 rows=50000 width=36)
>          ->  Seq Scan on parent  (cost=100000000.00..100000145.00
> rows=10000 width=36)
>          ->  Seq Scan on child1 parent  (cost=100000000.00..100000145.00
> rows=10000 width=36)
>          ->  Seq Scan on child2 parent  (cost=100000000.00..100000145.00
> rows=10000 width=36)
>          ->  Seq Scan on child3 parent  (cost=100000000.00..100000145.00
> rows=10000 width=36)
>          ->  Seq Scan on child4 parent  (cost=100000000.00..100000145.00
> rows=10000 width=36)
>    ->  Hash  (cost=131.25..131.25 rows=50 width=36)
>          ->  Index Scan using othertable_text on othertable
> (cost=0.00..131.25 rows=50 width=36)
>                Index Cond: (othertext = 'apple'::text)
> (11 rows)
>
> While:
>
> johnlange=# explain select * from ONLY parent natural join othertable
> where othertable.othertext='apple';
>                                          QUERY
> PLAN
> --------------------------------------------------------------------------------------------
>  Nested Loop  (cost=0.00..282.55 rows=50 width=72)
>    ->  Index Scan using othertable_text on othertable
> (cost=0.00..131.25 rows=50 width=36)
>          Index Cond: (othertext = 'apple'::text)
>    ->  Index Scan using parent_pkey on parent  (cost=0.00..3.01 rows=1
> width=36)
>          Index Cond: (parent.id = "outer".id)
> (5 rows)
>
> If I try to make it more efficient and get rid of the seq scans by
> pushing the condition into a subselect, I get an even more interesting
> plan:
>
> johnlange=# explain select * from parent where id in (select id from
> othertable where othertext='alma');
>                                                   QUERY
> PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Result  (cost=0.00..6563171.43 rows=25000 width=36)
>    ->  Append  (cost=0.00..6563171.43 rows=25000 width=36)
>          ->  Seq Scan on parent  (cost=0.00..1312634.29 rows=5000
> width=36)
>                Filter: (subplan)
>                SubPlan
>                  ->  Materialize  (cost=131.25..131.25 rows=50 width=4)
>                        ->  Index Scan using othertable_text on
> othertable  (cost=0.00..131.25 rows=50 width=4)
>                              Index Cond: (othertext = 'alma'::text)
>          ->  Seq Scan on child1 parent  (cost=0.00..1312634.29 rows=5000
> width=36)
>                Filter: (subplan)
>                SubPlan
>                  ->  Materialize  (cost=131.25..131.25 rows=50 width=4)
>                        ->  Index Scan using othertable_text on
> othertable  (cost=0.00..131.25 rows=50 width=4)
>                              Index Cond: (othertext = 'alma'::text)
>          ->  Seq Scan on child2 parent  (cost=0.00..1312634.29 rows=5000
> width=36)
>                Filter: (subplan)
>                SubPlan
>                  ->  Materialize  (cost=131.25..131.25 rows=50 width=4)
>                        ->  Index Scan using othertable_text on
> othertable  (cost=0.00..131.25 rows=50 width=4)
>                              Index Cond: (othertext = 'alma'::text)
>          ->  Seq Scan on child3 parent  (cost=0.00..1312634.29 rows=5000
> width=36)
>                Filter: (subplan)
>                SubPlan
>                  ->  Materialize  (cost=131.25..131.25 rows=50 width=4)
>                        ->  Index Scan using othertable_text on
> othertable  (cost=0.00..131.25 rows=50 width=4)
>                              Index Cond: (othertext = 'alma'::text)
>          ->  Seq Scan on child4 parent  (cost=0.00..1312634.29 rows=5000
> width=36)
>                Filter: (subplan)
>                SubPlan
>                  ->  Materialize  (cost=131.25..131.25 rows=50 width=4)
>                        ->  Index Scan using othertable_text on
> othertable  (cost=0.00..131.25 rows=50 width=4)
>                              Index Cond: (othertext = 'alma'::text)
> (32 rows)
>
> johnlange=# select version();
>
> version
> --------------------------------------------------------------------------------------------------------
>  PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.1
> (Mandrake Linux 9.1 3.2.1-2mdk)
> (1 row)


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: 1 char in the world
Next
From: Matt Mello
Date:
Subject: Re: 1 char in the world