inaccurate stats on large tables - Mailing list pgsql-performance

From Kiran Mukhyala
Subject inaccurate stats on large tables
Date
Msg-id 1220552512.3853.320.camel@sausalito.gene.com
Whole thread Raw
Responses Re: inaccurate stats on large tables  ("David Wilson" <david.t.wilson@gmail.com>)
List pgsql-performance
Hello,

        I am running a select on a large table with two where
        conditions.
        Explain analyze shows that the estimated number of rows returned
        (190760) is much more than the actual rows returned (58221),
        which is probably the underlying cause for the poor performance
        I am seeing.

        Can someone please tell me how to improve the query planner
        estimate? I did try vacuum analyze. Here are some details:

        Explain plan:
        unison@csb-test=> explain analyze select * from paliasorigin a
        where
        a.origin_id=20 and a.tax_id=9606;


        QUERY PLAN
        --------------------------------------------------------------------------
         Bitmap Heap Scan on paliasorigin a  (cost=4901.38..431029.54
        rows=190760 width=118) (actual time=12.447..112.902 rows=58221
        loops=1)
           Recheck Cond: ((origin_id = 20) AND (tax_id = 9606))
           ->  Bitmap Index Scan on paliasorigin_search3_idx
        (cost=0.00..4853.69 rows=190760 width=0) (actual
        time=11.407..11.407
        rows=58221 loops=1)
                 Index Cond: ((origin_id = 20) AND (tax_id = 9606))

        Schema:
        unison@csb-test=> \d+ paliasorigin
         Column   |           Type           |
        Modifiers                             |
        -----------+--------------------------+------------
         palias_id | integer                  | not null
         origin_id | integer                  | not null
         alias     | text                     | not null
         descr     | text                     |
         tax_id    | integer                  |
         added     | timestamp with time zone | not null default
        timenow()
        Indexes:
            "palias_pkey" PRIMARY KEY, btree (palias_id)
            "paliasorigin_alias_unique_in_origin_idx" UNIQUE, btree
        (origin_id,
        alias)
            "paliasorigin_alias_casefold_idx" btree (upper(alias))
        CLUSTER
            "paliasorigin_alias_idx" btree (alias)
            "paliasorigin_o_idx" btree (origin_id)
            "paliasorigin_search1_idx" btree (palias_id, origin_id)
            "paliasorigin_search3_idx" btree (origin_id, tax_id,
        palias_id)
            "paliasorigin_tax_id_idx" btree (tax_id)
        Foreign-key constraints:
            "origin_id_exists" FOREIGN KEY (origin_id) REFERENCES
        origin(origin_id) ON UPDATE CASCADE ON DELETE CASCADE
        Has OIDs: no


        Number of rows:
        unison@csb-test=> select count(*) from paliasorigin;
          count
        ----------
         37909009
        (1 row)

        Pg version:
        unison@csb-test=> select version();
                                                  version
        --------------------------------------------------------------------------------------------
         PostgreSQL 8.3.3 on x86_64-unknown-linux-gnu, compiled by GCC
        gcc (GCC)
        4.1.0 (SUSE Linux)
        (1 row)


        Info from analyze verbose:
        unison@csb-test=> analyze verbose paliasorigin;
        INFO:  analyzing "unison.paliasorigin"
        INFO:  "paliasorigin": scanned 300000 of 692947 pages,
        containing
        16409041 live rows and 0 dead rows; 300000 rows in sample,
        37901986
        estimated total rows
        ANALYZE
        Time: 21999.506 ms


        Thank you,

        -Kiran Mukhyala




pgsql-performance by date:

Previous
From: "Scott Carey"
Date:
Subject: Re: limit clause breaks query planner?
Next
From: "Matt Smiley"
Date:
Subject: Re: limit clause breaks query planner?