Re: Odd problem with performance in duplicate database - Mailing list pgsql-performance

From Josh Berkus
Subject Re: Odd problem with performance in duplicate database
Date
Msg-id 200308111751.39998.josh@agliodbs.com
Whole thread Raw
In response to Re: Odd problem with performance in duplicate database  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom,

> Partly.  The numbers suggest that in ANALYZE's default sample of 3000
> rows, it's only finding about a dozen non-null tgroup_ids (yielding the
> 0.996 null_frac value); and that in one case all dozen are different and
> in the other case there are two duplicates.  It would help if you
> boosted the stats target for this column by a factor of 10.  (You can
> do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
> show that you did so.)

Hmmm.  No dice.   I raised the selectivity to 1000, which increased n_distinct
to 108, which is pretty close to accurate.  However, the planner still
insists on using a seq scan on case_clients unless I drop random_page_cost to
1.5 (which is up from 1.2 but still somewhat unreasonable).

> But the other part of the problem is that in 7.2, the join selectivity
> estimator is way off when you are joining a unique column (like the pkey
> on the other side) to a column with a very large fraction of nulls.
> We only discovered this recently; it's fixed as of 7.3.3:

OK, I'll talk to the client about upgrading.

--
-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                        Josh Berkus
   Complete information technology     josh@agliodbs.com
    and data management solutions     (415) 565-7293
   for law firms, small businesses      fax 621-2533
    and non-profit organizations.     San Francisco


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Odd problem with performance in duplicate database
Next
From: xin fu
Date:
Subject: about performance of postgreSQL