Re: select query does not pick up the right index - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: select query does not pick up the right index
Date
Msg-id 20190105042357.GK25379@telsasoft.com
Whole thread Raw
In response to RE: select query does not pick up the right index  (Abadie Lana <Lana.Abadie@iter.org>)
Responses RE: select query does not pick up the right index
List pgsql-performance
On Fri, Jan 04, 2019 at 08:58:57AM +0000, Abadie Lana wrote:
> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, null_frac, n_distinct,
array_length(most_common_vals,1)n_mcv, array_length(histogram_bounds,1) n_hist FROM pg_stats WHERE attname='...' AND
tablename='...'ORDER BY 1 DESC; 
 
> 
> Hmm. Is it normal that the couple (tablename,attname ) is not unique? I'm surprised to see sample_{ctrl,util,buil}
quotedtwice
 

One of the rows is for "inherited stats" (including child tables) stats and one
is "noninherited stats".

The unique index on the table behind that view is:
    "pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit)

On the wiki, I added inherited and correlation columns.  Would you rerun that query ?
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

I'm also interested to see \d and channel_id statistics for the channel table.

> explain (analyze, buffers) select
'BUIL-B36-VA-RT-RT1:CL0001-2-ABW',c.smpl_time,c.nanosecs,c.float_val,c.num_val,c.str_val,c.datatype,c.array_valfrom
samplec WHERE c.channel_id = (SELECT channel_id FROM channel WHERE name='BUIL-B36-VA-RT-RT1:CL0001-2-ABW') order by
c.smpl_timedesc limit 5;
 

You originally wrote this as a implicit comma join.  Does the original query
still have an issue ?  The =(subselect query) doesn't allow the planner to
optimize for the given channel, which seems to be a fundamental problem.

On Fri, Jan 04, 2019 at 08:58:57AM +0000, Abadie Lana wrote:
> Based on your feedback...i rerun analyse directly on the two table sample_ctrl_year and sample_buil_year
> [...] Now when running the query again, only for sample_buil_year table the wrong index is picked up...

It looks like statistics on your tables were completely wrong; not just
sample_ctrl_year and sample_buil_year.  Right ?

Autoanalyze would normally handle this on nonempty tables (children or
otherwise) and you should manually run ANALZYE on the parents (both levels of
them) whenever statistics change, like after running a big DELETE or DROP or
after a significant interval of time has passed relative to the range of time
in the table's timestamp columns.

Do you know why autoanalze didn't handle the nonempty tables on its own ?

> Now, the channel name I gave has no entries in sample_buil_year...(and when I run the query directly against
sample_buil_yearthe right index is picked up).... So maybe something related with the partitioning?
 

>                ->  Index Scan Backward using smpl_time_bx2_idx on sample_buil_year c_5  (cost=0.56..2023054.76
rows=665761width=75) (actual time=13216.589..13216.589 rows=0 loops=1)
 
>                      Filter: (channel_id = $0)
>                      Rows Removed by Filter: 50597834
>                      Buffers: shared hit=26626368

So it scanned the entire index expecting to find 5 matching channel IDs "pretty
soon", based on the generic distribution of channel IDs, without the benefit of
knowing that this channel ID doesn't exist at all (due to =(subquery)).

26e6 buffers is 200GB, apparently accessing some pages many
times (even if cached).

    table_name    |     index_name      | table_size | index_size
                                                                                                          
 
 sample_buil_year | smpl_time_bx2_idx   | 4492 MB    | 1084 MB
                                                                                                          
 

General comments:

On Wed, Jan 02, 2019 at 04:28:41PM +0000, Abadie Lana wrote:
>    "sample_time_bm_idx" btree (channel_id, smpl_time)
>    "sample_time_mb1_idx" btree (smpl_time, channel_id)
>    "smpl_time_bx1_idx" btree (smpl_time)

The smpl_time index is loosely redundant with index on (smpl_time,channel_id).
You might consider dropping it, or otherwise dropping the smpl_time,channel_id
index and making two separate indices on smpl_time and channel.  That would
allow bitmap ANDing them together.

Or possibly (depending on detail of your data loading) leaving the composite
index and changing smpl_time to a BRIN index - it's nice to be able to CLUSTER
on the btree index to maximize the efficiency of the brin index.

>Check constraints:
>    "sample_buil_month_smpl_time_check" CHECK (smpl_time >= (now() - '32 days'::interval)::timestamp without time zone
ANDsmpl_time <= now())
 

I'm surprised that works, and not really sure what it's doing..but in any case
it's maybe not doing what you wanted(??).  I'm guessing you never get
constraint exclusion (which is irrelevant for this query but still).

Justin


pgsql-performance by date:

Previous
From: Abadie Lana
Date:
Subject: RE: select query does not pick up the right index
Next
From: Abadie Lana
Date:
Subject: RE: select query does not pick up the right index