Re: Improving Performance of Query ~ Filter by A, Sort by B - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Improving Performance of Query ~ Filter by A, Sort by B
Date
Msg-id CAMkU=1yar573WF7PPCdWWsVOc6sLpWDPdX_-8xC+9guJ033CEA@mail.gmail.com
Whole thread Raw
In response to Re: Improving Performance of Query ~ Filter by A, Sort by B  (Lincoln Swaine-Moore <lswainemoore@gmail.com>)
List pgsql-performance
On Mon, Jul 16, 2018 at 5:29 PM, Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:
Tom and Jeff,

Thanks very much for the suggestions!

Here's what I've found so far after playing around for a few more days:

What is your default_statistics_target?  What can you tell us about the distribution of parent_id?  (exponential, power law, etc?).  Can you show the results for select * from pg_stats where tablename='a' and attname='parent_id' \x\g\x  ?

The default_statistics_target is 500, which I agree seems quite insufficient for these purposes. I bumped this up to 2000, and saw some improvement in the row count estimation, but pretty much the same query plans. Unfortunately the distribution of counts is not intended to be correlated to parent_id, which is one reason I imagine the histograms might not be particularly effective unless theres one bucket for every value. Here is the output you requested:

select * from pg_stats where tablename='a' and attname='parent_id';

schemaname             | public
tablename              | a
attname                | parent_id
inherited              | t
null_frac              | 0
avg_width              | 4
n_distinct             | 18871
most_common_vals       | {15503,49787,49786,24595,49784,17549, ...} (2000 values)
most_common_freqs      | {0.0252983,0.02435,0.0241317,0.02329,0.019095,0.0103967,0.00758833,0.004245, ...} (2000 values)

You showed the 8 most common frequencies.  But could you also show the last couple of them?  When your queried parent_id value is not on the MCV list, it is the frequency of the least frequent one on the list which puts an upper limit on how frequent the one you queried for can be.



A few questions re: statistics:
 1) would it be helpful to bump column statistics to, say, 20k (the number of distinct values of parent_id)?

Only one way to find out...
However you can only go up to 10k, not 20k.

 
 2) is the discrepancy between the statistics on the parent and child table be expected? certainly I would think that the statistics would be different, but I would've imagined they would have histograms of the same size given the settings being the same.

Is the n_distinct estimate accurate for the partition?  There is an algorithm (which will change in v11) to stop the MCV from filling the entire statistics target size if it thinks adding more won't be useful.  But I don't know why the histogram boundary list would be short.  But, I doubt that that is very important here.  The histogram is only used for inequality/range, not for equality/set membership.

 
 3) is there a way to manually specify the the distribution of rows to be even? that is, set the frequency of each value to be ~ n_rows/n_distinct. This isn't quite accurate, but is a reasonable assumption about the distribution, and might generate better query plans.


This would be going in the wrong direction.  Your queries seem to preferentially use rare parent_ids, not typical parent_ids.  In fact, it seems like many of your hard-coded parent_ids don't exist in the table at all.  That certainly isn't going to help the planner any.  Could you somehow remove those before constructing the query?  

You might also take a step back, where is that list of parent_ids coming from in the first place, and why couldn't you convert the list of literals into a query that returns that list naturally?


You could try reversing the order and adding a column to be (tmstmp, parent_id, id) and keeping the table well vacuumed.  This would allow the slow plan to still walk the indexes in tmstmp order but do it as an index-only scan, so it could omit the extra trip to the table. That trip to the table must be awfully slow to explain the numbers you show later in the thread.
 
Just to clarify, do you mean building indexes like:
CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on "a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
That seems promising! Is the intuition here that we want the first key of the index to be the one we are ultimately ordering by? Sounds like I make have had that flipped initially. My understanding of this whole situation (and please do correct me if this doesn't make sense) is the big bottleneck here is reading pages from disk (when looking at stopped up queries, the wait_event is DataFileRead), and so anything that can be done to minimize the pages read will be valuable. Which is why I would love to get the query plan to use the tmstmp index without having to filter thereafter by parent_id.

Yes, that is the index.  

You really want it to filter by parent_id in the index, rather than going to the table to do the filter on parent_id.  The index pages with tmstmp as the leading column are going to be more tightly packed with potentially relevant rows, while the table pages are less likely to be densely packed.  So filtering in the index leads to less IO.  Currently, the only way to make that in-index filtering happen is with an index-only scan.  So the tmstmp needs to be first, and other two need to be present in either order.  Note that if you change select list of your query to select more than just "id", you will again defeat the index-only-scan.  
 

What happens when you remove that extra order by phrase that you added?  The original slow plan should become much faster when the number of parent_ids is large (it has to dig through fewer index entries before accumulating 20 good ones), so you should try going back to that.

Unfortunately, I've found that even when the number of parent_ids is large (2000), it's still prohibitively slow (haven't got it to finish), and maintains a query plan that involves an Index Scan Backward across the a_tmstmp_idxs (with a filter for parent_id).


I think the reason for this is that your 2000 literal parent_id values are preferentially rare or entirely absent from the table.  Therefore adding more parent_ids doesn't cause more rows to meet the filter qualification, so you don't accumulate a LIMIT worth of rows very fast.


The original index you discussed, '(parent_id, tmpstmp desc)', would be ideal if the parent_id were tested for equality rather than set membership.  So another approach to speeding this up would be to rewrite the query to test for equality on each parent_id separately and then combine the results.

select id from (
   (select * from a where parent_id= 34226 order by tmstmp desc limit 20) 
      union all
   (select * from a where parent_id= 24506 order by tmstmp desc limit 20)
      union all
   (select * from a where parent_id= 40986 order by tmstmp desc limit 20) 
      union all
   (select * from a where parent_id= 27162 order by tmstmp desc limit 20) 
) foo
ORDER BY tmstmp DESC LIMIT 20;
   
I don't know what is going to happen when you go from 4 parent_ids up to 2000, though.

It would be nice if PostgreSQL would plan your original query this way itself, but it lacks the machinery to do so.  I think there was a patch to add that machinery, but I don't remember seeing any work on it recently.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Special bloom index of INT, BIGINT, BIT, VARBIT for bitwiseoperation
Next
From: Mark Kirkwood
Date:
Subject: Re: Why HDD performance is better than SSD in this case