Re: Improving Performance of Query ~ Filter by A, Sort by B - Mailing list pgsql-performance

From Lincoln Swaine-Moore
Subject Re: Improving Performance of Query ~ Filter by A, Sort by B
Date
Msg-id CABcidkJoH-Uv+nZ-mYLpfvAzuyOFbeKwYC47XUvsFZt+hFgQcg@mail.gmail.com
Whole thread Raw
In response to Re: Improving Performance of Query ~ Filter by A, Sort by B  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Improving Performance of Query ~ Filter by A, Sort by B
List pgsql-performance
Tom and Jeff,

Thanks very much for the suggestions!

Here's what I've found so far after playing around for a few more days:

What is your default_statistics_target?  What can you tell us about the distribution of parent_id?  (exponential, power law, etc?).  Can you show the results for select * from pg_stats where tablename='a' and attname='parent_id' \x\g\x  ?

The default_statistics_target is 500, which I agree seems quite insufficient for these purposes. I bumped this up to 2000, and saw some improvement in the row count estimation, but pretty much the same query plans. Unfortunately the distribution of counts is not intended to be correlated to parent_id, which is one reason I imagine the histograms might not be particularly effective unless theres one bucket for every value. Here is the output you requested:

select * from pg_stats where tablename='a' and attname='parent_id';

schemaname             | public
tablename              | a
attname                | parent_id
inherited              | t
null_frac              | 0
avg_width              | 4
n_distinct             | 18871
most_common_vals       | {15503,49787,49786,24595,49784,17549, ...} (2000 values)
most_common_freqs      | {0.0252983,0.02435,0.0241317,0.02329,0.019095,0.0103967,0.00758833,0.004245, ...} (2000 values)
histogram_bounds       | {2,12,17,24,28,36,47,59,74,80,86,98,108,121,135,141,147,160,169,177,190,204, ...} (2001 values)
correlation            | -0.161576
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |


Interestingly, the number of elements in these most_common_vals is as expected (2000) for the parent table, but it's lower for the partition tables, despite the statistics level being the same.

SELECT attstattarget
FROM   pg_attribute
WHERE  attrelid in ('a_partition1'::regclass, 'a'::regclass)
AND    attname = 'parent_id';
-[ RECORD 1 ]-+-----
attstattarget | 2000
-[ RECORD 2 ]-+-----
attstattarget | 2000


select * from pg_stats where tablename='a_partition1' and attname='parent_id';

schemaname             | public
tablename              | a_partition1
attname                | parent_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 3969
most_common_vals       | {15503,49787,49786,24595,49784,10451,20136,17604,9683, ...} (400-ish values)
most_common_freqs      | {0.0807067,0.0769483,0.0749433,0.073565,0.0606433,0.0127917,0.011265,0.0112367, ...} (400-ish values)
histogram_bounds       | {5,24,27,27,33,38,41,69,74,74, ...} (1500-ish values)
correlation            | 0.402414
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |

A few questions re: statistics:
 1) would it be helpful to bump column statistics to, say, 20k (the number of distinct values of parent_id)?
 2) is the discrepancy between the statistics on the parent and child table be expected? certainly I would think that the statistics would be different, but I would've imagined they would have histograms of the same size given the settings being the same.
 3) is there a way to manually specify the the distribution of rows to be even? that is, set the frequency of each value to be ~ n_rows/n_distinct. This isn't quite accurate, but is a reasonable assumption about the distribution, and might generate better query plans.

The same thing could be obtained by doing a dummy operation, such as ORDER BY tmstmp + '0 seconds' DESC.  I prefer that method, as it is more obviously a tuning trick.  Adding in "id" looks more like a legitimate desire to break any ties that might occasionally occur in tmstmp.

I 100% agree that that is more clear. Thanks for the suggestion!

Finally, could you rewrite it as a join to a VALUES list, rather than as an in-list?

I should've mentioned this in my initial post, but I tried doing this without much success.

You could try reversing the order and adding a column to be (tmstmp, parent_id, id) and keeping the table well vacuumed.  This would allow the slow plan to still walk the indexes in tmstmp order but do it as an index-only scan, so it could omit the extra trip to the table. That trip to the table must be awfully slow to explain the numbers you show later in the thread.
 
Just to clarify, do you mean building indexes like:
CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on "a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
That seems promising! Is the intuition here that we want the first key of the index to be the one we are ultimately ordering by? Sounds like I make have had that flipped initially. My understanding of this whole situation (and please do correct me if this doesn't make sense) is the big bottleneck here is reading pages from disk (when looking at stopped up queries, the wait_event is DataFileRead), and so anything that can be done to minimize the pages read will be valuable. Which is why I would love to get the query plan to use the tmstmp index without having to filter thereafter by parent_id.

What happens when you remove that extra order by phrase that you added?  The original slow plan should become much faster when the number of parent_ids is large (it has to dig through fewer index entries before accumulating 20 good ones), so you should try going back to that.

Unfortunately, I've found that even when the number of parent_ids is large (2000), it's still prohibitively slow (haven't got it to finish), and maintains a query plan that involves an Index Scan Backward across the a_tmstmp_idxs (with a filter for parent_id).

And if not, could you just get rid of the partitioning altogether?  1e7 row is not all that many and doesn't generally need partitioning.  Unless it is serving a specific purpose, it is probably costing you more than you are getting.
 
Unfortunately, the partitioning is serving a specific purpose (to avoid some writing overhead, it's useful to be able to drop GIN indexes on one partition at a time during heavy writes). But given that the queries are slow on a single partition anyway, I suspect the partitioning isn't the main problem here.

But if they don't, is there any chance you could redesign your partitioning so that all parent_id queries together will always be in the same partition?
 
In effect, the parent_ids are distributed by part_key, so I've found that at least a stopgap solution is manually splitting apart the parent_ids by partition when generating the query. This has given the following query plan, which still doesn't use the tmstmp index (which would be ideal and to my understanding allow proper use of the LIMIT to minimize reads), but it does make things an order of magnitude faster by forcing the use of Bitmap. Apologies for the giant output; this approach balloons the size of the query itself.

SELECT "a"."id"
FROM "a" WHERE (
    ("a"."parent_id" IN (
      14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100
    )
    AND
    "a"."part_key" = 1
  )
  OR (
    "a"."parent_id" IN (
      50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348
    )
    AND
    "a"."part_key" = 2
  )
  OR (
    "a"."parent_id" IN (
      33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156
    )
    AND
    "a"."part_key" = 3
  )
  OR (
    "a"."parent_id" IN (
      42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071
    )
    AND
    "a"."part_key" = 4
  )
  OR (
    "a"."parent_id" IN (
      11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765
    )
    AND
    "a"."part_key" = 5
  )
)
ORDER BY
"a"."tmstmp" DESC, "a"."id" DESC
LIMIT 20;

Limit  (cost=449977.86..449977.91 rows=20 width=1412) (actual time=8967.465..8967.477 rows=20 loops=1)
  Output: a_partition1.id, a_partition1.tmstmp
  Buffers: shared hit=1641 read=125625 written=13428
  ->  Sort  (cost=449977.86..450397.07 rows=167683 width=1412) (actual time=8967.464..8967.468 rows=20 loops=1)
        Output: a_partition1.id, a_partition1.tmstmp
        Sort Key: a_partition1.tmstmp DESC, a_partition1.id DESC
        Sort Method: top-N heapsort  Memory: 85kB
        Buffers: shared hit=1641 read=125625 written=13428
        ->  Append  (cost=2534.33..445515.88 rows=167683 width=1412) (actual time=1231.579..8756.610 rows=145077 loops=1)
              Buffers: shared hit=1641 read=125625 written=13428
              ->  Bitmap Heap Scan on public.a_partition1  (cost=2534.33..246197.54 rows=126041 width=1393) (actual time=1231.578..4414.027 rows=115364 loops=1)
                    Output: a_partition1.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition1.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition1.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition1.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition1.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition1.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition1.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition1.part_key = 1)) OR ((a_partition1.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition1.part_key = 2)) OR ((a_partition1.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition1.part_key = 3)) OR ((a_partition1.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition1.part_key = 4)) OR ((a_partition1.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition1.part_key = 5)))
                    Heap Blocks: exact=93942
                    Buffers: shared hit=397 read=94547 written=6930
                    ->  BitmapOr  (cost=2534.33..2534.33 rows=192032 width=0) (actual time=1214.569..1214.569 rows=0 loops=1)
                          Buffers: shared hit=397 read=605 written=10
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..1479.43 rows=126041 width=0) (actual time=1091.952..1091.952 rows=115364 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=82 read=460 written=8
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..193.55 rows=14233 width=0) (actual time=26.911..26.911 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=66 read=33
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..275.65 rows=20271 width=0) (actual time=41.874..41.874 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=97 read=45 written=1
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..152.49 rows=11214 width=0) (actual time=23.542..23.542 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=52 read=26 written=1
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..275.65 rows=20271 width=0) (actual time=30.271..30.271 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=100 read=41
              ->  Bitmap Heap Scan on public.a_partition2  (cost=850.51..78634.20 rows=19852 width=1485) (actual time=316.458..2166.105 rows=16908 loops=1)
                    Output: a_partition2.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition2.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition2.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition2.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition2.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition2.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition2.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition2.part_key = 1)) OR ((a_partition2.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition2.part_key = 2)) OR ((a_partition2.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition2.part_key = 3)) OR ((a_partition2.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition2.part_key = 4)) OR ((a_partition2.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition2.part_key = 5)))
                    Heap Blocks: exact=17769
                    Buffers: shared hit=402 read=18034 written=3804
                    ->  BitmapOr  (cost=850.51..850.51 rows=59567 width=0) (actual time=313.191..313.191 rows=0 loops=1)
                          Buffers: shared hit=402 read=265 written=40
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..155.81 rows=11177 width=0) (actual time=65.671..65.671 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=84 read=57 written=11
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..272.08 rows=19852 width=0) (actual time=116.974..116.974 rows=18267 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=68 read=98 written=18
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..155.81 rows=11177 width=0) (actual time=58.915..58.915 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=100 read=41 written=5
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..86.19 rows=6183 width=0) (actual time=25.370..25.370 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=53 read=25 written=3
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..155.81 rows=11177 width=0) (actual time=46.254..46.254 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=97 read=44 written=3
              ->  Bitmap Heap Scan on public.a_partition3  (cost=692.99..56079.33 rows=13517 width=1467) (actual time=766.172..1555.761 rows=7594 loops=1)
                    Output: a_partition3.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition3.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition3.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition3.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition3.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition3.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition3.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition3.part_key = 1)) OR ((a_partition3.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition3.part_key = 2)) OR ((a_partition3.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition3.part_key = 3)) OR ((a_partition3.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition3.part_key = 4)) OR ((a_partition3.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition3.part_key = 5)))
                    Heap Blocks: exact=7472
                    Buffers: shared hit=432 read=7682 written=1576
                    ->  BitmapOr  (cost=692.99..692.99 rows=42391 width=0) (actual time=764.238..764.238 rows=0 loops=1)
                          Buffers: shared hit=432 read=210 written=51
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..138.53 rows=8870 width=0) (actual time=118.907..118.907 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=91 read=52 written=7
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..97.27 rows=6228 width=0) (actual time=26.656..26.656 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=71 read=28 written=3
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..225.13 rows=13517 width=0) (actual time=393.115..393.115 rows=7594 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=107 read=74 written=25
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..76.64 rows=4907 width=0) (actual time=36.979..36.979 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=56 read=22 written=9
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..138.53 rows=8870 width=0) (actual time=188.574..188.574 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=107 read=34 written=7
              ->  Bitmap Heap Scan on public.a_partition4  (cost=709.71..64604.81 rows=8273 width=1470) (actual time=268.111..543.570 rows=5211 loops=1)
                    Output: a_partition4.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition4.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition4.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition4.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition4.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition4.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition4.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition4.part_key = 1)) OR ((a_partition4.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition4.part_key = 2)) OR ((a_partition4.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition4.part_key = 3)) OR ((a_partition4.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition4.part_key = 4)) OR ((a_partition4.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition4.part_key = 5)))
                    Heap Blocks: exact=5153
                    Buffers: shared hit=410 read=5362 written=1118
                    ->  BitmapOr  (cost=709.71..709.71 rows=48654 width=0) (actual time=267.028..267.028 rows=0 loops=1)
                          Buffers: shared hit=410 read=209 written=50
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..153.69 rows=10908 width=0) (actual time=60.586..60.586 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=90 read=52 written=11
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..107.91 rows=7659 width=0) (actual time=47.041..47.041 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=64 read=35 written=7
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..153.69 rows=10908 width=0) (actual time=54.352..54.352 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=101 read=40 written=8
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..130.39 rows=8273 width=0) (actual time=54.690..54.690 rows=5220 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=56 read=40 written=13
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..153.69 rows=10908 width=0) (actual time=50.353..50.353 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=99 read=42 written=11
Planning time: 8.002 ms
Execution time: 8967.750 ms

When I remove the ID sorting hack from this query, it goes back to a nasty index scan on tmstmp with a filter key on the whole WHERE clause.

Thanks again for your help!


On Sat, Jul 14, 2018 at 11:25 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 10, 2018 at 11:07 AM, Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:



Something about the estimated row counts (this problem persisted after I tried ANALYZEing)

What is your default_statistics_target?  What can you tell us about the distribution of parent_id?  (exponential, power law, etc?).  Can you show the results for select * from pg_stats where tablename='a' and attname='parent_id' \x\g\x  ?
 
forces usage of the tmstmp index and Merge Append (which seems wise) but also a filter condition on parent_id over an index condition, which is apparently prohibitively slow.

I tried creating a multicolumn index like:

CREATE INDEX "a_partition1_parent_and_tmstmp_idx" on "a_partition2" USING btree ("parent_id", "tmstmp" DESC);

But this didn't help (it wasn't used).

You could try reversing the order and adding a column to be (tmstmp, parent_id, id) and keeping the table well vacuumed.  This would allow the slow plan to still walk the indexes in tmstmp order but do it as an index-only scan, so it could omit the extra trip to the table. That trip to the table must be awfully slow to explain the numbers you show later in the thread.

...
 
This query plan (which is the same as when LIMIT is removed) has been a good short term solution when the number of "parent_id"s I'm using is still relatively small, but unfortunately queries grow untenably slow as the number of "parent_id"s involved increases:

What happens when you remove that extra order by phrase that you added?  The original slow plan should become much faster when the number of parent_ids is large (it has to dig through fewer index entries before accumulating 20 good ones), so you should try going back to that.

...


I'd be very grateful for help with one or both of these questions:
1) Why is adding an unnecessary (from the perspective of result correctness) ORDER BY valuable for forcing the parent_id index usage, and does that indicate that there is something wrong with my table/indexes/statistics/etc.?

It finds the indexes on tmstmp to be falsely attractive, as it can walk in tmstmp order and so avoid the sort. (Really not the sort itself, but the fact that sort has to first read every row to be sorted, while walking an index can abort once the LIMIT is satisfied).  Adding an extra phrase to the ORDER BY means the index is no longer capable of delivering rows in the needed order, so it no longer looks falsely attractive.  The same thing could be obtained by doing a dummy operation, such as ORDER BY tmstmp + '0 seconds' DESC.  I prefer that method, as it is more obviously a tuning trick.  Adding in "id" looks more like a legitimate desire to break any ties that might occasionally occur in tmstmp.

As Tom pointed out, there clearly is something wrong with your statistics, although we don't know what is causing it to go wrong.  Fixing the statistics isn't guaranteed to fix the problem, but it would be a good start.

 
 
2) Is there any way I can improve my query time when there are many "parent_id"s involved? I seem to only be able to get the query plan to use at most one of the parent_id index and the tmstmp index at a time. Perhaps the correct multicolumn index would help?

A few things mentioned above might help.  

But if they don't, is there any chance you could redesign your partitioning so that all parent_id queries together will always be in the same partition?  And if not, could you just get rid of the partitioning altogether?  1e7 row is not all that many and doesn't generally need partitioning.  Unless it is serving a specific purpose, it is probably costing you more than you are getting.  

Finally, could you rewrite it as a join to a VALUES list, rather than as an in-list?
 
Cheers,

Jeff



--
Lincoln Swaine-Moore

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Improving Performance of Query ~ Filter by A, Sort by B
Next
From: Neto pr
Date:
Subject: Why HDD performance is better than SSD in this case