Thread: Improving Performance of Query ~ Filter by A, Sort by B

Improving Performance of Query ~ Filter by A, Sort by B

From
Lincoln Swaine-Moore
Date:
Hi all,


I'm having a good bit of trouble making production-ready a query I wouldn't have thought would be too challenging.

Below is the relevant portions of my table, which is partitioned on values of part_key from 1-5. The number of rows is on the order of 10^7, and the number of unique "parent_id" values is on the order of 10^4. "tmstmp" is continuous (more or less). Neither parent_id nor tmstmp is nullable. The table receives a fair amount of INSERTS, and also a number of UPDATES.

db=> \d a
                                             Table "public.a"
         Column          |           Type           | Collation | Nullable |                      Default
-------------------------+--------------------------+-----------+----------+----------------------------------------------
 id                      | integer                  |           |          | nextval('a_id_seq'::regclass)
 parent_id               | integer                  |           | not null |
 tmstmp                  | timestamp with time zone |           | not null |
 part_key                | integer                  |           |          |
Partition key: LIST (part_key)
Number of partitions: 5 (Use \d+ to list them.)

db=> \d a_partition1
                                        Table "public.a_partition1"
         Column          |           Type           | Collation | Nullable |                      Default
-------------------------+--------------------------+-----------+----------+----------------------------------------------
 id                      | integer                  |           |          | nextval('a_id_seq'::regclass)
 parent_id               | integer                  |           | not null |
 tmstmp                  | timestamp with time zone |           | not null |
 part_key                | integer                  |           |          |
Partition of: a FOR VALUES IN (1)
Indexes:
    "a_pk_idx1" UNIQUE, btree (id)
    "a_tmstmp_idx1" btree (tmstmp)
    "a_parent_id_idx1" btree (parent_id)
Check constraints:
    "a_partition_check_part_key1" CHECK (part_key = 1)
Foreign-key constraints:
    "a_partition_parent_id_fk_b_id1" FOREIGN KEY (parent_id) REFERENCES b(id) DEFERRABLE INITIALLY DEFERRED

db=> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname like 'a_partition%';
              relname              | relpages |  reltuples  | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
-----------------------------------+----------+-------------+---------------+---------+----------+----------------+------------+---------------
 a_partition1                      |   152146 |      873939 |        106863 | r       |       26 | f              |            |  287197233152
 a_partition2                      |   669987 | 3.62268e+06 |             0 | r       |       26 | f              |            |  161877745664
 a_partition3                      |   562069 | 2.94414e+06 |        213794 | r       |       26 | f              |            |  132375994368
 a_partition4                      |   729880 | 3.95513e+06 |         69761 | r       |       26 | f              |            |  188689047552
 a_partition5                      |   834132 |  4.9748e+06 |         52622 | r       |       26 | f              |            |  218596630528
(5 rows)



I'm interested in filtering by parent_id (an indexed foreign-key relationship, though I think it shouldn't matter in this context), then sorting the result by a timestamp field (also indexed). I only need 20 at a time.


The naive query works well for a very small number of ids (up to 3):

EXPLAIN (ANALYZE, BUFFERS)
SELECT "a"."id"
FROM "a"
WHERE "a"."parent_id" IN (
    37066,41174,28843
)
ORDER BY "a"."tmstmp" DESC
LIMIT 20;

 Limit  (cost=9838.23..9838.28 rows=20 width=12) (actual time=13.307..13.307 rows=0 loops=1)
   Buffers: shared hit=29 read=16
   ->  Sort  (cost=9838.23..9860.12 rows=8755 width=12) (actual time=13.306..13.306 rows=0 loops=1)
         Sort Key: a_partition1.tmstmp DESC
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=29 read=16
         ->  Append  (cost=0.43..9605.26 rows=8755 width=12) (actual time=13.302..13.302 rows=0 loops=1)
               Buffers: shared hit=29 read=16
               ->  Index Scan using a_parent_id_idx1 on a_partition1  (cost=0.43..4985.45 rows=4455 width=12) (actual time=4.007..4.007 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[]))
                     Buffers: shared hit=6 read=3
               ->  Index Scan using a_parent_id_idx2 on a_partition2  (cost=0.43..1072.79 rows=956 width=12) (actual time=3.521..3.521 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[]))
                     Buffers: shared hit=5 read=4
               ->  Index Scan using a_parent_id_idx3 on a_partition3  (cost=0.43..839.30 rows=899 width=12) (actual time=2.172..2.172 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[]))
                     Buffers: shared hit=6 read=3
               ->  Index Scan using a_parent_id_idx4 on a_partition4  (cost=0.43..1041.11 rows=959 width=12) (actual time=1.822..1.822 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[]))
                     Buffers: shared hit=6 read=3
               ->  Index Scan using a_parent_id_idx5 on a_partition5  (cost=0.43..1666.61 rows=1486 width=12) (actual time=1.777..1.777 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[]))
                     Buffers: shared hit=6 read=3
 Planning time: 0.559 ms
 Execution time: 13.343 ms
(25 rows)


But as soon as the number included in the filter goes up slightly (at about 4), the query plan changes the indexes it uses in a way that makes it impossibly slow. The below is only an EXPLAIN because the query times out:

EXPLAIN
SELECT "a"."id"
FROM "a"
WHERE "a"."parent_id" IN (
    34226,24506,40987,27162
)
ORDER BY "a"."tmstmp" DESC
LIMIT 20;

 Limit  (cost=2.22..8663.99 rows=20 width=12)
   ->  Merge Append  (cost=2.22..5055447.67 rows=11673 width=12)
         Sort Key: a_partition1.tmstmp DESC
         ->  Index Scan Backward using a_tmstmp_idx1 on a_partition1  (cost=0.43..1665521.55 rows=5940 width=12)
               Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[]))
         ->  Index Scan Backward using a_tmstmp_idx2 on a_partition2  (cost=0.43..880517.20 rows=1274 width=12)
               Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[]))
         ->  Index Scan Backward using a_tmstmp_idx3 on a_partition3  (cost=0.43..639224.73 rows=1199 width=12)
               Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[]))
         ->  Index Scan Backward using a_tmstmp_idx4 on a_partition4  (cost=0.43..852881.68 rows=1278 width=12)
               Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[]))
         ->  Index Scan Backward using a_tmstmp_idx5 on a_partition5  (cost=0.43..1017137.75 rows=1982 width=12)
               Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[]))
(13 rows)


Something about the estimated row counts (this problem persisted after I tried ANALYZEing) forces usage of the tmstmp index and Merge Append (which seems wise) but also a filter condition on parent_id over an index condition, which is apparently prohibitively slow.

I tried creating a multicolumn index like:

CREATE INDEX "a_partition1_parent_and_tmstmp_idx" on "a_partition2" USING btree ("parent_id", "tmstmp" DESC);

But this didn't help (it wasn't used).

I also found that removing the LIMIT makes an acceptably fast (though different) query plan which uses the parent_id instead instead of the tmstmp one:

EXPLAIN (ANALYZE, BUFFERS)
SELECT "a"."id"
FROM "a"
WHERE "a"."parent_id" IN (
    17942,32006,36733,9698,27948
)
ORDER BY "a"."tmstmp" DESC;

 Sort  (cost=17004.82..17041.29 rows=14591 width=12) (actual time=109.650..109.677 rows=127 loops=1)
   Sort Key: a_partition1.tmstmp DESC
   Sort Method: quicksort  Memory: 30kB
   Buffers: shared hit=60 read=142 written=12
   ->  Append  (cost=0.43..15995.64 rows=14591 width=12) (actual time=9.206..109.504 rows=127 loops=1)
         Buffers: shared hit=60 read=142 written=12
         ->  Index Scan using a_parent_id_idx1 on a_partition1  (cost=0.43..8301.03 rows=7425 width=12) (actual time=9.205..11.116 rows=1 loops=1)
               Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[]))
               Buffers: shared hit=10 read=6
         ->  Index Scan using a_parent_id_idx2 on a_partition2  (cost=0.43..1786.52 rows=1593 width=12) (actual time=7.116..76.000 rows=98 loops=1)
               Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[]))
               Buffers: shared hit=15 read=97 written=10
         ->  Index Scan using a_parent_id_idx3 on a_partition3  (cost=0.43..1397.74 rows=1498 width=12) (actual time=3.160..3.160 rows=0 loops=1)
               Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[]))
               Buffers: shared hit=10 read=5
         ->  Index Scan using a_parent_id_idx4 on a_partition4  (cost=0.43..1733.77 rows=1598 width=12) (actual time=1.975..16.960 rows=28 loops=1)
               Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[]))
               Buffers: shared hit=14 read=30 written=2
         ->  Index Scan using a_parent_id_idx5 on a_partition5  (cost=0.43..2776.58 rows=2477 width=12) (actual time=2.155..2.155 rows=0 loops=1)
               Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[]))
               Buffers: shared hit=11 read=4
 Planning time: 0.764 ms
 Execution time: 109.748 ms
(23 rows)


Eventually, I stumbled upon these links:

And from these, was able to partially resolve my issue by attaching an extra ORDER BY:

EXPLAIN (ANALYZE, BUFFERS)
SELECT "a"."id"
FROM "a" WHERE "a"."parent_id" IN (3909,26840,32181,22998,9632)
ORDER BY "a"."tmstmp" DESC,
         "a"."id" DESC
LIMIT 20;

 Limit  (cost=16383.91..16383.96 rows=20 width=12) (actual time=29.804..29.808 rows=6 loops=1)
   Buffers: shared hit=39 read=42
   ->  Sort  (cost=16383.91..16420.38 rows=14591 width=12) (actual time=29.803..29.804 rows=6 loops=1)
         Sort Key: a_partition1.tmstmp DESC, a_partition1.id DESC
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=39 read=42
         ->  Append  (cost=0.43..15995.64 rows=14591 width=12) (actual time=12.509..29.789 rows=6 loops=1)
               Buffers: shared hit=39 read=42
               ->  Index Scan using a_parent_id_idx1 on a_partition1  (cost=0.43..8301.03 rows=7425 width=12) (actual time=4.046..4.046 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[]))
                     Buffers: shared hit=7 read=8
               ->  Index Scan using a_parent_id_idx2 on a_partition2  (cost=0.43..1786.52 rows=1593 width=12) (actual time=5.447..5.447 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[]))
                     Buffers: shared hit=7 read=8
               ->  Index Scan using a_parent_id_idx3 on a_partition3  (cost=0.43..1397.74 rows=1498 width=12) (actual time=3.013..3.618 rows=1 loops=1)
                     Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[]))
                     Buffers: shared hit=9 read=7
               ->  Index Scan using a_parent_id_idx4 on a_partition4  (cost=0.43..1733.77 rows=1598 width=12) (actual time=7.337..7.337 rows=0 loops=1)
                     Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[]))
                     Buffers: shared hit=8 read=7
               ->  Index Scan using a_parent_id_idx5 on a_partition5  (cost=0.43..2776.58 rows=2477 width=12) (actual time=3.401..9.332 rows=5 loops=1)
                     Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[]))
                     Buffers: shared hit=8 read=12
 Planning time: 0.601 ms
 Execution time: 29.851 ms
(25 rows)

This query plan (which is the same as when LIMIT is removed) has been a good short term solution when the number of "parent_id"s I'm using is still relatively small, but unfortunately queries grow untenably slow as the number of "parent_id"s involved increases:

EXPLAIN (ANALYZE, BUFFERS)
SELECT "a"."id"
FROM "a"
WHERE "a"."parent_id" IN (
    49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236
)
ORDER BY "a"."tmstmp" DESC,
         "a"."id" DESC
LIMIT 20;

 Limit  (cost=641000.51..641000.56 rows=20 width=12) (actual time=80813.649..80813.661 rows=20 loops=1)
   Buffers: shared hit=11926 read=93553 dirtied=2063 written=27769
   ->  Sort  (cost=641000.51..642534.60 rows=613634 width=12) (actual time=80813.647..80813.652 rows=20 loops=1)
         Sort Key: a_partition1.tmstmp DESC, a_partition1.id DESC
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=11926 read=93553 dirtied=2063 written=27769
         ->  Append  (cost=0.43..624671.93 rows=613634 width=12) (actual time=2.244..80715.314 rows=104279 loops=1)
               Buffers: shared hit=11926 read=93553 dirtied=2063 written=27769
               ->  Index Scan using a_parent_id_idx1 on a_partition1  (cost=0.43..304861.89 rows=300407 width=12) (actual time=2.243..34766.236 rows=46309 loops=1)
                     Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[]))
                     Buffers: shared hit=9485 read=35740 dirtied=2033 written=12713
               ->  Index Scan using a_parent_id_idx2 on a_partition2  (cost=0.43..80641.75 rows=75349 width=12) (actual time=8.280..12640.675 rows=16628 loops=1)
                     Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[]))
                     Buffers: shared hit=558 read=16625 dirtied=8 written=6334
               ->  Index Scan using a_parent_id_idx3 on a_partition3  (cost=0.43..57551.91 rows=65008 width=12) (actual time=3.721..13759.664 rows=12973 loops=1)
                     Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[]))
                     Buffers: shared hit=592 read=12943 dirtied=2 written=3136
               ->  Index Scan using a_parent_id_idx4 on a_partition4  (cost=0.43..70062.42 rows=67402 width=12) (actual time=5.999..5949.033 rows=7476 loops=1)
                     Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[]))
                     Buffers: shared hit=449 read=7665 dirtied=10 written=1242
               ->  Index Scan using a_parent_id_idx5 on a_partition5  (cost=0.43..111553.96 rows=105468 width=12) (actual time=7.806..13519.564 rows=20893 loops=1)
                     Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[]))
                     Buffers: shared hit=842 read=20580 dirtied=10 written=4344
 Planning time: 8.366 ms
 Execution time: 80813.714 ms
(25 rows)


Though it still finishes, I'd really like to speed this up, especially because there are many situations where I will be using many more than the 200 "parent_id"s above.


I'd be very grateful for help with one or both of these questions:
1) Why is adding an unnecessary (from the perspective of result correctness) ORDER BY valuable for forcing the parent_id index usage, and does that indicate that there is something wrong with my table/indexes/statistics/etc.?
2) Is there any way I can improve my query time when there are many "parent_id"s involved? I seem to only be able to get the query plan to use at most one of the parent_id index and the tmstmp index at a time. Perhaps the correct multicolumn index would help?

A couple of notes (happy to be scolded for either of these things if they're problematic):
1) I've masked these queries somewhat for privacy reasons and cleaned up the table "a" of some extra fields and indexes.
2) I'm using different sets of "parent_id"s to try to overcome caching so the BUFFERS aspect can be as accurate as possible.

As for other maybe relevant information:

- version: 10.3
- hardware is AWS so probably not huge issue.
- work_mem is quite high (order of GBs)


Thanks in advance for your help!

Re: Improving Performance of Query ~ Filter by A, Sort by B

From
legrand legrand
Date:
Hello,

I have tested it with release 11 and limit 20 is pushed to each partition
when using index on tmstmp.

Could you tell us what is the result of your query applyed to one partition 

EXPLAIN ANALYZE
SELECT "a"."id"
FROM  a_partition1 "a"
WHERE "a"."parent_id" IN (
    34226,24506,40987,27162
)
ORDER BY "a"."tmstmp" DESC
LIMIT 20;

May be that limit 20 is not pushed to partitions in your version ?
Regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Re: Improving Performance of Query ~ Filter by A, Sort by B

From
Lincoln Swaine-Moore
Date:
Thanks for looking into this!

Here's the result (I turned off the timeout and got it to finish):

EXPLAIN ANALYZE
SELECT "a"."id"
FROM  a_partition1 "a"
WHERE "a"."parent_id" IN (
    49188,14816,14758,8402
)
ORDER BY "a"."tmstmp" DESC
LIMIT 20;

                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..5710.03 rows=20 width=12) (actual time=1141878.105..1142350.296 rows=20 loops=1)
   ->  Index Scan Backward using a_tmstmp_idx1 on a_partition1 a  (cost=0.43..1662350.21 rows=5823 width=12) (actual time=1141878.103..1142350.274 rows=20 loops=1)
         Filter: (parent_id = ANY ('{49188,14816,14758,8402}'::integer[]))
         Rows Removed by Filter: 7931478
 Planning time: 0.122 ms
 Execution time: 1142350.336 ms
(6 rows)
(Note: I've chosen parent_ids that I know are associated with the part_key 1, but the query plan was the same with the 4 parent_ids in your query.)

Looks like it's using the filter in the same way as the query on the parent table, so seems be a problem beyond the partitioning.

And as soon as I cut it back to 3 parent_ids, jumps to a query plan using a_parent_id_idx1 again:

EXPLAIN ANALYZE
SELECT "a"."id"
FROM  a_partition1 "a"
WHERE "a"."parent_id" IN (
    19948,21436,41220
)
ORDER BY "a"."tmstmp" DESC
LIMIT 20;

                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5004.57..5004.62 rows=20 width=12) (actual time=36.329..36.341 rows=20 loops=1)
   ->  Sort  (cost=5004.57..5015.49 rows=4367 width=12) (actual time=36.328..36.332 rows=20 loops=1)
         Sort Key: tmstmp DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Index Scan using a_parent_id_idx1 on a_partition1 a  (cost=0.43..4888.37 rows=4367 width=12) (actual time=5.581..36.270 rows=50 loops=1)
               Index Cond: (parent_id = ANY ('{19948,21436,41220}'::integer[]))
 Planning time: 0.117 ms
 Execution time: 36.379 ms
(8 rows)


Thanks again for your help!




On Wed, Jul 11, 2018 at 5:41 PM, legrand legrand <legrand_legrand@hotmail.com> wrote:
Hello,

I have tested it with release 11 and limit 20 is pushed to each partition
when using index on tmstmp.

Could you tell us what is the result of your query applyed to one partition

EXPLAIN ANALYZE
SELECT "a"."id"
FROM  a_partition1 "a"
WHERE "a"."parent_id" IN (
    34226,24506,40987,27162
)
ORDER BY "a"."tmstmp" DESC
LIMIT 20;

May be that limit 20 is not pushed to partitions in your version ?
Regards
PAscal





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html




--
Lincoln Swaine-Moore

Re: Improving Performance of Query ~ Filter by A, Sort by B

From
Tom Lane
Date:
Lincoln Swaine-Moore <lswainemoore@gmail.com> writes:
> Here's the result (I turned off the timeout and got it to finish):
> ...

I think the core of the problem here is bad rowcount estimation.  We can't
tell from your output how many rows really match

> WHERE "a"."parent_id" IN (
>     49188,14816,14758,8402
> )

but the planner is guessing there are 5823 of them.  In the case with
only three IN items, we have

>          ->  Index Scan using a_parent_id_idx1 on a_partition1 a (cost=0.43..4888.37 rows=4367 width=12) (actual
time=5.581..36.270rows=50 loops=1) 
>                Index Cond: (parent_id = ANY ('{19948,21436,41220}'::integer[]))

so the planner thinks there are 4367 matching rows but there are only 50.
Anytime you've got a factor-of-100 estimation error, you're going to be
really lucky if you get a decent plan.

I suggest increasing the statistics target for the parent_id column
in hopes of getting better estimates for the number of matches.

            regards, tom lane


Re: Improving Performance of Query ~ Filter by A, Sort by B

From
Jeff Janes
Date:
On Tue, Jul 10, 2018 at 11:07 AM, Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:



Something about the estimated row counts (this problem persisted after I tried ANALYZEing)

What is your default_statistics_target?  What can you tell us about the distribution of parent_id?  (exponential, power law, etc?).  Can you show the results for select * from pg_stats where tablename='a' and attname='parent_id' \x\g\x  ?
 
forces usage of the tmstmp index and Merge Append (which seems wise) but also a filter condition on parent_id over an index condition, which is apparently prohibitively slow.

I tried creating a multicolumn index like:

CREATE INDEX "a_partition1_parent_and_tmstmp_idx" on "a_partition2" USING btree ("parent_id", "tmstmp" DESC);

But this didn't help (it wasn't used).

You could try reversing the order and adding a column to be (tmstmp, parent_id, id) and keeping the table well vacuumed.  This would allow the slow plan to still walk the indexes in tmstmp order but do it as an index-only scan, so it could omit the extra trip to the table. That trip to the table must be awfully slow to explain the numbers you show later in the thread.

...
 
This query plan (which is the same as when LIMIT is removed) has been a good short term solution when the number of "parent_id"s I'm using is still relatively small, but unfortunately queries grow untenably slow as the number of "parent_id"s involved increases:

What happens when you remove that extra order by phrase that you added?  The original slow plan should become much faster when the number of parent_ids is large (it has to dig through fewer index entries before accumulating 20 good ones), so you should try going back to that.

...


I'd be very grateful for help with one or both of these questions:
1) Why is adding an unnecessary (from the perspective of result correctness) ORDER BY valuable for forcing the parent_id index usage, and does that indicate that there is something wrong with my table/indexes/statistics/etc.?

It finds the indexes on tmstmp to be falsely attractive, as it can walk in tmstmp order and so avoid the sort. (Really not the sort itself, but the fact that sort has to first read every row to be sorted, while walking an index can abort once the LIMIT is satisfied).  Adding an extra phrase to the ORDER BY means the index is no longer capable of delivering rows in the needed order, so it no longer looks falsely attractive.  The same thing could be obtained by doing a dummy operation, such as ORDER BY tmstmp + '0 seconds' DESC.  I prefer that method, as it is more obviously a tuning trick.  Adding in "id" looks more like a legitimate desire to break any ties that might occasionally occur in tmstmp.

As Tom pointed out, there clearly is something wrong with your statistics, although we don't know what is causing it to go wrong.  Fixing the statistics isn't guaranteed to fix the problem, but it would be a good start.

 
 
2) Is there any way I can improve my query time when there are many "parent_id"s involved? I seem to only be able to get the query plan to use at most one of the parent_id index and the tmstmp index at a time. Perhaps the correct multicolumn index would help?

A few things mentioned above might help.  

But if they don't, is there any chance you could redesign your partitioning so that all parent_id queries together will always be in the same partition?  And if not, could you just get rid of the partitioning altogether?  1e7 row is not all that many and doesn't generally need partitioning.  Unless it is serving a specific purpose, it is probably costing you more than you are getting.  

Finally, could you rewrite it as a join to a VALUES list, rather than as an in-list?
 
Cheers,

Jeff

Re: Improving Performance of Query ~ Filter by A, Sort by B

From
Lincoln Swaine-Moore
Date:
Tom and Jeff,

Thanks very much for the suggestions!

Here's what I've found so far after playing around for a few more days:

What is your default_statistics_target?  What can you tell us about the distribution of parent_id?  (exponential, power law, etc?).  Can you show the results for select * from pg_stats where tablename='a' and attname='parent_id' \x\g\x  ?

The default_statistics_target is 500, which I agree seems quite insufficient for these purposes. I bumped this up to 2000, and saw some improvement in the row count estimation, but pretty much the same query plans. Unfortunately the distribution of counts is not intended to be correlated to parent_id, which is one reason I imagine the histograms might not be particularly effective unless theres one bucket for every value. Here is the output you requested:

select * from pg_stats where tablename='a' and attname='parent_id';

schemaname             | public
tablename              | a
attname                | parent_id
inherited              | t
null_frac              | 0
avg_width              | 4
n_distinct             | 18871
most_common_vals       | {15503,49787,49786,24595,49784,17549, ...} (2000 values)
most_common_freqs      | {0.0252983,0.02435,0.0241317,0.02329,0.019095,0.0103967,0.00758833,0.004245, ...} (2000 values)
histogram_bounds       | {2,12,17,24,28,36,47,59,74,80,86,98,108,121,135,141,147,160,169,177,190,204, ...} (2001 values)
correlation            | -0.161576
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |


Interestingly, the number of elements in these most_common_vals is as expected (2000) for the parent table, but it's lower for the partition tables, despite the statistics level being the same.

SELECT attstattarget
FROM   pg_attribute
WHERE  attrelid in ('a_partition1'::regclass, 'a'::regclass)
AND    attname = 'parent_id';
-[ RECORD 1 ]-+-----
attstattarget | 2000
-[ RECORD 2 ]-+-----
attstattarget | 2000


select * from pg_stats where tablename='a_partition1' and attname='parent_id';

schemaname             | public
tablename              | a_partition1
attname                | parent_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 3969
most_common_vals       | {15503,49787,49786,24595,49784,10451,20136,17604,9683, ...} (400-ish values)
most_common_freqs      | {0.0807067,0.0769483,0.0749433,0.073565,0.0606433,0.0127917,0.011265,0.0112367, ...} (400-ish values)
histogram_bounds       | {5,24,27,27,33,38,41,69,74,74, ...} (1500-ish values)
correlation            | 0.402414
most_common_elems      |
most_common_elem_freqs |
elem_count_histogram   |

A few questions re: statistics:
 1) would it be helpful to bump column statistics to, say, 20k (the number of distinct values of parent_id)?
 2) is the discrepancy between the statistics on the parent and child table be expected? certainly I would think that the statistics would be different, but I would've imagined they would have histograms of the same size given the settings being the same.
 3) is there a way to manually specify the the distribution of rows to be even? that is, set the frequency of each value to be ~ n_rows/n_distinct. This isn't quite accurate, but is a reasonable assumption about the distribution, and might generate better query plans.

The same thing could be obtained by doing a dummy operation, such as ORDER BY tmstmp + '0 seconds' DESC.  I prefer that method, as it is more obviously a tuning trick.  Adding in "id" looks more like a legitimate desire to break any ties that might occasionally occur in tmstmp.

I 100% agree that that is more clear. Thanks for the suggestion!

Finally, could you rewrite it as a join to a VALUES list, rather than as an in-list?

I should've mentioned this in my initial post, but I tried doing this without much success.

You could try reversing the order and adding a column to be (tmstmp, parent_id, id) and keeping the table well vacuumed.  This would allow the slow plan to still walk the indexes in tmstmp order but do it as an index-only scan, so it could omit the extra trip to the table. That trip to the table must be awfully slow to explain the numbers you show later in the thread.
 
Just to clarify, do you mean building indexes like:
CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on "a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
That seems promising! Is the intuition here that we want the first key of the index to be the one we are ultimately ordering by? Sounds like I make have had that flipped initially. My understanding of this whole situation (and please do correct me if this doesn't make sense) is the big bottleneck here is reading pages from disk (when looking at stopped up queries, the wait_event is DataFileRead), and so anything that can be done to minimize the pages read will be valuable. Which is why I would love to get the query plan to use the tmstmp index without having to filter thereafter by parent_id.

What happens when you remove that extra order by phrase that you added?  The original slow plan should become much faster when the number of parent_ids is large (it has to dig through fewer index entries before accumulating 20 good ones), so you should try going back to that.

Unfortunately, I've found that even when the number of parent_ids is large (2000), it's still prohibitively slow (haven't got it to finish), and maintains a query plan that involves an Index Scan Backward across the a_tmstmp_idxs (with a filter for parent_id).

And if not, could you just get rid of the partitioning altogether?  1e7 row is not all that many and doesn't generally need partitioning.  Unless it is serving a specific purpose, it is probably costing you more than you are getting.
 
Unfortunately, the partitioning is serving a specific purpose (to avoid some writing overhead, it's useful to be able to drop GIN indexes on one partition at a time during heavy writes). But given that the queries are slow on a single partition anyway, I suspect the partitioning isn't the main problem here.

But if they don't, is there any chance you could redesign your partitioning so that all parent_id queries together will always be in the same partition?
 
In effect, the parent_ids are distributed by part_key, so I've found that at least a stopgap solution is manually splitting apart the parent_ids by partition when generating the query. This has given the following query plan, which still doesn't use the tmstmp index (which would be ideal and to my understanding allow proper use of the LIMIT to minimize reads), but it does make things an order of magnitude faster by forcing the use of Bitmap. Apologies for the giant output; this approach balloons the size of the query itself.

SELECT "a"."id"
FROM "a" WHERE (
    ("a"."parent_id" IN (
      14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100
    )
    AND
    "a"."part_key" = 1
  )
  OR (
    "a"."parent_id" IN (
      50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348
    )
    AND
    "a"."part_key" = 2
  )
  OR (
    "a"."parent_id" IN (
      33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156
    )
    AND
    "a"."part_key" = 3
  )
  OR (
    "a"."parent_id" IN (
      42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071
    )
    AND
    "a"."part_key" = 4
  )
  OR (
    "a"."parent_id" IN (
      11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765
    )
    AND
    "a"."part_key" = 5
  )
)
ORDER BY
"a"."tmstmp" DESC, "a"."id" DESC
LIMIT 20;

Limit  (cost=449977.86..449977.91 rows=20 width=1412) (actual time=8967.465..8967.477 rows=20 loops=1)
  Output: a_partition1.id, a_partition1.tmstmp
  Buffers: shared hit=1641 read=125625 written=13428
  ->  Sort  (cost=449977.86..450397.07 rows=167683 width=1412) (actual time=8967.464..8967.468 rows=20 loops=1)
        Output: a_partition1.id, a_partition1.tmstmp
        Sort Key: a_partition1.tmstmp DESC, a_partition1.id DESC
        Sort Method: top-N heapsort  Memory: 85kB
        Buffers: shared hit=1641 read=125625 written=13428
        ->  Append  (cost=2534.33..445515.88 rows=167683 width=1412) (actual time=1231.579..8756.610 rows=145077 loops=1)
              Buffers: shared hit=1641 read=125625 written=13428
              ->  Bitmap Heap Scan on public.a_partition1  (cost=2534.33..246197.54 rows=126041 width=1393) (actual time=1231.578..4414.027 rows=115364 loops=1)
                    Output: a_partition1.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition1.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition1.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition1.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition1.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition1.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition1.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition1.part_key = 1)) OR ((a_partition1.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition1.part_key = 2)) OR ((a_partition1.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition1.part_key = 3)) OR ((a_partition1.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition1.part_key = 4)) OR ((a_partition1.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition1.part_key = 5)))
                    Heap Blocks: exact=93942
                    Buffers: shared hit=397 read=94547 written=6930
                    ->  BitmapOr  (cost=2534.33..2534.33 rows=192032 width=0) (actual time=1214.569..1214.569 rows=0 loops=1)
                          Buffers: shared hit=397 read=605 written=10
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..1479.43 rows=126041 width=0) (actual time=1091.952..1091.952 rows=115364 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=82 read=460 written=8
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..193.55 rows=14233 width=0) (actual time=26.911..26.911 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=66 read=33
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..275.65 rows=20271 width=0) (actual time=41.874..41.874 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=97 read=45 written=1
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..152.49 rows=11214 width=0) (actual time=23.542..23.542 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=52 read=26 written=1
                          ->  Bitmap Index Scan on a_parent_id_idx1  (cost=0.00..275.65 rows=20271 width=0) (actual time=30.271..30.271 rows=0 loops=1)
                                Index Cond: (a_partition1.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=100 read=41
              ->  Bitmap Heap Scan on public.a_partition2  (cost=850.51..78634.20 rows=19852 width=1485) (actual time=316.458..2166.105 rows=16908 loops=1)
                    Output: a_partition2.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition2.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition2.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition2.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition2.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition2.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition2.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition2.part_key = 1)) OR ((a_partition2.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition2.part_key = 2)) OR ((a_partition2.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition2.part_key = 3)) OR ((a_partition2.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition2.part_key = 4)) OR ((a_partition2.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition2.part_key = 5)))
                    Heap Blocks: exact=17769
                    Buffers: shared hit=402 read=18034 written=3804
                    ->  BitmapOr  (cost=850.51..850.51 rows=59567 width=0) (actual time=313.191..313.191 rows=0 loops=1)
                          Buffers: shared hit=402 read=265 written=40
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..155.81 rows=11177 width=0) (actual time=65.671..65.671 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=84 read=57 written=11
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..272.08 rows=19852 width=0) (actual time=116.974..116.974 rows=18267 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=68 read=98 written=18
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..155.81 rows=11177 width=0) (actual time=58.915..58.915 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=100 read=41 written=5
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..86.19 rows=6183 width=0) (actual time=25.370..25.370 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=53 read=25 written=3
                          ->  Bitmap Index Scan on a_parent_id_idx2  (cost=0.00..155.81 rows=11177 width=0) (actual time=46.254..46.254 rows=0 loops=1)
                                Index Cond: (a_partition2.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=97 read=44 written=3
              ->  Bitmap Heap Scan on public.a_partition3  (cost=692.99..56079.33 rows=13517 width=1467) (actual time=766.172..1555.761 rows=7594 loops=1)
                    Output: a_partition3.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition3.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition3.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition3.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition3.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition3.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition3.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition3.part_key = 1)) OR ((a_partition3.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition3.part_key = 2)) OR ((a_partition3.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition3.part_key = 3)) OR ((a_partition3.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition3.part_key = 4)) OR ((a_partition3.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition3.part_key = 5)))
                    Heap Blocks: exact=7472
                    Buffers: shared hit=432 read=7682 written=1576
                    ->  BitmapOr  (cost=692.99..692.99 rows=42391 width=0) (actual time=764.238..764.238 rows=0 loops=1)
                          Buffers: shared hit=432 read=210 written=51
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..138.53 rows=8870 width=0) (actual time=118.907..118.907 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=91 read=52 written=7
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..97.27 rows=6228 width=0) (actual time=26.656..26.656 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=71 read=28 written=3
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..225.13 rows=13517 width=0) (actual time=393.115..393.115 rows=7594 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=107 read=74 written=25
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..76.64 rows=4907 width=0) (actual time=36.979..36.979 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=56 read=22 written=9
                          ->  Bitmap Index Scan on a_parent_id_idx3  (cost=0.00..138.53 rows=8870 width=0) (actual time=188.574..188.574 rows=0 loops=1)
                                Index Cond: (a_partition3.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=107 read=34 written=7
              ->  Bitmap Heap Scan on public.a_partition4  (cost=709.71..64604.81 rows=8273 width=1470) (actual time=268.111..543.570 rows=5211 loops=1)
                    Output: a_partition4.id, a_partition1.tmstmp
                    Recheck Cond: ((a_partition4.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) OR (a_partition4.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) OR (a_partition4.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) OR (a_partition4.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) OR (a_partition4.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])))
                    Filter: (((a_partition4.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[])) AND (a_partition4.part_key = 1)) OR ((a_partition4.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[])) AND (a_partition4.part_key = 2)) OR ((a_partition4.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[])) AND (a_partition4.part_key = 3)) OR ((a_partition4.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[])) AND (a_partition4.part_key = 4)) OR ((a_partition4.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[])) AND (a_partition4.part_key = 5)))
                    Heap Blocks: exact=5153
                    Buffers: shared hit=410 read=5362 written=1118
                    ->  BitmapOr  (cost=709.71..709.71 rows=48654 width=0) (actual time=267.028..267.028 rows=0 loops=1)
                          Buffers: shared hit=410 read=209 written=50
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..153.69 rows=10908 width=0) (actual time=60.586..60.586 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{14467,30724,3976,13323,23971,44688,1938,275,19931,26527,42529,19491,11428,39110,9260,22446,44253,6705,15033,6461,34878,26687,45248,12739,24518,12359,12745,33738,32590,11856,14802,10451,16697,18904,41273,3547,12637,20190,36575,16096,19937,39652,2278,5616,5235,8570,15100}'::integer[]))
                                Buffers: shared hit=90 read=52 written=11
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..107.91 rows=7659 width=0) (actual time=47.041..47.041 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{50309,37126,36876,41878,9112,43673,10782,1696,11042,11168,34632,31157,7156,49213,29504,7200,38594,38598,5064,8783,27862,38201,473,19687,8296,49641,28394,29803,31597,19313,33395,32244,4348}'::integer[]))
                                Buffers: shared hit=64 read=35 written=7
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..153.69 rows=10908 width=0) (actual time=54.352..54.352 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{33024,9092,28678,29449,15757,36366,39963,46737,17156,39226,25628,14237,13125,17569,10914,39075,49734,40999,41756,8751,45490,29365,9143,5050,2463,35267,1220,12869,20294,24776,16329,5578,46605,30545,3544,37341,37086,28383,42527,45027,44292,35849,46314,2540,19696,21876,49156}'::integer[]))
                                Buffers: shared hit=101 read=40 written=8
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..130.39 rows=8273 width=0) (actual time=54.690..54.690 rows=5220 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{42242,50388,21916,13987,28708,4136,24617,31789,36533,28854,24247,15455,20805,38728,8908,20563,13908,20438,21087,4329,1131,46837,6505,16724,39675,35071}'::integer[]))
                                Buffers: shared hit=56 read=40 written=13
                          ->  Bitmap Index Scan on a_parent_id_idx4  (cost=0.00..153.69 rows=10908 width=0) (actual time=50.353..50.353 rows=0 loops=1)
                                Index Cond: (a_partition4.parent_id = ANY ('{11522,8964,47879,10380,46970,11278,6543,9489,27283,30958,40214,35076,21023,19746,11044,45605,41259,35245,36911,7344,20276,35126,3257,49134,1091,24388,5447,35017,4042,7627,42061,5582,30419,7508,19278,34778,38394,34153,20714,10860,7917,21614,10223,50801,28629,6782,7765}'::integer[]))
                                Buffers: shared hit=99 read=42 written=11
Planning time: 8.002 ms
Execution time: 8967.750 ms

When I remove the ID sorting hack from this query, it goes back to a nasty index scan on tmstmp with a filter key on the whole WHERE clause.

Thanks again for your help!


On Sat, Jul 14, 2018 at 11:25 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 10, 2018 at 11:07 AM, Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:



Something about the estimated row counts (this problem persisted after I tried ANALYZEing)

What is your default_statistics_target?  What can you tell us about the distribution of parent_id?  (exponential, power law, etc?).  Can you show the results for select * from pg_stats where tablename='a' and attname='parent_id' \x\g\x  ?
 
forces usage of the tmstmp index and Merge Append (which seems wise) but also a filter condition on parent_id over an index condition, which is apparently prohibitively slow.

I tried creating a multicolumn index like:

CREATE INDEX "a_partition1_parent_and_tmstmp_idx" on "a_partition2" USING btree ("parent_id", "tmstmp" DESC);

But this didn't help (it wasn't used).

You could try reversing the order and adding a column to be (tmstmp, parent_id, id) and keeping the table well vacuumed.  This would allow the slow plan to still walk the indexes in tmstmp order but do it as an index-only scan, so it could omit the extra trip to the table. That trip to the table must be awfully slow to explain the numbers you show later in the thread.

...
 
This query plan (which is the same as when LIMIT is removed) has been a good short term solution when the number of "parent_id"s I'm using is still relatively small, but unfortunately queries grow untenably slow as the number of "parent_id"s involved increases:

What happens when you remove that extra order by phrase that you added?  The original slow plan should become much faster when the number of parent_ids is large (it has to dig through fewer index entries before accumulating 20 good ones), so you should try going back to that.

...


I'd be very grateful for help with one or both of these questions:
1) Why is adding an unnecessary (from the perspective of result correctness) ORDER BY valuable for forcing the parent_id index usage, and does that indicate that there is something wrong with my table/indexes/statistics/etc.?

It finds the indexes on tmstmp to be falsely attractive, as it can walk in tmstmp order and so avoid the sort. (Really not the sort itself, but the fact that sort has to first read every row to be sorted, while walking an index can abort once the LIMIT is satisfied).  Adding an extra phrase to the ORDER BY means the index is no longer capable of delivering rows in the needed order, so it no longer looks falsely attractive.  The same thing could be obtained by doing a dummy operation, such as ORDER BY tmstmp + '0 seconds' DESC.  I prefer that method, as it is more obviously a tuning trick.  Adding in "id" looks more like a legitimate desire to break any ties that might occasionally occur in tmstmp.

As Tom pointed out, there clearly is something wrong with your statistics, although we don't know what is causing it to go wrong.  Fixing the statistics isn't guaranteed to fix the problem, but it would be a good start.

 
 
2) Is there any way I can improve my query time when there are many "parent_id"s involved? I seem to only be able to get the query plan to use at most one of the parent_id index and the tmstmp index at a time. Perhaps the correct multicolumn index would help?

A few things mentioned above might help.  

But if they don't, is there any chance you could redesign your partitioning so that all parent_id queries together will always be in the same partition?  And if not, could you just get rid of the partitioning altogether?  1e7 row is not all that many and doesn't generally need partitioning.  Unless it is serving a specific purpose, it is probably costing you more than you are getting.  

Finally, could you rewrite it as a join to a VALUES list, rather than as an in-list?
 
Cheers,

Jeff



--
Lincoln Swaine-Moore

Re: Improving Performance of Query ~ Filter by A, Sort by B

From
Jeff Janes
Date:
On Mon, Jul 16, 2018 at 5:29 PM, Lincoln Swaine-Moore <lswainemoore@gmail.com> wrote:
Tom and Jeff,

Thanks very much for the suggestions!

Here's what I've found so far after playing around for a few more days:

What is your default_statistics_target?  What can you tell us about the distribution of parent_id?  (exponential, power law, etc?).  Can you show the results for select * from pg_stats where tablename='a' and attname='parent_id' \x\g\x  ?

The default_statistics_target is 500, which I agree seems quite insufficient for these purposes. I bumped this up to 2000, and saw some improvement in the row count estimation, but pretty much the same query plans. Unfortunately the distribution of counts is not intended to be correlated to parent_id, which is one reason I imagine the histograms might not be particularly effective unless theres one bucket for every value. Here is the output you requested:

select * from pg_stats where tablename='a' and attname='parent_id';

schemaname             | public
tablename              | a
attname                | parent_id
inherited              | t
null_frac              | 0
avg_width              | 4
n_distinct             | 18871
most_common_vals       | {15503,49787,49786,24595,49784,17549, ...} (2000 values)
most_common_freqs      | {0.0252983,0.02435,0.0241317,0.02329,0.019095,0.0103967,0.00758833,0.004245, ...} (2000 values)

You showed the 8 most common frequencies.  But could you also show the last couple of them?  When your queried parent_id value is not on the MCV list, it is the frequency of the least frequent one on the list which puts an upper limit on how frequent the one you queried for can be.



A few questions re: statistics:
 1) would it be helpful to bump column statistics to, say, 20k (the number of distinct values of parent_id)?

Only one way to find out...
However you can only go up to 10k, not 20k.

 
 2) is the discrepancy between the statistics on the parent and child table be expected? certainly I would think that the statistics would be different, but I would've imagined they would have histograms of the same size given the settings being the same.

Is the n_distinct estimate accurate for the partition?  There is an algorithm (which will change in v11) to stop the MCV from filling the entire statistics target size if it thinks adding more won't be useful.  But I don't know why the histogram boundary list would be short.  But, I doubt that that is very important here.  The histogram is only used for inequality/range, not for equality/set membership.

 
 3) is there a way to manually specify the the distribution of rows to be even? that is, set the frequency of each value to be ~ n_rows/n_distinct. This isn't quite accurate, but is a reasonable assumption about the distribution, and might generate better query plans.


This would be going in the wrong direction.  Your queries seem to preferentially use rare parent_ids, not typical parent_ids.  In fact, it seems like many of your hard-coded parent_ids don't exist in the table at all.  That certainly isn't going to help the planner any.  Could you somehow remove those before constructing the query?  

You might also take a step back, where is that list of parent_ids coming from in the first place, and why couldn't you convert the list of literals into a query that returns that list naturally?


You could try reversing the order and adding a column to be (tmstmp, parent_id, id) and keeping the table well vacuumed.  This would allow the slow plan to still walk the indexes in tmstmp order but do it as an index-only scan, so it could omit the extra trip to the table. That trip to the table must be awfully slow to explain the numbers you show later in the thread.
 
Just to clarify, do you mean building indexes like:
CREATE INDEX "a_tmstmp_parent_id_id_idx_[PART_KEY]" on "a_partition[PART_KEY]" USING btree("tmstmp", "parent_id", "id")
That seems promising! Is the intuition here that we want the first key of the index to be the one we are ultimately ordering by? Sounds like I make have had that flipped initially. My understanding of this whole situation (and please do correct me if this doesn't make sense) is the big bottleneck here is reading pages from disk (when looking at stopped up queries, the wait_event is DataFileRead), and so anything that can be done to minimize the pages read will be valuable. Which is why I would love to get the query plan to use the tmstmp index without having to filter thereafter by parent_id.

Yes, that is the index.  

You really want it to filter by parent_id in the index, rather than going to the table to do the filter on parent_id.  The index pages with tmstmp as the leading column are going to be more tightly packed with potentially relevant rows, while the table pages are less likely to be densely packed.  So filtering in the index leads to less IO.  Currently, the only way to make that in-index filtering happen is with an index-only scan.  So the tmstmp needs to be first, and other two need to be present in either order.  Note that if you change select list of your query to select more than just "id", you will again defeat the index-only-scan.  
 

What happens when you remove that extra order by phrase that you added?  The original slow plan should become much faster when the number of parent_ids is large (it has to dig through fewer index entries before accumulating 20 good ones), so you should try going back to that.

Unfortunately, I've found that even when the number of parent_ids is large (2000), it's still prohibitively slow (haven't got it to finish), and maintains a query plan that involves an Index Scan Backward across the a_tmstmp_idxs (with a filter for parent_id).


I think the reason for this is that your 2000 literal parent_id values are preferentially rare or entirely absent from the table.  Therefore adding more parent_ids doesn't cause more rows to meet the filter qualification, so you don't accumulate a LIMIT worth of rows very fast.


The original index you discussed, '(parent_id, tmpstmp desc)', would be ideal if the parent_id were tested for equality rather than set membership.  So another approach to speeding this up would be to rewrite the query to test for equality on each parent_id separately and then combine the results.

select id from (
   (select * from a where parent_id= 34226 order by tmstmp desc limit 20) 
      union all
   (select * from a where parent_id= 24506 order by tmstmp desc limit 20)
      union all
   (select * from a where parent_id= 40986 order by tmstmp desc limit 20) 
      union all
   (select * from a where parent_id= 27162 order by tmstmp desc limit 20) 
) foo
ORDER BY tmstmp DESC LIMIT 20;
   
I don't know what is going to happen when you go from 4 parent_ids up to 2000, though.

It would be nice if PostgreSQL would plan your original query this way itself, but it lacks the machinery to do so.  I think there was a patch to add that machinery, but I don't remember seeing any work on it recently.

Cheers,

Jeff