Re: Optimizer misses big in 10.4 with BRIN index - Mailing list pgsql-hackers
From | Arcadiy Ivanov |
---|---|
Subject | Re: Optimizer misses big in 10.4 with BRIN index |
Date | |
Msg-id | af24721c-8ad4-8edd-fd25-24818684b669@gmail.com Whole thread Raw |
In response to | Re: Optimizer misses big in 10.4 with BRIN index (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Optimizer misses big in 10.4 with BRIN index
|
List | pgsql-hackers |
On 07/26/2018 07:27 AM, Tomas Vondra wrote: > Arcadiy, can you provide plans with parallel query disabled? Or even > better, produce a test case that reproduces this (using synthetic > data, anonymized data or something like that, if needed). So I basically spent most of the time trying to create a reproducible case and I can say I failed. I can however reproduce this with specific large data set with specific data distribution, but not an artificial one. Unfortunately data is restricted so I have to obfuscate the data structures. That said, I am able to reproduce this sporadically on local machine after exporting from RDS 10.4. On some days reproduction happens always, on others it's hard to reproduce at all. So here it goes in sequential order: ** Data is imported into database ====================== db0=# \d+ schema0.table0 Table "schema0.table0" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+------------------------+-----------+----------+---------+----------+--------------+------------- field1 | character varying(128) | | not null | | extended | | field2 | character varying(128) | | not null | | extended | | field3 | character varying(128) | | not null | | extended | | field4 | bigint | | not null | | plain | | field5 | bigint | | not null | | plain | | field6 | bigint | | not null | | plain | | data | jsonb | | not null | | extended | | Indexes: "date_idx" brin (((data ->> 'date'::text)::numeric)) WITH (autosummarize='true') Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=5000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=5000, fillfactor=75 ** Data distribution in the table ====================== db0# SELECT (data->>'date')::numeric as date, count(1) FROM schema0.table0 GROUP BY date ORDER BY date; date | count ------------+--------- 1527580800 | 5 1527753600 | 72 1527840000 | 36 1528012800 | 18 1528099200 | 72 1528185600 | 7266 1528272000 | 336 1528358400 | 230 1528444800 | 90 1528704000 | 76 1528790400 | 4 1528876800 | 9060 1528934400 | 6 1528963200 | 33 1529193600 | 6 1529280000 | 18 1529294400 | 541344 1529380800 | 1113121 1529467200 | 794082 1529553600 | 604752 1529812800 | 135252 1529899200 | 63222 1529985600 | 31134 1530072000 | 25392 1530158400 | 90 1530417600 | 48 1530504000 | 144 1530518400 | 1 1530590400 | 1958 1530604800 | 1 1530763200 | 48614 1530849600 | 9 1531022400 | 3 1531108800 | 3 1531195200 | 1 *** This is the range we're interested in 1531281600 | 3713 ******* 1531627200 | 300 1531713600 | 1113 1531800000 | 30 1531886400 | 36000 1532232000 | 24900 1532577600 | 409364 1532664000 | 2050 1532836800 | 334782 1532923200 | 6771 (45 rows) ** No analysis was run after import. Let's see how we do. ======================= db0=# EXPLAIN (analyze, verbose, costs, buffers) SELECT ctid, (data->>'date')::numeric as date FROM schema0.table0 WHERE (data->>'date')::numeric >= '1531267200' AND (data->>'date')::numeric <= '1531353600' ORDER BY ctid, (data->>'date')::numeric DESC LIMIT 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1102962.86..1103079.54 rows=1000 width=38) (actual time=7234.091..7234.697 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Buffers: shared hit=9 read=350293 -> Gather Merge (cost=1102962.86..1105002.57 rows=17482 width=38) (actual time=7234.089..7234.589 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=9 read=350293 -> Sort (cost=1101962.84..1101984.69 rows=8741 width=38) (actual time=7227.938..7227.985 rows=964 loops=3) Output: ctid, (((data ->> 'date'::text))::numeric) Sort Key: table0.ctid, (((table0.data ->> 'date'::text))::numeric) DESC Sort Method: quicksort Memory: 102kB Buffers: shared hit=144 read=1048851 Worker 0: actual time=7223.770..7223.820 rows=1272 loops=1 Buffers: shared hit=68 read=348293 Worker 1: actual time=7226.520..7226.589 rows=1299 loops=1 Buffers: shared hit=67 read=350265 -> Parallel Seq Scan on schema0.table0 (cost=0.00..1101390.58 rows=8741 width=38) (actual time=68.862..7227.366 rows=1238 loops=3) Output: ctid, ((data ->> 'date'::text))::numeric Filter: ((((table0.data ->> 'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->> 'date'::text))::numeric <= '1531353600'::numeric)) Rows Removed by Filter: 1397270 Buffers: shared hit=32 read=1048849 Worker 0: actual time=50.494..7223.107 rows=1272 loops=1 Buffers: shared hit=11 read=348293 Worker 1: actual time=53.408..7225.849 rows=1299 loops=1 Buffers: shared hit=12 read=350263 Planning time: 2.096 ms Execution time: 7235.867 ms (28 rows) ** Ok, no statistics, Par Seq Scan is selected, it's expected. ======================= db0=# VACUUM ANALYZE VERBOSE schema0.table0; INFO: vacuuming "schema0.table0" INFO: index "date_idx" now contains 8194 row versions in 45 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: "table0": found 0 removable, 4195522 nonremovable row versions in 1048881 out of 1048881 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 1243 There were 0 unused item pointers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 2.36 s, system: 4.84 s, elapsed: 7.48 s. INFO: vacuuming "pg_toast.pg_toast_7888880" INFO: index "pg_toast_7888880_index" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: "pg_toast_7888880": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 1243 There were 0 unused item pointers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: analyzing "schema0.table0" INFO: "table0": scanned 30000 of 1048881 pages, containing 120000 live rows and 0 dead rows; 30000 rows in sample, 4195524 estimated total rows VACUUM ** Statistics are collected. Let's explain again ======================= db0=# EXPLAIN (analyze, verbose, costs, buffers) SELECT ctid, (data->>'date')::numeric as date FROM schema0.table0 WHERE (data->>'date')::numeric >= '1531267200' AND (data->>'date')::numeric <= '1531353600' ORDER BY ctid, (data->>'date')::numeric DESC LIMIT 1000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=2393.45..2394.66 rows=482 width=38) (actual time=7498.674..7498.791 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Buffers: shared hit=74 read=411764 -> Sort (cost=2393.45..2394.66 rows=482 width=38) (actual time=7498.672..7498.731 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Sort Key: table0.ctid, (((table0.data ->> 'date'::text))::numeric) DESC Sort Method: top-N heapsort Memory: 95kB Buffers: shared hit=74 read=411764 -> Bitmap Heap Scan on schema0.table0 (cost=162.16..2371.97 rows=482 width=38) (actual time=58.322..7497.656 rows=3713 loops=1) Output: ctid, ((data ->> 'date'::text))::numeric Recheck Cond: ((((table0.data ->> 'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->> 'date'::text))::numeric <= '1531353600'::numeric)) Rows Removed by Index Recheck: 1643390 Heap Blocks: lossy=411776 Buffers: shared hit=74 read=411764 -> Bitmap Index Scan on date_idx (cost=0.00..162.03 rows=557 width=0) (actual time=40.243..40.243 rows=4117760 loops=1) Index Cond: ((((table0.data ->> 'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->> 'date'::text))::numeric <= '1531353600'::numeric)) Buffers: shared hit=62 Planning time: 1.388 ms Execution time: 7498.907 ms (19 rows) ** Wait... What??? "Rows Removed by Index Recheck: 1643390" but data is almost sequential! Let's take a look at it. ======================= db0=# SELECT ctid, (data->>'date')::numeric as date FROM schema0.table0 WHERE (data->>'date')::numeric >= '1531267200' AND (data->>'date')::numeric <= '1531353600' ORDER BY ctid, (data->>'date')::numeric DESC LIMIT 1000; ctid | date -----------+------------ (4456,2) | 1531281600 (4468,2) | 1531281600 (9301,2) | 1531281600 (9310,3) | 1531281600 ... (9317,4) | 1531281600 (9325,3) | 1531281600 (9325,4) | 1531281600 (9326,1) | 1531281600 (9328,1) | 1531281600 (9328,2) | 1531281600 (9345,1) | 1531281600 ... (9365,2) | 1531281600 (9386,2) | 1531281600 (9386,3) | 1531281600 (9386,4) | 1531281600 (9418,4) | 1531281600 ... (9419,4) | 1531281600 (12921,2) | 1531281600 ... (12962,2) | 1531281600 (12983,1) | 1531281600 (12983,2) | 1531281600 (12983,3) | 1531281600 (12983,4) | 1531281600 (13028,3) | 1531281600 (13028,4) | 1531281600 (14113,1) | 1531281600 (14113,2) | 1531281600 (15297,1) | 1531281600 ... (15317,2) | 1531281600 (15350,1) | 1531281600 (15350,2) | 1531281600 (15350,3) | 1531281600 (15350,4) | 1531281600 (15386,3) | 1531281600 (15386,4) | 1531281600 (17620,3) | 1531281600 ... (17648,4) | 1531281600 (17652,1) | 1531281600 ... (17653,2) | 1531281600 (17658,1) | 1531281600 ... (17659,2) | 1531281600 (21229,2) | 1531281600 (21243,1) | 1531281600 ... (21246,4) | 1531281600 (35794,2) | 1531281600 ... (35798,1) | 1531281600 (35827,4) | 1531281600 ... (35887,2) | 1531281600 (37006,3) | 1531281600 ... (37080,1) | 1531281600 (37157,2) | 1531281600 (37157,3) | 1531281600 (37157,4) | 1531281600 (39475,1) | 1531281600 ... (39552,4) | 1531281600 (39623,4) | 1531281600 (39653,1) | 1531281600 (39653,2) | 1531281600 (39653,3) | 1531281600 (39653,4) | 1531281600 (39674,1) | 1531281600 (39674,2) | 1531281600 (39674,3) | 1531281600 (39674,4) | 1531281600 (40829,3) | 1531281600 (40829,4) | 1531281600 (41946,4) | 1531281600 ... (41962,2) | 1531281600 (1000 rows) ** Ok, fine, there are large gaps. Let's make **absolutely sure** the data is sequential. ======================= db0=# CREATE INDEX date_ord_idx ON schema0.table0 (((table0.data ->> 'date')::numeric)); CREATE INDEX ** Not running any analysis on a new index, shouldn't matter if stats are wrong, expecting scan and sort. ======================= db0=# CLUSTER VERBOSE schema0.table0 USING date_ord_idx; INFO: clustering "schema0.table0" using sequential scan and sort INFO: "table0": found 0 removable, 4195522 nonremovable row versions in 1048881 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 873.12 s, system: 19.18 s, elapsed: 1019.36 s. CLUSTER ** Ok, now we're clustered. Dropping order b-tree index. ======================= db0=# DROP INDEX schema0.date_ord_idx ; DROP INDEX ** Let's explain again ======================= db0=# EXPLAIN (analyze, verbose, costs, buffers) SELECT ctid, (data->>'date')::numeric as date FROM schema0.table0 WHERE (data->>'date')::numeric >= '1531267200' AND (data->>'date')::numeric <= '1531353600' ORDER BY ctid, (data->>'date')::numeric DESC LIMIT 1000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=2393.45..2394.66 rows=482 width=38) (actual time=7339.584..7339.703 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Buffers: shared hit=56 read=410752 -> Sort (cost=2393.45..2394.66 rows=482 width=38) (actual time=7339.583..7339.634 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Sort Key: table0.ctid, (((table0.data ->> 'date'::text))::numeric) DESC Sort Method: top-N heapsort Memory: 95kB Buffers: shared hit=56 read=410752 -> Bitmap Heap Scan on schema0.table0 (cost=162.16..2371.97 rows=482 width=38) (actual time=59.252..7338.635 rows=3713 loops=1) Output: ctid, ((data ->> 'date'::text))::numeric Recheck Cond: ((((table0.data ->> 'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->> 'date'::text))::numeric <= '1531353600'::numeric)) Rows Removed by Index Recheck: 1639294 Heap Blocks: lossy=410752 Buffers: shared hit=56 read=410752 -> Bitmap Index Scan on date_idx (cost=0.00..162.03 rows=557 width=0) (actual time=24.545..24.545 rows=4107520 loops=1) Index Cond: ((((table0.data ->> 'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->> 'date'::text))::numeric <= '1531353600'::numeric)) Buffers: shared hit=56 Planning time: 0.515 ms Execution time: 7339.772 ms (19 rows) ** "Rows Removed by Index Recheck: 1639294" - OK, statistics are pooched post-clustering. Let's reanalyze. ======================= db0=# ANALYZE VERBOSE schema0.table0; INFO: analyzing "schema0.table0" INFO: "table0": scanned 30000 of 1048881 pages, containing 120000 live rows and 0 dead rows; 30000 rows in sample, 4195524 estimated total rows ANALYZE ** Let's try this again... ======================= db0=# EXPLAIN (analyze, verbose, costs, buffers) SELECT ctid, (data->>'date')::numeric as date FROM schema0.table0 WHERE (data->>'date')::numeric >= '1531267200' AND (data->>'date')::numeric <= '1531353600' ORDER BY ctid, (data->>'date')::numeric DESC LIMIT 1000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=5444.33..5445.71 rows=555 width=38) (actual time=6533.824..6533.927 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Buffers: shared hit=410808 -> Sort (cost=5444.33..5445.71 rows=555 width=38) (actual time=6533.822..6533.862 rows=1000 loops=1) Output: ctid, (((data ->> 'date'::text))::numeric) Sort Key: table0.ctid, (((table0.data ->> 'date'::text))::numeric) DESC Sort Method: top-N heapsort Memory: 95kB Buffers: shared hit=410808 -> Bitmap Heap Scan on schema0.table0 (cost=162.22..5419.03 rows=555 width=38) (actual time=59.316..6532.913 rows=3713 loops=1) Output: ctid, ((data ->> 'date'::text))::numeric Recheck Cond: ((((table0.data ->> 'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->> 'date'::text))::numeric <= '1531353600'::numeric)) Rows Removed by Index Recheck: 1639294 Heap Blocks: lossy=410752 Buffers: shared hit=410808 -> Bitmap Index Scan on date_idx (cost=0.00..162.08 rows=1339 width=0) (actual time=30.859..30.859 rows=4107520 loops=1) Index Cond: ((((table0.data ->> 'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->> 'date'::text))::numeric <= '1531353600'::numeric)) Buffers: shared hit=56 Planning time: 1.100 ms Execution time: 6534.026 ms (19 rows) ** This cannot be possible! The table must be clustered on the same expression we cluster BRIN! Let's look at the data. ======================= db0=# SELECT ctid, (data->>'date')::numeric as date FROM schema0.table0 WHERE (data->>'date')::numeric >= '1531267200' AND (data->>'date')::numeric <= '1531353600' ORDER BY ctid, (data->>'date')::numeric DESC LIMIT 1000; ctid | date -----------+------------ (4155,2) | 1531281600 ... (4159,1) | 1531281600 (4188,4) | 1531281600 (4189,1) | 1531281600 (4189,2) | 1531281600 (4189,3) | 1531281600 (4189,4) | 1531281600 (4231,4) | 1531281600 (4247,3) | 1531281600 (4247,4) | 1531281600 (4248,1) | 1531281600 (4248,2) | 1531281600 (5367,3) | 1531281600 ... (5368,4) | 1531281600 (5376,3) | 1531281600 ... (5441,1) | 1531281600 (5518,2) | 1531281600 (5518,3) | 1531281600 (5518,4) | 1531281600 (7836,1) | 1531281600 (7836,2) | 1531281600 (7851,3) | 1531281600 ... (7913,4) | 1531281600 (7984,4) | 1531281600 (8014,1) | 1531281600 (8014,2) | 1531281600 (8014,3) | 1531281600 (8014,4) | 1531281600 (8035,1) | 1531281600 (8035,2) | 1531281600 (8035,3) | 1531281600 (8035,4) | 1531281600 (9190,3) | 1531281600 (9190,4) | 1531281600 (10307,4) | 1531281600 ... (10349,3) | 1531281600 (10371,3) | 1531281600 ... (10392,2) | 1531281600 (10445,4) | 1531281600 (10446,1) | 1531281600 (10446,2) | 1531281600 (11580,3) | 1531281600 ... (11586,2) | 1531281600 (11595,3) | 1531281600 ... (11598,2) | 1531281600 (11600,1) | 1531281600 ... (11601,2) | 1531281600 (11609,1) | 1531281600 ... (11621,2) | 1531281600 (11621,3) | 1531281600 (11664,3) | 1531281600 ... (11684,3) | 1531281600 (11726,2) | 1531281600 ... (11770,4) | 1531281600 (12891,2) | 1531281600 (14057,1) | 1531281600 ... (14058,2) | 1531281600 (14064,3) | 1531281600 ... (14071,3) | 1531281600 (1000 rows) **** The issue seems to be with both clustering and BRIN stats. Fillfactor on the table is not a factor, can be 100 can be 75, I'm able to reproduce and fail to reproduce with both. Clustering on `date_ord_idx` can be done with ANALYZE and without, in some cases PG picks scan and sort, sometimes it uses index regardless of stats. Sometimes the table is properly clustered, sometimes it isn't afterwards. Sometimes if you cluster first on the unanalyzed `date_ord_idx`, clustering doesn't fix the problem, but then analyzing and clustering on `date_ord_idx` again does fix the issue. Sometimes neither helps. Full vacuum analyze may help, or may not help, depending on the phase of the moon. When BRIN is actually fixed, the explain has a few hundred rows eliminated by the recheck, not >1M, and queries are 50 - 70 ms long for date range. If anybody can give a suggestion on where in the code to start looking, I'll be extra-grateful. -- Arcadiy Ivanov arcadiy@gmail.com | @arcivanov | https://ivanov.biz https://github.com/arcivanov
pgsql-hackers by date: