Re: document the need to analyze partitioned tables - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: document the need to analyze partitioned tables |
Date | |
Msg-id | Y8hjXaVPc9rSkBQw@momjian.us Whole thread Raw |
In response to | Re: document the need to analyze partitioned tables (Laurenz Albe <laurenz.albe@cybertec.at>) |
Responses |
Re: document the need to analyze partitioned tables
|
List | pgsql-hackers |
On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote: > On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote: > > On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote: > > > Maybe (all?) the clarification the docs need is to say: > > > "Partitioned tables are not *themselves* processed by autovacuum." > > > > Yes, I think the lack of autovacuum needs to be specifically mentioned > > since most people assume autovacuum handles _all_ statistics updating. > > > > Can someone summarize how bad it is we have no statistics on partitioned > > tables? It sounds bad to me. > > Andrey Lepikhov had an example earlier in this thread[1]. It doesn't take > an exotic query. > > Attached is a new version of my patch that tries to improve the wording. Ah, yes, that is the example I saw but could not re-find. Here is the output: CREATE TABLE test (id integer, val integer) PARTITION BY hash (id); CREATE TABLE test_0 PARTITION OF test FOR VALUES WITH (modulus 2, remainder 0); CREATE TABLE test_1 PARTITION OF test FOR VALUES WITH (modulus 2, remainder 1); INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q); VACUUM ANALYZE test; INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q); VACUUM ANALYZE test_0,test_1; EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF) SELECT * FROM test t1, test t2 WHERE t1.id = t2.val; QUERY PLAN --------------------------------------------------------------------------------------------------------- Hash Join (cost=7.50..15.25 rows=200 width=16) (actual rows=105 loops=1) Hash Cond: (t1.id = t2.val) -> Append (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1) -> Seq Scan on test_0 t1_1 (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1) -> Seq Scan on test_1 t1_2 (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1) -> Hash (cost=5.00..5.00 rows=200 width=8) (actual rows=200 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 16kB -> Append (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1) -> Seq Scan on test_0 t2_1 (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1) -> Seq Scan on test_1 t2_2 (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1) VACUUM ANALYZE test; EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF) SELECT * FROM test t1, test t2 WHERE t1.id = t2.val; QUERY PLAN --------------------------------------------------------------------------------------------------------- Hash Join (cost=7.50..15.25 rows=200 width=16) (actual rows=105 loops=1) Hash Cond: (t2.val = t1.id) -> Append (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1) -> Seq Scan on test_0 t2_1 (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1) -> Seq Scan on test_1 t2_2 (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1) -> Hash (cost=5.00..5.00 rows=200 width=8) (actual rows=200 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 16kB -> Append (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1) -> Seq Scan on test_0 t1_1 (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1) -> Seq Scan on test_1 t1_2 (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1) I see the inner side uses 'val' in the first EXPLAIN and 'id' in the second, and you are right that 'val' has mostly 0/1. Is it possible to document when partition table statistics helps? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
pgsql-hackers by date: