Re: document the need to analyze partitioned tables - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: document the need to analyze partitioned tables
Date
Msg-id Y8hjXaVPc9rSkBQw@momjian.us
Whole thread Raw
In response to Re: document the need to analyze partitioned tables  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: document the need to analyze partitioned tables
List pgsql-hackers
On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:
> On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
> > On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
> > > Maybe (all?) the clarification the docs need is to say:
> > > "Partitioned tables are not *themselves* processed by autovacuum."
> > 
> > Yes, I think the lack of autovacuum needs to be specifically mentioned
> > since most people assume autovacuum handles _all_ statistics updating.
> > 
> > Can someone summarize how bad it is we have no statistics on partitioned
> > tables?  It sounds bad to me.
> 
> Andrey Lepikhov had an example earlier in this thread[1].  It doesn't take
> an exotic query. 
> 
> Attached is a new version of my patch that tries to improve the wording.

Ah, yes, that is the example I saw but could not re-find.  Here is the
output:

    CREATE TABLE test (id integer, val integer) PARTITION BY hash (id);
    
    CREATE TABLE test_0 PARTITION OF test
       FOR VALUES WITH (modulus 2, remainder 0);
    CREATE TABLE test_1 PARTITION OF test
       FOR VALUES WITH (modulus 2, remainder 1);
    
    INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q);
    
    VACUUM ANALYZE test;
    
    INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q);
    
    VACUUM ANALYZE test_0,test_1;
    
    EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
    SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
                                                   QUERY PLAN                                                
    ---------------------------------------------------------------------------------------------------------
     Hash Join  (cost=7.50..15.25 rows=200 width=16) (actual rows=105 loops=1)
       Hash Cond: (t1.id = t2.val)
       ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             ->  Seq Scan on test_0 t1_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
             ->  Seq Scan on test_1 t1_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)
       ->  Hash  (cost=5.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 16kB
             ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
                   ->  Seq Scan on test_0 t2_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
                   ->  Seq Scan on test_1 t2_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)
    
    VACUUM ANALYZE test;
    
    EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
    SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
                                                   QUERY PLAN                                                
    ---------------------------------------------------------------------------------------------------------
     Hash Join  (cost=7.50..15.25 rows=200 width=16) (actual rows=105 loops=1)
       Hash Cond: (t2.val = t1.id)
       ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             ->  Seq Scan on test_0 t2_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
             ->  Seq Scan on test_1 t2_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)
       ->  Hash  (cost=5.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 16kB
             ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
                   ->  Seq Scan on test_0 t1_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
                   ->  Seq Scan on test_1 t1_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)

I see the inner side uses 'val' in the first EXPLAIN and 'id' in the
second, and you are right that 'val' has mostly 0/1.

Is it possible to document when partition table statistics helps?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Extracting cross-version-upgrade knowledge from buildfarm client
Next
From: Sandro Santilli
Date:
Subject: Re: Ability to reference other extensions by schema in extension scripts