Thread: ANALYZE'ing table hierarchies
Hello,
I have a question about running ANALYZE on table hierarchies.
The documentation page for partitioning mentions the need to manually issue VACUUM and ANALYZEs for each partition (http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html#DDL-PARTITIONING-CAVEATS), while the page for ANALYZE (http://www.postgresql.org/docs/9.4/static/sql-analyze.html) mentions it does gather statistics for the master table and its children.
When I ANALYZE a master table PostgreSQL says it is working on the entire hierarchy and automatically ANALYZEs the child tables. However, the last_analyze column stays empty for the child tables on pg_stat_user_tables.
I find this somewhat confusing, what is the expected behavior? I figure the child tables have been properly ANALYZEd, but I can't find it registered anywhere. It would be nice to be able to parse pg_stat_user_tables for analyze statistics without having to worry about hierarchies, since each partition seems to have its own statistics for everything else.
nunks=# analyze verbose tb05;
INFO: analyzing "public.tb05"
INFO: "tb05": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.tb05" inheritance tree
INFO: "tb05_2016": scanned 165 of 165 pages, containing 7465 live rows and 455 dead rows; 7465 rows in sample, 7465 estimated total rows
INFO: "tb05_2015": scanned 381 of 381 pages, containing 16281 live rows and 1504 dead rows; 16281 rows in sample, 16281 estimated total rows
ANALYZE
nunks=# select relname, last_analyze from pg_stat_user_tables where relname like 'tb05%';
relname | last_analyze
-----------+-------------------------------
tb05 | 2016-02-18 22:47:32.770076-02
tb05_2016 |
tb05_2015 |
(3 rows)
nunks=# \d+ tb05;
Table "public.tb05"
Column | Type | Modifiers | Storage | Stats target | Description
---------------------+-----------------------------+-----------+----------+--------------+-------------
a | character varying(8) | not null | extended | |
b | date | not null | plain | |
c | timestamp without time zone | not null | plain | |
d | timestamp without time zone | not null | plain | |
e | bigint | not null | plain | |
f | bigint | not null | plain | |
g | bigint | not null | plain | |
h | bigint | not null | plain | |
i | bigint | not null | plain | |
j | bigint | not null | plain | |
k | bigint | not null | plain | |
l | bigint | not null | plain | |
m | bigint | not null | plain | |
n | bigint | not null | plain | |
o | bigint | not null | plain | |
p | bigint | not null | plain | |
q | bigint | not null | plain | |
r | bigint | not null | plain | |
Indexes:
"tb05_pkey" PRIMARY KEY, btree (a, b)
Triggers:
tr_partition_tb05 BEFORE INSERT ON tb05 FOR EACH ROW EXECUTE PROCEDURE fn_partition_tb05()
Child tables: tb05_2015,
tb05_2016
Thanks!
Nunks
music=> create table music(id int,name text,style text);
CREATE TABLE
music=> create table rock (check(style = 'rock')) inherits(music);
CREATE TABLE
music=> create table pop (check(style = 'pop')) inherits(music);
CREATE TABLE
music=> create table classic (check(style = 'classic')) inherits(music);
CREATE TABLE
music=> create table jazz (check(style = 'jazz')) inherits(music);
CREATE TABLE
music=> \dS+ music
Table "eric.music"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
id | integer | | plain | |
name | text | | extended | |
style | text | | extended | |
Child tables: classic,
jazz,
pop,
rock
music=> create index music_pop_id on pop (id);
CREATE INDEX
music=> create index music_rock_id on rock (id);
CREATE INDEX
music=> create index music_jazz_id on jazz (id);
CREATE INDEX
music=> create index music_classic_id on classic (id);
CREATE INDEX
music=> CREATE OR REPLACE FUNCTION music_insert_trigger()
music-> RETURNS TRIGGER AS
music-> $$
music$> BEGIN
music$> IF (NEW.style = 'rock') THEN
music$> INSERT INTO rock VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'pop') THEN
music$> INSERT INTO pop VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'pop') THEN
music$> INSERT INTO pop VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'jazz') THEN
music$> INSERT INTO jazz VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'classic') THEN
music$> INSERT INTO classic VALUES (NEW.*);
music$> END IF;
music$> RETURN NULL;
music$> END;
music$> $$
music-> LANGUAGE plpgsql ;
CREATE FUNCTION
music=> CREATE TRIGGER insert_music_trigger
music-> BEFORE INSERT ON music
music-> FOR EACH row EXECUTE PROCEDURE music_insert_trigger() ;
CREATE TRIGGER
music=> insert into music values(2,'Have a Nice Day','pop');
INSERT 0 0
music=> insert into music values(1,'21 Gun','rock');
INSERT 0 0
music=> select * from music;
id | name | style
----+-----------------+-------
1 | 21 Gun | rock
2 | Have a Nice Day | pop
(2 rows)
music=> select * from pop
music-> ;
id | name | style
----+-----------------+-------
2 | Have a Nice Day | pop
(1 row)
music=> select * from rock;
id | name | style
----+--------+-------
1 | 21 Gun | rock
(1 row)
music=> analyze verbose music
music-> ;
INFO: analyzing "eric.music"
INFO: "music": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "eric.music" inheritance tree
INFO: "rock": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
INFO: "pop": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
ANALYZE
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'music';
relname | last_analyze
---------+-------------------------------
music | 2016-02-18 22:29:56.528758-08
(1 row)
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'pop';
relname | last_analyze
---------+--------------
pop |
(1 row)
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'rock';
relname | last_analyze
---------+--------------
rock |
(1 row)
music=> analyze verbose pop;
INFO: analyzing "eric.pop"
INFO: "pop": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
ANALYZE
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'rock';
relname | last_analyze
---------+--------------
rock |
(1 row)
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'pop';
relname | last_analyze
---------+-------------------------------
pop | 2016-02-18 22:31:55.666556-08
(1 row)
music=> analyze verbose rock;
INFO: analyzing "eric.rock"
INFO: "rock": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
ANALYZE
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'rock';
relname | last_analyze
---------+-------------------------------
rock | 2016-02-18 22:34:16.526558-08
(1 row)
Hello,
I have a question about running ANALYZE on table hierarchies.
The documentation page for partitioning mentions the need to manually issue VACUUM and ANALYZEs for each partition (http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html#DDL-PARTITIONING-CAVEATS), while the page for ANALYZE (http://www.postgresql.org/docs/9.4/static/sql-analyze.html) mentions it does gather statistics for the master table and its children.
When I ANALYZE a master table PostgreSQL says it is working on the entire hierarchy and automatically ANALYZEs the child tables. However, the last_analyze column stays empty for the child tables on pg_stat_user_tables.
I find this somewhat confusing, what is the expected behavior? I figure the child tables have been properly ANALYZEd, but I can't find it registered anywhere. It would be nice to be able to parse pg_stat_user_tables for analyze statistics without having to worry about hierarchies, since each partition seems to have its own statistics for everything else.
nunks=# analyze verbose tb05;
INFO: analyzing "public.tb05"
INFO: "tb05": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.tb05" inheritance tree
INFO: "tb05_2016": scanned 165 of 165 pages, containing 7465 live rows and 455 dead rows; 7465 rows in sample, 7465 estimated total rows
INFO: "tb05_2015": scanned 381 of 381 pages, containing 16281 live rows and 1504 dead rows; 16281 rows in sample, 16281 estimated total rows
ANALYZE
nunks=# select relname, last_analyze from pg_stat_user_tables where relname like 'tb05%';
relname | last_analyze
-----------+-------------------------------
tb05 | 2016-02-18 22:47:32.770076-02
tb05_2016 |
tb05_2015 |
(3 rows)
nunks=# \d+ tb05;
Table "public.tb05"
Column | Type | Modifiers | Storage | Stats target | Description
---------------------+-----------------------------+-----------+----------+--------------+-------------
a | character varying(8) | not null | extended | |
b | date | not null | plain | |
c | timestamp without time zone | not null | plain | |
d | timestamp without time zone | not null | plain | |
e | bigint | not null | plain | |
f | bigint | not null | plain | |
g | bigint | not null | plain | |
h | bigint | not null | plain | |
i | bigint | not null | plain | |
j | bigint | not null | plain | |
k | bigint | not null | plain | |
l | bigint | not null | plain | |
m | bigint | not null | plain | |
n | bigint | not null | plain | |
o | bigint | not null | plain | |
p | bigint | not null | plain | |
q | bigint | not null | plain | |
r | bigint | not null | plain | |
Indexes:
"tb05_pkey" PRIMARY KEY, btree (a, b)
Triggers:
tr_partition_tb05 BEFORE INSERT ON tb05 FOR EACH ROW EXECUTE PROCEDURE fn_partition_tb05()
Child tables: tb05_2015,
tb05_2016
Thanks!
Nunks
Yes, that's the same behavior I'm seeing in my database. Analyzing the master table works on the inheritance tree but doesn't register to pg_stat_user_tables unless you explicitly analyze each child table on its own.
What eludes me is that part of the documentation says we should analyze each table individually, and another part says we can count on ANALYZE to work on the entire inheritance tree...
Could it be that when we analyze the master the scanning of child tables just means it gather statistics for the master alone, as if it were a single big table consisting of its contents plus the contents of its children? Then I'd expect those statistics to not be used for planning queries that are made directly to a child table, and the behavior we see on pg_stat_user_tables is correct in that the statistics gathered only work for queries made to the master table.
Thanks! =)
Nunks
- Stella Adler |
Hi,NunksI have a test just now,as follows:Hope to help you!
music=> create table music(id int,name text,style text);
CREATE TABLE
music=> create table rock (check(style = 'rock')) inherits(music);
CREATE TABLE
music=> create table pop (check(style = 'pop')) inherits(music);
CREATE TABLE
music=> create table classic (check(style = 'classic')) inherits(music);
CREATE TABLE
music=> create table jazz (check(style = 'jazz')) inherits(music);
CREATE TABLE
music=> \dS+ music
Table "eric.music"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
id | integer | | plain | |
name | text | | extended | |
style | text | | extended | |
Child tables: classic,
jazz,
pop,
rock
music=> create index music_pop_id on pop (id);
CREATE INDEX
music=> create index music_rock_id on rock (id);
CREATE INDEX
music=> create index music_jazz_id on jazz (id);
CREATE INDEX
music=> create index music_classic_id on classic (id);
CREATE INDEX
music=> CREATE OR REPLACE FUNCTION music_insert_trigger()
music-> RETURNS TRIGGER AS
music-> $$
music$> BEGIN
music$> IF (NEW.style = 'rock') THEN
music$> INSERT INTO rock VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'pop') THEN
music$> INSERT INTO pop VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'pop') THEN
music$> INSERT INTO pop VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'jazz') THEN
music$> INSERT INTO jazz VALUES (NEW.*);
music$> ELSEIF (NEW.style = 'classic') THEN
music$> INSERT INTO classic VALUES (NEW.*);
music$> END IF;
music$> RETURN NULL;
music$> END;
music$> $$
music-> LANGUAGE plpgsql ;
CREATE FUNCTION
music=> CREATE TRIGGER insert_music_trigger
music-> BEFORE INSERT ON music
music-> FOR EACH row EXECUTE PROCEDURE music_insert_trigger() ;
CREATE TRIGGER
music=> insert into music values(2,'Have a Nice Day','pop');
INSERT 0 0
music=> insert into music values(1,'21 Gun','rock');
INSERT 0 0
music=> select * from music;
id | name | style
----+-----------------+-------
1 | 21 Gun | rock
2 | Have a Nice Day | pop
(2 rows)
music=> select * from pop
music-> ;
id | name | style
----+-----------------+-------
2 | Have a Nice Day | pop
(1 row)
music=> select * from rock;
id | name | style
----+--------+-------
1 | 21 Gun | rock
(1 row)
music=> analyze verbose music
music-> ;
INFO: analyzing "eric.music"
INFO: "music": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "eric.music" inheritance tree
INFO: "rock": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
INFO: "pop": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
ANALYZE
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'music';
relname | last_analyze
---------+-------------------------------
music | 2016-02-18 22:29:56.528758-08
(1 row)
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'pop';
relname | last_analyze
---------+--------------
pop |
(1 row)
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'rock';
relname | last_analyze
---------+--------------
rock |
(1 row)
music=> analyze verbose pop;
INFO: analyzing "eric.pop"
INFO: "pop": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
ANALYZE
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'rock';
relname | last_analyze
---------+--------------
rock |
(1 row)
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'pop';
relname | last_analyze
---------+-------------------------------
pop | 2016-02-18 22:31:55.666556-08
(1 row)
music=> analyze verbose rock;
INFO: analyzing "eric.rock"
INFO: "rock": scanned 1 of 1 pages, containing 1 live rows and 0 dead rows; 1 rows in sample, 1 estimated total rows
ANALYZE
music=> select relname, last_analyze from pg_stat_user_tables where relname = 'rock';
relname | last_analyze
---------+-------------------------------
rock | 2016-02-18 22:34:16.526558-08
(1 row)Hello,
I have a question about running ANALYZE on table hierarchies.
The documentation page for partitioning mentions the need to manually issue VACUUM and ANALYZEs for each partition (http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html#DDL-PARTITIONING-CAVEATS), while the page for ANALYZE (http://www.postgresql.org/docs/9.4/static/sql-analyze.html) mentions it does gather statistics for the master table and its children.
When I ANALYZE a master table PostgreSQL says it is working on the entire hierarchy and automatically ANALYZEs the child tables. However, the last_analyze column stays empty for the child tables on pg_stat_user_tables.
I find this somewhat confusing, what is the expected behavior? I figure the child tables have been properly ANALYZEd, but I can't find it registered anywhere. It would be nice to be able to parse pg_stat_user_tables for analyze statistics without having to worry about hierarchies, since each partition seems to have its own statistics for everything else.
nunks=# analyze verbose tb05;
INFO: analyzing "public.tb05"
INFO: "tb05": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing "public.tb05" inheritance tree
INFO: "tb05_2016": scanned 165 of 165 pages, containing 7465 live rows and 455 dead rows; 7465 rows in sample, 7465 estimated total rows
INFO: "tb05_2015": scanned 381 of 381 pages, containing 16281 live rows and 1504 dead rows; 16281 rows in sample, 16281 estimated total rows
ANALYZE
nunks=# select relname, last_analyze from pg_stat_user_tables where relname like 'tb05%';
relname | last_analyze
-----------+-------------------------------
tb05 | 2016-02-18 22:47:32.770076-02
tb05_2016 |
tb05_2015 |
(3 rows)
nunks=# \d+ tb05;
Table "public.tb05"
Column | Type | Modifiers | Storage | Stats target | Description
---------------------+-----------------------------+-----------+----------+--------------+-------------
a | character varying(8) | not null | extended | |
b | date | not null | plain | |
c | timestamp without time zone | not null | plain | |
d | timestamp without time zone | not null | plain | |
e | bigint | not null | plain | |
f | bigint | not null | plain | |
g | bigint | not null | plain | |
h | bigint | not null | plain | |
i | bigint | not null | plain | |
j | bigint | not null | plain | |
k | bigint | not null | plain | |
l | bigint | not null | plain | |
m | bigint | not null | plain | |
n | bigint | not null | plain | |
o | bigint | not null | plain | |
p | bigint | not null | plain | |
q | bigint | not null | plain | |
r | bigint | not null | plain | |
Indexes:
"tb05_pkey" PRIMARY KEY, btree (a, b)
Triggers:
tr_partition_tb05 BEFORE INSERT ON tb05 FOR EACH ROW EXECUTE PROCEDURE fn_partition_tb05()
Child tables: tb05_2015,
tb05_2016
Thanks!
Nunks
nunks <nunks.lol@gmail.com> writes: > What eludes me is that part of the documentation says we should analyze > each table individually, and another part says we can count on ANALYZE to > work on the entire inheritance tree... Whole-tree stats are a different thing from per-table stats. What ANALYZE on a parent table produces is (1) stats for that table alone and (2) summary stats for the whole inheritance tree rooted at that table. If you want (3) stats for a child table alone then you need to ANALYZE that child table by name. regards, tom lane
I deal with some big (TB-level) partitioned tables where the application does a hybrid of querying both the master and the child tables directly. This clarification will be very useful.
- Stella Adler |
nunks <nunks.lol@gmail.com> writes:
> What eludes me is that part of the documentation says we should analyze
> each table individually, and another part says we can count on ANALYZE to
> work on the entire inheritance tree...
Whole-tree stats are a different thing from per-table stats. What
ANALYZE on a parent table produces is (1) stats for that table alone
and (2) summary stats for the whole inheritance tree rooted at that table.
If you want (3) stats for a child table alone then you need to ANALYZE
that child table by name.
regards, tom lane
nunks <nunks.lol@gmail.com> writes: > I deal with some big (TB-level) partitioned tables where the application > does a hybrid of querying both the master and the child tables directly. > This clarification will be very useful. BTW, the autovacuum daemon is normally fairly good about analyzing tables when the individual-table stats need updating (or if you think it isn't, you can adjust its parameters to make it more or less aggressive about that). Where it falls down is actually in maintenance of whole-tree stats: it does not realize that a bunch of updates on a child table should prompt auto-analyze of the parent table as well as the child. So I'd actually not think that you need to worry too much about manual ANALYZEs of the children. You might indeed need to help things out with periodic manual ANALYZEs of the parent table, though. regards, tom lane