Thread: Statistic Estimation in PostgreSQL

Statistic Estimation in PostgreSQL

From
rahulg
Date:
I am facing problem in tracing in what events the selectivity
histogram in pg_statistic is stored/updated.

I went through the code in src/backend/commands/analyze.c and got to
see the code computing the histogram but when I tried to trace the
caller of analyze_rel or compute_index_stats, I find out that only
during Vaccum command, these functions are invoked. I am little
surprised as I did get to see PostgreSQL giving good selectivity
estimations despite me running Vaccuum command even once.

I did try to trace the control flow during table operations
(creation / insertion) but couldn't find any histogram updation
functions being invoked though I did get to know that some functions
related to catalog updations is being invoked.

Can anybody tell me in what all events is the pg_statistic table
exactly updated?


Re: Statistic Estimation in PostgreSQL

From
Alvaro Herrera
Date:
rahulg escribió:

> 
> Can anybody tell me in what all events is the pg_statistic table
> exactly updated?

only ANALYZE

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Statistic Estimation in PostgreSQL

From
Jan Urbański
Date:
rahulg wrote:
> I am facing problem in tracing in what events the selectivity
> histogram in pg_statistic is stored/updated.
> 
> I went through the code in src/backend/commands/analyze.c and got to
> see the code computing the histogram but when I tried to trace the
> caller of analyze_rel or compute_index_stats, I find out that only
> during Vaccum command, these functions are invoked. I am little
> surprised as I did get to see PostgreSQL giving good selectivity
> estimations despite me running Vaccuum command even once.
> 
> I did try to trace the control flow during table operations
> (creation / insertion) but couldn't find any histogram updation
> functions being invoked though I did get to know that some functions
> related to catalog updations is being invoked.
> 
> Can anybody tell me in what all events is the pg_statistic table
> exactly updated?

The flow is like this:
analyze_rel() determines the type-specific analyze function to call for 
each column:

http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l270

the function OID gets pulled from pg_type, from the typanalyze column:
http://www.postgresql.org/docs/8.3/static/catalog-pg-type.html

currently there is only one datatype that has a custom analyze function 
(tsvector). All others get analyzed using the standard one:

http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l1489

depending on whether the datatype has a < operator or not, the column is 
analyzed using compute_scalar_stats() or compute_minimal_stats().
The actual call of the chosen function happens here:

http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l425

And the write to pg_statistic happens here:

http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l446

HTH,
Jan

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin