Thread: Statistic Estimation in PostgreSQL
I am facing problem in tracing in what events the selectivity histogram in pg_statistic is stored/updated. I went through the code in src/backend/commands/analyze.c and got to see the code computing the histogram but when I tried to trace the caller of analyze_rel or compute_index_stats, I find out that only during Vaccum command, these functions are invoked. I am little surprised as I did get to see PostgreSQL giving good selectivity estimations despite me running Vaccuum command even once. I did try to trace the control flow during table operations (creation / insertion) but couldn't find any histogram updation functions being invoked though I did get to know that some functions related to catalog updations is being invoked. Can anybody tell me in what all events is the pg_statistic table exactly updated?
rahulg escribió: > > Can anybody tell me in what all events is the pg_statistic table > exactly updated? only ANALYZE -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
rahulg wrote: > I am facing problem in tracing in what events the selectivity > histogram in pg_statistic is stored/updated. > > I went through the code in src/backend/commands/analyze.c and got to > see the code computing the histogram but when I tried to trace the > caller of analyze_rel or compute_index_stats, I find out that only > during Vaccum command, these functions are invoked. I am little > surprised as I did get to see PostgreSQL giving good selectivity > estimations despite me running Vaccuum command even once. > > I did try to trace the control flow during table operations > (creation / insertion) but couldn't find any histogram updation > functions being invoked though I did get to know that some functions > related to catalog updations is being invoked. > > Can anybody tell me in what all events is the pg_statistic table > exactly updated? The flow is like this: analyze_rel() determines the type-specific analyze function to call for each column: http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l270 the function OID gets pulled from pg_type, from the typanalyze column: http://www.postgresql.org/docs/8.3/static/catalog-pg-type.html currently there is only one datatype that has a custom analyze function (tsvector). All others get analyzed using the standard one: http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l1489 depending on whether the datatype has a < operator or not, the column is analyzed using compute_scalar_stats() or compute_minimal_stats(). The actual call of the chosen function happens here: http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l425 And the write to pg_statistic happens here: http://git.postgresql.org/?p=postgresql.git;a=blob;f=src/backend/commands/analyze.c;h=d96b7666f08d9ac74cccf05851fa8fd8fc2d4948;hb=HEAD#l446 HTH, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin