Home > mailing lists

Re: Bad plan on a huge table query - Mailing list pgsql-general

From	Daniel Cristian Cruz
Subject	Re: Bad plan on a huge table query
Date	March 26, 2013 15:08:49
Msg-id	CACffM9FbTy_J1SuH_cc3LTDetFKh7NV7OpSG9k_NDEdnPKbZjw@mail.gmail.com Whole thread Raw
In response to	Re: Bad plan on a huge table query (Jeff Janes <jeff.janes@gmail.com>)
Responses	Re: Bad plan on a huge table query
List	pgsql-general

Tree view

Well, I did it:

explain (analyze, buffers)

select count(*) from turma.aula_confirmacao where

inicio_aula::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP; -- changed name because of a conflict in some queries

http://explain.depesz.com/s/Fzr

And just to update, this is the actual query and plan:

EXPLAIN ANALYZE

SELECT

aluno_mensal.id_matricula,

aluno_mensal.id_turma,

aluno_mensal.turma,

aluno_mensal.id_produto_educacao,

aluno_mensal.produto_educacao,

aluno_mensal.unidade,

aluno_mensal.unidade_execucao,

aluno_mensal.modalidade,

aluno_mensal.id_pessoa,

aluno_mensal.nome_pessoa,

presenca.id_diario,

aula_confirmacao.inicio_aula::date AS data_aula,

presenca.justificativa_falta,

SUM(aula_confirmacao.termino_aula - aula_confirmacao.inicio_aula) AS carga_faltas,

mensal.ano AS ano_apuracao,

mensal.mes AS mes_apuracao

FROM indicadores.aluno_mensal

JOIN indicadores.mensal

ON mensal.id_mensal = aluno_mensal.id_mensal

JOIN turma.presenca

ON presenca.id_matricula = aluno_mensal.id_matricula

JOIN turma.aula_confirmacao

ON aula_confirmacao.id_evento = presenca.id_evento

JOIN turma.estudante_periodo

estudante_periodo.id_matricula = presenca.id_matricula AND

estudante_periodo.id_diario = presenca.id_diario AND

aula_confirmacao.inicio_aula::date BETWEEN estudante_periodo.inicio AND estudante_periodo.termino

WHERE

presenca.inicio_aula::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP AND

NOT presenca.presente AND

mensal.ano = EXTRACT(YEAR FROM CURRENT_TIMESTAMP) AND

aula_confirmacao.inicio_aula::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP AND

aula_confirmacao.confirmada AND

aluno_mensal.id_medicao = 7

GROUP BY

aluno_mensal.id_matricula,

aluno_mensal.id_turma,

aluno_mensal.turma,

aluno_mensal.id_produto_educacao,

aluno_mensal.produto_educacao,

aluno_mensal.unidade,

aluno_mensal.unidade_execucao,

aluno_mensal.modalidade,

aluno_mensal.id_pessoa,

aluno_mensal.nome_pessoa,

presenca.id_diario,

aula_confirmacao.inicio_aula::date,

presenca.justificativa_falta,

mensal.ano,

mensal.mes;

http://explain.depesz.com/s/YfXr

I guess that, there is something with estudante_periodo, because there is 24% with only one row and 50% with 5 or less rows on it:

with distr as (select id_matricula, count(*) from turma.estudante_periodo group by id_matricula)

select count as rows_on_estudante_periodo, count(*), (100 * count(*) / sum(count(*)) over ())::numeric(5,2) as percent from distr group by count order by 1;

rows_on_estudante_periodo | count | percent

---------------------------+-------+---------

1 | 24941 | 23.92

2 | 5720 | 5.49

3 | 5220 | 5.01

4 | 8787 | 8.43

5 | 7908 | 7.58

6 | 7357 | 7.06

7 | 4896 | 4.70

8 | 3076 | 2.95

9 | 2963 | 2.84

10 | 2679 | 2.57

11 | 6613 | 6.34

12 | 8708 | 8.35

13 | 4448 | 4.27

14 | 1411 | 1.35

15 | 2137 | 2.05

16 | 1219 | 1.17

17 | 2269 | 2.18

18 | 627 | 0.60

19 | 332 | 0.32

20 | 325 | 0.31

21 | 213 | 0.20

22 | 127 | 0.12

23 | 113 | 0.11

24 | 144 | 0.14

25 | 862 | 0.83

26 | 784 | 0.75

27 | 131 | 0.13

28 | 79 | 0.08

29 | 35 | 0.03

30 | 136 | 0.13

31 | 1 | 0.00

33 | 1 | 0.00

36 | 1 | 0.00

38 | 1 | 0.00

39 | 1 | 0.00

40 | 1 | 0.00

(36 rows)

After the refactoring, idx_aula_confirmacao_2 became idx_aula_confirmacao_1:

select * from pg_stats where tablename = 'idx_aula_confirmacao_1';

-[ RECORD 1 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

schemaname | turma

tablename | idx_aula_confirmacao_1

attname | inicio_aula

inherited | f

null_frac | 0.996792

avg_width | 4

n_distinct | 24

most_common_vals |

most_common_freqs |

histogram_bounds | {2013-02-04,2013-02-25,2013-03-12,2013-03-15,2013-03-19,2013-03-21,2013-03-22,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25}

correlation | 0.433041

most_common_elems |

most_common_elem_freqs |

elem_count_histogram |

I've run:

analyze turma.aula_confirmacao ;

select * from pg_stats where tablename = 'idx_aula_confirmacao_1';

schemaname | turma

tablename | idx_aula_confirmacao_1

attname | inicio_aula

inherited | f

null_frac | 0.996927

avg_width | 4

n_distinct | 24

most_common_vals |

most_common_freqs |

histogram_bounds | {2013-02-04,2013-03-04,2013-03-08,2013-03-11,2013-03-13,2013-03-18,2013-03-20,2013-03-22,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25}

correlation | 0.208954

most_common_elems |

most_common_elem_freqs |

elem_count_histogram |

and a third time:

analyze turma.aula_confirmacao ;

select * from pg_stats where tablename = 'idx_aula_confirmacao_1';

schemaname | turma

tablename | idx_aula_confirmacao_1

attname | inicio_aula

inherited | f

null_frac | 0.997112

avg_width | 4

n_distinct | 17

most_common_vals |

most_common_freqs |

histogram_bounds | {2013-02-13,2013-03-11,2013-03-15,2013-03-21,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25,2013-03-25}

correlation | 0.459312

most_common_elems |

most_common_elem_freqs |

elem_count_histogram |

Thanks,

2013/3/21 Jeff Janes <jeff.janes@gmail.com>

On Thu, Mar 21, 2013 at 12:30 PM, Daniel Cristian Cruz <danielcristian@gmail.com> wrote:

Are you using autovacuum? If so, you probably need to tune it more aggressively. For the short term, running an ANALYSE on those tables should at least get you more accurate query plans.

I've done it; with default_statistics_target on 1000, 100 and 200 (left it on 200, which was production config too).

You are doing an manual analyze each time you change default_statistics_target, right?

Can you do an "analyze verbose aula_confirmacao" and see if the output is as expected?

what happens if you do:

explain (analyze, buffers)
select count(*) from aula_confirmacao where
inicio::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP;

From your explain plan http://explain.depesz.com/s/GDJn, step 9, the row estimates for that simple query should be off by a factor of 23 (101508/4442), yet there is no apparent reason for that to give a bad estimate, other than bad statistics. There are no filters so cross-column correlations can't be throwing it off, so why is it so bad?

Also, it would be nice to see:

select * from pg_stats where tablename = 'idx_aula_confirmacao_2' \x\g\x

(which I assume is a function-based index)

Cheers,

Jeff

--
Daniel Cristian Cruz
クルズクリスチアンダニエル

pgsql-general by date:

From: Richard Huxton
Date: 26 March 2013, 14:29:04
Subject: Re: PostgreSQL service terminated by query

From: Clemens Eisserer
Date: 26 March 2013, 16:25:00
Subject: Re: Do "after update" trigger block the current transaction?

Re: Bad plan on a huge table query - Mailing list pgsql-general

Previous

Next