Home > mailing lists

Re: Bad plan on a huge table query - Mailing list pgsql-general

From	Daniel Cristian Cruz
Subject	Re: Bad plan on a huge table query
Date	March 21, 2013 22:37:50
Msg-id	CACffM9EOQuUuJf1JGT35ps_OQ-i_uu64CB6nSVB+sv6cGb70qA@mail.gmail.com Whole thread Raw
In response to	Re: Bad plan on a huge table query (Daniel Cristian Cruz <danielcristian@gmail.com>)
List	pgsql-general

Tree view

Ooops, no query... Now it goes (Jeff, types in each line):

SELECT

aluno_mensal.id_matricula, --integer

aluno_mensal.id_turma, --integer

aluno_mensal.turma, --text

aluno_mensal.id_produto_educacao, --integer

aluno_mensal.produto_educacao, --text

aluno_mensal.unidade, --text

aluno_mensal.unidade_execucao, --text

aluno_mensal.modalidade, --text

aluno_mensal.id_pessoa, --integer

aluno_mensal.nome_pessoa, --text

presenca.id_diario, --integer

aula_confirmacao.inicio::date AS data_aula, --timestamp to date

presenca.justificativa_falta, --text

SUM(aula_confirmacao.termino - aula_confirmacao.inicio) AS carga_faltas, --interval

mensal.ano AS ano_apuracao, --integer

mensal.mes AS mes_apuracao --integer

FROM indicadores.aluno_mensal

JOIN indicadores.mensal

ON mensal.id_mensal = aluno_mensal.id_mensal

JOIN turma.presenca

ON presenca.id_matricula = aluno_mensal.id_matricula

JOIN turma.aula_confirmacao

ON aula_confirmacao.id_evento = presenca.id_evento

JOIN turma.estudante_periodo

estudante_periodo.id_matricula = presenca.id_matricula AND

estudante_periodo.id_diario = presenca.id_diario AND

aula_confirmacao.inicio::date BETWEEN estudante_periodo.inicio AND estudante_periodo.termino -- timestamp, date, date

WHERE

presenca.inicio::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP AND -- timestamp

NOT presenca.presente AND --boolean

mensal.ano = EXTRACT(YEAR FROM CURRENT_TIMESTAMP) AND -- integer

aula_confirmacao.inicio::DATE BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP AND --timestamp to date

aula_confirmacao.confirmada AND -- boolean

aluno_mensal.id_medicao = 7

GROUP BY

aluno_mensal.id_matricula,

aluno_mensal.id_turma,

aluno_mensal.turma,

aluno_mensal.id_produto_educacao,

aluno_mensal.produto_educacao,

aluno_mensal.unidade,

aluno_mensal.unidade_execucao,

aluno_mensal.modalidade,

aluno_mensal.id_pessoa,

aluno_mensal.nome_pessoa,

presenca.id_diario,

aula_confirmacao.inicio::date,

presenca.justificativa_falta,

mensal.ano,

mensal.mes;

2013/3/21 Daniel Cristian Cruz <danielcristian@gmail.com>

2013/3/21 Alban Hertroys <haramrae@gmail.com>
On Mar 20, 2013, at 22:36, Daniel Cristian Cruz <danielcristian@gmail.com> wrote:

Hi,

I'm trying to figure out why does the planner found 1 row estimate using nested loops over a big table. There is no return from it:

http://explain.depesz.com/s/GRs

That plan contains no actual statistics, which makes it difficult to say anything about it. And you didn't provide any info on table definitions or indexes whatsoever, we have to put that together from the generated query plans. Not great...

My bad... I guess the plan could do it. And now I figured out that I lost the first query... Now the query looks like this:

It returns if disable nested loops, but the plan still poor:

http://explain.depesz.com/s/fMY

You could probably gain some here by adding an index on aluno_mensal.id_medicao. In step 14 the lack thereof causes a seqscan over more than a million rows.

There is already an index on id_medicao. It used a hashjoin because I disable mergejoin which uses the index, instead there is no return.

What I also find a bit peculiar is that the filter in step 7 appears to apply a function (date_part(text, date)) on every row in that heap. Do you perhaps have a functional index on that table that makes that operation efficient?

Yes, tried to improve performance creating a index on inicio using CAST(inicio TO DATE). The real filter here is aula_confirmacao.inicio::date BETWEEN DATE_TRUNC('YEAR', CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP (filtering all rows from year's beginning until now).

Besides, now() results in a timestamp, which in this query needs to be converted to date; it's perhaps better to use CURRENT_DATE there, although the benefits are probably immeasurable since it only needs to be calculated once for all rows it's compared against.

DATE_TRUNC expect a text and a timestamp.

I'm using PostgreSQL 9.2.3, default_statistics_target on 1000.

I can't remember what to make PostgreSQL sees a better estimate in the scan of aula_confirmacao and the join with presenca. I got rusty after a long time just doing modeling.

Does someone has some idea on that?

Are you asking about vacuum? You're definitely not vacuuming enough, your statistics and your actual numbers of rows differ by enormous amounts (that's why depesz colours them red).

autovacuum is running on production and the develop database. This is happening at develop database, fresh restore.

Are you using autovacuum? If so, you probably need to tune it more aggressively. For the short term, running an ANALYSE on those tables should at least get you more accurate query plans.

I've done it; with default_statistics_target on 1000, 100 and 200 (left it on 200, which was production config too).

Thank you and sorry about the broken english, there was a long time since the last time I wrote...

--
Daniel Cristian Cruz
クルズクリスチアンダニエル

--
Daniel Cristian Cruz
クルズクリスチアンダニエル

pgsql-general by date:

From: Daniel Cristian Cruz
Date: 21 March 2013, 22:30:47
Subject: Re: Bad plan on a huge table query

From: Rob Sargent
Date: 22 March 2013, 00:24:00
Subject: Re: regexp_replace failing on 9.0.4

Re: Bad plan on a huge table query - Mailing list pgsql-general

Previous

Next