Home > mailing lists

Re: Slow query - lots of temporary files. - Mailing list pgsql-performance

From	Johann Spies
Subject	Re: Slow query - lots of temporary files.
Date	June 12, 2015 15:04:54
Msg-id	CAGZ55DR2T6PVA8Zf4R-G5VXz25Q4VLGXVATJfWL2y0hwGD40CQ@mail.gmail.com Whole thread Raw
In response to	Re: Slow query - lots of temporary files. (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List	pgsql-performance

Tree view

On 10 June 2015 at 16:50, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

The problematic piece of the explain plan is this:

-> Merge Join (cost=4384310.92..21202716.78 rows=6664163593
width=390)"
Output: a.ut, c.gt, b.go, b.gn, d.au"
Merge Cond: ((c.ut)::text = (d.rart_id)::text)"

That is, the planner expects ~6.7 billion rows, each ~390B wide. That's ~2.5TB of data that needs to be stored to disk (so that the sort can process it).

The way the schema is designed might be one of the issues - ISTM the 'ut' column is somehow universal, mixing values referencing different columns in multiple tables. Not only that's utterly misleading for the planner (and may easily cause issues with huge intermediate results), but it also makes formulating the queries very difficult. And of course, the casting between text and int is not very good either.

Fix the schema to follow relational best practices - separate the values into multiple columns, and most of this will go away.

Thanks for your reply Tomas.

I do not understand what the problem with the 'ut' column is. It is a unique identifier in the first table(africa_uts) and is used in the other tables to establish joins and does have the same type definition in all the tables. Is the problem in the similar name. The data refers in all the 'ut' columns of the different tables to the same data. I do not casting of integers into text in this case. I don't know why the planner is doing it. The field 'rart_id' in isi.rauthor is just another name for 'ut' in the other tables and have the same datatype.

I do not understand your remark: "separate the values into multiple columns". I cannot see which values can be separated into different columns in the schema. Do you mean in the query? Why?

Johann

Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)

pgsql-performance by date:

From: Sasa Vilic
Date: 12 June 2015, 00:18:37
Subject: Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated

From: "Sheena, Prabhjot"
Date: 12 June 2015, 17:57:34
Subject: pg bouncer issue what does sv_used column means

Re: Slow query - lots of temporary files. - Mailing list pgsql-performance

Previous

Next