Re: Slow query - lots of temporary files. - Mailing list pgsql-performance

From Johann Spies
Subject Re: Slow query - lots of temporary files.
Date
Msg-id CAGZ55DR2T6PVA8Zf4R-G5VXz25Q4VLGXVATJfWL2y0hwGD40CQ@mail.gmail.com
Whole thread Raw
In response to Re: Slow query - lots of temporary files.  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-performance


On 10 June 2015 at 16:50, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


The problematic piece of the explain plan is this:

 ->  Merge Join  (cost=4384310.92..21202716.78 rows=6664163593
                  width=390)"
       Output: a.ut, c.gt, b.go, b.gn, d.au"
       Merge Cond: ((c.ut)::text = (d.rart_id)::text)"

That is, the planner expects ~6.7 billion rows, each ~390B wide. That's ~2.5TB of data that needs to be stored to disk (so that the sort can process it).

The way the schema is designed might be one of the issues - ISTM the 'ut' column is somehow universal, mixing values referencing different columns in multiple tables. Not only that's utterly misleading for the planner (and may easily cause issues with huge intermediate results), but it also makes formulating the queries very difficult. And of course, the casting between text and int is not very good either.

Fix the schema to follow relational best practices - separate the values into multiple columns, and most of this will go away.

Thanks for your reply Tomas.

I do not understand what the problem with the 'ut' column is.  It is a unique identifier in the first table(africa_uts) and is used in the other tables to establish joins and does have the same type definition in all the tables.  Is the problem in the similar name.  The data refers in all the 'ut' columns of the different tables to the same data.  I do not casting of integers into text in this case.  I don't know why the planner is doing it.  The field 'rart_id' in isi.rauthor is just another name for 'ut' in the other tables and have the same datatype.

I do not understand your remark: "separate the values into multiple columns". I cannot see which values can be separated into different columns in the schema.  Do you mean in the query?  Why?


Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)

pgsql-performance by date:

Previous
From: Sasa Vilic
Date:
Subject: Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated
Next
From: "Sheena, Prabhjot"
Date:
Subject: pg bouncer issue what does sv_used column means