Home > mailing lists

Re: UNION ALL - Mailing list pgsql-hackers

From	Ibrar Ahmed
Subject	Re: UNION ALL
Date	August 15, 2019 19:21:17
Msg-id	CALtqXTfMFEte-JDcAJPsYW1TU2CJdECNM5mHHtgn4C9fEtwEpw@mail.gmail.com Whole thread Raw
In response to	Re: UNION ALL (066ce286@free.fr)
Responses	Re: UNION ALL
List	pgsql-hackers

Tree view

On Fri, Aug 16, 2019 at 12:16 AM <066ce286@free.fr> wrote:

Generally speaking, when executing UNION ; a DISTINCT is run afterward on the resultset.

So, if you're sure that each part of UNION cannot return a line returned by another one, you may use UNION ALL, you'll cut the cost of the final implicit DISTINCT.

----- Mail original -----
De: "Mark Pasterkamp" <markpasterkamp1994@gmail.com>
À: pgsql-hackers@lists.postgresql.org
Envoyé: Jeudi 15 Août 2019 20:37:06
Objet: UNION ALL

Dear all,

I was wondering if someone could help me understands what a union all actually does.

For my thesis I am using Apache Calcite to rewrite queries into using materialized views which I then give to a Postgres database.
For some queries, this means that they will be rewritten in a UNION ALL style query between an expression and a table scan of a materialized view.
However, contrary to what I expected, the UNION ALL query is actually a lot slower.

As an example, say I have 2 tables: actor and movie. Furthermore, there is also a foreign key index on movie to actor.
I also have a materialized view with the join of these 2 tables for all movies <= 2015 called A.
Now, if I want to query all entries in the join between actor and movie, I would assume that a UNION ALL between the join of actor and movie for movies >2015 and A is faster than executing the original query..
If I look at the explain analyze part, I can certainly see a reduction in cost up until the UNION ALL part, which carries a respective cost more than negating the cost reduction up to a point where I might as well not use the existing materialized view.

I have some trouble understanding this phenomenon.
One thought which came to my mind was that perhaps UNION ALL might create a temporary table containing both result sets, and then do a table scan and return that result.

this would greatly increase IO cost which could attribute to the problem.
However, I am really not sure what UNION ALL actually does to append both result sets so I was wondering if someone would be able to help me out with this.

Mark

066ce286@free.fr: Please, avoid top-posting. It makes harder to follow the
discussion.

Ibrar Ahmed

pgsql-hackers by date:

From: 066ce286@free.fr
Date: 15 August 2019, 19:15:50
Subject: Re: UNION ALL

From: Tomas Vondra
Date: 15 August 2019, 20:53:51
Subject: Re: Extension development

Re: UNION ALL - Mailing list pgsql-hackers

Previous

Next