Thread: comparison between 2 execution plans

comparison between 2 execution plans

From
Neto pr
Date:
Dear all

Could you help me understand these two execution plans for the same query (query 3 benchmark TPCH www.tpc.org/tpch), executed in two different environments of Postgresql, as described below:
Execution Plan 1:
- https://explain.depesz.com/s/Ughh
- Postgresql version 10.1 (default) with index on l_shipdate (table lineitem)

Execution Plan 2:
- https://explain.depesz.com/s/7Zb7
- Postgresql version 9.5 (version with source code changed by me) with index on l_orderkey (table lineitem).

Some doubts
- Difference between GroupAggregate and Finalize GroupAggregate
- because some algorithms show measurements on "Disk" and others on "Memory" example:
     - External sort Disk: 52784kB
     - quicksort Memory: 47770kB

Because one execution plan was much smaller than the other, considering that the query is the same and the data are the same.
--------------------------------------------------
select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'HOUSEHOLD'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-21'
    and l_shipdate > date '1995-03-21'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
--------------------------------------------------

best regards
Neto

Re: comparison between 2 execution plans

From
Neto pr
Date:
Further information is These plans were generated by the EXPLAIN ANALYZE command, and the time of plan 1 (Postgresql 10.1 )  was 4.7 minutes and plan 2 (postgresql 9.5 changed)  was 2.95 minutes.

2018-05-05 6:26 GMT-07:00 Neto pr <netoprbr9@gmail.com>:
Dear all

Could you help me understand these two execution plans for the same query (query 3 benchmark TPCH www.tpc.org/tpch), executed in two different environments of Postgresql, as described below:
Execution Plan 1:
- https://explain.depesz.com/s/Ughh
- Postgresql version 10.1 (default) with index on l_shipdate (table lineitem)

Execution Plan 2:
- https://explain.depesz.com/s/7Zb7
- Postgresql version 9.5 (version with source code changed by me) with index on l_orderkey (table lineitem).

Some doubts
- Difference between GroupAggregate and Finalize GroupAggregate
- because some algorithms show measurements on "Disk" and others on "Memory" example:
     - External sort Disk: 52784kB
     - quicksort Memory: 47770kB

Because one execution plan was much smaller than the other, considering that the query is the same and the data are the same.
--------------------------------------------------
select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'HOUSEHOLD'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-21'
    and l_shipdate > date '1995-03-21'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
--------------------------------------------------

best regards
Neto

Re: comparison between 2 execution plans

From
Adrian Klaver
Date:
On 05/05/2018 06:26 AM, Neto pr wrote:
> Dear all
> 
> Could you help me understand these two execution plans for the same 
> query (query 3 benchmark TPCH www.tpc.org/tpch 
> <http://www.tpc.org/tpch>), executed in two different environments of 
> Postgresql, as described below:
> Execution Plan 1:
> - https://explain.depesz.com/s/Ughh
> - Postgresql version 10.1 (default) with index on l_shipdate (table 
> lineitem)
> 
> Execution Plan 2:
> - https://explain.depesz.com/s/7Zb7
> - Postgresql version 9.5 (version with source code changed by me) with 

It might help if you explained what 'version with source code changed by 
me' means?

Also the schema for the table lineitem from both instances might help 
shed light.

Any reason why the index changed between instances?


> index on l_orderkey (table lineitem).
> 
> Some doubts
> - Difference between GroupAggregate and Finalize GroupAggregate
> - because some algorithms show measurements on "Disk" and others on 
> "Memory" example:
>       - External sort Disk: 52784kB
>       - quicksort Memory: 47770kB
> 
> Because one execution plan was much smaller than the other, considering 
> that the query is the same and the data are the same.
> --------------------------------------------------
> select
>      l_orderkey,
>      sum(l_extendedprice * (1 - l_discount)) as revenue,
>      o_orderdate,
>      o_shippriority
> from
>      customer,
>      orders,
>      lineitem
> where
>      c_mktsegment = 'HOUSEHOLD'
>      and c_custkey = o_custkey
>      and l_orderkey = o_orderkey
>      and o_orderdate < date '1995-03-21'
>      and l_shipdate > date '1995-03-21'
> group by
>      l_orderkey,
>      o_orderdate,
>      o_shippriority
> order by
>      revenue desc,
>      o_orderdate
> --------------------------------------------------
> 
> best regards
> Neto


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: comparison between 2 execution plans

From
Neto pr
Date:
Dear,


2018-05-05 9:57 GMT-07:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 05/05/2018 06:26 AM, Neto pr wrote:
Dear all

Could you help me understand these two execution plans for the same query (query 3 benchmark TPCH www.tpc.org/tpch <http://www.tpc.org/tpch>), executed in two different environments of Postgresql, as described below:
Execution Plan 1:
- https://explain.depesz.com/s/Ughh
- Postgresql version 10.1 (default) with index on l_shipdate (table lineitem)

Execution Plan 2:
- https://explain.depesz.com/s/7Zb7
- Postgresql version 9.5 (version with source code changed by me) with

It might help if you explained what 'version with source code changed by me' means?

Postgresql with modified source code, is that I modified some internal functions of cost (source code) and parameters in Postgresql.conf so that it is possible for the DBMS to differentiate cost of read (random and sequence) and write (random and sequence), this is because reading in SSDs' and more than 400 times faster than HDD. This is due to academic research that I am doing.

 

Also the schema for the table lineitem from both instances might help shed light.

I am using 40g scale, in this way the lineitem table has (40 * 6 million) 240 million of the rows.


Any reason why the index changed between instances?



index on l_orderkey (table lineitem).

Some doubts
- Difference between GroupAggregate and Finalize GroupAggregate
- because some algorithms show measurements on "Disk" and others on "Memory" example:
      - External sort Disk: 52784kB
      - quicksort Memory: 47770kB

Because one execution plan was much smaller than the other, considering that the query is the same and the data are the same.
--------------------------------------------------
select
     l_orderkey,
     sum(l_extendedprice * (1 - l_discount)) as revenue,
     o_orderdate,
     o_shippriority
from
     customer,
     orders,
     lineitem
where
     c_mktsegment = 'HOUSEHOLD'
     and c_custkey = o_custkey
     and l_orderkey = o_orderkey
     and o_orderdate < date '1995-03-21'
     and l_shipdate > date '1995-03-21'
group by
     l_orderkey,
     o_orderdate,
     o_shippriority
order by
     revenue desc,
     o_orderdate
--------------------------------------------------

best regards
Neto


--
Adrian Klaver
adrian.klaver@aklaver.com

Regards
Neto

Re: comparison between 2 execution plans

From
Adrian Klaver
Date:
On 05/05/2018 10:51 AM, Neto pr wrote:
> Dear,
> 
> 
> 2018-05-05 9:57 GMT-07:00 Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>>:
> 
>     On 05/05/2018 06:26 AM, Neto pr wrote:

> 
>     It might help if you explained what 'version with source code
>     changed by me' means?
> 
> 
> Postgresql with modified source code, is that I modified some internal 
> functions of cost (source code) and parameters in Postgresql.conf so 
> that it is possible for the DBMS to differentiate cost of read (random 
> and sequence) and write (random and sequence), this is because reading 
> in SSDs' and more than 400 times faster than HDD. This is due to 
> academic research that I am doing.

Seems to me that the above may enter into the difference between 
instances. Someone with more knowledge of the Postgres internals then I 
would need to comment, contingent on seeing the changes I would presume.

Also from my previous post:

Any reason why the index changed between instances?

> 
> 
>     Also the schema for the table lineitem from both instances might
>     help shed light.
> 
> 
> see schema of the tables below:
> https://docs.snowflake.net/manuals/_images/sample-data-tpch-schema.png
> 
> I am using 40g scale, in this way the lineitem table has (40 * 6 
> million) 240 million of the rows.
> 
> 
>     Any reason why the index changed between instances?
> 
> 
> 
>         index on l_orderkey (table lineitem).
> 
>         Some doubts
>         - Difference between GroupAggregate and Finalize GroupAggregate
>         - because some algorithms show measurements on "Disk" and others
>         on "Memory" example:
>                - External sort Disk: 52784kB
>                - quicksort Memory: 47770kB
> 
>         Because one execution plan was much smaller than the other,
>         considering that the query is the same and the data are the same.
>         --------------------------------------------------
>         select
>               l_orderkey,
>               sum(l_extendedprice * (1 - l_discount)) as revenue,
>               o_orderdate,
>               o_shippriority
>         from
>               customer,
>               orders,
>               lineitem
>         where
>               c_mktsegment = 'HOUSEHOLD'
>               and c_custkey = o_custkey
>               and l_orderkey = o_orderkey
>               and o_orderdate < date '1995-03-21'
>               and l_shipdate > date '1995-03-21'
>         group by
>               l_orderkey,
>               o_orderdate,
>               o_shippriority
>         order by
>               revenue desc,
>               o_orderdate
>         --------------------------------------------------
> 
>         best regards
>         Neto
> 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 
> 
> Regards
> Neto


-- 
Adrian Klaver
adrian.klaver@aklaver.com