BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function
Date
Msg-id 16625-07403f18463add24@postgresql.org
Whole thread Raw
Responses Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function
Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16625
Logged by:          XINYU LIU
Email address:      xinyuliu@umich.edu
PostgreSQL version: 12.3
Operating system:   Ubuntu 19.10
Description:

Hello,

We are developing a tool for automatically finding performance bugs in
PostgreSQL. Our key insight is that given a pair of semantic equivalent
queries, a robust DBMS should return the same result within a similar
execution time. Significant time difference suggests a potential performance
bug in the DBMS.

We are sharing a pair of TPC-H queries that exhibit a potential performance
bug in this report:

First query:
SELECT "s_suppkey" 
FROM   "supplier" 
WHERE  s_suppkey > 100;  

Second query:
SELECT "s_suppkey" 
FROM   "supplier" 
WHERE  s_suppkey > 100 
GROUP  BY s_suppkey; 

[Actual Behavior]
We executed both queries on the TPC-H benchmark of scale factor 5: the first
query takes only 17 millisecond, while the second query takes 42
millisecond. We think the time difference results from different plans
selected.

[Query Execution Plan]
First query:
                                                       QUERY PLAN
                                      
--------------------------------------------------------------------------------------------------------------
 Seq Scan on supplier  (cost=0.00..1734.00 rows=49909 width=4) (actual
time=0.047..14.898 rows=49900 loops=1)
   Filter: (s_suppkey > 100)
   Rows Removed by Filter: 100
 Planning Time: 0.639 ms
 Execution Time: 17.469 ms
(5 rows)




Second query:
                                                                      QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1858.77..2357.86 rows=49909 width=4) (actual
time=30.093..38.541 rows=49900 loops=1)
   Group Key: s_suppkey
   ->  Seq Scan on supplier  (cost=0.00..1734.00 rows=49909 width=4) (actual
time=0.047..13.656 rows=49900 loops=1)
         Filter: (s_suppkey > 100)
         Rows Removed by Filter: 100
 Planning Time: 0.669 ms
 Execution Time: 42.270 ms
(7 rows)


[Expected Behavior]
Since these two queries are semantically equivalent, we were hoping that
PostgreSQL will return the same results in roughly the same amount of
time.


[Test Environment]
Ubuntu 19.10
PostgreSQL v12.3
Database: TPC-H benchmark (with scale factor 5)


[Steps for reproducing our observations]
    
* Download the dataset from the link:
https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing
* Set up TPC-H benchmark
tar xzvf tpch5_postgresql.tar.gz
cd tpch5_postgresql
db=tpch5
createdb $db
psql -d $db < dss.ddl
for i in `ls *.tbl`
do
    echo $i
    name=`echo $i|cut -d'.' -f1`
    psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING
'LATIN1';"
done
psql -d $db < dss_postgres.ri
*Execute the queries


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification
Next
From: Christophe Pettus
Date:
Subject: Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function