Re: Postgresql Sort cost Poor performance? - Mailing list pgsql-performance

From David Rowley
Subject Re: Postgresql Sort cost Poor performance?
Date
Msg-id CAKJS1f8aMw0Fbd7K1D4n08fRMUS_19D9pTNXd6u9Qi+eMj6wOQ@mail.gmail.com
Whole thread Raw
In response to Re: Postgresql Sort cost Poor performance?  ("tank.zhang" <6220104@qq.com>)
Responses Re: Postgresql Sort cost Poor performance?
List pgsql-performance
On Tue, 2 Apr 2019 at 20:00, tank.zhang <6220104@qq.com> wrote:
> 2、 Adding a DISTINCT response time was very slow
>
> qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
> QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
> IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
> TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
> TO_DATE('2019-03-11','YYYY-MM-DD');
>  checkcarnum
> -------------
>      1071367
> (1 row)

That's because of how DISTINCT is implemented within an aggregate
function in PostgreSQL. Internally within the aggregate code in the
executor, a sort is performed on the entire input to the aggregate
node.  The planner is currently unable to make use of any indexes that
provide pre-sorted input.

One way to work around this would be to perform the DISTINCT and
COUNT(*) in separate stages using a subquery.

From your original query, something like:

SELECT COUNT(SVIN) AS CHECKCARNUM,SMTOC
FROM (
SELECT SMTOC,SVIN
FROM QIS_CARPASSEDSTATION A
WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG'
AND A.SLINENO IN ( '1F' , '2F' , '3F' )
AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD')
AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD')
GROUP BY  SMTOC,SVIN
) A GROUP BY SMTOC;

An index something like:
CREATE INDEX ON QIS_CARPASSEDSTATION (SMTOC, SVIN, SSTATIONCD, DWORKDATE);

Should help speed up the subquery and provide pre-sorted input to the
outer aggregate.  If you like, you could add SLINENO to the end of the
index to allow an index-only scan which may result in further
performance improvements.

Without the index, you're forced to sort, but at least it's just one
sort instead of two.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



pgsql-performance by date:

Previous
From: "tank.zhang"
Date:
Subject: Re: Postgresql Sort cost Poor performance?
Next
From: "tank.zhang"
Date:
Subject: Re: Postgresql Sort cost Poor performance?