Home > mailing lists

Re: Distinct + Limit - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Distinct + Limit
Date	March 28, 2012 11:13:26
Msg-id	22625.1332943981@sss.pgh.pa.us Whole thread Raw
In response to	Re: Distinct + Limit (Francois Deliege <fdeliege@gmail.com>)
Responses	Re: Distinct + Limit
List	pgsql-performance

Tree view

Francois Deliege <fdeliege@gmail.com> writes:
> I have the following table with millions of rows:

> CREATE TABLE table1
> (
>   col1 text,
>   col2 text,
>   col3 text,
>   col4 text,
>   col5 text,
>   col6 text
> )

> select col1 from table1 group by col1 limit 1;
> select distinct on (col1) col1 from table1 limit 1;

> select col1 from table1 group by col1 limit 2;
> select distinct on (col1) col1 from table1 limit 2;

> Performing any of these following queries results in a full sequential
scan, followed by a hash aggregate, and then the limit.

Well, if you had an index on the column, you would get a significantly
better plan ...

> Similarly, the following query results in a sequential scan:

> select * from table1 where col1 <> col1;

> This query is generated by the Sequel library abstraction layer in Ruby when filtering record based on a empty array
ofvalues. We fixed this by handling that case on the client side, but originally thought the server would have
rewrittenit and sent a empty result set. 

It does not, and never will, because that would be an incorrect
optimization.  "col1 <> col1" isn't constant false, it's more like
"col1 is not null".  I'd suggest "WHERE FALSE", or "WHERE 1 <> 1"
if you must, to generate a provably false constraint.

            regards, tom lane

pgsql-performance by date:

From: Ants Aasma
Date: 27 March 2012, 21:19:53
Subject: Re: Distinct + Limit

From: Merlin Moncure
Date: 28 March 2012, 13:40:12
Subject: Re: Distinct + Limit

Re: Distinct + Limit - Mailing list pgsql-performance

Previous

Next