Home > mailing lists

Re: Distinct + Limit - Mailing list pgsql-performance

From	Francois Deliege
Subject	Re: Distinct + Limit
Date	March 27, 2012 17:58:54
Msg-id	16737833.463.1332881676120.JavaMail.geo-discussion-forums@pbcpw7 Whole thread
Responses	Re: Distinct + Limit Re: Distinct + Limit
List	pgsql-performance

Tree view

Hi list,

I have the following table with millions of rows:

CREATE TABLE table1
(
  col1 text,
  col2 text,
  col3 text,
  col4 text,
  col5 text,
  col6 text
)

select col1 from table1 group by col1 limit 1;
select distinct on (col1) col1 from table1 limit 1;

select col1 from table1 group by col1 limit 2;
select distinct on (col1) col1 from table1 limit 2;

Performing any of these following queries results in a full sequential scan, followed by a hash aggregate, and then the
limit. An optimization could be to stop the sequential scan as soon as the limit of results has been reached.  Am I
missingsomething? 

Limit  (cost=2229280.06..2229280.08 rows=2 width=8)
  ->  HashAggregate  (cost=2229280.06..2229280.21 rows=15 width=8)
        ->  Seq Scan on table1  (cost=0.00..2190241.25 rows=15615525 width=8)

Similarly, the following query results in a sequential scan:

select * from table1 where col1 <> col1;

This query is generated by the Sequel library abstraction layer in Ruby when filtering record based on a empty array of
values.We fixed this by handling that case on the client side, but originally thought the server would have rewritten
itand sent a empty result set. 

I would greatly appreciate any help on speeding up these without having to rewrite the queries on the client side.

Thanks,

Francois

pgsql-performance by date:

From: Dave Crooke
Date: 27 March 2012, 17:40:49
Subject: Re: Linux machine aggressively clearing cache

From: Ants Aasma
Date: 27 March 2012, 21:19:53
Subject: Re: Distinct + Limit

Re: Distinct + Limit - Mailing list pgsql-performance

Previous

Next