Optimizing DISTINCT with LIMIT - Mailing list pgsql-hackers

From tmp
Subject Optimizing DISTINCT with LIMIT
Date
Msg-id gh8m5v$7oj$1@news.hub.org
Whole thread Raw
Responses Re: Optimizing DISTINCT with LIMIT  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
As far as I have understood the following query  SELECT DISTINCT foo  FROM bar  LIMIT baz
is done by first sorting the input and then traversing the sorted data, 
ensuring uniqueness of output and stopping when the LIMIT threshold is 
reached. Furthermore, a part of the sort procedure is to traverse input 
at least one time.

Now, if the input is large but the LIMIT threshold is small, this 
sorting step may increase the query time unnecessarily so here is a 
suggestion for optimization:  If the input is "sufficiently" large and the LIMIT threshold 
"sufficiently" small, maintain the DISTINCT output by hashning while 
traversing the input and stop when the LIMIT threshold is reached. No 
sorting required and *at* *most* one read of input.

Use case: Websites that needs to present small samples of huge queries fast.


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Assertion failure in new outer/semi/anti join code
Next
From: Gregory Stark
Date:
Subject: Re: In-place upgrade: catalog side