Thread: Quota query with decent performance?

Quota query with decent performance?

From
Troels Arvin
Date:
Hello,

I'm researching how "quota queries" (a term used by Fabian Pascal) may be
performed in various DBMSes with acceptable performance:
http://troels.arvin.dk/db/rdbms/#select-limit-simple-note

An example of a quota query could be to get the top-3 youngest people from
a collection of people. The complicated part is that such a query might
return more than 3 rows in some tie situations.

In MSSQL and DB2 there are very efficient facilities for such queries, but
I can't find any well-performing declarative methods for PostgreSQL. I
have tried a couple of different strategies, and I currently get the best
results from a correlated subquery like

SELECT * FROM person AS px
WHERE ( SELECT COUNT(*) FROM person AS py WHERE py.age < px.age
) < 3;

When my base table has 4000 rows, my query takes 27 seconds in PostgreSQL
7.2.3 (PIII 1000MHz) which is clearly unacceptable, especially comparing
to the same query in DB2 which only takes 1.4 seconds (on the same server)
- or to this non-standard-SQL DB2-query which only takes 0.02 seconds to
calculate the same result:

SELECT *
FROM ( SELECT firstname,age,RANK() OVER (ORDER BY age ASC) AS rank FROM person
) AS foo
WHERE rank<=3;

Test-files with table definitions and randomly generated rows:
http://troels.arvin.dk/db/tests/quota.1/

Any suggestions on how to perform fast "quota queries" in PostgreSQL?

-- 
Greetings from Troels Arvin, Copenhagen, Denmark




Re: Quota query with decent performance?

From
Josh Berkus
Date:
Troels,

Thank you for contacting us before publishing your results.   Please ignore
any list-trolls who criticize your methodology; there are a few cranks on
every list.   The important thing is your contacted us.

> In MSSQL and DB2 there are very efficient facilities for such queries, but
> I can't find any well-performing declarative methods for PostgreSQL. I
> have tried a couple of different strategies, and I currently get the best
> results from a correlated subquery like
>
> SELECT * FROM person AS px
> WHERE (
>   SELECT COUNT(*)
>   FROM person AS py
>   WHERE py.age < px.age
> ) < 3;
>
> When my base table has 4000 rows, my query takes 27 seconds in PostgreSQL
> 7.2.3 (PIII 1000MHz) which is clearly unacceptable,

Well, first off 7.2.3 is a two-year old version.   There have been performance
improvements specificially on these sorts of issues since then.  Please use
at least 7.3.4, and we would prefer that you use 7.4RC2, which is available
now.  7.4, in particular, introduces significant improvements in "group by"
queries. If for some reason you have to compare against this old version,
please be fair to us and note somewhere that you were using an old version of
PostgreSQL.

Second, the query you post is one "SQL Standard" way, which is good for
portability but not for speed.  Frankly, I'm not convinced that it's even the
best SQL standard way.  On the other databases, you seem happy to use
non-SQL-standard syntax, so let me give you one such solution in PostgreSQL:

SELECT * FROM person
WHERE person.age >= (SELECT p2.agefrom person p2order by p2.age DESC LIMIT 1 OFFSET 2)

also try:

SELECT *
FROM person, (SELECT p2.agefrom person p2order by p2.age DESC LIMIT 1 OFFSET 2) as prank
WHERE person.age >= prank.age

This should give you all of the rows whose ages are in the top 3 ranks much
faster.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Quota query with decent performance?

From
Chester Kustarz
Date:
maybe:

select *
from person
where age <=
(select age from person order by age limit 1 offset 2);

7.20 msec

assuming it does what you want.

On Tue, 11 Nov 2003, Troels Arvin wrote:
> An example of a quota query could be to get the top-3 youngest people from
> a collection of people. The complicated part is that such a query might
> return more than 3 rows in some tie situations.



Re: Quota query with decent performance?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Second, the query you post is one "SQL Standard" way, which is good for 
> portability but not for speed.  Frankly, I'm not convinced that it's even the
> best SQL standard way.  On the other databases, you seem happy to use 
> non-SQL-standard syntax, so let me give you one such solution in PostgreSQL:
> [snip]

I don't know of any very good solution in bog-standard SQL either.

Aside from the LIMIT-based solution that Josh offered, I recall that
Oleg Bartunov and Teodor Sigaev had some ideas about top-N-aggregate
solutions.  We didn't accept those into the main distribution (yet)
but if you dig in the PG list archives I think there is working code
available.  Try searching for "partial sorting".
        regards, tom lane


Re: Quota query with decent performance?

From
Troels Arvin
Date:
On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote:

[... cut efficient top-3 youngest query wanted ...]

> select *
> from person
> where age <=
> (select age from person order by age limit 1 offset 2);

Thanks - it works; why didn't I think of that?

-- 
Greetings from Troels Arvin, Copenhagen, Denmark




Re: Quota query with decent performance?

From
Troels Arvin
Date:
On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote:

> select *
> from person
> where age <=
> (select age from person order by age limit 1 offset 2);

Integrated into http://troels.arvin.dk/db/rdbms/#select-top-n

-- 
Greetings from Troels Arvin, Copenhagen, Denmark




Re: Quota query with decent performance?

From
Troels Arvin
Date:
On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote:

[... discussion of top-n query (where n=3) ...]

> select *
> from person
> where age <=
> (select age from person order by age limit 1 offset 2);

It fails when the cardinality of person is less than 3 (returns empty
set). My solution is this, which is not as beautiful any more:

SELECT *
FROM person
WHERE age <= COALESCE ( (   SELECT age FROM person   ORDER BY age ASC   LIMIT 1 OFFSET 2       -- 2=n-1 ),(   SELECT
ageFROM person   ORDER BY age DESC      -- note: opposite of ASC   LIMIT 1 )
 
);

-- 
Greetings from Troels Arvin, Copenhagen, Denmark




Re: Quota query with decent performance?

From
Bruno Wolff III
Date:
On Mon, Nov 17, 2003 at 23:55:53 +0100, Troels Arvin <troels@arvin.dk> wrote:
> On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote:
> 
> [... discussion of top-n query (where n=3) ...]
> 
> > select *
> > from person
> > where age <=
> > (select age from person order by age limit 1 offset 2);
> 
> It fails when the cardinality of person is less than 3 (returns empty
> set). My solution is this, which is not as beautiful any more:

Used that way the subselect value will be null if there are no matching
rows. This should allow you to do something like:

select *
from person
where not isfalse (age <=
(select age from person order by age limit 1 offset 2));


Re: Quota query with decent performance?

From
Troels Arvin
Date:
On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote:

> select *
> from person
> where not isfalse (age <=
> (select age from person order by age limit 1 offset 2));

Thanks; much nicer than my COALESCE-variant.
http://troels.arvin.dk/db/rdbms/#select-top-n-postgresql updated.

-- 
Greetings from Troels Arvin, Copenhagen, Denmark




Re: Quota query with decent performance?

From
Troels Arvin
Date:
On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote:

[...]
> where not isfalse (age <=
[...]

Strange. I can't find the ISFALSE in neither PostgreSQL or standard SQL
documentation. How can that be?

-- 
Greetings from Troels Arvin, Copenhagen, Denmark




Re: Quota query with decent performance?

From
Bruno Wolff III
Date:
On Tue, Nov 18, 2003 at 15:21:43 +0100, Troels Arvin <troels@arvin.dk> wrote:
> On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote:
> 
> [...]
> > where not isfalse (age <=
> [...]
> 
> Strange. I can't find the ISFALSE in neither PostgreSQL or standard SQL
> documentation. How can that be?

I didn't remember the exact syntax and found the isfalse function by
guessing. The standard syntax is expression is false instead of
isfalse(expression), though the function does work (in 7.4 at least).
area=> select isfalse(null);isfalse
---------f
(1 row)



Re: Quota query with decent performance?

From
Bruno Wolff III
Date:
On Tue, Nov 18, 2003 at 15:21:43 +0100, Troels Arvin <troels@arvin.dk> wrote:
> On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote:
> 
> [...]
> > where not isfalse (age <=
> [...]
> 
> Strange. I can't find the ISFALSE in neither PostgreSQL or standard SQL
> documentation. How can that be?

I forgot to mention that the is false operator is described under
the section on comparison operators.


Re: Quota query with decent performance?

From
Troels Arvin
Date:
On Tue, 18 Nov 2003 08:36:27 -0600, Bruno Wolff III wrote:

> The standard syntax is expression is false instead of
> isfalse(expression)

OK, so I guess that a 'better' (closer to standard) version of your query
would be:

SELECT *
FROM pview
WHERE ( age <= (   SELECT age FROM pview   ORDER BY age ASC   LIMIT 1 OFFSET 20       -- 2=n-1 )
) IS NOT FALSE;

-- 
Greetings from Troels Arvin, Copenhagen, Denmark