Thread: Quota query with decent performance?
Hello, I'm researching how "quota queries" (a term used by Fabian Pascal) may be performed in various DBMSes with acceptable performance: http://troels.arvin.dk/db/rdbms/#select-limit-simple-note An example of a quota query could be to get the top-3 youngest people from a collection of people. The complicated part is that such a query might return more than 3 rows in some tie situations. In MSSQL and DB2 there are very efficient facilities for such queries, but I can't find any well-performing declarative methods for PostgreSQL. I have tried a couple of different strategies, and I currently get the best results from a correlated subquery like SELECT * FROM person AS px WHERE ( SELECT COUNT(*) FROM person AS py WHERE py.age < px.age ) < 3; When my base table has 4000 rows, my query takes 27 seconds in PostgreSQL 7.2.3 (PIII 1000MHz) which is clearly unacceptable, especially comparing to the same query in DB2 which only takes 1.4 seconds (on the same server) - or to this non-standard-SQL DB2-query which only takes 0.02 seconds to calculate the same result: SELECT * FROM ( SELECT firstname,age,RANK() OVER (ORDER BY age ASC) AS rank FROM person ) AS foo WHERE rank<=3; Test-files with table definitions and randomly generated rows: http://troels.arvin.dk/db/tests/quota.1/ Any suggestions on how to perform fast "quota queries" in PostgreSQL? -- Greetings from Troels Arvin, Copenhagen, Denmark
Troels, Thank you for contacting us before publishing your results. Please ignore any list-trolls who criticize your methodology; there are a few cranks on every list. The important thing is your contacted us. > In MSSQL and DB2 there are very efficient facilities for such queries, but > I can't find any well-performing declarative methods for PostgreSQL. I > have tried a couple of different strategies, and I currently get the best > results from a correlated subquery like > > SELECT * FROM person AS px > WHERE ( > SELECT COUNT(*) > FROM person AS py > WHERE py.age < px.age > ) < 3; > > When my base table has 4000 rows, my query takes 27 seconds in PostgreSQL > 7.2.3 (PIII 1000MHz) which is clearly unacceptable, Well, first off 7.2.3 is a two-year old version. There have been performance improvements specificially on these sorts of issues since then. Please use at least 7.3.4, and we would prefer that you use 7.4RC2, which is available now. 7.4, in particular, introduces significant improvements in "group by" queries. If for some reason you have to compare against this old version, please be fair to us and note somewhere that you were using an old version of PostgreSQL. Second, the query you post is one "SQL Standard" way, which is good for portability but not for speed. Frankly, I'm not convinced that it's even the best SQL standard way. On the other databases, you seem happy to use non-SQL-standard syntax, so let me give you one such solution in PostgreSQL: SELECT * FROM person WHERE person.age >= (SELECT p2.agefrom person p2order by p2.age DESC LIMIT 1 OFFSET 2) also try: SELECT * FROM person, (SELECT p2.agefrom person p2order by p2.age DESC LIMIT 1 OFFSET 2) as prank WHERE person.age >= prank.age This should give you all of the rows whose ages are in the top 3 ranks much faster. -- -Josh BerkusAglio Database SolutionsSan Francisco
maybe: select * from person where age <= (select age from person order by age limit 1 offset 2); 7.20 msec assuming it does what you want. On Tue, 11 Nov 2003, Troels Arvin wrote: > An example of a quota query could be to get the top-3 youngest people from > a collection of people. The complicated part is that such a query might > return more than 3 rows in some tie situations.
Josh Berkus <josh@agliodbs.com> writes: > Second, the query you post is one "SQL Standard" way, which is good for > portability but not for speed. Frankly, I'm not convinced that it's even the > best SQL standard way. On the other databases, you seem happy to use > non-SQL-standard syntax, so let me give you one such solution in PostgreSQL: > [snip] I don't know of any very good solution in bog-standard SQL either. Aside from the LIMIT-based solution that Josh offered, I recall that Oleg Bartunov and Teodor Sigaev had some ideas about top-N-aggregate solutions. We didn't accept those into the main distribution (yet) but if you dig in the PG list archives I think there is working code available. Try searching for "partial sorting". regards, tom lane
On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote: [... cut efficient top-3 youngest query wanted ...] > select * > from person > where age <= > (select age from person order by age limit 1 offset 2); Thanks - it works; why didn't I think of that? -- Greetings from Troels Arvin, Copenhagen, Denmark
On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote: > select * > from person > where age <= > (select age from person order by age limit 1 offset 2); Integrated into http://troels.arvin.dk/db/rdbms/#select-top-n -- Greetings from Troels Arvin, Copenhagen, Denmark
On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote: [... discussion of top-n query (where n=3) ...] > select * > from person > where age <= > (select age from person order by age limit 1 offset 2); It fails when the cardinality of person is less than 3 (returns empty set). My solution is this, which is not as beautiful any more: SELECT * FROM person WHERE age <= COALESCE ( ( SELECT age FROM person ORDER BY age ASC LIMIT 1 OFFSET 2 -- 2=n-1 ),( SELECT ageFROM person ORDER BY age DESC -- note: opposite of ASC LIMIT 1 ) ); -- Greetings from Troels Arvin, Copenhagen, Denmark
On Mon, Nov 17, 2003 at 23:55:53 +0100, Troels Arvin <troels@arvin.dk> wrote: > On Tue, 11 Nov 2003 18:49:31 -0500, Chester Kustarz wrote: > > [... discussion of top-n query (where n=3) ...] > > > select * > > from person > > where age <= > > (select age from person order by age limit 1 offset 2); > > It fails when the cardinality of person is less than 3 (returns empty > set). My solution is this, which is not as beautiful any more: Used that way the subselect value will be null if there are no matching rows. This should allow you to do something like: select * from person where not isfalse (age <= (select age from person order by age limit 1 offset 2));
On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote: > select * > from person > where not isfalse (age <= > (select age from person order by age limit 1 offset 2)); Thanks; much nicer than my COALESCE-variant. http://troels.arvin.dk/db/rdbms/#select-top-n-postgresql updated. -- Greetings from Troels Arvin, Copenhagen, Denmark
On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote: [...] > where not isfalse (age <= [...] Strange. I can't find the ISFALSE in neither PostgreSQL or standard SQL documentation. How can that be? -- Greetings from Troels Arvin, Copenhagen, Denmark
On Tue, Nov 18, 2003 at 15:21:43 +0100, Troels Arvin <troels@arvin.dk> wrote: > On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote: > > [...] > > where not isfalse (age <= > [...] > > Strange. I can't find the ISFALSE in neither PostgreSQL or standard SQL > documentation. How can that be? I didn't remember the exact syntax and found the isfalse function by guessing. The standard syntax is expression is false instead of isfalse(expression), though the function does work (in 7.4 at least). area=> select isfalse(null);isfalse ---------f (1 row)
On Tue, Nov 18, 2003 at 15:21:43 +0100, Troels Arvin <troels@arvin.dk> wrote: > On Tue, 18 Nov 2003 06:56:42 -0600, Bruno Wolff III wrote: > > [...] > > where not isfalse (age <= > [...] > > Strange. I can't find the ISFALSE in neither PostgreSQL or standard SQL > documentation. How can that be? I forgot to mention that the is false operator is described under the section on comparison operators.
On Tue, 18 Nov 2003 08:36:27 -0600, Bruno Wolff III wrote: > The standard syntax is expression is false instead of > isfalse(expression) OK, so I guess that a 'better' (closer to standard) version of your query would be: SELECT * FROM pview WHERE ( age <= ( SELECT age FROM pview ORDER BY age ASC LIMIT 1 OFFSET 20 -- 2=n-1 ) ) IS NOT FALSE; -- Greetings from Troels Arvin, Copenhagen, Denmark