Re: selects with large offset really slow - Mailing list pgsql-general

From greg@turnstep.com
Subject Re: selects with large offset really slow
Date
Msg-id 447588d7993c074f6afce4a01454461d@biglumber.com
Whole thread Raw
In response to selects with large offset really slow  (John Smith <john_smith_45678@yahoo.com>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> There are 90K-100K records in each of two tables. This simple join
> is really slow and the larger the offset, the longer it takes.
> Anything I can do to speed it up (a lot)? I've double-checked and
> there are indexes on everything used for joins and ordering.

Indexes won't really help here, and it's not the offset that is
killing you, it's the sort. There are some possibilities however,
depending on the nature of the tables. If they are fairly static,
you can speed it up drastically by removing the WHERE clause
and the ORDER BY clause.

Since all we care about is if referrer_id is "1", we can do this:

CREATE INDEX stats_id ON stats(link_id);
ALTER TABLE links ADD ref BOOL;
BEGIN;
UPDATE links SET ref='true' WHERE EXISTS
  (SELECT 1 FROM stats WHERE link_id=links.id AND referrer_id=1);
COMMIT;

If you don't care about the "non 1" referrer_ids, you can do this instead:

CREATE INDEX stats_id ON stats(link_id);
BEGIN;
DELETE FROM links WHERE NOT EXISTS
  (SELECT 1 FROM stats WHERE link_id=links.id AND referrer_id=1);
COMMIT;


Now we can do something like this:

CREATE INDEX links_url on links(url);
SELECT id, url FROM links ORDER BY url LIMIT 100 OFFSET 80000;

This is pretty fast, as it uses the links_url index.

We can tweak a little more speed out of it by making the
ORDER BY permanent with a CLUSTER command:

CLUSTER links_url ON links;

Now as long as you don't change that table, you can do this:

SELECT id, url FROM links LIMIT 100 OFFSET 90000;

Very fast, and probably worth all the overhead if you are
making multiple queries. Moving referrer_id into the links
table would be a good thing as well, as it would allow you
to use the pre-cluster SELECT above with the links_url
index and still get a good response.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302071246

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+RAAlvJuQZxSWSsgRAqM+AKDX5d7vCxFDRDybNgXinuq9coF/SgCg4OFy
qRQKb6w693Yyt1dZfCFKCpQ=
=q0og
-----END PGP SIGNATURE-----



pgsql-general by date:

Previous
From: Dennis Gearon
Date:
Subject: user account uniqueness
Next
From: Mikhail Terekhov
Date:
Subject: Re: How to make a dictoinary list of a query result