sequence query performance issues - Mailing list pgsql-performance
From | Peter Koczan |
---|---|
Subject | sequence query performance issues |
Date | |
Msg-id | 4544e0330709271504g364fd042u161966fa448934f6@mail.gmail.com Whole thread Raw |
Responses |
Re: sequence query performance issues
|
List | pgsql-performance |
Hello, I have a weird performance issue with a query I'm testing. Basically, I'm trying to port a function that generates user uids, and since postgres offers a sequence generator function, I figure I'd take advantage of that. Basically, I generate our uid range, filter out those which are in use, and randomly pick however many I need. However, when I run it it takes forever (>10 minutes and I get nothing so I cancelled the query) and cpu usage on the server is maxed out. Here's my query (I'll post the explain output later so as not to obscure my question): => select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select uid from people) order by random() limit 1; I thought that nulls were a problem, so I tried: => select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid,0) from people) order by random() limit 1; And that finished in less than a second. I then tried: => select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid,0) from people where uid is not null) order by random() limit 1; And we're back to taking forever. So I have 2 questions: - Is there a better query for this purpose? Mine works when coalesced, but it seems a little brute-force and the random() sorting, while kinda nice, is slow. - Is this in any way expected? I know that nulls sometimes cause problems, but why is it taking forever even when trying to filter those out? Thanks. Peter The gory details: - There is an btree index on people(uid), and there are ~6300 rows, of which ~1300 have null uids. - EXPLAIN output (I couldn't get EXPLAIN ANALYZE output from the first two queries since they took too long): => explain select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select uid from people) order by random() limit 1; QUERY PLAN ------------------------------------------------------------------------------------------ Limit (cost=40025.57..40025.60 rows=10 width=4) -> Sort (cost=40025.57..40026.82 rows=500 width=4) Sort Key: random() -> Function Scan on generate_series a (cost=693.16..40003.16 rows=500 width=4) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=693.16..756.03 rows=6287 width=2) -> Seq Scan on people (cost=0.00..686.87 rows=6287 width=2) (8 rows) => explain select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select uid from people where uid is not null) order by random() limit 1; QUERY PLAN ------------------------------------------------------------------------------------------ Limit (cost=31486.71..31486.73 rows=10 width=4) -> Sort (cost=31486.71..31487.96 rows=500 width=4) Sort Key: random() -> Function Scan on generate_series a (cost=691.79..31464.29 rows=500 width=4) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=691.79..741.00 rows=4921 width=2) -> Seq Scan on people (cost=0.00..686.87 rows=4921 width=2) Filter: (uid IS NOT NULL) (9 rows) => explain select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid, 0) from people) order by random() limit 1; QUERY PLAN ---------------------------------------------------------------------------------------- Limit (cost=756.97..756.99 rows=10 width=4) -> Sort (cost=756.97..758.22 rows=500 width=4) Sort Key: random() -> Function Scan on generate_series a (cost=718.30..734.55 rows=500 width=4) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on people (cost=0.00..702.59 rows=6287 width=2) (7 rows) => explain analyze select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid, 0) from people) order by random() limit 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=756.97..756.99 rows=10 width=4) (actual time=370.444..370.554 rows=10 loops=1) -> Sort (cost=756.97..758.22 rows=500 width=4) (actual time=370.434..370.472 rows=10 loops=1) Sort Key: random() -> Function Scan on generate_series a (cost=718.30..734.55 rows=500 width=4) (actual time=70.018..199.540 rows=26808 loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on people (cost=0.00..702.59 rows=6287 width=2) (actual time=0.023..29.167 rows=6294 loops=1) Total runtime: 372.224 ms (8 rows)
pgsql-performance by date: