Thread: Query optimization
Greetings! I am trying to find a way to optimize this query and have hit a wall. The database size is 2.9 GB and contains 1 million records. The system is a dual xeon 1 ghz P3 with 4 GB ram, 2 of it shared memory. Redhat linux kernel 2.4.18-5 ext3fs. I'm hoping I haven't hit the limit of the hardware or os but here's all the relevant info. Questions, comments, solutions would be greatly appreciated. 11696 postgres 25 0 1084M 1.1G 562M R 99.9 28.6 2:36 postmaster Postgresql.conf settings shared_buffers = 250000 sort_mem = 1048576 # min 32 vacuum_mem = 128000 # min 1024 wal_files = 64 # range 0-64 enable_seqscan = false enable_indexscan = true enable_tidscan = true enable_sort = true enable_nestloop = true enable_mergejoin = true enable_hashjoin = true [postgres@db1 base]$ cat /proc/sys/kernel/shmmax 2192000000 database=# explain analyze SELECT active,registrant,name FROM person WHERE object.active = 1 AND object.registrant = 't' ORDER BY UPPER(object.name) DESC LIMIT 10 OFFSET 0; NOTICE: QUERY PLAN: Limit (cost=nan..nan rows=10 width=2017) (actual time=204790.82..204790.84 rows=10 loops=1) -> Sort (cost=nan..nan rows=1032953 width=2017) (actual time=204790.81..204790.82 rows=11 loops=1) -> Index Scan using registrant__object__idx on object (cost=0.00..81733.63 rows=1032953 width=2017) (actual time=0.14..94509.14 rows=1032946 loops=1) Total runtime: 205125.75 msec NOTICE: QUERY PLAN: Limit (cost=nan..nan rows=10 width=2017) (actual time=204790.82..204790.84 rows=10 loops=1) -> Sort (cost=nan..nan rows=1032953 width=2017) (actual time=204790.81..204790.82 rows=11 loops=1) -> Index Scan using registrant__object__idx on object (cost=0.00..81733.63 rows=1032953 width=2017) (actual time=0.14..94509.14 rows=1032946 loops=1) Total runtime: 205125.75 msec
On Saturday 07 Dec 2002 2:16 am, Fred Moyer wrote: > > database=# explain analyze SELECT active,registrant,name FROM person WHERE > object.active = 1 AND object.registrant = 't' ORDER BY UPPER(object.name) > DESC LIMIT 10 OFFSET 0; > NOTICE: QUERY PLAN: What's the connection between "person" and "object"? Looks like an unconstrained join from here. Schema and count(*) for both and details of indexes would be useful. > Limit (cost=nan..nan rows=10 width=2017) (actual ^^^^^^^^ Never seen this "nan" before - presumably Not A Number, but I don't know why the planner generates it > time=204790.82..204790.84 rows=10 loops=1) > -> Sort (cost=nan..nan rows=1032953 width=2017) (actual > time=204790.81..204790.82 rows=11 loops=1) > -> Index Scan using registrant__object__idx on object > (cost=0.00..81733.63 rows=1032953 width=2017) (actual > time=0.14..94509.14 rows=1032946 loops=1) > Total runtime: 205125.75 msec Without seeing schema details difficult to suggest much. If it's this particular query that's the problem you might try a partial index CREATE INDEX foo_object_idx ON object (upper(object.name)) WHERE active=1 AND registrant='t'; See CREATE INDEX in the manuals for details. -- Richard Huxton
Ikes, they are the same, a cut and paste error. Sorry about that. No joins involved, one table with 1 million records, about 255 rows, only about 10% of the rows contain data in this particular instance. object is indexed on active, registrant, and name as well as UPPER(name). Postgres version is 7.2.3 Here is the relevant table info (some schema details omitted for brevity) id | numeric(10,0) | not null default nextval('seq_object' ::text) name | character varying(64) | registrant | boolean | active | numeric(1,0) | not null default 1 registrant__object__idx active__object__idx, name__object__idx, upper_name__object__idx, id__object__idx, Primary key: pk_object__id db=# select count(*) from count; count --------- 1032953 (1 row) db=# explain analyze select count(*) from object; NOTICE: QUERY PLAN: Aggregate (cost=100073270.91..100073270.91 rows=1 width=0) (actual time=3085.51..3085.51 rows=1 loops=1) -> Seq Scan on object (cost=100000000.00..100070688.53 rows=1032953 width=0) (actual time=0.01..2008.51 rows=1032953 loops=1) Total runtime: 3085.62 msec EXPLAIN > On Saturday 07 Dec 2002 2:16 am, Fred Moyer wrote: >> >> database=# explain analyze SELECT active,registrant,name FROM object >> WHERE object.active = 1 AND object.registrant = 't' ORDER BY >> UPPER(object.name) DESC LIMIT 10 OFFSET 0; >> NOTICE: QUERY PLAN: > > What's the connection between "person" and "object"? Looks like an > unconstrained join from here. Schema and count(*) for both and details > of indexes would be useful. > >> Limit (cost=nan..nan rows=10 width=2017) (actual > ^^^^^^^^ > Never seen this "nan" before - presumably Not A Number, but I don't know > why the planner generates it > >> time=204790.82..204790.84 rows=10 loops=1) >> -> Sort (cost=nan..nan rows=1032953 width=2017) (actual >> time=204790.81..204790.82 rows=11 loops=1) >> -> Index Scan using registrant__object__idx on object >> (cost=0.00..81733.63 rows=1032953 width=2017) (actual >> time=0.14..94509.14 rows=1032946 loops=1) >> Total runtime: 205125.75 msec > > Without seeing schema details difficult to suggest much. If it's this > particular query that's the problem you might try a partial index > > CREATE INDEX foo_object_idx ON object (upper(object.name)) WHERE > active=1 AND registrant='t'; > > See CREATE INDEX in the manuals for details. > > -- > Richard Huxton > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org Fred Moyer Digital Campaigns, Inc.
Fred Moyer wrote: > > I am trying to find a way to optimize this query and have hit a wall. The > database size is 2.9 GB and contains 1 million records. > Postgresql.conf settings > shared_buffers = 250000 This looks awfull high to me. 25000 might be better to give more room to the OS disk-caching. Bit of a waste if PostgreSQL and the OS start caching exactly the same blocks. Trying is the only way to find a good setting. > sort_mem = 1048576 # min 32 > vacuum_mem = 128000 # min 1024 > wal_files = 64 # range 0-64 > enable_seqscan = false Why disable seqscan? For any query that is not particularly selective this will mean a performance hit. > enable_indexscan = true > enable_tidscan = true > enable_sort = true > enable_nestloop = true > enable_mergejoin = true > enable_hashjoin = true > database=# explain analyze SELECT active,registrant,name FROM person WHERE > object.active = 1 AND object.registrant = 't' ORDER BY UPPER(object.name) > DESC LIMIT 10 OFFSET 0; > NOTICE: QUERY PLAN: > > Limit (cost=nan..nan rows=10 width=2017) (actual > time=204790.82..204790.84 rows=10 loops=1) > -> Sort (cost=nan..nan rows=1032953 width=2017) (actual > time=204790.81..204790.82 rows=11 loops=1) > -> Index Scan using registrant__object__idx on object > (cost=0.00..81733.63 rows=1032953 width=2017) (actual > time=0.14..94509.14 rows=1032946 loops=1) > Total runtime: 205125.75 msec I think this is an example of a not particularly selective query. If I read it correctly, pretty much every row satisfies the predicates object.active = 1 AND object.registrant = 't' (how much do not satisfy these predicates?). Jochem
On Saturday 07 Dec 2002 8:10 pm, Fred Moyer wrote: > Ikes, they are the same, a cut and paste error. Sorry about that. No > joins involved, one table with 1 million records, about 255 rows, only > about 10% of the rows contain data in this particular instance. > > object is indexed on active, registrant, and name as well as UPPER(name). > Postgres version is 7.2.3 I think Jochem's got it with "enable_seqscan" - you've disabled scans so the planner is checking one million index entries - bad idea. Try Jochem's suggestion of re-enabling seqscan and see if that helps things along. > db=# select count(*) from count; > count > --------- > 1032953 > >> time=204790.82..204790.84 rows=10 loops=1) > >> -> Sort (cost=nan..nan rows=1032953 width=2017) (actual > >> time=204790.81..204790.82 rows=11 loops=1) > >> -> Index Scan using registrant__object__idx on object > >> (cost=0.00..81733.63 rows=1032953 width=2017) (actual > >> time=0.14..94509.14 rows=1032946 loops=1) > >> Total runtime: 205125.75 msec -- Richard Huxton