Thread: Query optimization

Query optimization

From
"Fred Moyer"
Date:
Greetings!

I am trying to find a way to optimize this query and have hit a wall.  The
database size is 2.9 GB and contains 1 million records.  The system is a
dual xeon 1 ghz P3 with 4 GB ram, 2 of it shared memory.  Redhat linux
kernel 2.4.18-5 ext3fs.

I'm hoping I haven't hit the limit of the hardware or os but here's all
the relevant info.  Questions, comments, solutions would be greatly
appreciated.

11696 postgres  25   0 1084M 1.1G  562M R    99.9 28.6   2:36 postmaster

Postgresql.conf settings
shared_buffers = 250000
sort_mem = 1048576            # min 32
vacuum_mem = 128000          # min 1024
wal_files = 64 # range 0-64
enable_seqscan = false
enable_indexscan = true
enable_tidscan = true
enable_sort = true
enable_nestloop = true
enable_mergejoin = true
enable_hashjoin = true

[postgres@db1 base]$ cat /proc/sys/kernel/shmmax
2192000000

database=# explain analyze SELECT active,registrant,name FROM person WHERE
object.active = 1 AND object.registrant = 't' ORDER BY UPPER(object.name)
DESC LIMIT 10 OFFSET 0;
NOTICE:  QUERY PLAN:

Limit  (cost=nan..nan rows=10 width=2017) (actual
time=204790.82..204790.84 rows=10 loops=1)
  ->  Sort  (cost=nan..nan rows=1032953 width=2017) (actual
time=204790.81..204790.82 rows=11 loops=1)
        ->  Index Scan using registrant__object__idx on object
(cost=0.00..81733.63 rows=1032953 width=2017) (actual
time=0.14..94509.14 rows=1032946 loops=1)
Total runtime: 205125.75 msec

NOTICE:  QUERY PLAN:

Limit  (cost=nan..nan rows=10 width=2017) (actual
time=204790.82..204790.84 rows=10 loops=1)
  ->  Sort  (cost=nan..nan rows=1032953 width=2017) (actual
time=204790.81..204790.82 rows=11 loops=1)
        ->  Index Scan using registrant__object__idx on object
(cost=0.00..81733.63 rows=1032953 width=2017) (actual
time=0.14..94509.14 rows=1032946 loops=1)
Total runtime: 205125.75 msec



Re: Query optimization

From
Richard Huxton
Date:
On Saturday 07 Dec 2002 2:16 am, Fred Moyer wrote:
>
> database=# explain analyze SELECT active,registrant,name FROM person WHERE
> object.active = 1 AND object.registrant = 't' ORDER BY UPPER(object.name)
> DESC LIMIT 10 OFFSET 0;
> NOTICE:  QUERY PLAN:

What's the connection between "person" and "object"? Looks like an
unconstrained join from here. Schema and count(*) for both and details of
indexes would be useful.

> Limit  (cost=nan..nan rows=10 width=2017) (actual
               ^^^^^^^^
Never seen this "nan" before - presumably Not A Number, but I don't know why
the planner generates it

> time=204790.82..204790.84 rows=10 loops=1)
>   ->  Sort  (cost=nan..nan rows=1032953 width=2017) (actual
> time=204790.81..204790.82 rows=11 loops=1)
>         ->  Index Scan using registrant__object__idx on object
> (cost=0.00..81733.63 rows=1032953 width=2017) (actual
> time=0.14..94509.14 rows=1032946 loops=1)
> Total runtime: 205125.75 msec

Without seeing schema details difficult to suggest much. If it's this
particular query that's the problem you might try a partial index

CREATE INDEX foo_object_idx ON object (upper(object.name)) WHERE active=1 AND
registrant='t';

See CREATE INDEX in the manuals for details.

--
  Richard Huxton

Re: Query optimization

From
"Fred Moyer"
Date:
Ikes, they are the same, a cut and paste error.  Sorry about that.  No
joins involved, one table with 1 million records, about 255 rows, only
about 10% of the rows contain data in this particular instance.

object is indexed on active, registrant, and name as well as UPPER(name).
Postgres version is 7.2.3

Here is the relevant table info (some schema details omitted for brevity)

id              | numeric(10,0)            | not null default
nextval('seq_object'
::text)
name           | character varying(64)    |
registrant      | boolean                  |
active          | numeric(1,0)             | not null default 1

registrant__object__idx
active__object__idx,
name__object__idx,
upper_name__object__idx,
id__object__idx,
Primary key: pk_object__id

db=# select count(*) from count;
  count
---------
 1032953
(1 row)

db=# explain analyze select count(*) from object;
NOTICE:  QUERY PLAN:

Aggregate  (cost=100073270.91..100073270.91 rows=1 width=0) (actual
time=3085.51..3085.51 rows=1 loops=1)
  ->  Seq Scan on object  (cost=100000000.00..100070688.53 rows=1032953
width=0) (actual time=0.01..2008.51 rows=1032953 loops=1)
Total runtime: 3085.62 msec

EXPLAIN

> On Saturday 07 Dec 2002 2:16 am, Fred Moyer wrote:
>>
>> database=# explain analyze SELECT active,registrant,name FROM object
>> WHERE object.active = 1 AND object.registrant = 't' ORDER BY
>> UPPER(object.name) DESC LIMIT 10 OFFSET 0;
>> NOTICE:  QUERY PLAN:
>
> What's the connection between "person" and "object"? Looks like an
> unconstrained join from here. Schema and count(*) for both and details
> of  indexes would be useful.
>
>> Limit  (cost=nan..nan rows=10 width=2017) (actual
>                ^^^^^^^^
> Never seen this "nan" before - presumably Not A Number, but I don't know
> why  the planner generates it
>
>> time=204790.82..204790.84 rows=10 loops=1)
>>   ->  Sort  (cost=nan..nan rows=1032953 width=2017) (actual
>> time=204790.81..204790.82 rows=11 loops=1)
>>         ->  Index Scan using registrant__object__idx on object
>> (cost=0.00..81733.63 rows=1032953 width=2017) (actual
>> time=0.14..94509.14 rows=1032946 loops=1)
>> Total runtime: 205125.75 msec
>
> Without seeing schema details difficult to suggest much. If it's this
> particular query that's the problem you might try a partial index
>
> CREATE INDEX foo_object_idx ON object (upper(object.name)) WHERE
> active=1 AND  registrant='t';
>
> See CREATE INDEX in the manuals for details.
>
> --
>   Richard Huxton
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org


Fred Moyer
Digital Campaigns, Inc.



Re: Query optimization

From
Jochem van Dieten
Date:
Fred Moyer wrote:
>
> I am trying to find a way to optimize this query and have hit a wall.  The
> database size is 2.9 GB and contains 1 million records.

> Postgresql.conf settings
> shared_buffers = 250000

This looks awfull high to me. 25000 might be better to give more room to
the OS disk-caching. Bit of a waste if PostgreSQL and the OS start
caching exactly the same blocks.
Trying is the only way to find a good setting.


> sort_mem = 1048576            # min 32
> vacuum_mem = 128000          # min 1024
> wal_files = 64 # range 0-64
> enable_seqscan = false

Why disable seqscan? For any query that is not particularly selective
this will mean a performance hit.


> enable_indexscan = true
> enable_tidscan = true
> enable_sort = true
> enable_nestloop = true
> enable_mergejoin = true
> enable_hashjoin = true

> database=# explain analyze SELECT active,registrant,name FROM person WHERE
> object.active = 1 AND object.registrant = 't' ORDER BY UPPER(object.name)
> DESC LIMIT 10 OFFSET 0;
> NOTICE:  QUERY PLAN:
>
> Limit  (cost=nan..nan rows=10 width=2017) (actual
> time=204790.82..204790.84 rows=10 loops=1)
>   ->  Sort  (cost=nan..nan rows=1032953 width=2017) (actual
> time=204790.81..204790.82 rows=11 loops=1)
>         ->  Index Scan using registrant__object__idx on object
> (cost=0.00..81733.63 rows=1032953 width=2017) (actual
> time=0.14..94509.14 rows=1032946 loops=1)
> Total runtime: 205125.75 msec

I think this is an example of a not particularly selective query. If I
read it correctly, pretty much every row satisfies the predicates
object.active = 1 AND object.registrant = 't' (how much do not satisfy
these predicates?).

Jochem


Re: Query optimization

From
Richard Huxton
Date:
On Saturday 07 Dec 2002 8:10 pm, Fred Moyer wrote:
> Ikes, they are the same, a cut and paste error.  Sorry about that.  No
> joins involved, one table with 1 million records, about 255 rows, only
> about 10% of the rows contain data in this particular instance.
>
> object is indexed on active, registrant, and name as well as UPPER(name).
> Postgres version is 7.2.3

I think Jochem's got it with "enable_seqscan" - you've disabled scans so the
planner is checking one million index entries - bad idea. Try Jochem's
suggestion of re-enabling seqscan and see if that helps things along.

> db=# select count(*) from count;
>   count
> ---------
>  1032953

> >> time=204790.82..204790.84 rows=10 loops=1)
> >>   ->  Sort  (cost=nan..nan rows=1032953 width=2017) (actual
> >> time=204790.81..204790.82 rows=11 loops=1)
> >>         ->  Index Scan using registrant__object__idx on object
> >> (cost=0.00..81733.63 rows=1032953 width=2017) (actual
> >> time=0.14..94509.14 rows=1032946 loops=1)
> >> Total runtime: 205125.75 msec

--
  Richard Huxton