Re: Operator performance question - Mailing list pgsql-general

From Brandon Aiken
Subject Re: Operator performance question
Date
Msg-id F8E84F0F56445B4CB39E019EF67DACBA4020FF@exchsrvr.winemantech.com
Whole thread Raw
In response to Operator performance question  (Alban Hertroys <alban@magproductions.nl>)
Responses Re: Operator performance question  (Alban Hertroys <alban@magproductions.nl>)
List pgsql-general
Shouldn't these be using HAVING?

SELECT COUNT(max_persons) ...
GROUP BY NULL
HAVING max_persons >= 5 AND max_persons <= 8;

--
Brandon Aiken
CS/IT Systems Engineer

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Alban Hertroys
Sent: Tuesday, January 09, 2007 11:07 AM
To: Postgres General
Subject: [GENERAL] Operator performance question

Hi all,

I need your help on a small performance problem.

I have a table of which I have to do a bunch of counts of various
conditions. The worst case scenario where I have to iterate over every
record in the table performs just a little bit too slow (800ms). That
particular query will be hit a lot (it will be on the index of our web
app).

PostgreSQL uses a sequential scan (it should IMO) - I think my
bottleneck is in the operators on the various columns.

My queries look like this:

SELECT COUNT(NULLIF(max_persons BETWEEN 5 AND 8, false)) AS "persons
5-8",
-- And other variations

COUNT(NULLIF(country_id = 74, false)) AS "LOCATION_NETHERLANDS",
-- Basically for every country in Europe

COUNT(NULLIF(specifications & '00000000000000000000000001000000',
0::bit(32))) AS "washing machine",
-- And a bunch more of these; the bit mask is almost fully covered

COUNT(*) AS all
FROM table;

The plan is:
                                                            QUERY PLAN

------------------------------------------------------------------------
-----------------------------------------------------------
 Aggregate  (cost=7371.23..7371.55 rows=1 width=18) (actual
time=803.374..803.376 rows=1 loops=1)
   ->  Seq Scan on fewo_property_location  (cost=0.00..828.84 rows=41538
width=18) (actual time=0.036..147.211 rows=41492 loops=1)
         Filter: ((location_id IS NOT NULL) AND (property_state_id = 3))
 Total runtime: 804.398 ms
(4 rows)

The table definition is like:
      Column       |   Type   |      Modifiers
-------------------+----------+----------------------
 property_id       | integer  | not null
 property_state_id | integer  | not null
 location_id       | integer  |
 min_persons       | smallint | not null
 max_persons       | smallint | not null
 specifications    | bit(32)  | default (0)::bit(32)
 country_id        | integer  |
Indexes:
    "fewo_property_location_pkey" PRIMARY KEY, btree (property_id)
    "fewo_property_location_country_idx" btree (country_id) WHERE
location_id IS NOT NULL
    "fewo_property_location_country_location_idx" btree (country_id,
location_id) CLUSTER
    "fewo_property_location_location_online_idx" btree (location_id)
WHERE location_id IS NOT NULL AND property_state_id = 3
    "fewo_property_location_property_location_idx" btree (property_id,
location_id) WHERE location_id IS NOT NULL AND property_state_id = 3
    "fewo_property_location_specifications_idx" btree (specifications)
Foreign-key constraints:
    "fewo_property_location_location_id_fkey" FOREIGN KEY (location_id)
REFERENCES fewo_location(location_id) MATCH FULL
    "fewo_property_location_property_state_id_fkey" FOREIGN KEY
(property_state_id) REFERENCES fewo_property_state(property_state_id)
MATCH FULL

My conclusion is that this query time is mostly limited to the somewhat
complex COUNT expressions. Is there any way to do this more efficiently?

For the record, if I constrain this query to specific countries it
performs in about 80ms (10x as fast).

The hardware is a dual Opteron64x2, 4G RAM and some kind of RAID setup
(software, don't know what type) running in a Xen host - it's our
development DB-server.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

pgsql-general by date:

Previous
From: "Jeanna Geier"
Date:
Subject: Re: SELECT INTO using Views?
Next
From: Tom Lane
Date:
Subject: Re: Operator performance question