Re: select off of a view going slowly - Mailing list pgsql-general

From Tom Lane
Subject Re: select off of a view going slowly
Date
Msg-id 26728.989289295@sss.pgh.pa.us
Whole thread Raw
In response to Re: select off of a view going slowly  (Fran Fabrizio <ffabrizio@exchange.webmd.net>)
List pgsql-general
Fran Fabrizio <ffabrizio@exchange.webmd.net> writes:
> Does a view run its query every time I select from the view?

Yes.  That's sort of the point.

> View definition: SELECT count(*) AS count, log.site_id, log.host_id FROM log
> WHERE (((log.status = 'CRIT'::"varchar") OR (log.status = 'EMERG'::"varchar"))
> AND (log.tstamp > (now() - '1 day'::"interval"))) GROUP BY log.site_id,
> log.host_id;

> monitoring=# \d log
>                     Table "log"
>   Attribute   |           Type           | Modifier
> --------------+--------------------------+----------
>  site_id      | bigint                   |
>  host_id      | bigint                   |
>  fqdn         | character varying        | not null
>  site         | character varying        | not null
>  region       | character varying        | not null
>  hostname     | character varying        | not null
>  product      | character varying        | not null
>  class        | character varying        | not null
>  subclass     | character varying        | not null
>  status       | character varying        | not null
>  msg          | character varying        | not null
>  remote_stamp | timestamp with time zone | not null
>  tstamp       | timestamp with time zone | not null
> Indices: log_hostid_index,
>          log_siteid_hostid_index,
>          log_siteid_index,
>          log_status_index,
>          log_tstamp_index

> monitoring=# explain select * from error_log where site_id=34 and host_id=88;

> NOTICE:  QUERY PLAN:

> Subquery Scan error_log  (cost=33145.20..33145.21 rows=1 width=16)
>   ->  Aggregate  (cost=33145.20..33145.21 rows=1 width=16)
>         ->  Group  (cost=33145.20..33145.21 rows=1 width=16)
>               ->  Sort  (cost=33145.20..33145.20 rows=1 width=16)
>                     ->  Seq Scan on log  (cost=0.00..33145.19 rows=1
> width=16)

You need to get rid of that bottom seqscan.  Unfortunately, you're
running into a couple of deficiencies that prevent Postgres from
choosing either of the reasonable indexes (siteid_hostid or tstamp).
The one that's easiest to get around is siteid/hostid: try

explain select * from error_log where site_id=34::int8 and host_id=88::int8;

As you've written it, the WHERE clause contains int8 vs. int4
comparisons, which the planner is not smart about turning into
operations that have something to do with an int8 index.  So
you need to make the comparisons look like int8 vs. int8.
(Alternatively: do the siteid and hostid really need to be bigint?
Could you get by with int4, at least for a release or two more?)

The other possibility is to use the tstamp index with the
log.tstamp > (now() - '1 day'::"interval") clause.  But this is not
considered indexable because "now() - '1 day'::interval" is not known
to reduce to a constant.  It's possible to hack up your own function
to allow this to be treated as a constant, see previous discussions eg
http://www.geocrawler.com/archives/3/10/2000/9/0/4324734/

We have plans to address both of these issues in future releases...

            regards, tom lane

pgsql-general by date:

Previous
From: Vince Vielhaber
Date:
Subject: Re: New mirrors on web page
Next
From: Tom Lane
Date:
Subject: Re: poor performance on 7.1 vs 7.0