Re: inconsistent/weird index usage - Mailing list pgsql-performance

From John Meinel
Subject Re: inconsistent/weird index usage
Date
Msg-id 415D614D.7010001@johnmeinel.com
Whole thread Raw
In response to inconsistent/weird index usage  (Dustin Sallings <dustin@spy.net>)
List pgsql-performance
Dustin Sallings wrote:
>

[...]

>     OK, now that that's over with, I have this one particular query that
> I attempt to run for a report from my phone that no longer works because
> it tries to do a table scan on *some* of the tables.  Why it chooses
> this table scan, I can't imagine.  The query is as follows:
>
> select
>         s.serial as serial_num,
>         s.name as name,
>         date(ts) as day,
>         min(sample) as min_temp,
>         avg(sample) as avg_temp,
>         stddev(sample) as stddev_temp,
>         max(sample) as max_temp
>     from
>         samples inner join sensors s using (sensor_id)
>     where
>         ts > current_date - 7
>     group by
>         serial_num, name, day
>     order by
>         serial_num, day desc
>
>

[ next section heavily clipped for clarity ]

->  Seq Scan on samples_1999  (cost rows=103677) (actual rows=0 loops=1)

->  Index Scan using samples_2000_bytime on samples_2000  (cost
rows=714082 (actual rows=0 loops=1)


->  Seq Scan on samples_2001  (cost rows=902191) (actual rows=0 loops=1)

->  Seq Scan on samples_2002  (cost rows=1037201) (actual rows=0 loops=1)

->  Index Scan using samples_2003_bytime on samples_2003  (cost
rows=1049772) (actual rows=0 loops=1)

->  Index Scan using samples_2004_bytime on samples_2004  (cost
rows=791991) (actual rows=66389 loops=1)

[...]
>
>
>     Essentially, what you can see here is that it's doing an index scan
> on samples_2000, samples_2003, and samples_2004, but a sequential scan
> on samples_1999, samples_2001, and samples_2002.  It's very strange to
> me that it would make these choices.  If I disable sequential scans
> altogether for this session, the query runs in under 4 seconds.
>
>     This is a very cool solution for long-term storage, and isn't
> terribly hard to manage.  I actually have other report queries that seem
> to be making pretty good index selection currently...but I always want
> more!  :)  Does anyone have any suggestions as to how to get this to do
> what I want?
>
>     Of course, ideally, it would ignore five of the tables altogether.  :)
>
> --
> SPY                      My girlfriend asked me which one I like better.
> pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
> |    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
> L_______________________ I hope the answer won't upset her. ____________
>
>

Just as a heads up. You have run vacuum analyze before running this
query, correct?

Because you'll notice that the query planner is thinking that it will
have 103677 rows from 1999, 700,000 rows from 2000, 900,000 rows from
2001, etc, etc. Obviously the query planner is not planning well
considering it there are only 60,000 rows from 2004, and no rows from
anything else.

It just seems like it hasn't updated it's statistics to be aware of when
the time is on most of the tables.

(By the way, an indexed scan returning 0 entries is *really* fast, so I
wouldn't worry about ignoring the extra tables. :)

I suppose the other question is whether this is a prepared or stored
query. Because sometimes the query planner cannot do enough optimization
in a stored query. (I ran into this problem where I had 1 column with
500,000+ entries referencing 1 number. If I ran manually, the time was
much better because I wasn't using *that* number. With a stored query,
it had to take into account that I *might* use that number, and didn't
want to do 500,000+ indexed lookups)

The only other thing I can think of is that there might be some
collision between datetime and date. Like it is thinking it is looking
at the time of day when it plans the queries (hence why so many rows),
but really it is looking at the date. Perhaps a cast is in order to make
it work right. I don't really know.

Interesting problem, though.
John
=:->


Attachment

pgsql-performance by date:

Previous
From: Dustin Sallings
Date:
Subject: inconsistent/weird index usage
Next
From: Tom Lane
Date:
Subject: Re: inconsistent/weird index usage