Thread: query plan wierdness?

query plan wierdness?

From

Joel McGraw

Date:

07 July 2004, 18:27:40

Can someone explain what I'm missing here?   This query does what I
expect--it uses the "foo" index on the openeddatetime, callstatus,
calltype, callkey fields:

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by openeddatetime desc, callstatus desc, calltype
desc, callkey desc limit 26;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------------------
 Limit  (cost=0.00..103.76 rows=26 width=297) (actual time=0.07..0.58
rows=26 loops=1)
   ->  Index Scan Backward using foo on call  (cost=0.00..1882805.77
rows=471781 width=297) (actual time=0.06..0.54 rows=27 loops=1)
         Index Cond: ((openeddatetime >= '2000-01-01
00:00:00-07'::timestamp with time zone) AND (openeddatetime <=
'2004-06-24 23:59:59.999-07'::timestamp with time zone))
         Filter: (aspid = '123C'::bpchar)
 Total runtime: 0.66 msec
(5 rows)


However, this query performs a sequence scan on the table, ignoring the
call_idx13 index (the only difference is the addition of the aspid field
in the order by clause):

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
calltype desc, callkey desc limit 26;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------
 Limit  (cost=349379.41..349379.48 rows=26 width=297) (actual
time=32943.52..32943.61 rows=26 loops=1)
   ->  Sort  (cost=349379.41..350558.87 rows=471781 width=297) (actual
time=32943.52..32943.56 rows=27 loops=1)
         Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
         ->  Seq Scan on call  (cost=0.00..31019.36 rows=471781
width=297) (actual time=1.81..7318.13 rows=461973 loops=1)
               Filter: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Total runtime: 39353.86 msec
(6 rows)


Here's the structure of the table in question:


                   Table "public.call"
      Column      |           Type           | Modifiers
------------------+--------------------------+-----------
 aspid            | character(4)             |
 lastmodifiedtime | timestamp with time zone |
 moduser          | character(13)            |
 callkey          | character(13)            |
 calltype         | text                     |
 callqueueid      | text                     |
 openeddatetime   | timestamp with time zone |
 assigneddatetime | timestamp with time zone |
 closeddatetime   | timestamp with time zone |
 reopeneddatetime | timestamp with time zone |
 openedby         | text                     |
 callstatus       | character(1)             |
 callpriority     | text                     |
 callreasontext   | text                     |
 keyword1         | text                     |
 keyword2         | text                     |
 callername       | text                     |
 custfirstname    | text                     |
 custlastname     | text                     |
 custssntin       | character(9)             |
custssnseq       | text                     |
 custdbccode      | character(9)             |
 custlongname     | text                     |
 custtypecode     | character(2)             |
 custphone        | text                     |
 custid           | character(9)             |
 assigneduserid   | character varying(30)    |
 historyitemcount | integer                  |
 callertype       | text                     |
 callerphoneext   | text                     |
 followupdate     | text                     |
 hpjobnumber      | character(11)            |
Indexes: call_idx1 unique btree (aspid, callkey),
         call_aspid btree (aspid),
         call_aspid_opendedatetime btree (aspid, openeddatetime),
         call_idx10 btree (aspid, keyword1, openeddatetime, callstatus,
calltype
, custtypecode, custid, callkey),
         call_idx11 btree (aspid, keyword2, openeddatetime, callstatus,
calltype
, custtypecode, custid, callkey),
         call_idx12 btree (aspid, custtypecode, custid, openeddatetime,
callstat
us, calltype, callkey),
         call_idx13 btree (aspid, openeddatetime, callstatus, calltype,
callkey),
         call_idx14 btree (aspid, callqueueid, callstatus, callkey),
         call_idx2 btree (aspid, callqueueid, openeddatetime,
custtypecode, call
status, callkey),
         call_idx3 btree (aspid, assigneduserid, openeddatetime,
custtypecode, c
allstatus, callkey),
         call_idx4 btree (aspid, custid, custtypecode, callkey,
callstatus),
         call_idx7 btree (aspid, calltype, custtypecode, custid,
callstatus, cal
lkey),
         call_idx9 btree (aspid, assigneduserid, callstatus,
followupdate),
         foo btree (openeddatetime, callstatus, calltype, callkey)




TIA,

-Joel

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain
informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the
intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use,
copy,disclose or distribute to anyone the message or any information contained in the message. If you have received
thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.

Re: query plan wierdness?

From

Joel McGraw

Date:

07 July 2004, 19:12:53

Well, you're kind of right.  I removed the limit, and now _both_
versions of the query perform a sequence scan!

Oh, I forgot to include in my original post: this is PostgreSQL 7.3.4
(on x86 Linux and sparc Solaris 6)

-Joel

-----Original Message-----
From: Guido Barosio [mailto:gbarosio@sinectis.com.ar]
Sent: Wednesday, July 07, 2004 2:46 PM
To: Joel McGraw
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] query plan wierdness?

The limit is tricking you.
I guess a sequential scan is cheaper than an index scan with the limit
26 found there.

I am wrong?

Greets

--
-------------------------------------------
Guido Barosio
Buenos Aires, Argentina
-------------------------------------------

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain
informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the
intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use,
copy,disclose or distribute to anyone the message or any information contained in the message. If you have received
thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.

Re: query plan wierdness?

From

Stephan Szabo

Date:

07 July 2004, 19:50:45

On Wed, 7 Jul 2004, Joel McGraw wrote:

> However, this query performs a sequence scan on the table, ignoring the
> call_idx13 index (the only difference is the addition of the aspid field
> in the order by clause):
>
> elon2=# explain analyse select * from call where aspid='123C' and
> OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
> 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
> calltype desc, callkey desc limit 26;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> ------------------------------------------------------------
>  Limit  (cost=349379.41..349379.48 rows=26 width=297) (actual
> time=32943.52..32943.61 rows=26 loops=1)
>    ->  Sort  (cost=349379.41..350558.87 rows=471781 width=297) (actual
> time=32943.52..32943.56 rows=27 loops=1)
>          Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
>          ->  Seq Scan on call  (cost=0.00..31019.36 rows=471781
> width=297) (actual time=1.81..7318.13 rows=461973 loops=1)
>                Filter: ((aspid = '123C'::bpchar) AND (openeddatetime >=
> '2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
> <= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
>  Total runtime: 39353.86 msec
> (6 rows)


Hmm, what does it say after a set enable_seqscan=off?

Also, what does it say if you use aspid desc rather than just aspid in the
order by?

Re: query plan wierdness?

From

Guido Barosio

Date:

08 July 2004, 13:16:05

The limit is tricking you.
I guess a sequential scan is cheaper than an index scan with the limit 26 found there.

I am wrong?

Greets

--
-------------------------------------------
Guido Barosio
Buenos Aires, Argentina
-------------------------------------------

Re: query plan wierdness?

From

Rod Taylor

Date:

08 July 2004, 13:51:26

> However, this query performs a sequence scan on the table, ignoring the
> call_idx13 index (the only difference is the addition of the aspid field
> in the order by clause):

You do not have an index which matches the ORDER BY, so PostgreSQL
cannot simply scan the index for the data you want. Thus is needs to
find all matching rows, order them, etc.

> 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
> calltype desc, callkey desc limit 26;

aspid ASC, openeddatetime DESC, callstatus DESC, calltype DESC

>          call_idx13 btree (aspid, openeddatetime, callstatus, calltype,
> callkey),

This index is: aspid ASC, openeddatetime ASC, callstatus ASC, calltype
ASC, callkey ASC

A reverse scan, would of course be DESC, DESC, DESC, DESC, DESC --
neither of which matches your requested order by, thus cannot help the
reduce the lines looked at to 26.

This leaves your WHERE clause to restrict the dataset and it doesn't do
a very good job of it. There are more than 450000 rows matching the
where clause, which means the sequential scan was probably the right
choice (unless you have over 10 million entries in the table).


Since your WHERE clause contains a single aspid, an improvement to the
PostgreSQL optimizer may be to ignore that field in the ORDER BY as
order is no longer important since there is only one possible value. If
it did ignore aspid, it would use a plan similar to the first one you
provided.

You can accomplish the same thing by leaving out aspid ASC OR by setting
it to aspid DESC in the ORDER BY. Leaving it out entirely will be
slightly faster, but DESC will cause PostgreSQL to use index
"call_idx13".

vacuum_mem

From

Litao Wu

Date:

08 July 2004, 15:03:54

Hi,

I tested vacuum_mem setting under a
4CPU and 4G RAM machine. I am the only person
on that machine.

The table:
         tablename         | size_kb |   reltuples
---------------------------+-------------------------
 big_t                     | 2048392 |   7.51515e+06

Case 1:
1. vacuum full big_t;
2. begin;
   update big_t set email = lpad('a', 255, 'b');
   rollback;
3. set vacuum_mem=655360; -- 640M
4. vacuum big_t;
It takes 1415,375 ms
Also from top, the max SIZE is 615M while
SHARE is always 566M

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM
  TIME COMMAND
5914 postgres  16   0   615M 615M  566M D     7.5 15.8
 21:21 postgres: postgres mydb xxx.xxx.xxx.xxx:34361
VACUUM

Case 2:
1. vacuum full big_t;
2. begin;
   update big_t set email = lpad('a', 255, 'b');
   rollback;
3. set vacuum_mem=65536; -- 64M
4. vacuum big_t;
It takes 1297,798 ms
Also from top, the max SIZE is 615M while
SHARE is always 566M

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM
  TIME COMMAND
 3613 postgres  15   0  615M 615M  566M D    17.1 15.8
  9:04 postgres: postgres mydb xxx.xxx.xxx.xxx:34365
VACUUM

It seems vacuum_mem does not have performance
effect at all.

In reality, we vaccum nightly and I want to find out
which vacuum_mem value is the
best to short vacuum time.

Any thoughts?

Thanks,



__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail

Re: vacuum_mem

From

Rod Taylor

Date:

08 July 2004, 15:26:02

> It seems vacuum_mem does not have performance
> effect at all.

Wrong conclusion. It implies that your test case takes less than 64M of
memory to track your removed tuples. I think it takes 8 bytes to track a
tuple for vacuuming an index, which means it should be able to track
800000 deletions. Since you're demonstration had 750000 for removal,
it's under the limit.

Try your test again with 32MB; it should make a single sequential pass
on the table, and 2 passes on each index for that table.

Either that, or do a few more aborted updates.

Re: query plan wierdness?

From

Joel McGraw

Date:

09 July 2004, 15:15:59

>
> > However, this query performs a sequence scan on the table, ignoring
the
> > call_idx13 index (the only difference is the addition of the aspid
field
> > in the order by clause):
>
> You do not have an index which matches the ORDER BY, so PostgreSQL
> cannot simply scan the index for the data you want. Thus is needs to
> find all matching rows, order them, etc.
>
> > 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
> > calltype desc, callkey desc limit 26;
>
> aspid ASC, openeddatetime DESC, callstatus DESC, calltype DESC
>
> >          call_idx13 btree (aspid, openeddatetime, callstatus,
calltype,
> > callkey),
>
> This index is: aspid ASC, openeddatetime ASC, callstatus ASC, calltype
> ASC, callkey ASC
>

OK, that makes sense; however, this doesn't:

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid asc, openeddatetime asc, callstatus asc,
calltype asc, callkey asc;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------
 Sort  (cost=342903.52..344071.99 rows=467384 width=295) (actual
time=33159.38..33897.22 rows=461973 loops=1)
   Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
   ->  Seq Scan on call  (cost=0.00..31019.36 rows=467384 width=295)
(actual time=1.80..7373.75 rows=461973 loops=1)
         Filter: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Total runtime: 38043.03 msec
(5 rows)


I've modified the "order by" to reflect the call_idx13 index, yet the
query still causes a sequence scan of the table.



> A reverse scan, would of course be DESC, DESC, DESC, DESC, DESC --
> neither of which matches your requested order by, thus cannot help the
> reduce the lines looked at to 26.


To clarify, the query that the programmer wants is:

select * from call where aspid='123C' and OpenedDateTime between
'2000-01-01 00:00:00.0' and '2004-06-24 23:59:59.999' order by aspid,
openeddatetime desc, callstatus desc, calltype desc, callkey desc;

We had started playing with placing limits on the query to address
another, unrelated problem.

However, out of curiosity I did some testing with varying limits to see
at which point the planner decided to do a sequence scan instead of
using the index.



elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid, openeddatetime , callstatus , calltype ,
callkey limit 92785;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------------
 Limit  (cost=0.00..343130.36 rows=92785 width=295) (actual
time=0.17..1835.55 rows=92785 loops=1)
   ->  Index Scan using call_idx13 on call  (cost=0.00..1728444.76
rows=467384 width=295) (actual time=0.17..1699.56 rows=92786 loops=1)
         Index Cond: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Total runtime: 1901.43 msec
(4 rows)



elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid, openeddatetime , callstatus , calltype ,
callkey limit 92786;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------------
 Limit  (cost=0.00..343134.06 rows=92786 width=295) (actual
time=0.17..1834.09 rows=92786 loops=1)
   ->  Index Scan using call_idx13 on call  (cost=0.00..1728444.76
rows=467384 width=295) (actual time=0.17..1698.16 rows=92787 loops=1)
         Index Cond: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Total runtime: 1899.97 msec
(4 rows)


elon2=# select count(*) from call;
 count
--------
 507392
(1 row)


>
> This leaves your WHERE clause to restrict the dataset and it doesn't
do
> a very good job of it. There are more than 450000 rows matching the
> where clause, which means the sequential scan was probably the right
> choice (unless you have over 10 million entries in the table).
>
>
> Since your WHERE clause contains a single aspid, an improvement to the
> PostgreSQL optimizer may be to ignore that field in the ORDER BY as
> order is no longer important since there is only one possible value.
If
> it did ignore aspid, it would use a plan similar to the first one you
> provided.
>
> You can accomplish the same thing by leaving out aspid ASC OR by
setting
> it to aspid DESC in the ORDER BY. Leaving it out entirely will be
> slightly faster, but DESC will cause PostgreSQL to use index
> "call_idx13".
>
>

Again, that makes sense to me, but if I remove aspid from the query it
still ignores the index....


elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by openeddatetime desc, callstatus desc, calltype
desc, callkey desc;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------
 Sort  (cost=342903.52..344071.99 rows=467384 width=295) (actual
time=17598.31..18304.26 rows=461973 loops=1)
   Sort Key: openeddatetime, callstatus, calltype, callkey
   ->  Seq Scan on call  (cost=0.00..31019.36 rows=467384 width=295)
(actual time=1.78..7337.85 rows=461973 loops=1)
         Filter: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Total runtime: 21665.43 msec
(5 rows)


Setting enable_seqscan=off still doesn't cause the desired index to be
selected:

elon2=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid desc, openeddatetime desc, callstatus desc,
calltype desc, callkey desc;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------
 Sort  (cost=355314.41..356482.87 rows=467384 width=295) (actual
time=33382.92..34088.10 rows=461973 loops=1)
   Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
   ->  Index Scan using call_aspid on call  (cost=0.00..43430.25
rows=467384 width=295) (actual time=0.24..7915.21 rows=461973 loops=1)
         Index Cond: (aspid = '123C'::bpchar)
         Filter: ((openeddatetime >= '2000-01-01 00:00:00-07'::timestamp
with time zone) AND (openeddatetime <= '2004-06-24
23:59:59.999-07'::timestamp with time zone))
 Total runtime: 39196.39 msec




Thanks for your help (and sorry for the long post),

-Joel

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain
informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the
intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use,
copy,disclose or distribute to anyone the message or any information contained in the message. If you have received
thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.

Re: query plan wierdness?

From

Rod Taylor

Date:

09 July 2004, 16:18:59

> OK, that makes sense; however, this doesn't:
>
> elon2=# explain analyse select * from call where aspid='123C' and
> OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
> 23:59:59.999' order by aspid asc, openeddatetime asc, callstatus asc,
> calltype asc, callkey asc;

> I've modified the "order by" to reflect the call_idx13 index, yet the
> query still causes a sequence scan of the table.

This query shown above does not have a limit where the original one had
LIMIT 26. PostgreSQL has determined that pulling out all the table rows,
and sorting them in CPU is cheaper than pulling out all index rows, then
randomly pulling out all table rows.

Normally, that would be well on the mark. You can sort a large number of
tuples for a single random disk seek, but this is not true for you.

Considering you're pulling out 450k rows in 8 seconds, I'd also guess
the data is mostly in memory. Is that normal? Or is this a result of
having run several test queries against the same data multiple times?

If it's normal, bump your effective_cache parameter higher to move the
sort vs. scan threshold.

> Again, that makes sense to me, but if I remove aspid from the query it
> still ignores the index....

You've changed 2 variables. You removed the aspid AND removed the LIMIT.
Add back the limit of 26 like you originally showed, and it'll do what I
described.

> Setting enable_seqscan=off still doesn't cause the desired index to be
> selected:
>
> elon2=# explain analyse select * from call where aspid='123C' and
> OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
> 23:59:59.999' order by aspid desc, openeddatetime desc, callstatus desc,
> calltype desc, callkey desc;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------------------
>  Sort  (cost=355314.41..356482.87 rows=467384 width=295) (actual
> time=33382.92..34088.10 rows=461973 loops=1)
>    Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
>    ->  Index Scan using call_aspid on call  (cost=0.00..43430.25
> rows=467384 width=295) (actual time=0.24..7915.21 rows=461973 loops=1)
>          Index Cond: (aspid = '123C'::bpchar)
>          Filter: ((openeddatetime >= '2000-01-01 00:00:00-07'::timestamp
> with time zone) AND (openeddatetime <= '2004-06-24
> 23:59:59.999-07'::timestamp with time zone))
>  Total runtime: 39196.39 msec

I'm a little surprised at this. I should have done a reverse index scan
and skipped the sort step. In fact, with a very simple select, I get
this:

rbt=# \d t
                  Table "public.t"
 Column |              Type              | Modifiers
--------+--------------------------------+-----------
 col1   | bpchar                         |
 col2   | timestamp(0) without time zone |
 col3   | integer                        |
 col4   | integer                        |
 col5   | integer                        |
Indexes:
    "t_idx" btree (col1, col2, col3, col4, col5)

rbt=# set enable_seqscan = false;
SET
rbt=# explain analyze select * from t order by col1 desc, col2 desc,
col3 desc, col4 desc, col5 desc;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Scan Backward using t_idx on t  (cost=0.00..6.20 rows=18
width=52) (actual time=0.046..0.219 rows=18 loops=1)
 Total runtime: 1.813 ms
(2 rows)

Any chance you could put together a test case demonstrating the above
behaviour? Everything from CREATE TABLE, through dataload to the EXPLAIN
ANALYZE.

Re: query plan wierdness?

From

Joel McGraw

Date:

12 July 2004, 20:54:29

>
> Considering you're pulling out 450k rows in 8 seconds, I'd also guess
> the data is mostly in memory. Is that normal? Or is this a result of
> having run several test queries against the same data multiple times?
>

Ah yes, that would have been the result of running the query several
times...


Oddly enough, I put the same database on a different machine, and the
query now behaves as I hoped all along.  Notice that I'm using the
"real" query, with the aspid in asc and the other fields in desc order,
yet the query does use the call_idx13 index:


csitech=# explain analyse select * from call where aspid='123C' and
OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24
23:59:59.999' order by aspid, openeddatetime desc, callstatus desc,
calltype desc, callkey desc;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------------------------------------
 Sort  (cost=60.01..60.05 rows=14 width=696) (actual
time=42393.56..43381.85 rows=510705 loops=1)
   Sort Key: aspid, openeddatetime, callstatus, calltype, callkey
   ->  Index Scan using call_idx13 on call  (cost=0.00..59.74 rows=14
width=696) (actual time=0.33..19679.01 rows=510705 loops=1)
         Index Cond: ((aspid = '123C'::bpchar) AND (openeddatetime >=
'2000-01-01 00:00:00-07'::timestamp with time zone) AND (openeddatetime
<= '2004-06-24 23:59:59.999-07'::timestamp with time zone))
 Total runtime: 43602.05 msec


FWIW, this is different hardware (Solaris 9/Sparc), but the same version
of Postgres (7.3.4).  The data is a superset of the data in the other
database (they are both snapshots taken from production).

I dropped and recreated the index on the other (Linux) machine, ran
vacuum analyse, then tried the query again.  It still performs a
sequence scan on the call table. :(


>
> Any chance you could put together a test case demonstrating the above
> behaviour? Everything from CREATE TABLE, through dataload to the
EXPLAIN
> ANALYZE.


Forgive me for being thick: what exactly would be involved?  Due to
HIPAA regulations, I cannot "expose" any of the data.

<background>
I hesitated to bring this up because I wanted to focus on the technical
issues rather than have this degenerate into a religious war.  The chief
developer in charge of the project brought this query to my attention.
He has a fair amount of political sway in the company, and is now
lobbying to switch to MySQL because he maintains that PostgreSQL is
broken and/or too slow for our needs.  He has apparently benchmarked the
same query using MySQL and gotten much more favorable results (I have
been unable to corroborate this yet).
</background>


-Joel

-- CONFIDENTIALITY NOTICE --

This message is intended for the sole use of the individual and entity to whom it is addressed, and may contain
informationthat is privileged, confidential and exempt from disclosure under applicable law. If you are not the
intendedaddressee, nor authorized to receive for the intended addressee, you are hereby notified that you may not use,
copy,disclose or distribute to anyone the message or any information contained in the message. If you have received
thismessage in error, please immediately advise the sender by reply email, and delete the message. Thank you.

Re: query plan wierdness?

From

Rod Taylor

Date:

12 July 2004, 23:11:12

> Oddly enough, I put the same database on a different machine, and the
> query now behaves as I hoped all along.  Notice that I'm using the
> "real" query, with the aspid in asc and the other fields in desc order,
> yet the query does use the call_idx13 index:

Notice that while it only takes 19 seconds to pull the data out of the
table, it is spending 30 seconds sorting it -- so the index scan isn't
buying you very much.

Try it again with ORDER BY ascid DESC and you should get the query down
to 20 seconds in total on that Sparc; so I wouldn't call it exactly what
you wanted.

he decision about whether to use an index or not, is borderline. And as
you've shown they take approximately the same amount of time. Use of an
index will not necessarily be faster than a sequential scan -- but the
penalty for accidentally selecting one when it shouldn't have is much
higher.

> > Any chance you could put together a test case demonstrating the above
> > behaviour? Everything from CREATE TABLE, through dataload to the
> EXPLAIN
> > ANALYZE.
>
>
> Forgive me for being thick: what exactly would be involved?  Due to
> HIPAA regulations, I cannot "expose" any of the data.

Of course. But that doesn't mean you couldn't create table different
name and muck around with the values. But you're getting what you want,
so it isn't a problem anymore.

> <background>
> I hesitated to bring this up because I wanted to focus on the technical
> issues rather than have this degenerate into a religious war.  The chief
> developer in charge of the project brought this query to my attention.
> He has a fair amount of political sway in the company, and is now
> lobbying to switch to MySQL because he maintains that PostgreSQL is
> broken and/or too slow for our needs.  He has apparently benchmarked the
> same query using MySQL and gotten much more favorable results (I have
> been unable to corroborate this yet).
> </background>

I wouldn't be surprised if MySQL did run this single query faster with
nothing else going on during that time. MySQL was designed primarily
with a single user in mind, but it is unlikely this will be your
production situation so the benchmark is next to useless.

Connect 50 clients to the databases running this (and a mixture of other
selects) while another 20 clients are firing off updates, inserts,
deletes on these and other structures -- or whatever matches your full
production load.

This is what PostgreSQL (and a number of other DBs) are designed for,
typical production loads.