Thread: Re: PostgreSQL does CAST implicitely between int and a domain derived from int

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
"Kevin Grittner"
Date:
Jean-Michel Pouré<jm@poure.com> wrote:

> [no postgresql.conf changes except]
> shared_buffer 24M.

That's part of your problem.  (Well, that's understating it; we don't
have any real evidence that you have any performance problems *not*
resulting from failure to do normal configuration.)  If you download
the configurator tool I referenced in another email you can get a
start on fixing this.  Or search the lists -- this stuff had been
discussed many, many times.  Or do a google search; I typed

postgresql performance configuration

and got many hits.  If you spend even one hour closely reviewing any
of the top few hits and testing a more optimal configuration for your
hardware and workload, most of your performance problems would
probably go away.

Then, if you *still* have any performance problems, people could help
you diagnose and fine-tune from a base which should be much closer.

The truth is, with a proper configuration your "biggest problem",
which you've misdiagnosed as the result of casting, would probably go
away.  I will take another look at it now that you have the results of
EXPLAIN ANALYZE posted, but I seriously doubt that it's going to do
well without tuning the configuration.

-Kevin

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
"Kevin Grittner"
Date:
I wrote:

> I will take another look at it now that you have the results of
> EXPLAIN ANALYZE posted

Could you run this?:

set work_mem = '50MB';
set effective_cache_size = '3GB';
EXPLAIN ANALYZE <your query>
begin transaction;
drop index node_comment_statistics_node_comment_timestamp_idx;
EXPLAIN ANALYZE <your query>
rollback transaction;

The BEGIN TRANSACTION and ROLLBACK TRANSACTION will prevent the index
from actually being dropped; it just won't be visible to your query
during that second run.  I'm kinda curious what plan it chooses
without it.

Some configuration options can be dynamically overridden for a
particular connection.  This is not a complete list of what you might
want to use in your postgresql.conf file, but it might turn up an
interesting plan for diagnostic purposes.

-Kevin


Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
Jean-Michel Pouré
Date:
Le mercredi 26 août 2009 à 18:03 -0500, Kevin Grittner a écrit :
> That's part of your problem.

Sorry, I wrote that too quickly.

My configuration is (Quad core, 8Gb):
shared_buffers = 2GB (WAS 1GB)
temp_buffers = 128MB (modified after reading your message)
work_mem = 512MB (modified after reading your message)

Still casting.

> If you download
> the configurator tool I referenced in another email you can get a
> start on fixing this.

I do that immediately.
Then I get back to you.

Why not include a configurator in PostgreSQL core?

Kind regards,
Jean-Michel

Attachment

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
Jean-Michel Pouré
Date:
Dear Kevin,

Thanks for help!

Could you run this?:
>
> set work_mem = '50MB';
> set effective_cache_size = '3GB';
> EXPLAIN ANALYZE <your query>
> begin transaction;
> drop index node_comment_statistics_node_comment_timestamp_idx;
> EXPLAIN ANALYZE <your query>
> rollback transaction;

set work_mem = '50MB';
set effective_cache_size = '1GB';

EXPLAIN ANALYSE
SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0,
u2.name, ncs.last_comment_name) AS last_comment_name,
ncs.last_comment_uid
FROM node n
INNER JOIN users u1 ON n.uid = u1.uid
INNER JOIN term_node tn ON n.vid = tn.vid
INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid
INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid
WHERE n.status = 1 AND tn.tid = 3
ORDER BY ncs.last_comment_timestamp DESC LIMIT 1 OFFSET 0 ;

"Limit  (cost=0.00..544.67 rows=1 width=17) (actual
time=455.234..455.234 rows=0 loops=1)"
"  ->  Nested Loop  (cost=0.00..49565.19 rows=91 width=17) (actual
time=455.232..455.232 rows=0 loops=1)"
"        ->  Nested Loop  (cost=0.00..49538.56 rows=91 width=21) (actual
time=455.232..455.232 rows=0 loops=1)"
"              ->  Nested Loop  (cost=0.00..49512.17 rows=91 width=13)
(actual time=455.232..455.232 rows=0 loops=1)"
"                    ->  Nested Loop  (cost=0.00..27734.58 rows=67486
width=17) (actual time=0.027..264.540 rows=67486 loops=1)"
"                          ->  Index Scan Backward using
node_comment_statistics_node_comment_timestamp_idx on
node_comment_statistics ncs  (cost=0.00..3160.99 rows=67486 width=13)
(actual time=0.014..40.618 rows=67486 loops=1)"
"                          ->  Index Scan using node_pkey on node n
(cost=0.00..0.35 rows=1 width=12) (actual time=0.002..0.003 rows=1
loops=67486)"
"                                Index Cond: (n.nid =
(ncs.nid)::integer)"
"                                Filter: (n.status = 1)"
"                    ->  Index Scan using term_node_vid_idx on term_node
tn  (cost=0.00..0.31 rows=1 width=4) (actual time=0.002..0.002 rows=0
loops=67486)"
"                          Index Cond: ((tn.vid)::integer =
(n.vid)::integer)"
"                          Filter: ((tn.tid)::integer = 3)"
"              ->  Index Scan using users_pkey on users u2
(cost=0.00..0.28 rows=1 width=12) (never executed)"
"                    Index Cond: (u2.uid = ncs.last_comment_uid)"
"        ->  Index Scan using users_pkey on users u1  (cost=0.00..0.28
rows=1 width=4) (never executed)"
"              Index Cond: (u1.uid = n.uid)"
"Total runtime: 455.311 ms"

> begin transaction;
> drop index node_comment_statistics_node_comment_timestamp_idx;
> EXPLAIN ANALYZE <your query>
> rollback transaction;

begin transaction;
drop index node_comment_statistics_node_comment_timestamp_idx;

EXPLAIN ANALYSE
SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0, u2.name, ncs.last_comment_name) AS last_comment_name,
ncs.last_comment_uid
FROM node n
INNER JOIN users u1 ON n.uid = u1.uid
INNER JOIN term_node tn ON n.vid = tn.vid
INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid
INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid
WHERE n.status = 1 AND tn.tid = 3
ORDER BY ncs.last_comment_timestamp DESC LIMIT 1 OFFSET 0 ;

rollback transaction;

Does not show any result because of ROLLBACK;
The query executes in 89 ms.

> Some configuration options can be dynamically overridden for a
> particular connection.  This is not a complete list of what you might
> want to use in your postgresql.conf file, but it might turn up an
> interesting plan for diagnostic purposes.

I am turning to configurator, stay tuned.
Again,thanks for your help.

Bye, Jean-Michel

Attachment

Re: PostgreSQL does CAST implicitely between int and adomain derived from int

From
"Kevin Grittner"
Date:
Jean-Michel Pouré<jm@poure.com> wrote:
> Still casting.

For about the tenth time on the topic -- YOUR PROBLEM HAS NOTHING
WHATSOEVER TO DO WITH CASTING!  Let that go so you can look for the
real problem.

Just as an example, look at this closely:

test=# create table t2 (c1 int not null primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"t2_pkey" for table "t2"
CREATE TABLE
test=# insert into t2 select * from generate_series(1,10000);
INSERT 0 10000
test=# vacuum analyze;
VACUUM
test=# explain analyze select count(*) from t1 where c1 between 200
and 400;
                                                      QUERY PLAN
----------------------------------------------------------------------
-------------------------------------------------
 Aggregate  (cost=12.75..12.76 rows=1 width=0) (actual
time=0.764..0.765 rows=1 loops=1)
   ->  Index Scan using t1_pkey on t1  (cost=0.00..12.25 rows=200
width=0) (actual time=0.095..0.470 rows=201 loops=1)
         Index Cond: (((c1)::integer >= 200) AND ((c1)::integer <=
400))
 Total runtime: 0.827 ms
(4 rows)

The type is always put in there so that you can see what it's doing;
it doesn't reflect anything which is actually taking any time.

Let it go.

-Kevin

Re: PostgreSQL does CAST implicitely between int and adomain derived from int

From
Jean-Michel Pouré
Date:
Le jeudi 27 août 2009 à 09:01 -0500, Kevin Grittner a écrit :
> The type is always put in there so that you can see what it's doing;
> it doesn't reflect anything which is actually taking any time.

My query plan for the same query is:

"Aggregate  (cost=12.75..12.76 rows=1 width=0) (actual time=0.094..0.094
rows=1 loops=1)"
"  ->  Index Scan using t2_pkey on t2  (cost=0.00..12.25 rows=200
width=0) (actual time=0.016..0.068 rows=201 loops=1)"
"        Index Cond: ((c1 >= 200) AND (c1 <= 400))"
"Total runtime: 0.142 ms"

So I don't see any :: in my results.

In my various query plans on my database, the :: is only displayed when
comparing int and int_unsigned. So I interpreted the :: as a cast.

Are you sure that :: does not inform of a cast? Do we have documentation
about that?

Kind regards,
Jean-Michel

Attachment

Re: PostgreSQL does CAST implicitely between int and adomain derived from int

From
"Kevin Grittner"
Date:
>Jean-Michel Pouré<jm@poure.com> wrote:
> Does not show any result because of ROLLBACK;

Then you need to use a better tool to run it.  For example, in psql:

test=# create table t2 (c1 int not null);
CREATE TABLE
test=# insert into t2 select * from generate_series(1,10000);
INSERT 0 10000
test=# create unique index t2_c1 on t2 (c1);
CREATE INDEX
test=# explain analyze select count(*) from t2 where c1 between 200 and
400;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=47.77..47.78 rows=1 width=0) (actual
time=0.668..0.669 rows=1 loops=1)
   ->  Bitmap Heap Scan on t2  (cost=4.76..47.64 rows=50 width=0)
(actual time=0.091..0.380 rows=201 loops=1)
         Recheck Cond: ((c1 >= 200) AND (c1 <= 400))
         ->  Bitmap Index Scan on t2_c1  (cost=0.00..4.75 rows=50
width=0) (actual time=0.080..0.080 rows=201 loops=1)
               Index Cond: ((c1 >= 200) AND (c1 <= 400))
 Total runtime: 0.722 ms
(6 rows)

test=# begin transaction;
BEGIN
test=# drop index t2_c1;
DROP INDEX
test=# explain analyze select count(*) from t2 where c1 between 200 and
400;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Aggregate  (cost=190.50..190.51 rows=1 width=0) (actual
time=3.324..3.325 rows=1 loops=1)
   ->  Seq Scan on t2  (cost=0.00..190.00 rows=200 width=0) (actual
time=0.053..3.036 rows=201 loops=1)
         Filter: ((c1 >= 200) AND (c1 <= 400))
 Total runtime: 3.366 ms
(4 rows)

test=# rollback transaction;
ROLLBACK

-Kevin

Re: PostgreSQL does CAST implicitely between int and adomain derived from int

From
Jean-Michel Pouré
Date:
Le jeudi 27 août 2009 à 09:16 -0500, Kevin Grittner a écrit :
> Then you need to use a better tool to run it.

Understood, thanks.

cms=# set work_mem = '50MB';
SET
cms=# set effective_cache_size = '1GB';
SET
cms=# begin transaction;
BEGIN
cms=# drop index node_comment_statistics_node_comment_timestamp_idx;
DROP INDEX
cms=#
cms=# EXPLAIN ANALYSE
cms-# SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0,
u2.name, ncs.last_comment_name) AS last_comment_name,
ncs.last_comment_uid
cms-# FROM node n
cms-# INNER JOIN users u1 ON n.uid = u1.uid
cms-# INNER JOIN term_node tn ON n.vid = tn.vid
cms-# INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid
cms-# INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid
cms-# WHERE n.status = 1 AND tn.tid = 3
cms-# ORDER BY ncs.last_comment_timestamp DESC LIMIT 1 OFFSET 0 ;
cms=# rollback transaction;


QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=972.82..972.82 rows=1 width=17) (actual time=0.018..0.018
rows=0 loops=1)
   ->  Sort  (cost=972.82..973.04 rows=91 width=17) (actual
time=0.018..0.018 rows=0 loops=1)
         Sort Key: ncs.last_comment_timestamp
         Sort Method:  quicksort  Memory: 25kB
         ->  Nested Loop  (cost=4.96..972.36 rows=91 width=17) (actual
time=0.010..0.010 rows=0 loops=1)
               ->  Nested Loop  (cost=4.96..945.74 rows=91 width=21)
(actual time=0.010..0.010 rows=0 loops=1)
                     ->  Nested Loop  (cost=4.96..919.34 rows=91
width=13) (actual time=0.010..0.010 rows=0 loops=1)
                           ->  Nested Loop  (cost=4.96..890.02 rows=91
width=8) (actual time=0.009..0.009 rows=0 loops=1)
                                 ->  Bitmap Heap Scan on term_node tn
(cost=4.96..215.63 rows=91 width=4) (actual time=0.009..0.009 rows=0
loops=1)
                                       Recheck Cond: ((tid)::integer =
3)
                                       ->  Bitmap Index Scan on
term_node_tid_idx  (cost=0.00..4.94 rows=91 width=0) (actual
time=0.008..0.008 rows=0 loops=1)
                                             Index Cond: ((tid)::integer
= 3)
                                 ->  Index Scan using node_vid_idx on
node n  (cost=0.00..7.40 rows=1 width=12) (never executed)
                                       Index Cond: ((n.vid)::integer =
(tn.vid)::integer)
                                       Filter: (n.status = 1)
                           ->  Index Scan using
node_comment_statistics_pkey on node_comment_statistics ncs
(cost=0.00..0.31 rows=1 width=13) (never executed)
                                 Index Cond: ((ncs.nid)::integer =
n.nid)
                     ->  Index Scan using users_pkey on users u2
(cost=0.00..0.28 rows=1 width=12) (never executed)
                           Index Cond: (u2.uid = ncs.last_comment_uid)
               ->  Index Scan using users_pkey on users u1
(cost=0.00..0.28 rows=1 width=4) (never executed)
                     Index Cond: (u1.uid = n.uid)
 Total runtime: 0.092 ms
(22 lignes)


Does it mean my index is broken and should be rebuilt?

Kind regards,
Jean-Michel

Attachment

Re: PostgreSQL does CAST implicitely between int andadomain derived from int

From
"Kevin Grittner"
Date:
Jean-Michel Pouré<jm@poure.com> wrote:

> ... Index Cond: ((tid)::integer = 3)

> ... Index Cond: ((n.vid)::integer = (tn.vid)::integer)

> ... Index Cond: ((ncs.nid)::integer = n.nid)

>  Total runtime: 0.092 ms

Sorry, but I just had to point that out.
I feel much better now.  ;-)

> Does it mean my index is broken and should be rebuilt?

No, probably not.

Just to get another data point, what happens if you run the same query
without taking the index out of the picture, but without the LIMIT or
OFFSET clauses?  An EXPLAIN ANALYZE of that would help understand it
more fully.

-Kevin

Re: PostgreSQL does CAST implicitely between int andadomain derived from int

From
Jean-Michel Pouré
Date:
Le jeudi 27 août 2009 à 09:52 -0500, Kevin Grittner a écrit :
> Just to get another data point, what happens if you run the same query
> without taking the index out of the picture, but without the LIMIT or
> OFFSET clauses?  An EXPLAIN ANALYZE of that would help understand it
> more fully.

Also, just a short notice that this SELECT returns no result.

You were right: adding LIMIT 1 changes speed from O.090 ms to 420 ms.
This has nothing to do with casting.

EXPLAIN ANALYSE
SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0,
u2.name, ncs.last_comment_name) AS last_comment_name,
ncs.last_comment_uid
FROM node n
INNER JOIN users u1 ON n.uid = u1.uid
INNER JOIN term_node tn ON n.vid = tn.vid
INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid
INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid
WHERE n.status = 1 AND tn.tid = 3
ORDER BY ncs.last_comment_timestamp DESC

"Sort  (cost=975.32..975.55 rows=91 width=17) (actual time=0.021..0.021
rows=0 loops=1)"
"  Sort Key: ncs.last_comment_timestamp"
"  Sort Method:  quicksort  Memory: 25kB"
"  ->  Nested Loop  (cost=4.96..972.36 rows=91 width=17) (actual
time=0.016..0.016 rows=0 loops=1)"
"        ->  Nested Loop  (cost=4.96..945.74 rows=91 width=21) (actual
time=0.016..0.016 rows=0 loops=1)"
"              ->  Nested Loop  (cost=4.96..919.34 rows=91 width=13)
(actual time=0.016..0.016 rows=0 loops=1)"
"                    ->  Nested Loop  (cost=4.96..890.02 rows=91
width=8) (actual time=0.016..0.016 rows=0 loops=1)"
"                          ->  Bitmap Heap Scan on term_node tn
(cost=4.96..215.63 rows=91 width=4) (actual time=0.016..0.016 rows=0
loops=1)"
"                                Recheck Cond: ((tid)::integer = 3)"
"                                ->  Bitmap Index Scan on
term_node_tid_idx  (cost=0.00..4.94 rows=91 width=0) (actual
time=0.014..0.014 rows=0 loops=1)"
"                                      Index Cond: ((tid)::integer = 3)"
"                          ->  Index Scan using node_vid_idx on node n
(cost=0.00..7.40 rows=1 width=12) (never executed)"
"                                Index Cond: ((n.vid)::integer =
(tn.vid)::integer)"
"                                Filter: (n.status = 1)"
"                    ->  Index Scan using node_comment_statistics_pkey
on node_comment_statistics ncs  (cost=0.00..0.31 rows=1 width=13) (never
executed)"
"                          Index Cond: ((ncs.nid)::integer = n.nid)"
"              ->  Index Scan using users_pkey on users u2
(cost=0.00..0.28 rows=1 width=12) (never executed)"
"                    Index Cond: (u2.uid = ncs.last_comment_uid)"
"        ->  Index Scan using users_pkey on users u1  (cost=0.00..0.28
rows=1 width=4) (never executed)"
"              Index Cond: (u1.uid = n.uid)"
"Total runtime: 0.090 ms"

EXPLAIN ANALYSE
SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0,
u2.name, ncs.last_comment_name) AS last_comment_name,
ncs.last_comment_uid
FROM node n
INNER JOIN users u1 ON n.uid = u1.uid
INNER JOIN term_node tn ON n.vid = tn.vid
INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid
INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid
WHERE n.status = 1 AND tn.tid = 3
ORDER BY ncs.last_comment_timestamp DESC
LIMIT 1

"Limit  (cost=0.00..544.67 rows=1 width=17) (actual
time=435.715..435.715 rows=0 loops=1)"
"  ->  Nested Loop  (cost=0.00..49565.19 rows=91 width=17) (actual
time=435.713..435.713 rows=0 loops=1)"
"        ->  Nested Loop  (cost=0.00..49538.56 rows=91 width=21) (actual
time=435.713..435.713 rows=0 loops=1)"
"              ->  Nested Loop  (cost=0.00..49512.17 rows=91 width=13)
(actual time=435.713..435.713 rows=0 loops=1)"
"                    ->  Nested Loop  (cost=0.00..27734.58 rows=67486
width=17) (actual time=0.029..252.443 rows=67486 loops=1)"
"                          ->  Index Scan Backward using
node_comment_statistics_node_comment_timestamp_idx on
node_comment_statistics ncs  (cost=0.00..3160.99 rows=67486 width=13)
(actual time=0.014..40.583 rows=67486 loops=1)"
"                          ->  Index Scan using node_pkey on node n
(cost=0.00..0.35 rows=1 width=12) (actual time=0.002..0.003 rows=1
loops=67486)"
"                                Index Cond: (n.nid =
(ncs.nid)::integer)"
"                                Filter: (n.status = 1)"
"                    ->  Index Scan using term_node_vid_idx on term_node
tn  (cost=0.00..0.31 rows=1 width=4) (actual time=0.002..0.002 rows=0
loops=67486)"
"                          Index Cond: ((tn.vid)::integer =
(n.vid)::integer)"
"                          Filter: ((tn.tid)::integer = 3)"
"              ->  Index Scan using users_pkey on users u2
(cost=0.00..0.28 rows=1 width=12) (never executed)"
"                    Index Cond: (u2.uid = ncs.last_comment_uid)"
"        ->  Index Scan using users_pkey on users u1  (cost=0.00..0.28
rows=1 width=4) (never executed)"
"              Index Cond: (u1.uid = n.uid)"
"Total runtime: 435.788 ms"

EXPLAIN ANALYSE
SELECT ncs.last_comment_timestamp, IF (ncs.last_comment_uid != 0,
u2.name, ncs.last_comment_name) AS last_comment_name,
ncs.last_comment_uid
FROM node n
INNER JOIN users u1 ON n.uid = u1.uid
INNER JOIN term_node tn ON n.vid = tn.vid
INNER JOIN node_comment_statistics ncs ON n.nid = ncs.nid
INNER JOIN users u2 ON ncs.last_comment_uid=u2.uid
WHERE n.status = 1 AND tn.tid = 3
ORDER BY ncs.last_comment_timestamp DESC LIMIT 1 OFFSET 0

"Limit  (cost=0.00..544.67 rows=1 width=17) (actual
time=541.488..541.488 rows=0 loops=1)"
"  ->  Nested Loop  (cost=0.00..49565.19 rows=91 width=17) (actual
time=541.486..541.486 rows=0 loops=1)"
"        ->  Nested Loop  (cost=0.00..49538.56 rows=91 width=21) (actual
time=541.485..541.485 rows=0 loops=1)"
"              ->  Nested Loop  (cost=0.00..49512.17 rows=91 width=13)
(actual time=541.485..541.485 rows=0 loops=1)"
"                    ->  Nested Loop  (cost=0.00..27734.58 rows=67486
width=17) (actual time=0.024..307.341 rows=67486 loops=1)"
"                          ->  Index Scan Backward using
node_comment_statistics_node_comment_timestamp_idx on
node_comment_statistics ncs  (cost=0.00..3160.99 rows=67486 width=13)
(actual time=0.012..62.504 rows=67486 loops=1)"
"                          ->  Index Scan using node_pkey on node n
(cost=0.00..0.35 rows=1 width=12) (actual time=0.003..0.003 rows=1
loops=67486)"
"                                Index Cond: (n.nid =
(ncs.nid)::integer)"
"                                Filter: (n.status = 1)"
"                    ->  Index Scan using term_node_vid_idx on term_node
tn  (cost=0.00..0.31 rows=1 width=4) (actual time=0.003..0.003 rows=0
loops=67486)"
"                          Index Cond: ((tn.vid)::integer =
(n.vid)::integer)"
"                          Filter: ((tn.tid)::integer = 3)"
"              ->  Index Scan using users_pkey on users u2
(cost=0.00..0.28 rows=1 width=12) (never executed)"
"                    Index Cond: (u2.uid = ncs.last_comment_uid)"
"        ->  Index Scan using users_pkey on users u1  (cost=0.00..0.28
rows=1 width=4) (never executed)"
"              Index Cond: (u1.uid = n.uid)"
"Total runtime: 541.568 ms"


Attachment

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
"Kevin Grittner"
Date:
Jean-Michel Pouré<jm@poure.com> wrote:

> Also, just a short notice that this SELECT returns no result.

Once you posted EXPLAIN ANALYZE results, that was clear because actual
rows on the top line is zero.

> You were right: adding LIMIT 1 changes speed from O.090 ms to 420
> ms.

In summary, what's happening is that when the LIMIT 1 is there, the
optimizer sees that the index will return rows in the order you
requested, and thinks that it won't have to read very far to get a
match, at which point it would be able to stop.  There are no matches,
but it has to read all the way through the index, pulling related rows
to check for matches, before it can know that.  Without the limit, it
optimizes for the fastest plan which will scan all the rows.  The
first test returns nothing, so all the joins become very cheap -- they
are never exercised.

This is related to a topic recently discussed on the hackers list --
whether the optimizer should be modified to recognize "risky" plans,
and try to avoid them.  This is another example of a query which might
benefit from such work.

It's also possible that this is another manifestation of an issue
about which there has been some dispute -- the decision to always
round up any fraction on expected rows to the next whole number.  I
don't know without doing more research, but it wouldn't shock me if
this rounding contributed to the optimizer's expectations that it
would get a match soon enough to make the problem plan a good one.

It is *possible* that if you boost your default_statistics_target and
run ANALYZE (or VACUUM ANALYZE), it will recognize that it isn't a
good idea to read backwards on that index.  I would try it and see, if
that's practical for you.  If not, you might be able to limit the
plans that the optimizer considers using various techniques, but
that's a bit of a kludge; I'd save it for a last resort.

> This has nothing to do with casting.

Yeah, that much was pretty apparent to many people from the start.  It
was rather frustrating that you weren't listening on that point; I
think that resulted in you wasting time focusing on the wrong things
and not moving in a productive direction sooner.  As has been
suggested by someone else, you'll get better results presenting your
problem with as much relevant detail as possible and asking for help
sorting it out, rather than putting too much emphasis on your
preliminary guess as to the cause.

-Kevin

Re: PostgreSQL

From
Jean-Michel Pouré
Date:
Le jeudi 27 août 2009 à 11:36 -0500, Kevin Grittner a écrit :
> It is *possible* that if you boost your default_statistics_target and
> run ANALYZE (or VACUUM ANALYZE), it will recognize that it isn't a
> good idea to read backwards on that index.  I would try it and see, if
> that's practical for you.  If not, you might be able to limit the
> plans that the optimizer considers using various techniques, but
> that's a bit of a kludge; I'd save it for a last resort.

I will try that.

> Yeah, that much was pretty apparent to many people from the start.  It
> was rather frustrating that you weren't listening on that point; I
> think that resulted in you wasting time focusing on the wrong things
> and not moving in a productive direction sooner.  As has been
> suggested by someone else, you'll get better results presenting your
> problem with as much relevant detail as possible and asking for help
> sorting it out, rather than putting too much emphasis on your
> preliminary guess as to the cause.

Yeah. I will keep that in mind, don't worry.

This kind of slow queries on LIMIT seems to happen all the time on
Drupal. Maybe it is because the site is not yet going live.

Also this means that Drupal on PostgreSQL could rock completely if/when
the optimizer has enough information to find the correct plan.

If you are interested, I can post on performance ML strange queries with
LIMIT that may be interesting after we go life and have enough
statistics.

Many thanks and bye,
Jean-Michel

Attachment
2009/8/27 Kevin Grittner <Kevin.Grittner@wicourts.gov>:
> It is *possible* that if you boost your default_statistics_target and
> run ANALYZE (or VACUUM ANALYZE), it will recognize that it isn't a
> good idea to read backwards on that index.  I would try it and see, if
> that's practical for you.

I notice this in one of the plans:

->  Bitmap Index Scan on term_node_tid_idx  (cost=0.00..4.94 rows=91
width=0) (actual time=0.014..0.014 rows=0 loops=1)
     Index Cond: ((tid)::integer = 3)

That's a pretty bad estimate for a scan of a single relation with a
filter on one column.

I'd like to see the output of:

SELECT MIN(tid), MAX(tid), SUM(1) FROM term_node;
SHOW default_statistics_target;

By the way, why does EXPLAIN not display the name of the table as well
as the index when it performs a bitmap index scan?  It does do so for
a regular index scan.

What version of PG is this again?

...Robert

Robert Haas <robertmhaas@gmail.com> writes:
> By the way, why does EXPLAIN not display the name of the table as well
> as the index when it performs a bitmap index scan?

Because that plan node is not in fact touching the table.  The table
name is shown in the BitmapHeapScan node that *does* touch the table.

            regards, tom lane

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
Jean-Michel Pouré
Date:
Le jeudi 27 août 2009 à 13:35 -0400, Robert Haas a écrit :
> SELECT MIN(tid), MAX(tid), SUM(1) FROM term_node;
> SHOW default_statistics_target;

SELECT MIN(tid), MAX(tid), SUM(1) FROM term_node;
6;56;67479

SHOW default_statistics_target;
100

For information, if some hackers are interested and they belong to the
community for a long time, I can provide tables with data.

Kind regards,
Jean-Michel

Attachment

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
Jean-Michel Pouré
Date:
Le jeudi 27 août 2009 à 14:05 -0400, Tom Lane a écrit :
> tom lane

Dear Tom,

Why is the query planner displaying ::integer
What does it mean?

Kind regards,
Jean-Michel

Attachment

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
Bruce Momjian
Date:
Jean-Michel Pour� wrote:
-- Start of PGP signed section.
> Le jeudi 27 ao?t 2009 ? 14:05 -0400, Tom Lane a ?crit :
> > tom lane
>
> Dear Tom,
>
> Why is the query planner displaying ::integer
> What does it mean?

::integer casts a data type to INTEGER.  It is the same as CAST().

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
Jean-Michel Pouré
Date:
Le samedi 29 août 2009 à 11:16 -0400, Bruce Momjian a écrit :
> > Why is the query planner displaying ::integer
> > What does it mean?
>
> ::integer casts a data type to INTEGER.  It is the same as CAST().

In Drupal database, we have two types:

integer
int_unsigned

CREATE DOMAIN int_unsigned
  AS integer
CONSTRAINT int_unsigned_check CHECK ((VALUE >= 0));

Why do queries cast between integer and int_unsigned?

Kind regards,
Jean-Michel

Attachment
Jean-Michel =?ISO-8859-1?Q?Pour=E9?= <jm@poure.com> writes:
> In Drupal database, we have two types:
> CREATE DOMAIN int_unsigned
>   AS integer
> CONSTRAINT int_unsigned_check CHECK ((VALUE >= 0));

> Why do queries cast between integer and int_unsigned?

That domain doesn't have any operators of its own.  To compare to
another value, or use an index, you have to cast it to integer which
does have operators.  It's a no-op cast, but logically necessary.

            regards, tom lane

Re: PostgreSQL does CAST implicitely between int and a domain derived from int

From
Jean-Michel Pouré
Date:
Le samedi 29 août 2009 à 13:44 -0400, Tom Lane a écrit :
> That domain doesn't have any operators of its own.  To compare to
> another value, or use an index, you have to cast it to integer which
> does have operators.  It's a no-op cast, but logically necessary.

Dear Tom,

Thanks for answering. On more question:

Drupal makes use these no-op CREATE DOMAINs in the database schema :

CREATE DOMAIN int_unsigned
  AS integer
   CONSTRAINT int_unsigned_check CHECK ((VALUE >= 0));

CREATE DOMAIN bigint_unsigned
  AS bigint
   CONSTRAINT bigint_unsigned_check CHECK ((VALUE >= 0));

CREATE DOMAIN smallint_unsigned
  AS smallint
   CONSTRAINT smallint_unsigned_check CHECK ((VALUE >= 0));

CREATE DOMAIN varchar_ci
  AS character varying(255)
  DEFAULT ''::character varying
  NOT NULL;

In my slow queries, I can notice excessive no-op casts. Do you think
this could lead to excessive sequential scans?

What do you recommend: using normal types and moving constraints in the
Drupal database? Is PostgreSQL domain broken as it forces casting or is
this a no-op for performance?

What do you recommend?

Kind regards,
Jean-Michel

Attachment
Jean-Michel =?ISO-8859-1?Q?Pour=E9?= <jm@poure.com> writes:
> What do you recommend: using normal types and moving constraints in the
> Drupal database? Is PostgreSQL domain broken as it forces casting or is
> this a no-op for performance?

In principle it should be an unnoticeable slowdown.  In the past we've
had some issues with the planner failing to recognize possible
optimizations when there was a cast in the way, but I'm not aware of
any such bugs at the moment.

            regards, tom lane

On Sat, Aug 29, 2009 at 10:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Jean-Michel =?ISO-8859-1?Q?Pour=E9?= <jm@poure.com> writes:
>> What do you recommend: using normal types and moving constraints in the
>> Drupal database? Is PostgreSQL domain broken as it forces casting or is
>> this a no-op for performance?
>
> In principle it should be an unnoticeable slowdown.  In the past we've
> had some issues with the planner failing to recognize possible
> optimizations when there was a cast in the way, but I'm not aware of
> any such bugs at the moment.

In particular since your plan nodes are all index scans it's clear
that the casts are not getting in the way. The symptom when there were
problems was that the planner was forced to use sequential scans
because it couldn't match the casted expressionto the index expression
or some variant of that.


--
greg
http://mit.edu/~gsstark/resume.pdf