Re: 1 char in the world - Mailing list pgsql-performance

From alien
Subject Re: 1 char in the world
Date
Msg-id 3E3D5ED9.3050608@spaceship.com
Whole thread Raw
In response to Re: 1 char in the world  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Tom Lane wrote:
> I don't believe those numbers for a moment.  All else being equal,
> comparing a "char" field to a literal should be exactly the same speed
> as comparing a bool field to a literal (and if you'd just said "where bool",
> the bool field would be faster).  Both ought to be markedly faster than
> text.
>
> Look for errors in your test procedure.  One thing I'd particularly
> wonder about is whether the query plans are the same.  In the absence of
> any VACUUM ANALYZE data, I'd fully expect the planner to pick a
> different plan for a bool field than text/char --- because even without
> ANALYZE data, it knows that a bool column has only two possible values.

Well, the previous test was done on REAL data.  Everything was indexed
and vacuum analyzed as it should be.

However, I generated some test data under "controlled" circumstances and
did get different results.  Bear in mind, though, that the data is no
longer "real", and doesn't represent the system I am concerned about.

[Someone requested some tests with int4/int8, too, so I included them,
as well.  However, I would never use 4 or 8 bytes to store one bit.
Since a byte is platform-atomic, however, I will use a whole byte for a
single bit, as bit packing is too expensive.]

create table booltest (
   boo boolean,
   cha "char",
   txt text,
   in4 int4,
   in8 int8
);

Insert lots of data here, but stay consistent between fields.  [If you
insert a TRUE into a boolean, put a 'Y' into a text or "char" field and
a 1 into an int type.]  So, I basically had 2 different insert
statements (one for true and one for false), and I used a random number
generator to get a good distribution of them.

create index booidx on booltest(boo);
create index chaidx on booltest(cha);
create index txtidx on booltest(txt);
create index in4idx on booltest(in4);
create index in8idx on booltest(in8);

vacuum full verbose analyze booltest;
INFO:  --Relation public.booltest--
INFO:  Pages 6897: Changed 0, reaped 0, Empty 0, New 0; Tup 1000000: Vac
0, Keep/VTL 0/0, UnUsed 0, MinLen 52, MaxLen 52; Re-using: Free/Avail.
Space 362284/362284; EndEmpty/Avail. Pages 0/6897.
         CPU 0.53s/0.41u sec elapsed 18.69 sec.
INFO:  Index booidx: Pages 2193; Tuples 1000000.
         CPU 0.24s/0.11u sec elapsed 3.33 sec.
INFO:  Index chaidx: Pages 2193; Tuples 1000000.
         CPU 0.23s/0.19u sec elapsed 4.01 sec.
INFO:  Index txtidx: Pages 2745; Tuples 1000000.
         CPU 0.51s/0.14u sec elapsed 4.07 sec.
INFO:  Index in4idx: Pages 2193; Tuples 1000000.
         CPU 0.20s/0.17u sec elapsed 3.51 sec.
INFO:  Index in8idx: Pages 2745; Tuples 1000000.
         CPU 0.26s/0.04u sec elapsed 1.92 sec.
INFO:  Rel booltest: Pages: 6897 --> 6897; Tuple(s) moved: 0.
         CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  --Relation pg_toast.pg_toast_4327226--
INFO:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0,
Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space
0/0; EndEmpty/Avail. Pages 0/0.
         CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  Index pg_toast_4327226_index: Pages 1; Tuples 0.
         CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  Analyzing public.booltest
VACUUM


Count our test set:
select count(*) from booltest; [ALL] output:1000000
select count(*) from booltest where boo; [TRUES] output:498649

TESTS .....

1) INT8=1
explain analyze select count(*) from booltest where in8 = '1';
Aggregate  (cost=1342272.26..1342272.26 rows=1 width=0) (actual
time=3434.37..3434.37 rows=1 loops=1)
->  Index Scan using in8idx on booltest  (cost=0.00..1340996.42
rows=510333 width=0) (actual time=6.96..2704.45 rows=498649 loops=1)
Index Cond: (in8 = 1::bigint)
Total runtime: 3434.50 msec

2) INT4=1
explain analyze select count(*) from booltest where in4 = 1;
Aggregate  (cost=1341990.26..1341990.26 rows=1 width=0) (actual
time=3219.24..3219.24 rows=1 loops=1)
->  Index Scan using in4idx on booltest  (cost=0.00..1340714.42
rows=510333 width=0) (actual time=12.92..2548.20 rows=498649 loops=1)
Index Cond: (in4 = 1)
Total runtime: 3219.35 msec

3) TEXT='Y'
explain analyze select count(*) from booltest where txt = 'Y';
Aggregate  (cost=1342272.26..1342272.26 rows=1 width=0) (actual
time=4820.06..4820.06 rows=1 loops=1)
->  Index Scan using txtidx on booltest  (cost=0.00..1340996.42
rows=510333 width=0) (actual time=15.83..4042.07 rows=498649 loops=1)
Index Cond: (txt = 'Y'::text)
Total runtime: 4820.18 msec

4) BOOLEAN=true
explain analyze select count(*) from booltest where boo = true;
Aggregate  (cost=1341990.26..1341990.26 rows=1 width=0) (actual
time=3437.30..3437.30 rows=1 loops=1)
->  Index Scan using booidx on booltest  (cost=0.00..1340714.42
rows=510333 width=0) (actual time=28.16..2751.38 rows=498649 loops=1)
Index Cond: (boo = true)
Total runtime: 3437.42 msec

5) BOOLEAN [implied = true]
explain analyze select count(*) from booltest where boo;
Aggregate  (cost=100018172.83..100018172.83 rows=1 width=0) (actual
time=2775.40..2775.40 rows=1 loops=1)
->  Seq Scan on booltest  (cost=100000000.00..100016897.00 rows=510333
width=0) (actual time=0.10..2138.11 rows=498649 loops=1)
Filter: boo
Total runtime: 2775.50 msec

6) "char"='Y'
explain analyze select count(*) from booltest where cha = 'Y';
Aggregate  (cost=1341990.26..1341990.26 rows=1 width=0) (actual
time=3379.71..3379.71 rows=1 loops=1)
->  Index Scan using chaidx on booltest  (cost=0.00..1340714.42
rows=510333 width=0) (actual time=32.77..2695.77 rows=498649 loops=1)
Index Cond: (cha = 'Y'::"char")
Total runtime: 3379.82 msec


Average ms over 42 attempts per test, some one-after-the-other, others
mixed with other queries, was:

1) INT8=1          3229.76
2) INT4=1          3194.45
3) TEXT='Y'        4799.23
4) BOOLEAN=true    3283.30
5) BOOLEAN         2801.83
6) "char"='Y'      3290.15

The straight boolean test was the fastest at 2.8 secs, and the TEXT was
the slowest at 4.8 secs.  Everything else settled in the same pot at
3.25 secs.

I wasn't too impressed with any of these times, actually, but I'm
bearing in mind that we are talking about an aggregate, which I have
learned much about in the last few days from the mailing list, and which
I expect to be slow in PG.

Since straight-BOOLEAN [not adding "= true" to the condition] is about
15% faster than "char", I will stick with BOOLEAN.

My immediate future plans also include recoding my system to never use
aggregates in "live" queries.  I will be tracking statistics in real
time in statistics tables instead of hitting the database with an
aggregate.  It is much cheaper to add a few milliseconds per insert than
to slow my whole system down for several seconds during an aggregate query.

In fact, if you have a sizable table, and especially if you are running
an OLTP server, unless you are manually investigating something in the
table, I recommend never using aggregates.  Your SQL queries should be
aggregate-free for large tables, if possible.

Thanks!

--
Matt Mello


pgsql-performance by date:

Previous
From: Curt Sampson
Date:
Subject: Re: Postgres 7.3.1 poor insert/update/search performance
Next
From: Josh Berkus
Date:
Subject: Re: Postgres 7.3.1 poor insert/update/search performance