Re: 1 char in the world - Mailing list pgsql-performance
From | alien |
---|---|
Subject | Re: 1 char in the world |
Date | |
Msg-id | 3E3D5ED9.3050608@spaceship.com Whole thread Raw |
In response to | Re: 1 char in the world (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
Tom Lane wrote: > I don't believe those numbers for a moment. All else being equal, > comparing a "char" field to a literal should be exactly the same speed > as comparing a bool field to a literal (and if you'd just said "where bool", > the bool field would be faster). Both ought to be markedly faster than > text. > > Look for errors in your test procedure. One thing I'd particularly > wonder about is whether the query plans are the same. In the absence of > any VACUUM ANALYZE data, I'd fully expect the planner to pick a > different plan for a bool field than text/char --- because even without > ANALYZE data, it knows that a bool column has only two possible values. Well, the previous test was done on REAL data. Everything was indexed and vacuum analyzed as it should be. However, I generated some test data under "controlled" circumstances and did get different results. Bear in mind, though, that the data is no longer "real", and doesn't represent the system I am concerned about. [Someone requested some tests with int4/int8, too, so I included them, as well. However, I would never use 4 or 8 bytes to store one bit. Since a byte is platform-atomic, however, I will use a whole byte for a single bit, as bit packing is too expensive.] create table booltest ( boo boolean, cha "char", txt text, in4 int4, in8 int8 ); Insert lots of data here, but stay consistent between fields. [If you insert a TRUE into a boolean, put a 'Y' into a text or "char" field and a 1 into an int type.] So, I basically had 2 different insert statements (one for true and one for false), and I used a random number generator to get a good distribution of them. create index booidx on booltest(boo); create index chaidx on booltest(cha); create index txtidx on booltest(txt); create index in4idx on booltest(in4); create index in8idx on booltest(in8); vacuum full verbose analyze booltest; INFO: --Relation public.booltest-- INFO: Pages 6897: Changed 0, reaped 0, Empty 0, New 0; Tup 1000000: Vac 0, Keep/VTL 0/0, UnUsed 0, MinLen 52, MaxLen 52; Re-using: Free/Avail. Space 362284/362284; EndEmpty/Avail. Pages 0/6897. CPU 0.53s/0.41u sec elapsed 18.69 sec. INFO: Index booidx: Pages 2193; Tuples 1000000. CPU 0.24s/0.11u sec elapsed 3.33 sec. INFO: Index chaidx: Pages 2193; Tuples 1000000. CPU 0.23s/0.19u sec elapsed 4.01 sec. INFO: Index txtidx: Pages 2745; Tuples 1000000. CPU 0.51s/0.14u sec elapsed 4.07 sec. INFO: Index in4idx: Pages 2193; Tuples 1000000. CPU 0.20s/0.17u sec elapsed 3.51 sec. INFO: Index in8idx: Pages 2745; Tuples 1000000. CPU 0.26s/0.04u sec elapsed 1.92 sec. INFO: Rel booltest: Pages: 6897 --> 6897; Tuple(s) moved: 0. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: --Relation pg_toast.pg_toast_4327226-- INFO: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: Index pg_toast_4327226_index: Pages 1; Tuples 0. CPU 0.00s/0.00u sec elapsed 0.02 sec. INFO: Analyzing public.booltest VACUUM Count our test set: select count(*) from booltest; [ALL] output:1000000 select count(*) from booltest where boo; [TRUES] output:498649 TESTS ..... 1) INT8=1 explain analyze select count(*) from booltest where in8 = '1'; Aggregate (cost=1342272.26..1342272.26 rows=1 width=0) (actual time=3434.37..3434.37 rows=1 loops=1) -> Index Scan using in8idx on booltest (cost=0.00..1340996.42 rows=510333 width=0) (actual time=6.96..2704.45 rows=498649 loops=1) Index Cond: (in8 = 1::bigint) Total runtime: 3434.50 msec 2) INT4=1 explain analyze select count(*) from booltest where in4 = 1; Aggregate (cost=1341990.26..1341990.26 rows=1 width=0) (actual time=3219.24..3219.24 rows=1 loops=1) -> Index Scan using in4idx on booltest (cost=0.00..1340714.42 rows=510333 width=0) (actual time=12.92..2548.20 rows=498649 loops=1) Index Cond: (in4 = 1) Total runtime: 3219.35 msec 3) TEXT='Y' explain analyze select count(*) from booltest where txt = 'Y'; Aggregate (cost=1342272.26..1342272.26 rows=1 width=0) (actual time=4820.06..4820.06 rows=1 loops=1) -> Index Scan using txtidx on booltest (cost=0.00..1340996.42 rows=510333 width=0) (actual time=15.83..4042.07 rows=498649 loops=1) Index Cond: (txt = 'Y'::text) Total runtime: 4820.18 msec 4) BOOLEAN=true explain analyze select count(*) from booltest where boo = true; Aggregate (cost=1341990.26..1341990.26 rows=1 width=0) (actual time=3437.30..3437.30 rows=1 loops=1) -> Index Scan using booidx on booltest (cost=0.00..1340714.42 rows=510333 width=0) (actual time=28.16..2751.38 rows=498649 loops=1) Index Cond: (boo = true) Total runtime: 3437.42 msec 5) BOOLEAN [implied = true] explain analyze select count(*) from booltest where boo; Aggregate (cost=100018172.83..100018172.83 rows=1 width=0) (actual time=2775.40..2775.40 rows=1 loops=1) -> Seq Scan on booltest (cost=100000000.00..100016897.00 rows=510333 width=0) (actual time=0.10..2138.11 rows=498649 loops=1) Filter: boo Total runtime: 2775.50 msec 6) "char"='Y' explain analyze select count(*) from booltest where cha = 'Y'; Aggregate (cost=1341990.26..1341990.26 rows=1 width=0) (actual time=3379.71..3379.71 rows=1 loops=1) -> Index Scan using chaidx on booltest (cost=0.00..1340714.42 rows=510333 width=0) (actual time=32.77..2695.77 rows=498649 loops=1) Index Cond: (cha = 'Y'::"char") Total runtime: 3379.82 msec Average ms over 42 attempts per test, some one-after-the-other, others mixed with other queries, was: 1) INT8=1 3229.76 2) INT4=1 3194.45 3) TEXT='Y' 4799.23 4) BOOLEAN=true 3283.30 5) BOOLEAN 2801.83 6) "char"='Y' 3290.15 The straight boolean test was the fastest at 2.8 secs, and the TEXT was the slowest at 4.8 secs. Everything else settled in the same pot at 3.25 secs. I wasn't too impressed with any of these times, actually, but I'm bearing in mind that we are talking about an aggregate, which I have learned much about in the last few days from the mailing list, and which I expect to be slow in PG. Since straight-BOOLEAN [not adding "= true" to the condition] is about 15% faster than "char", I will stick with BOOLEAN. My immediate future plans also include recoding my system to never use aggregates in "live" queries. I will be tracking statistics in real time in statistics tables instead of hitting the database with an aggregate. It is much cheaper to add a few milliseconds per insert than to slow my whole system down for several seconds during an aggregate query. In fact, if you have a sizable table, and especially if you are running an OLTP server, unless you are manually investigating something in the table, I recommend never using aggregates. Your SQL queries should be aggregate-free for large tables, if possible. Thanks! -- Matt Mello
pgsql-performance by date: