Thread: number of rows estimation for bit-AND operation
Hi,
I am using int8 field to pack a number of error flags. This is very common technique for large tables to pack multiple flags in one integer field.
For most records – the mt_flags field is 0. Here is the statistics (taken from pgAdmin Statistics tab for mt_flags column):
Most common Values: {0,128,2,4,8)
Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029)
What I notice that when bit-AND function is used – Postgres significantly underestimates the amount of rows:
explain analyze select count(*) from mt__20090801 where mt_flags&8=0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=83054.43..83054.44 rows=1 width=0) (actual time=2883.154..2883.154 rows=1 loops=1)
-> Seq Scan on mt__20090801 (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.008..2100.390 rows=2439435 loops=1)
Filter: ((mt_flags & 8) = 0)
Total runtime: 2883.191 ms
(4 rows)
This is not an issue for the particular query above, but I noticed that due to that miscalculation in many cases Postgres chooses plan with Nested Loops for other queries. I can fix it by setting enable_nest_loops to off, but it's not something I should set for all queries.
Is there any way to help Postgres make a better estimation for number of rows returned by bit function?
Thanks,
-Slava Moudry, Senior DW Engineer. 4Info Inc.
P.S. table definition:
\d mt__20090801
Table "dw.mt__20090801"
Column | Type | Modifiers
--------------------------+-----------------------------+-----------
mt_id | bigint | not null
mt_ts | timestamp without time zone |
ad_cost | numeric(10,5) |
short_code | integer |
message_id | bigint | not null
mp_code | character(1) | not null
al_id | integer | not null
cust_id | integer |
device_id | integer | not null
broker_id | smallint |
partner_id | integer |
ad_id | integer |
keyword_id | integer |
sc_id | integer |
cp_id | integer |
src_alertlog_id | bigint |
src_query_id | bigint |
src_response_message_num | smallint |
src_gateway_message_id | bigint |
mt_flags | integer |
message_length | integer | not null
created_etl | timestamp without time zone |
Indexes:
"mt_device_id__20090801" btree (device_id) WITH (fillfactor=100), tablespace "index2"
"mt_ts__20090801" btree (mt_ts) WITH (fillfactor=100) CLUSTER, tablespace "index2"
Check constraints:
"mt__20090801_mt_ts_check" CHECK (mt_ts >= '2009-08-01 00:00:00'::timestamp without time zone AND mt_ts < '2009-08-02 00:00:00'::timestamp without time
zone)
Inherits: mt
Tablespace: "dw_tables3"
On Mon, Aug 17, 2009 at 2:07 PM, Slava Moudry<smoudry@4info.net> wrote: > Hi, > > I am using int8 field to pack a number of error flags. This is very common > technique for large tables to pack multiple flags in one integer field. > > For most records – the mt_flags field is 0. Here is the statistics (taken > from pgAdmin Statistics tab for mt_flags column): > > Most common Values: {0,128,2,4,8) > > Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029) > > What I notice that when bit-AND function is used – Postgres significantly > underestimates the amount of rows: > > explain analyze select count(*) from mt__20090801 where mt_flags&8=0; > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > > Aggregate (cost=83054.43..83054.44 rows=1 width=0) (actual > time=2883.154..2883.154 rows=1 loops=1) > > -> Seq Scan on mt__20090801 (cost=0.00..83023.93 rows=12200 width=0) > (actual time=0.008..2100.390 rows=2439435 loops=1) > > Filter: ((mt_flags & 8) = 0) > > Total runtime: 2883.191 ms > > (4 rows) > > This is not an issue for the particular query above, but I noticed that due > to that miscalculation in many cases Postgres chooses plan with Nested Loops > for other queries. I can fix it by setting enable_nest_loops to off, but > it's not something I should set for all queries. > > Is there any way to help Postgres make a better estimation for number of > rows returned by bit function? You can index on the function. For instance: create table t (mt_flags int); create index t_mtflags_bit on t ((mt_flags&8)); insert into t select case when random() > 0.95 then case when random() >0.5 then 8 else 12 end else 0 end from generate_series(1,10000); analyze t; explain select * from t where mt_flags&8=8; QUERY PLAN -------------------------------------------------------------------------- Index Scan using t_mtflags_bit on t (cost=0.00..52.17 rows=467 width=4) Index Cond: ((mt_flags & 8) = 8) (2 rows) Hope that helps a little.
2009/8/18 Slava Moudry <smoudry@4info.net>: > Hi Scott, > Thank you for reply. > I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your advice: Yeah, you're returning most of the rows, so a seq scan makes sense. Try indexing / matching on something more uncommon and you should get an index scan. > The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition. > It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below itis again 200x off on number of rows. increase default stats target, analyze, try again. > explain analyze select count(*) from staging.tmp_t where mt_flags&134=0; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1) > -> Seq Scan on tmp_t (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1) > Filter: ((mt_flags & 134) = 0) > Total runtime: 2965.009 ms > (4 rows) > > I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x (nowusing 8.4.0). > We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing thisin reporting applications. Looks more like a low stats target. Try increasing that first.
2009/8/18 Slava Moudry <smoudry@4info.net>: >> increase default stats target, analyze, try again. > This field has only 5 values. I had put values/frequencies in my first post. Sorry, kinda missed that. Anyway, there's no way for pg to know which operation is gonna match. Without an index on it. So my guess is that it just guesses some fixed value. With an index it might be able to get it right, but you'll need an index for each type of match you're looking for. I think. Maybe someone else on the list has a better idea.
Hi Scott, Thank you for reply. I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your advice: create index t_mtflags_bit on staging.tmp_t ((mt_flags&8)); analyze staging.tmp_t; explain analyze select count(*) from staging.tmp_t where mt_flags&8=0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=89122.78..89122.79 rows=1 width=0) (actual time=2994.970..2994.971 rows=1 loops=1) -> Seq Scan on tmp_t (cost=0.00..83023.93 rows=2439541 width=0) (actual time=0.012..2161.886 rows=2439435 loops=1) Filter: ((mt_flags & 8) = 0) Total runtime: 2995.017 ms (4 rows) The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition. It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below it isagain 200x off on number of rows. explain analyze select count(*) from staging.tmp_t where mt_flags&134=0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Aggregate (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1) -> Seq Scan on tmp_t (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1) Filter: ((mt_flags & 134) = 0) Total runtime: 2965.009 ms (4 rows) I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x (nowusing 8.4.0). We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing this inreporting applications. Thanks, -Slava Moudry. -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Tuesday, August 18, 2009 12:09 AM To: Slava Moudry Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] number of rows estimation for bit-AND operation On Mon, Aug 17, 2009 at 2:07 PM, Slava Moudry<smoudry@4info.net> wrote: > Hi, > > I am using int8 field to pack a number of error flags. This is very common > technique for large tables to pack multiple flags in one integer field. > > For most records - the mt_flags field is 0. Here is the statistics (taken > from pgAdmin Statistics tab for mt_flags column): > > Most common Values: {0,128,2,4,8) > > Most common Frequencies: {0.96797,0.023,0.0076,0.0005,0.00029) > > What I notice that when bit-AND function is used - Postgres significantly > underestimates the amount of rows: > > explain analyze select count(*) from mt__20090801 where mt_flags&8=0; > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > > Aggregate (cost=83054.43..83054.44 rows=1 width=0) (actual > time=2883.154..2883.154 rows=1 loops=1) > > -> Seq Scan on mt__20090801 (cost=0.00..83023.93 rows=12200 width=0) > (actual time=0.008..2100.390 rows=2439435 loops=1) > > Filter: ((mt_flags & 8) = 0) > > Total runtime: 2883.191 ms > > (4 rows) > > This is not an issue for the particular query above, but I noticed that due > to that miscalculation in many cases Postgres chooses plan with Nested Loops > for other queries. I can fix it by setting enable_nest_loops to off, but > it's not something I should set for all queries. > > Is there any way to help Postgres make a better estimation for number of > rows returned by bit function? You can index on the function. For instance: create table t (mt_flags int); create index t_mtflags_bit on t ((mt_flags&8)); insert into t select case when random() > 0.95 then case when random() >0.5 then 8 else 12 end else 0 end from generate_series(1,10000); analyze t; explain select * from t where mt_flags&8=8; QUERY PLAN -------------------------------------------------------------------------- Index Scan using t_mtflags_bit on t (cost=0.00..52.17 rows=467 width=4) Index Cond: ((mt_flags & 8) = 8) (2 rows) Hope that helps a little.
> increase default stats target, analyze, try again. This field has only 5 values. I had put values/frequencies in my first post. Based on the values (see below) - there is no reason for planner to think that mt_flags&134=0 should return 12200 rows. select mt_flags, count(*) from staging.tmp_t group by 1; mt_flags | count ----------+--------- 128 | 57362 4 | 1371 8 | 627 2 | 19072 0 | 2361630 (5 rows) In fact, if I rewrite the query using value matching - the estimations are right on: explain analyze select count(*) from staging.tmp_t where mt_flags not in (128,2,4); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=85878.63..85878.64 rows=1 width=0) (actual time=2904.005..2904.005 rows=1 loops=1) -> Seq Scan on tmp_t (cost=0.00..79973.85 rows=**2361910** width=0) (actual time=0.008..2263.983 rows=2362257 loops=1) Filter: (mt_flags <> ALL ('{128,2,4}'::integer[])) Total runtime: 2904.038 ms (4 rows) Anyways, I've been using statistics target of 100 in 8.3 and in 8.4 100 is default. I am currently using default_statistics_target=1000. Do you think that bit-and function might be skewing the statistics for execution plan somehow? Thanks, -Slava. -----Original Message----- From: Scott Marlowe [mailto:scott.marlowe@gmail.com] Sent: Tuesday, August 18, 2009 2:58 PM To: Slava Moudry Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] number of rows estimation for bit-AND operation 2009/8/18 Slava Moudry <smoudry@4info.net>: > Hi Scott, > Thank you for reply. > I am using Postgres 8.4.0 (btw - great release --very happy about it) and I got a different plan after following your advice: Yeah, you're returning most of the rows, so a seq scan makes sense. Try indexing / matching on something more uncommon and you should get an index scan. > The seq scan is OK, since I don't expect Postgres to use index scan for such low-selective condition. > It would be tough for me to support indexes for each bit flag value and their combinations. E.g. in the query below itis again 200x off on number of rows. increase default stats target, analyze, try again. > explain analyze select count(*) from staging.tmp_t where mt_flags&134=0; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=83054.43..83054.44 rows=1 width=0) (actual time=2964.960..2964.960 rows=1 loops=1) > -> Seq Scan on tmp_t (cost=0.00..83023.93 rows=12200 width=0) (actual time=0.014..2152.031 rows=2362257 loops=1) > Filter: ((mt_flags & 134) = 0) > Total runtime: 2965.009 ms > (4 rows) > > I still wonder if it's something I could/should report as a bug? I've been struggling with this issue in 8.2, 8.3.x (nowusing 8.4.0). > We can more or less work around this by disabling nestloop in our analytics queries but I have problems enforcing thisin reporting applications. Looks more like a low stats target. Try increasing that first.
On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > 2009/8/18 Slava Moudry <smoudry@4info.net>: >>> increase default stats target, analyze, try again. >> This field has only 5 values. I had put values/frequencies in my first post. > > Sorry, kinda missed that. Anyway, there's no way for pg to know which > operation is gonna match. Without an index on it. So my guess is > that it just guesses some fixed value. With an index it might be able > to get it right, but you'll need an index for each type of match > you're looking for. I think. Maybe someone else on the list has a > better idea. The best way to handle this is probably to not cram multiple vales into a single field. Just use one boolean for each flag. It won't even cost you any space, because right now you are using 8 bytes to store 5 booleans, and 5 booleans will (I believe) only require 5 bytes. Even if you were using enough of the bits for the space usage to be higher with individual booleans, the overall performance is likely to be better that way. This is sort of stating the obvious, but it doesn't make it any less true. Unfortunately, PG's selectivity estimator can't handle cases like this. Tom Lane recently made some noises about trying to improve it, but it's not clear whether that will go anywhere, and in any event it won't happen before 8.5.0 comes out next spring/summer. ...Robert
2009/8/20 Slava Moudry <smoudry@4info.net>: > Hi, > Yes, I thought about putting the bit-flags in separate fields. > Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of recordsin fact table, so I prefer to pack them into one int8. For giggles I created two test tables, one with a single int, one with 8 bools, and put 100M entries in each. The table with 8 bools took up aprrox. 3560616 bytes, while the one with a single int took up approx. 3544212 I.e they're about the same. You should really test to see if having a lot of bools costs more than mangling ints around. I'm guessing I could fit a lot more bools in the test table due to alignment issues than just 8.
On Thu, Aug 20, 2009 at 7:32 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > 2009/8/20 Slava Moudry <smoudry@4info.net>: >> Hi, >> Yes, I thought about putting the bit-flags in separate fields. >> Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of recordsin fact table, so I prefer to pack them into one int8. > > For giggles I created two test tables, one with a single int, one with > 8 bools, and put 100M entries in each. The table with 8 bools took up > aprrox. 3560616 bytes, while the one with a single int took up approx. > 3544212 > > I.e they're about the same. You should really test to see if having a > lot of bools costs more than mangling ints around. I'm guessing I > could fit a lot more bools in the test table due to alignment issues > than just 8. So, I made a table with 26 bool fields, and added 100M rows to it, and that table took up about 5906028 bytes. So yea, the storage is greater for boolean fields, but only if they aren't null. making them null would save a lot of space, so if null bits fit your model, then it might be worth looking into. Certainly they're not so much bigger as to be unmanageable.
Hi, Yes, I thought about putting the bit-flags in separate fields. Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of recordsin fact table, so I prefer to pack them into one int8. For users it's also much easier to write "where mt_flags&134=0" instead of "where f_2=false and f4=false and f_128=false". In Teradata - that worked just fine, but it costs millions vs. zero cost for Postgres, so I am not really complaining outloud :) Hopefully Tom or other bright folks at PG could take a look at this for the next patch/release. Btw, can you send me the link to " PG's selectivity estimator" discussion - I'd like to provide feedback if I can. Thanks, -Slava. -----Original Message----- From: Robert Haas [mailto:robertmhaas@gmail.com] Sent: Thursday, August 20, 2009 10:55 AM To: Scott Marlowe Cc: Slava Moudry; pgsql-performance@postgresql.org Subject: Re: [PERFORM] number of rows estimation for bit-AND operation On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > 2009/8/18 Slava Moudry <smoudry@4info.net>: >>> increase default stats target, analyze, try again. >> This field has only 5 values. I had put values/frequencies in my first post. > > Sorry, kinda missed that. Anyway, there's no way for pg to know which > operation is gonna match. Without an index on it. So my guess is > that it just guesses some fixed value. With an index it might be able > to get it right, but you'll need an index for each type of match > you're looking for. I think. Maybe someone else on the list has a > better idea. The best way to handle this is probably to not cram multiple vales into a single field. Just use one boolean for each flag. It won't even cost you any space, because right now you are using 8 bytes to store 5 booleans, and 5 booleans will (I believe) only require 5 bytes. Even if you were using enough of the bits for the space usage to be higher with individual booleans, the overall performance is likely to be better that way. This is sort of stating the obvious, but it doesn't make it any less true. Unfortunately, PG's selectivity estimator can't handle cases like this. Tom Lane recently made some noises about trying to improve it, but it's not clear whether that will go anywhere, and in any event it won't happen before 8.5.0 comes out next spring/summer. ...Robert
On Thu, Aug 20, 2009 at 9:58 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > On Thu, Aug 20, 2009 at 7:32 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: >> 2009/8/20 Slava Moudry <smoudry@4info.net>: >>> Hi, >>> Yes, I thought about putting the bit-flags in separate fields. >>> Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of recordsin fact table, so I prefer to pack them into one int8. >> >> For giggles I created two test tables, one with a single int, one with >> 8 bools, and put 100M entries in each. The table with 8 bools took up >> aprrox. 3560616 bytes, while the one with a single int took up approx. >> 3544212 >> >> I.e they're about the same. You should really test to see if having a >> lot of bools costs more than mangling ints around. I'm guessing I >> could fit a lot more bools in the test table due to alignment issues >> than just 8. > > So, I made a table with 26 bool fields, and added 100M rows to it, and > that table took up about 5906028 bytes. So yea, the storage is > greater for boolean fields, but only if they aren't null. making them > null would save a lot of space, so if null bits fit your model, then > it might be worth looking into. Certainly they're not so much bigger > as to be unmanageable. This is a clever idea. Tables with any non-null columns have a null bitmap with 1 bit per field, followed by the actual values of the non-null fields. So if the OP arranges to use true and null as the values instead of true and false, and uses null for the flag value that is most often wanted, it will pack down pretty tight. Scott, did you check whether a toast table got created here and what the size of it was? ...Robert
Robert Haas escribió: > Scott, did you check whether a toast table got created here and what > the size of it was? A table with only bool columns (and, say, one int8 column) would not have a toast table. Only varlena columns produce toast tables. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, Sorry I don't understand how the numbers came so low. If you assume that 8 boolean fields take 1 byte each, so for 100M table it will be 800M bytes. How did your table fit in 3560616 bytes ? Using postgres 8.4.0 on Linux x64: create table staging.tmp_t1(a1 boolean, a2 boolean, a3 boolean, a4 boolean, a5 boolean, a6 boolean, a7 boolean, a8 boolean) tablespace stage3; insert into staging.tmp_t1 select 1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean,1:boolean from generate_series(1,100000000); select pg_total_relation_size('staging.tmp_t1'); pg_total_relation_size ------------------------ 3,625,689,088 (1 row) The table with 16 booleans took just 766MB more, so the growth appears to be non-linear. Most likely Postgres does some compression.. I can't tell for sure without looking into source code. Anyway, given that int8 can accomodate 64 flags - the space saving can be substantial. Thanks, -Slava. ----- Original Message ----- From: "Scott Marlowe" <scott.marlowe@gmail.com> To: "Slava Moudry" <smoudry@4info.net> Cc: "Robert Haas" <robertmhaas@gmail.com>; <pgsql-performance@postgresql.org> Sent: Thursday, August 20, 2009 6:58 PM Subject: Re: [PERFORM] number of rows estimation for bit-AND operation On Thu, Aug 20, 2009 at 7:32 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > 2009/8/20 Slava Moudry <smoudry@4info.net>: >> Hi, >> Yes, I thought about putting the bit-flags in separate fields. >> Unfortunately - I expect to have quite a lot of these and space is an >> issue when you are dealing with billions of records in fact table, so I >> prefer to pack them into one int8. > > For giggles I created two test tables, one with a single int, one with > 8 bools, and put 100M entries in each. The table with 8 bools took up > aprrox. 3560616 bytes, while the one with a single int took up approx. > 3544212 > > I.e they're about the same. You should really test to see if having a > lot of bools costs more than mangling ints around. I'm guessing I > could fit a lot more bools in the test table due to alignment issues > than just 8. So, I made a table with 26 bool fields, and added 100M rows to it, and that table took up about 5906028 bytes. So yea, the storage is greater for boolean fields, but only if they aren't null. making them null would save a lot of space, so if null bits fit your model, then it might be worth looking into. Certainly they're not so much bigger as to be unmanageable.