Re: number of rows estimation for bit-AND operation - Mailing list pgsql-performance

From Slava Moudry
Subject Re: number of rows estimation for bit-AND operation
Date
Msg-id 622F69662CFE9F4182958973F99F3F151515E74E32@EXVMBX017-12.exch017.msoutlookonline.net
Whole thread Raw
In response to Re: number of rows estimation for bit-AND operation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
Hi,
Yes, I thought about putting the bit-flags in separate fields.
Unfortunately - I expect to have quite a lot of these and space is an issue when you are dealing with billions of
recordsin fact table, so I prefer to pack them into one int8. 
For users it's also much easier to write "where mt_flags&134=0" instead of "where f_2=false and f4=false and
f_128=false".
In Teradata - that worked just fine, but it costs millions vs. zero cost for Postgres, so I am not really complaining
outloud :) 

Hopefully Tom or other bright folks at PG could take a look at this for the next patch/release.
Btw, can you send me the link to " PG's selectivity estimator" discussion - I'd like to provide feedback if I can.
Thanks,
-Slava.


-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, August 20, 2009 10:55 AM
To: Scott Marlowe
Cc: Slava Moudry; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] number of rows estimation for bit-AND operation

On Tue, Aug 18, 2009 at 6:34 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> 2009/8/18 Slava Moudry <smoudry@4info.net>:
>>> increase default stats target, analyze, try again.
>> This field has only 5 values. I had put values/frequencies in my first post.
>
> Sorry, kinda missed that.  Anyway, there's no way for pg to know which
> operation is gonna match.  Without an index on it.  So my guess is
> that it just guesses some fixed value.  With an index it might be able
> to get it right, but you'll need an index for each type of match
> you're looking for.  I think.  Maybe someone else on the list has a
> better idea.

The best way to handle this is probably to not cram multiple vales
into a single field.  Just use one boolean for each flag.  It won't
even cost you any space, because right now you are using 8 bytes to
store 5 booleans, and 5 booleans will (I believe) only require 5
bytes.  Even if you were using enough of the bits for the space usage
to be higher with individual booleans, the overall performance is
likely to be better that way.

This is sort of stating the obvious, but it doesn't make it any less
true.  Unfortunately, PG's selectivity estimator can't handle cases
like this.  Tom Lane recently made some noises about trying to improve
it, but it's not clear whether that will go anywhere, and in any event
it won't happen before 8.5.0 comes out next spring/summer.

...Robert

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: number of rows estimation for bit-AND operation
Next
From: Jeff Janes
Date:
Subject: [PERFORMANCE] how to set wal_buffers