Re: Bit datatype performance? - Mailing list pgsql-general
From | Antonio Vieiro |
---|---|
Subject | Re: Bit datatype performance? |
Date | |
Msg-id | 4E70F388.7030403@antonioshome.net Whole thread Raw |
In response to | Re: Bit datatype performance? (Radosław Smogura <rsmogura@softperience.eu>) |
List | pgsql-general |
Hi, Thanks for the tip. Maybe two UUIDs are a best approach. I'll see which is more performant. Kind regards, Antonio El 14/09/11 19:32, Radosław Smogura escribió: > Hi, > > I think it's not bad approach if performance is important. I don't know > how b-tree index will work with bitset datatype, but I assume it should > treat is as 256bit number (maybe someone more competive in internals > will answer this). > > Please bear in mind, that this approach will work well only on query You > have written. > > Because You ask on performance, I will add this topic You may want to > test and think about it > > PG by default uses text transfer mode, so if you transfer your data > from/to server those will be transferred as 256 0/1 character string. > You may to think about storing tags as e.g. 4 long (64bit) fields, or 2 > type 4 UUIDs (128bit) and use composite index. If you have ability to > use binary transfer and on your client side bitest will be mapped to > some "reasonable" type, then You won, otherwise (in binary mode) you > should get nice boost when you will store, those values in types I have > wrote. > > Of course those are only some concepts, personally I have never made > such things. > > Regards, > Radek > > On Wed, 14 Sep 2011 17:58:58 +0200, Antonio Vieiro wrote: >> Hi again, >> >> Thanks for the tip. In fact I was thinking of creating an index on >> the bitmask, so I could use: >> >> ... where t.bits = :mymask >> >> directly, avoiding a full table scan. I assume this is possible >> (indexing bit and comparing bits), isn't it? >> >> Thanks, >> Antonio >> >> El 14/09/11 15:58, Radosław Smogura escribió: >>> On Wed, 14 Sep 2011 12:00:35 +0200, Antonio Vieiro wrote: >>>> Hi all, >>>> >>>> One of my entities 'E' may be 'tagged' with an arbitrary set of 256 >>>> tags 'T'. >>>> >>>> A first approach could be to add a M:N relationship between 'E' and >>>> 'T'. >>>> >>>> A second way to do this could be to add a BIT(256) datatype to 'E', >>>> setting bits to '1' if the entity is tagged with each one of the 256 >>>> tags (i.e. using a 'bitmask' on the set of tags). >>>> >>>> Since querying entities 'E' with a certain set of tags 'T' must be >>>> very fast I was wondering if the second approach would be faster. What >>>> do you think? >>>> >>>> Thanks for any hints, >>>> Antonio >>> >>> I assume each entity may have one or more different tags. >>> >>> Actually performing test like >>> ... where (t.bits & :mymask) = :mymask >>> should be quite fast and faster then creating additional relations, but >>> only if it's highly probable that your query will almost always scan >>> whole table. >>> >>> The advantage of indexes is that the index is used 1st and tail (slower) >>> parts of query will always get "subset" of table. In bitset, You will >>> probably scan whole table. >>> >>> So I think, you should do some performance test for large number of >>> data, and compare both ways. I think bitset will be fast for really >>> small data, but M:N relations may be faster for really large data sets. >>> >>> You need to measure size of your database too, in M:N case with 256 tags >>> it may be quite large. >
pgsql-general by date: