Re: Bit datatype performance? - Mailing list pgsql-general

From Antonio Vieiro
Subject Re: Bit datatype performance?
Date
Msg-id 4E70F388.7030403@antonioshome.net
Whole thread Raw
In response to Re: Bit datatype performance?  (Radosław Smogura <rsmogura@softperience.eu>)
List pgsql-general
Hi,

Thanks for the tip. Maybe two UUIDs are a best approach. I'll see which
is more performant.

Kind regards,
Antonio

El 14/09/11 19:32, Radosław Smogura escribió:
> Hi,
>
> I think it's not bad approach if performance is important. I don't know
> how b-tree index will work with bitset datatype, but I assume it should
> treat is as 256bit number (maybe someone more competive in internals
> will answer this).
>
> Please bear in mind, that this approach will work well only on query You
> have written.
>
> Because You ask on performance, I will add this topic You may want to
> test and think about it
>
> PG by default uses text transfer mode, so if you transfer your data
> from/to server those will be transferred as 256 0/1 character string.
> You may to think about storing tags as e.g. 4 long (64bit) fields, or 2
> type 4 UUIDs (128bit) and use composite index. If you have ability to
> use binary transfer and on your client side bitest will be mapped to
> some "reasonable" type, then You won, otherwise (in binary mode) you
> should get nice boost when you will store, those values in types I have
> wrote.
>
> Of course those are only some concepts, personally I have never made
> such things.
>
> Regards,
> Radek
>
> On Wed, 14 Sep 2011 17:58:58 +0200, Antonio Vieiro wrote:
>> Hi again,
>>
>> Thanks for the tip. In fact I was thinking of creating an index on
>> the bitmask, so I could use:
>>
>> ... where t.bits = :mymask
>>
>> directly, avoiding a full table scan. I assume this is possible
>> (indexing bit and comparing bits), isn't it?
>>
>> Thanks,
>> Antonio
>>
>> El 14/09/11 15:58, Radosław Smogura escribió:
>>> On Wed, 14 Sep 2011 12:00:35 +0200, Antonio Vieiro wrote:
>>>> Hi all,
>>>>
>>>> One of my entities 'E' may be 'tagged' with an arbitrary set of 256
>>>> tags 'T'.
>>>>
>>>> A first approach could be to add a M:N relationship between 'E' and
>>>> 'T'.
>>>>
>>>> A second way to do this could be to add a BIT(256) datatype to 'E',
>>>> setting bits to '1' if the entity is tagged with each one of the 256
>>>> tags (i.e. using a 'bitmask' on the set of tags).
>>>>
>>>> Since querying entities 'E' with a certain set of tags 'T' must be
>>>> very fast I was wondering if the second approach would be faster. What
>>>> do you think?
>>>>
>>>> Thanks for any hints,
>>>> Antonio
>>>
>>> I assume each entity may have one or more different tags.
>>>
>>> Actually performing test like
>>> ... where (t.bits & :mymask) = :mymask
>>> should be quite fast and faster then creating additional relations, but
>>> only if it's highly probable that your query will almost always scan
>>> whole table.
>>>
>>> The advantage of indexes is that the index is used 1st and tail (slower)
>>> parts of query will always get "subset" of table. In bitset, You will
>>> probably scan whole table.
>>>
>>> So I think, you should do some performance test for large number of
>>> data, and compare both ways. I think bitset will be fast for really
>>> small data, but M:N relations may be faster for really large data sets.
>>>
>>> You need to measure size of your database too, in M:N case with 256 tags
>>> it may be quite large.
>


pgsql-general by date:

Previous
From: Steve Crawford
Date:
Subject: Re: Arrays
Next
From: Brian Fehrle
Date:
Subject: Re: Query performance help with 'shadow table' approach.