Re: Could use some advice on search architecture - Mailing list pgsql-general

From Jov
Subject Re: Could use some advice on search architecture
Date
Msg-id CADyrUxNkXo3zNxXf_GpG=acWWJoT4mSK8Q+5MebW6B7Rypxkug@mail.gmail.com
Whole thread Raw
In response to Re: Could use some advice on search architecture  (Dorian Hoxha <dorian.hoxha@gmail.com>)
List pgsql-general

the other is named imcs.
imcs is mainly a memory database,it is very fast for olap,because of multi threads parallel query plan and column storage.for billions of data,most group by like querys return within 2s.
we use it for several weeks and happy for the performance.

jov

在 2014-4-19 下午8:27,"Dorian Hoxha" <dorian.hoxha@gmail.com>写道:
Postgresql has 2 column store, 1-in memory(cant remember the name) and http://www.citusdata.com/blog/76-postgresql-columnar-store-for-analytics


On Sat, Apr 19, 2014 at 2:10 PM, Robin <robinstc@live.co.uk> wrote:
bottom post
On 19/04/2014 12:46, R. Pasch wrote:
On 19-4-2014 9:38, Robin wrote:

Well, given that there are known limited attributes, this is the type of application that really really suits a column oriented database, such as Sybase IQ (now sold by SAP). Its a neat product that scales. Great performance with drag'n'drop analytics.

Unless you can charm IQ out of SAP (it has been known to happen), you might have to look at some other techniques

So consider some binary data representation
Red - 1 (0000 0001)
Orange - 2 (0000 0010)
Yellow - 4 (0000 0100)
Green - 8 (0000 1000)
Blue - 16 (0001 0000)
Indigo - 32 (0010 0000)
Violet - 64 (0100 0000)

This way, you can encode several colours in 1 value
Red or Green or Indigo = 1 + 8  + 32 = 41 = 0010 1001


Robin


I stopped reading when I heard the word "sold by SAP" ;-) This project is solely build with open-source and freely available software.

I've been thinking about using a binary data representation but didn't come to a solution to this specific problem quite yet. Per property of a product, only one bit would be 1 and the rest would be 0. What would a query look like to match all products that have a bit in the correct position?

Say for instance these are a couple records (and yes, property values can be null as well)

title, property1, property2, property3
================================
product1, 0000 0001, 0000 0010, NULL
product2, 0000 0100, 0100 0000, 0010 0000
product3, 0010 0000, 0010 0000, 0100 0000

Say that I would like to retrieve the products that either have property1 as 0010 0000, 1000 000 or 0000 0001. Combined that would be 0010 1001 and would have to match product1 and product3 as they both have their individual bit matching one of the bits being asked for. What would a where statement look like using this type of binary representation?

If that would be fairly simple to do and fast (most important factor) then I could do an OR construction on all property columns and have something count the amount of properties that actually matched. Is that something you can do with a binary operator of some sort as well? Count the amount of overlapping bits?

Say for instance I have a binary value of 0110 0101 and another binary value of 1100 0100, how could I found out how many bits matched? (in this case the number of matching bits would be 2)


I understand the reluctance to pay SAP-style rates, as a longtime DB user, I have learned some 'charm' techniques.

However, I poked around a bit for alternatives, as I do like the column-oriented approach, and found something called - MonetDB - it apparently has a column-store db kernel, and is open source - I suggest you have a look, if it does what it says on the label, then it looks like a find.

There is a discussion of bitmask-trickiness here also dealing with colours

Robin



pgsql-general by date:

Previous
From: Sergey Konoplev
Date:
Subject: Re: [PERFORM] Hot standby 9.2.1 PANIC: WAL contains references to invalid pages
Next
From: Sergey Konoplev
Date:
Subject: Re: Disable an index temporarily