Re: KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation) - Mailing list pgsql-performance

From Stefan Keller
Subject Re: KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Date
Msg-id BANLkTikchwYQUBUNuZ7m57B7FPXdhjCg_Q@mail.gmail.com
Whole thread Raw
In response to Re: KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)  (Jim Nasby <jim@nasby.net>)
List pgsql-performance
Hi Jim

You actually made me think about the schema Michel and I are using:

> And KVP is? ;)

CREATE TABLE mykvpstore( id bigint PRIMARY KEY )
CREATE TABLE kvp ( id bigint REFERENCES mykvpstore(id), key text NOT
NULL, value text, );
-- with index on key

And the table with the associative array type (hstore) is:
CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL );
-- with GIST index on obj

It seems to me that in the mykvpstore-kvp there is also some overhead.

And yes, we have no clue what keys to anticipate, except for some
common ones like 'name': The use case is coming from OpenStreetMap
(http://wiki.openstreetmap.org/wiki/Database_schema ).

Yours, Stefan


2011/5/17 Jim Nasby <jim@nasby.net>:
> On May 16, 2011, at 8:47 AM, Merlin Moncure wrote:
>> On Sat, May 14, 2011 at 5:10 AM, Stefan Keller <sfkeller@gmail.com> wrote:
>>> Hi,
>>>
>>> I am conducting a benchmark to compare KVP table vs. hstore and got
>>> bad hstore performance results when the no. of records is greater than
>>> about 500'000.
>>>
>>> CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text );
>>> -- with index on key
>>> CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
>>> -- with GIST index on obj
>>>
>>> Does anyone have experience with that?
>>
>> hstore is not really designed for large-ish sets like that.
>
> And KVP is? ;)
>
> IIRC hstore ends up just storing everything as text, with pointers to know where things start and end. There's no
realindexing inside hstore, so basically the only thing it can do is scan the entire hstore. 
>
> That said, I would strongly reconsider using KVP for anything except the most trivial of data sets. It is *extremely*
inefficient.Do you really have absolutely no idea what *any* of your keys will be? Even if you need to support a
certainamount of non-deterministic stuff, I would put everything you possibly can into real fields and only use KVP or
hstorefor things that you really didn't anticipate. 
>
> Keep in mind that for every *value*, your overhead is 24 bytes for the heap header, 2+ varlena bytes in the heap,
plusthe length of the key. In the index you're looking at 6+ bytes of overhead, 1+ byte for varlena, plus the length of
thekey. The PK will cost you an additional 16-24 bytes, depending on alignment. So that's a *minimum* of ~50 bytes per
value,and realistically the overhead will be closer to 65-70 bytes, *per value*. Unless your values are decent-sized
strings,the overhead is going to be many times larger than the actual data! 
> --
> Jim C. Nasby, Database Architect                   jim@nasby.net
> 512.569.9461 (cell)                         http://jim.nasby.net
>
>
>

pgsql-performance by date:

Previous
From: Jim Nasby
Date:
Subject: Re: KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Next
From: Clemens Eisserer
Date:
Subject: Re: hash semi join caused by "IN (select ...)"