Re: tables with lots of columns - what alternative from - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: tables with lots of columns - what alternative from
Date
Msg-id Pine.GSO.4.63.0512071146320.13553@ra.sai.msu.su
Whole thread Raw
In response to tables with lots of columns - what alternative from performance point of view?  (hubert depesz lubaczewski <depesz@gmail.com>)
Responses Re: tables with lots of columns - what alternative from performance point of view?
List pgsql-general
contrib/hstore will save you.
See http://www.sai.msu.su/~megera/postgres/gist/hstore/README.hstore
for details.

     Oleg
On Wed, 7 Dec 2005, hubert depesz lubaczewski wrote:

> hi
> jus recently there were some thread on postgresql list with people asying :
> i have 700 columns, i have 1000 columns and so on.
> some people, imediatelly responded: change your schema.
> this is what forced to me ask:
>
> i have a situation where i ahve to store a number of "objects" in database.
> all objects have 3 specific attributes (which go into objects table), and
> may have a lot of "custom fields".
> basically - lsit of accessible custom fields for object depends on which
> object-category this object belongs to.
> now.
> i know, i could have written it in this way:
>
> create table object_custom_fields (id serial primary key, object_id int8,
> field_id int8, field_value text);
> but:
> this approach has two very big drawbacks (for me):
> 1. the table cannot differentiate between custom fields of type "date",
> "number" and so on. - everything is stored as text.
> 2. it is rather slow. i have to do a non-unique index scan over
> object_custom_fields, get all records, and pivot it (on the client side of
> curse) to make it usable.
>
> i did it differently, definitelly not nicely, but i dont see any other way
> to get this performance with unknown list of custom fields:
> 1. create table cf_types (id serial, codename text, representation text);
> 2. create table cf_definitions (id serial, category_id int8, type_id int8,
> field-number int4);
> 3. create table cf_values (id serial, object_id int8 (unique),
> ...................................................);
>
> where
> cf_types store information like this:
> id |  codename  | representation
> ----+------------+----------------
>  1 | bool       | boolean
>  2 | integer    | integer
>  3 | number     | number
>  4 | text       | text
>  5 | note       | text
>  6 | date       | date
> ...
> basically - there might be many "types" with the same representation.
> then
> cf_values have a lot of (128 at the moment) fields for all possible
> representations.
> basically it looks like:
> id, object_id, boolean_1 ... boolean_128, integer_1..integer_128, ...
> the datatypes of this fields relate to their content (integer_* fields have
> datatype int8, and so on).
>
> now.
> in cf_definitions i specify, category, field_type_id, and a field-number -
> which relates to _NUMBER in fields in cf_values.
>
> what i did achive is *very* fast retrieval of data for any given object.
> the schema of cf_values table is absolutelly awful, and i will never say
> differently.
> my point is - if somebody (tom lane for example) says - redesign your schema
> - whenever he reads about table with 700 column (i have more :) - then i
> must have missed something absolutelyl simple, fast and elegant. what is
> this?
>
> depesz
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-general by date:

Previous
From: Thomas Hallgren
Date:
Subject: Re: Graphical modelling tool
Next
From: go KEY802207
Date:
Subject: MySQL dump