Re: Column as arrays.. more efficient than columns? - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Column as arrays.. more efficient than columns?
Date
Msg-id b42b73150709070549m2b919184i80e04b1a2e5fa557@mail.gmail.com
Whole thread Raw
In response to Re: Column as arrays.. more efficient than columns?  (Ron Johnson <ron.l.johnson@cox.net>)
Responses Re: Column as arrays.. more efficient than columns?  (Ron Johnson <ron.l.johnson@cox.net>)
List pgsql-general
On 9/7/07, Ron Johnson <ron.l.johnson@cox.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 09/06/07 20:53, Merlin Moncure wrote:
> [snip]
> >
> > arrays are interesting and have some useful problems.  however, we
> > must first discuss the problems...first and foremost if you need to
> > read any particular item off the array you must read the entire array
> > from disk and you must right all items back to disk for writes.
>
> Reads and writes are done at the page level, so I'm not sure this is
> valid.

sure it is...since the denormalized record is much larger (especially
in array scenarios), the tuple is much larger meaning the page will
fill up much more quickly meaning more dead pages, more vacuuming,
etc.   Besides that, the server has to do some work presenting the
array as part of the read which is overhead.  I didn't go into a lot
of detail but the reasoning is sound.  Here is a quick example showing
the problem.


merlin

create table denormalized
(
  data int[]
);

create table normalized
(
  id int primary key,
  datum int
);

insert into normalized select v, v from generate_series(1, 100) v;
insert into denormalized select array(select generate_series(1,100));

create sequence rotator maxvalue 100 cycle;

-- bench denormalized (d.sql) --
update denormalized set data[n] = data[n] + 1 from (select
nextval('rotator') as n) q

merlin@mernix:~$ pgbench -c 4 -t 1000 -f d.sql
starting vacuum...end.
transaction type: Custom query
scaling factor: 1
number of clients: 4
number of transactions per client: 1000
number of transactions actually processed: 4000/4000
tps = 2452.188456 (including connections establishing)
tps = 2465.262905 (excluding connections establishing)

INFO:  "normalized": found 0 removable, 100 nonremovable row versions
in 38 pages


-- bench normalized (n.sql) --
update normalized set datum = datum + 1 where id = (select nextval('rotator'));

merlin@mernix:~$ pgbench -c 4 -t 1000 -f n.sql
starting vacuum...end.
transaction type: Custom query
scaling factor: 1
number of clients: 4
number of transactions per client: 1000
number of transactions actually processed: 4000/4000
tps = 6494.402637 (including connections establishing)
tps = 6594.087741 (excluding connections establishing)

INFO:  "denormalized": found 0 removable, 1 nonremovable row versions
in 223 page

merlin

pgsql-general by date:

Previous
From: Hannes Dorbath
Date:
Subject: Re: Connection Pooling directly on Postgres Server
Next
From: Ron Johnson
Date:
Subject: Re: Column as arrays.. more efficient than columns?