Thread: splitting up tables based on read/write frequency of columns

splitting up tables based on read/write frequency of columns

From

Jonathan Vanasco

Date:

19 January 2015, 21:47:48

This is really a theoretical/anecdotal question, as I'm not at a scale yet where this would measurable.  I want to
investigatewhile this is fresh in my mind... 

I recall reading that unless a row has columns that are TOASTed, an `UPDATE` is essentially an `INSERT + DELETE`, with
theprevious row marked for vacuuming. 

A few of my tables have the following characteristics:
    - The Primary Key has many other tables/columns that FKEY onto it.
    - Many columns (30+) of small data size
    - Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS
    - Some columns (10%) do a bit of internal bookkeeping and are 1 WRITE(UPDATE) for 50 READS

Has anyone done testing/benchmarking on potential efficiency/savings by consolidating the frequent UPDATE columns into
theirown table?

Re: splitting up tables based on read/write frequency of columns

From

Stefan Keller

Date:

19 January 2015, 22:08:03

Hi

I'm pretty sure PostgreSQL can handle this.
But since you asked with a theoretic background,
it's probably worthwhile to look at column stores (like [1]).

-S.

[*] http://citusdata.github.io/cstore_fdw/

2015-01-19 22:47 GMT+01:00 Jonathan Vanasco <postgres@2xlp.com>:
> This is really a theoretical/anecdotal question, as I'm not at a scale yet where this would measurable.  I want to
investigatewhile this is fresh in my mind... 
>
> I recall reading that unless a row has columns that are TOASTed, an `UPDATE` is essentially an `INSERT + DELETE`,
withthe previous row marked for vacuuming. 
>
> A few of my tables have the following characteristics:
>         - The Primary Key has many other tables/columns that FKEY onto it.
>         - Many columns (30+) of small data size
>         - Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS
>         - Some columns (10%) do a bit of internal bookkeeping and are 1 WRITE(UPDATE) for 50 READS
>
> Has anyone done testing/benchmarking on potential efficiency/savings by consolidating the frequent UPDATE columns
intotheir own table? 
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: splitting up tables based on read/write frequency of columns

From

Jonathan Vanasco

Date:

21 January 2015, 15:37:03

On Jan 19, 2015, at 5:07 PM, Stefan Keller wrote:

> Hi
>
> I'm pretty sure PostgreSQL can handle this.
> But since you asked with a theoretic background,
> it's probably worthwhile to look at column stores (like [1]).

Wow. I didn't know there was a column store extension for PG -- this would come in handy for some analytic stuff we
run!

I know that PG can handle my current system "at scale".  I'm really just wondering what the possible
slowdowns/improvementswill be.   

Doing a rewrite of the entire row + updating the various indexes seems to be a lot of unnecessary IO.  At some point it
willmake sense to minimize that and isolate the heavy-write columns from impacting the rest of the table's performance.

Re: splitting up tables based on read/write frequency of columns

From

David G Johnston

Date:

21 January 2015, 17:09:52

Jonathan Vanasco-7 wrote
> This is really a theoretical/anecdotal question, as I'm not at a scale yet
> where this would measurable.  I want to investigate while this is fresh in
> my mind...
>
> I recall reading that unless a row has columns that are TOASTed, an
> `UPDATE` is essentially an `INSERT + DELETE`, with the previous row marked
> for vacuuming.
>
> A few of my tables have the following characteristics:
>     - The Primary Key has many other tables/columns that FKEY onto it.
>     - Many columns (30+) of small data size
>     - Most columns (90%) are 1 WRITE(UPDATE) for 1000 READS
>     - Some columns (10%) do a bit of internal bookkeeping and are 1
> WRITE(UPDATE) for 50 READS
>
> Has anyone done testing/benchmarking on potential efficiency/savings by
> consolidating the frequent UPDATE columns into their own table?

Consider another in-database attempt to mitigate the need to do this
manually:

HOT (heap-only-tuple)

http://pgsql.tapoueh.org/site/html/misc/hot.html

I haven't done any bench-marking but I do currently use this idea to
segregate read-only fields from read-write fields.  Likely there is also a
model reason why these fields have different update frequencies and so can
both logically and physically be partitioned out.

The only harm I see is that it can make using the schema more difficult -
though that can be somewhat mitigated by using (updateable) views in front
of the partitioned tables.

If you can logically partition them I would go for it; if it is a purely
physical concern then I'd probably ponder it for another couple of days and
then go for it anyway.  The main additional thing to ponder is the cost of
additional parsing and the additional join.  Neither is super expensive but
if only doing this for physical reasons you need to evaluate your planned
usage patterns.  With a logical split you are more likely to find situations
where you do not even care about one table or the other and so can avoid the
join entirely.

David J.



--
View this message in context:
http://postgresql.nabble.com/splitting-up-tables-based-on-read-write-frequency-of-columns-tp5834646p5834911.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.