Home > mailing lists

Re: designing time dimension for star schema - Mailing list pgsql-general

From	Mark Wong
Subject	Re: designing time dimension for star schema
Date	February 11, 2014 01:08:30
Msg-id	CAE+TzGrsY7ai7VuGpx6RhSVU5QJX0n27iwz6eUPjnogMyqowhg@mail.gmail.com Whole thread
In response to	designing time dimension for star schema (Mark Wong <markwkm@gmail.com>)
Responses	Re: designing time dimension for star schema
List	pgsql-general

Tree view

On Mon, Feb 10, 2014 at 9:20 AM, CS DBA <cs_dba@consistentstate.com> wrote:
> I've done a lot of DSS architecture. A couple of thoughts:
>
> - in most cases the ETL process figures out the time id's as part of the
> preparation and then does bulk loads into the fact tables
>       I would be very concerned about performance of a trigger that
> fired for every row on the fact table
>
>       you mention you want to do data streaming instead of bulk loads,
> can you elaborate?

We have processes inserting data from log files as they are written.

> - When querying a star schema one of the performance features is the
> fact that all joins to the dimension tables are performed via a numeric
> key, such as:
> "select * from fact, time_dim, geo_dim
>    where fact.time_id = time_dim.time_id..."
>
> In the case of this being a timestamp I suspect the performance would
> take a hit, depending on the size of your fact table and the
> scope/volume of your DSS queries this could easily be a show stopper
> based on the assumption that the database can do a numeric binary search
> much faster than a timestamp search

I guess I was hoping the extra 4 bytes from a timestamp, compared to a
bigint, wouldn't be too significant yet I didn't consider postgres
might do a binary search faster on an integer type than a timestamp.
Even with 1 billion rows, but maybe that's wishful thinking.  Maybe a
regular integer at 4 bytes would be good enough.  I would estimate a
query would touch up to an order of 1 million rows at a time.

Regards,
Mark

pgsql-general by date:

From: Marti Raudsepp
Date: 10 February 2014, 23:28:47
Subject: Re: pg_test_fsync: "Invalid argument" in the middle of a test

From: Tom Lane
Date: 11 February 2014, 02:01:21
Subject: Re: designing time dimension for star schema

Re: designing time dimension for star schema - Mailing list pgsql-general

Previous

Next