Home > mailing lists

Real vs Int performance - Mailing list pgsql-performance

From	David Greco
Subject	Real vs Int performance
Date	January 26, 2011 17:27:32
Msg-id	CDFA9340E95A764E9366B4EDF3A43125BCACF7E68D@VA3DIAXVS091.RED001.local Whole thread
Responses	Re: Real vs Int performance Re: Real vs Int performance
List	pgsql-performance

Tree view

New to Postgres and am prototyping a migration from Oracle to Postgres 9.0.1 on Linux. Starting with the data warehouse. Current process is to load the data from

our OLTP (Oracle), dump it into another instance of Oracle for staging and manipulation, then extract it and load it into Infobright. I am trying

to replace the Oracle instance used for staging and manipulation with Postgres. Kettle (PDI), a Java ETL tool, is used for this process.

Came across a problem I find perplexing. I recreated the dimensional tables in Oracle and the fields that are integers in Oracle became integers

in Postgres. Was experiencing terrible performance during the load and narrowed down to a particular dimensional lookup problem. The table

dim_carrier holds about 80k rows. You can see the actual query issued by Kettle below, but basically I am looking up using the business key from

our OLTP system. This field is carrier_source_id and is indexed as you can see below. If I change this field from an integer to a real, I get

about a 70x increase in performance of the query. The EXPLAIN ANALYZE output is nearly identical, except for the casting of 1 to a real when the column

is a real. In real life, this query is actually bound and parameterized, but I wished to simplify things a bit here (and don't yet know how to EXPLAIN ANALYZE a parameterized

query). Now in terms of actual performance, the same query executed about 25k times takes 7 seconds with the real column, and 500 seconds with the integer column.

What gives here? Seems like integer (or serial) is a pretty common choice for primary key columns, and therefore what I'm experiencing must be an anomoly.

Table "hits_olap.dim_carrier"

Column | Type | Modifiers

-------------------+-----------------------------+-----------

carrier_id | integer | not null

dim_version | smallint |

dim_effect_date | timestamp without time zone |

dim_expire_date | timestamp without time zone |

carrier_source_id | integer |

carrier_name | character varying(30) |

carrier_type | character varying(30) |

carrier_scac | character varying(4) |

carrier_currency | character varying(3) |

current_row | smallint | default 0

Indexes:

"dim_carrier_pk" PRIMARY KEY, btree (carrier_id)

"idx_dim_carrier_lookup" btree (carrier_source_id)

VACUUM

ANALYZE

REINDEX

EXPLAIN ANALYZE SELECT CARRIER_ID, DIM_VERSION FROM HITS_OLAP.DIM_CARRIER WHERE CARRIER_SOURCE_ID = '1' AND now() >= DIM_EFFECT_DATE

AND now() < DIM_EXPIRE_DATE;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

Index Scan using idx_dim_carrier_lookup on dim_carrier (cost=0.00..12.10 rows=2 width=6) (actual time=0.076..0.077 rows=1 loops=1)

Index Cond: (carrier_source_id = 1)

Filter: ((now() >= dim_effect_date) AND (now() < dim_expire_date)) Total runtime: 0.108 ms

(4 rows)

ALTER TABLE

Table "hits_olap.dim_carrier"

Column | Type | Modifiers

-------------------+-----------------------------+-----------

carrier_id | integer | not null

dim_version | smallint |

dim_effect_date | timestamp without time zone |

dim_expire_date | timestamp without time zone |

carrier_source_id | real |

carrier_name | character varying(30) |

carrier_type | character varying(30) |

carrier_scac | character varying(4) |

carrier_currency | character varying(3) |

current_row | smallint | default 0

Indexes:

"dim_carrier_pk" PRIMARY KEY, btree (carrier_id)

"idx_dim_carrier_lookup" btree (carrier_source_id)

VACUUM

ANALYZE

REINDEX

EXPLAIN ANALYZE SELECT CARRIER_ID, DIM_VERSION FROM HITS_OLAP.DIM_CARRIER WHERE CARRIER_SOURCE_ID = '1' AND now() >= DIM_EFFECT_DATE

AND now() < DIM_EXPIRE_DATE;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

Index Scan using idx_dim_carrier_lookup on dim_carrier (cost=0.00..12.10 rows=2 width=6) (actual time=0.068..0.069 rows=1 loops=1)

Index Cond: (carrier_source_id = 1::real)

Filter: ((now() >= dim_effect_date) AND (now() < dim_expire_date)) Total runtime: 0.097 ms

(4 rows)

Thanks for the help,

Dave Greco

pgsql-performance by date:

From: Shaun Thomas
Date: 26 January 2011, 13:16:49
Subject: Re: FW: Queries becoming slow under heavy load

From: "Kevin Grittner"
Date: 26 January 2011, 17:53:04
Subject: Re: Real vs Int performance

Real vs Int performance - Mailing list pgsql-performance

Previous

Next