horizontal partition - Mailing list pgsql-performance

From Gaetano Mendola
Subject horizontal partition
Date
Msg-id ctp64h$l4d$1@floppy.pyrenet.fr
Whole thread Raw
Responses Re: horizontal partition  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Hi all,
I have a big table with ~ 10 Milion rows, and is a very
pain administer it, so after years I convinced my self
to partition it and replace the table usage ( only for reading )
with a view.

Now my user_logs table is splitted in 4:

user_logs
user_logs_2002
user_logs_2003
user_logs_2004

and the view v_user_logs is builded on top of these tables:


CREATE OR REPLACE VIEW v_user_logs AS
   SELECT * FROM user_logs
   UNION ALL
   SELECT * FROM user_logs_2002
   UNION ALL
   SELECT * FROM user_logs_2003
   UNION ALL
   SELECT * FROM user_logs_2004
;


the view is performing really well:


empdb=# explain analyze select * from v_user_logs where id_user = sp_id_user('kalman');
                                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Subquery Scan v_user_logs  (cost=0.00..895.45 rows=645 width=88) (actual time=17.039..2345.388 rows=175 loops=1)
    ->  Append  (cost=0.00..892.23 rows=645 width=67) (actual time=17.030..2344.195 rows=175 loops=1)
          ->  Subquery Scan "*SELECT* 1"  (cost=0.00..120.70 rows=60 width=67) (actual time=17.028..17.036 rows=1
loops=1)
                ->  Index Scan using idx_user_user_logs on user_logs  (cost=0.00..120.40 rows=60 width=67) (actual
time=17.012..17.018rows=1 loops=1) 
                      Index Cond: (id_user = 4185)
          ->  Subquery Scan "*SELECT* 2"  (cost=0.00..475.44 rows=316 width=67) (actual time=49.406..1220.400 rows=79
loops=1)
                ->  Index Scan using idx_user_user_logs_2004 on user_logs_2004  (cost=0.00..473.86 rows=316 width=67)
(actualtime=49.388..1219.386 rows=79 loops=1) 
                      Index Cond: (id_user = 4185)
          ->  Subquery Scan "*SELECT* 3"  (cost=0.00..204.33 rows=188 width=67) (actual time=59.375..1068.806 rows=95
loops=1)
                ->  Index Scan using idx_user_user_logs_2003 on user_logs_2003  (cost=0.00..203.39 rows=188 width=67)
(actualtime=59.356..1067.934 rows=95 loops=1) 
                      Index Cond: (id_user = 4185)
          ->  Subquery Scan "*SELECT* 4"  (cost=0.00..91.75 rows=81 width=67) (actual time=37.623..37.623 rows=0
loops=1)
                ->  Index Scan using idx_user_user_logs_2002 on user_logs_2002  (cost=0.00..91.35 rows=81 width=67)
(actualtime=37.618..37.618 rows=0 loops=1) 
                      Index Cond: (id_user = 4185)
  Total runtime: 2345.917 ms
(15 rows)



the problem is now if this view is used in others views like this:



CREATE OR REPLACE VIEW v_ua_user_login_logout_tmp AS
   SELECT
      u.login,
      ul.*
   FROM user_login      u,
        v_user_logs     ul
   WHERE
        u.id_user = ul.id_user
;



empdb=# explain analyze select * from v_ua_user_login_logout_tmp where login = 'kalman';
                                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=4.01..228669.81 rows=173 width=100) (actual time=1544.784..116490.363 rows=175 loops=1)
    Hash Cond: ("outer".id_user = "inner".id_user)
    ->  Subquery Scan ul  (cost=0.00..193326.71 rows=7067647 width=88) (actual time=5.677..108190.096 rows=7067831
loops=1)
          ->  Append  (cost=0.00..157988.47 rows=7067647 width=67) (actual time=5.669..77109.995 rows=7067831 loops=1)
                ->  Subquery Scan "*SELECT* 1"  (cost=0.00..8158.48 rows=362548 width=67) (actual time=5.666..3379.178
rows=362862loops=1) 
                      ->  Seq Scan on user_logs  (cost=0.00..6345.74 rows=362548 width=67) (actual time=5.645..1395.673
rows=362862loops=1) 
                ->  Subquery Scan "*SELECT* 2"  (cost=0.00..93663.88 rows=4191588 width=67) (actual
time=9.149..35094.798rows=4191580 loops=1) 
                      ->  Seq Scan on user_logs_2004  (cost=0.00..72705.94 rows=4191588 width=67) (actual
time=9.117..16531.486rows=4191580 loops=1) 
                ->  Subquery Scan "*SELECT* 3"  (cost=0.00..44875.33 rows=2008233 width=67) (actual
time=0.562..24017.680rows=2008190 loops=1) 
                      ->  Seq Scan on user_logs_2003  (cost=0.00..34834.17 rows=2008233 width=67) (actual
time=0.542..13224.265rows=2008190 loops=1) 
                ->  Subquery Scan "*SELECT* 4"  (cost=0.00..11290.78 rows=505278 width=67) (actual time=7.100..3636.163
rows=505199loops=1) 
                      ->  Seq Scan on user_logs_2002  (cost=0.00..8764.39 rows=505278 width=67) (actual
time=6.446..1474.709rows=505199 loops=1) 
    ->  Hash  (cost=4.00..4.00 rows=1 width=16) (actual time=0.083..0.083 rows=0 loops=1)
          ->  Index Scan using user_login_login_key on user_login u  (cost=0.00..4.00 rows=1 width=16) (actual
time=0.064..0.066rows=1 loops=1) 
                Index Cond: ((login)::text = 'kalman'::text)
  Total runtime: 116491.056 ms
(16 rows)



as you can see the index scan is not used anymore.
Do you see any problem on this approach ?



Regards
Gaetano Mendola
















pgsql-performance by date:

Previous
From: Cosimo Streppone
Date:
Subject: Re: High end server and storage for a PostgreSQL OLTP system
Next
From: William Yu
Date:
Subject: Re: High end server and storage for a PostgreSQL OLTP system