Re: Partition table in 9.0.x? - Mailing list pgsql-performance

From AJ Weber
Subject Re: Partition table in 9.0.x?
Date
Msg-id 50EC5FC7.7000902@comcast.net
Whole thread Raw
In response to Re: Partition table in 9.0.x?  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Partition table in 9.0.x?
List pgsql-performance
> It does if you use it without an argument, to display all the tables
> in the search path:
>
> jjanes=# \d+
>                           List of relations
>   Schema |       Name       | Type  | Owner  |  Size   | Description
> --------+------------------+-------+--------+---------+-------------
>   public | pgbench_accounts | table | jjanes | 128 MB  |
>   public | pgbench_branches | table | jjanes | 40 kB   |
>   public | pgbench_history  | table | jjanes | 0 bytes |
>   public | pgbench_tellers  | table | jjanes | 40 kB   |
> (4 rows)
>
> It rather annoys me that you actually get less information (no size,
> no owner) when you use \d+ on a named table.  I don't know if there is
> a reason for that feature, or if it was just an oversight.
That is rather peculiar.  Sorry for that.
Table in question is 9284MB
(Parent table is 621MB)

>
> The current constraint exclusion code is quite simple-minded and
> doesn't know how to make use of check constraints that use the mod
> function, so the indexes of all partitions would have to be searched
> for each order_num-driven query, even though we know the data could
> only exist in one of them.  The constraint exclusion codes does
> understand check constraints that involve ranges.
Hmm.  That's a bit of a limitation I didn't know about.  I assume it
doesn't understand the percent (mod operator) just the same as not
understanding the MOD() function?  Either way, I guess this strategy
does not pan-out.
> There could still be some benefit as the table data would be
> concentrated, even if the index data is not.
I'm reaching way, way back in my head, but I think _some_ RDBMS I worked
with previously had a way to "cluster" the rows around a single one of
the indexes on the table, thus putting the index and the row-data
"together" and reducing the number of IO's to retrieve the row if that
index was used.  Am I understanding that PG's "cluster" is strictly to
group like rows together logically -- table data only, not to coordinate
the table row with the index upon which you clustered them?

>
>> and all rows for the same order would stay
>> within the same partition-table.
> But usually a given order_num would only be of interest for a fraction
> of a second before moving on to some other order_num of interest, so
> by the time the relevant partition become fully cached, it would no
> longer be hot.  Or, if the partitions were small enough, you could
> assume that all rows would be dragged into memory when the first one
> was requested because they lay so close to each other.  But it is not
> feasible to have a large enough number of partitions to make that
> happen.  But if the table is clustered, this is exactly what you would
> get--the trouble would be keeping it clustered.   If most of the
> line-items are inserted at the same time as each other, they probably
> should be fairly well clustered to start with.
Does decreasing the fill to like 90 help keep it clustered in-between
times that I could shutdown the app and perform a (re-) cluster on the
overall table?  Problem is, with a table that size, and the hardware I'm
"blessed with", the cluster takes quite a bit of time. :(



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Partition table in 9.0.x?
Next
From: "Midge Brown"
Date:
Subject: Re: Two Necessary Kernel Tweaks for Linux Systems