Home > mailing lists

Thread: : Tracking Full Table Scans

: Tracking Full Table Scans

From

Venkat Balaji

Date:

27 September 2011, 09:36:11

Hello Everyone,

I am preparing a plan to track the tables undergoing Full Table Scans for most number of times.

If i track seq_scan from the pg_stat_user_tables, will that help (considering the latest analyzed ones) ?

Please help !

Thanks

Re: : Tracking Full Table Scans

From

"Kevin Grittner"

Date:

27 September 2011, 11:32:36

Venkat Balaji <venkat.balaji@verse.in> wrote:

> I am preparing a plan to track the tables undergoing Full Table
> Scans for most number of times.
>
> If i track seq_scan from the pg_stat_user_tables, will that help
> (considering the latest analyzed ones) ?

Well, yeah; but be careful not to assume that a sequential scan is
always a bad thing.  Here's our top ten tables for sequential scans
in a database which is performing quite well:

cc=> select seq_scan, n_live_tup, relname
cc->   from pg_stat_user_tables
cc->   order by seq_scan desc
cc->   limit 10;
 seq_scan | n_live_tup |      relname
----------+------------+--------------------
 81264339 |         20 | MaintCode
 16840299 |          3 | DbTranImageStatus
 14905181 |         18 | ControlFeature
 11908114 |         10 | AgingBoundary
  8789288 |         22 | CtofcTypeCode
  7786110 |          6 | PrefCounty
  6303959 |          9 | ProtOrderHistEvent
  5835430 |          1 | ControlRecord
  5466806 |          1 | ControlAccounting
  5202028 |         12 | ProtEventOrderType
(10 rows)

You'll notice that they are all very small tables.  In all cases the
entire heap fits in one page, so any form of indexed scan would at
least double the number of pages visited, and slow things down.

If you have queries which are not performing to expectations, your
best bet might be to pick one of them and post it here, following
the advice on this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: : Tracking Full Table Scans

From

Venkat Balaji

Date:

27 September 2011, 13:35:13

Thanks a lot Kevin !!

Yes. I intended to track full table scans first to ensure that only small tables or tables with very less pages are (as you said) getting scanned full.

I am yet to identify slow running queries. Will surely hit back with them in future.

Thanks

On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Venkat Balaji <venkat.balaji@verse.in> wrote:

> I am preparing a plan to track the tables undergoing Full Table
> Scans for most number of times.
>
> If i track seq_scan from the pg_stat_user_tables, will that help
> (considering the latest analyzed ones) ?

Well, yeah; but be careful not to assume that a sequential scan is
always a bad thing. Here's our top ten tables for sequential scans
in a database which is performing quite well:

cc=> select seq_scan, n_live_tup, relname
cc-> from pg_stat_user_tables
cc-> order by seq_scan desc
cc-> limit 10;
seq_scan | n_live_tup | relname
----------+------------+--------------------
81264339 | 20 | MaintCode
16840299 | 3 | DbTranImageStatus
14905181 | 18 | ControlFeature
11908114 | 10 | AgingBoundary
8789288 | 22 | CtofcTypeCode
7786110 | 6 | PrefCounty
6303959 | 9 | ProtOrderHistEvent
5835430 | 1 | ControlRecord
5466806 | 1 | ControlAccounting
5202028 | 12 | ProtEventOrderType
(10 rows)

You'll notice that they are all very small tables. In all cases the
entire heap fits in one page, so any form of indexed scan would at
least double the number of pages visited, and slow things down.

If you have queries which are not performing to expectations, your
best bet might be to pick one of them and post it here, following
the advice on this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: : Tracking Full Table Scans

From

Venkat Balaji

Date:

27 September 2011, 13:49:57

I would like to know the difference between "n_tup_upd" and "n_tup_hot_upd".

Thanks

On Tue, Sep 27, 2011 at 9:56 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:

Thanks a lot Kevin !!

Yes. I intended to track full table scans first to ensure that only small tables or tables with very less pages are (as you said) getting scanned full.

I am yet to identify slow running queries. Will surely hit back with them in future.

Thanks
VB

On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> I am preparing a plan to track the tables undergoing Full Table
> Scans for most number of times.
>
> If i track seq_scan from the pg_stat_user_tables, will that help
> (considering the latest analyzed ones) ?

Well, yeah; but be careful not to assume that a sequential scan is
always a bad thing. Here's our top ten tables for sequential scans
in a database which is performing quite well:

cc=> select seq_scan, n_live_tup, relname
cc-> from pg_stat_user_tables
cc-> order by seq_scan desc
cc-> limit 10;
seq_scan | n_live_tup | relname
----------+------------+--------------------
81264339 | 20 | MaintCode
16840299 | 3 | DbTranImageStatus
14905181 | 18 | ControlFeature
11908114 | 10 | AgingBoundary
8789288 | 22 | CtofcTypeCode
7786110 | 6 | PrefCounty
6303959 | 9 | ProtOrderHistEvent
5835430 | 1 | ControlRecord
5466806 | 1 | ControlAccounting
5202028 | 12 | ProtEventOrderType
(10 rows)

You'll notice that they are all very small tables. In all cases the
entire heap fits in one page, so any form of indexed scan would at
least double the number of pages visited, and slow things down.

If you have queries which are not performing to expectations, your
best bet might be to pick one of them and post it here, following
the advice on this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: : Tracking Full Table Scans

From

"Kevin Grittner"

Date:

27 September 2011, 14:15:59

Venkat Balaji <venkat.balaji@verse.in> wrote:

> I would like to know the difference between "n_tup_upd" and
> "n_tup_hot_upd".

A HOT update is used when none of the updated columns are used in an
index and there is room for the new tuple (version of the row) on
the same page as the old tuple.  This is faster for a number of
reasons, and cleanup of the old tuple is a little different.

If you want the gory implementation details, take a look at this
file in the source tree:

src/backend/access/heap/README.HOT

-Kevin

Re: : Tracking Full Table Scans

From

Craig Ringer

Date:

27 September 2011, 21:55:24

On 09/28/2011 12:26 AM, Venkat Balaji wrote:
> Thanks a lot Kevin !!
>
> Yes. I intended to track full table scans first to ensure that only
> small tables or tables with very less pages are (as you said) getting
> scanned full.

It can also be best to do a full table scan of a big table for some
queries. If the query needs to touch all the data in a table - for
example, for an aggregate - then the query will often complete fastest
and with less disk use by using a sequential scan.

I guess what you'd really want to know is to find out about queries that
do seqscans to match relatively small fractions of the total tuples
scanned, ie low-selectivity seqscans. I'm not sure whether or not it's
possible to gather this data with PostgreSQL's current level of stats
detail.

--
Craig Ringer

Re: : Tracking Full Table Scans

From

Venkat Balaji

Date:

28 September 2011, 03:09:59

Thanks Kevin !!

I will have a look at the source tree.

Regards

On Tue, Sep 27, 2011 at 10:45 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Venkat Balaji <venkat.balaji@verse.in> wrote:

> I would like to know the difference between "n_tup_upd" and
> "n_tup_hot_upd".

A HOT update is used when none of the updated columns are used in an
index and there is room for the new tuple (version of the row) on
the same page as the old tuple. This is faster for a number of
reasons, and cleanup of the old tuple is a little different.

If you want the gory implementation details, take a look at this
file in the source tree:

src/backend/access/heap/README.HOT

-Kevin

Re: : Tracking Full Table Scans

From

Venkat Balaji

Date:

28 September 2011, 03:13:36

Yes. I am looking for the justified full table scans.

If bigger tables are getting scanned, I would like to know %age rows scanned against %age rows as the output.

If the query needs 80% of the rows as the output, then a full table scan is always better.

I believe there is a possibility for this in Postgres. I think we can get this using pg_stat_user_table, pg_statio_user_tables and pg_stats.

I will post the calculation once it i get it.

Thanks

On Wed, Sep 28, 2011 at 6:25 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:

On 09/28/2011 12:26 AM, Venkat Balaji wrote:
Thanks a lot Kevin !!

Yes. I intended to track full table scans first to ensure that only
small tables or tables with very less pages are (as you said) getting
scanned full.

It can also be best to do a full table scan of a big table for some queries. If the query needs to touch all the data in a table - for example, for an aggregate - then the query will often complete fastest and with less disk use by using a sequential scan.

I guess what you'd really want to know is to find out about queries that do seqscans to match relatively small fractions of the total tuples scanned, ie low-selectivity seqscans. I'm not sure whether or not it's possible to gather this data with PostgreSQL's current level of stats detail.

--
Craig Ringer