Thread: : Tracking Full Table Scans

: Tracking Full Table Scans

From
Venkat Balaji
Date:
Hello Everyone,

I am preparing a plan to track the tables undergoing Full Table Scans for most number of times.

If i track seq_scan from the pg_stat_user_tables, will that help (considering the latest analyzed ones) ?

Please help !

Thanks
VB

Re: : Tracking Full Table Scans

From
"Kevin Grittner"
Date:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> I am preparing a plan to track the tables undergoing Full Table
> Scans for most number of times.
>
> If i track seq_scan from the pg_stat_user_tables, will that help
> (considering the latest analyzed ones) ?

Well, yeah; but be careful not to assume that a sequential scan is
always a bad thing.  Here's our top ten tables for sequential scans
in a database which is performing quite well:

cc=> select seq_scan, n_live_tup, relname
cc->   from pg_stat_user_tables
cc->   order by seq_scan desc
cc->   limit 10;
 seq_scan | n_live_tup |      relname
----------+------------+--------------------
 81264339 |         20 | MaintCode
 16840299 |          3 | DbTranImageStatus
 14905181 |         18 | ControlFeature
 11908114 |         10 | AgingBoundary
  8789288 |         22 | CtofcTypeCode
  7786110 |          6 | PrefCounty
  6303959 |          9 | ProtOrderHistEvent
  5835430 |          1 | ControlRecord
  5466806 |          1 | ControlAccounting
  5202028 |         12 | ProtEventOrderType
(10 rows)

You'll notice that they are all very small tables.  In all cases the
entire heap fits in one page, so any form of indexed scan would at
least double the number of pages visited, and slow things down.

If you have queries which are not performing to expectations, your
best bet might be to pick one of them and post it here, following
the advice on this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: : Tracking Full Table Scans

From
Venkat Balaji
Date:
Thanks a lot Kevin !!

Yes. I intended to track full table scans first to ensure that only small tables or tables with very less pages are (as you said) getting scanned full.

I am yet to identify slow running queries. Will surely hit back with them in future.

Thanks
VB



On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> I am preparing a plan to track the tables undergoing Full Table
> Scans for most number of times.
>
> If i track seq_scan from the pg_stat_user_tables, will that help
> (considering the latest analyzed ones) ?

Well, yeah; but be careful not to assume that a sequential scan is
always a bad thing.  Here's our top ten tables for sequential scans
in a database which is performing quite well:

cc=> select seq_scan, n_live_tup, relname
cc->   from pg_stat_user_tables
cc->   order by seq_scan desc
cc->   limit 10;
 seq_scan | n_live_tup |      relname
----------+------------+--------------------
 81264339 |         20 | MaintCode
 16840299 |          3 | DbTranImageStatus
 14905181 |         18 | ControlFeature
 11908114 |         10 | AgingBoundary
 8789288 |         22 | CtofcTypeCode
 7786110 |          6 | PrefCounty
 6303959 |          9 | ProtOrderHistEvent
 5835430 |          1 | ControlRecord
 5466806 |          1 | ControlAccounting
 5202028 |         12 | ProtEventOrderType
(10 rows)

You'll notice that they are all very small tables.  In all cases the
entire heap fits in one page, so any form of indexed scan would at
least double the number of pages visited, and slow things down.

If you have queries which are not performing to expectations, your
best bet might be to pick one of them and post it here, following
the advice on this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin

Re: : Tracking Full Table Scans

From
Venkat Balaji
Date:
I would like to know the difference between "n_tup_upd" and "n_tup_hot_upd".

Thanks
VB

On Tue, Sep 27, 2011 at 9:56 PM, Venkat Balaji <venkat.balaji@verse.in> wrote:
Thanks a lot Kevin !!

Yes. I intended to track full table scans first to ensure that only small tables or tables with very less pages are (as you said) getting scanned full.

I am yet to identify slow running queries. Will surely hit back with them in future.

Thanks
VB



On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> I am preparing a plan to track the tables undergoing Full Table
> Scans for most number of times.
>
> If i track seq_scan from the pg_stat_user_tables, will that help
> (considering the latest analyzed ones) ?

Well, yeah; but be careful not to assume that a sequential scan is
always a bad thing.  Here's our top ten tables for sequential scans
in a database which is performing quite well:

cc=> select seq_scan, n_live_tup, relname
cc->   from pg_stat_user_tables
cc->   order by seq_scan desc
cc->   limit 10;
 seq_scan | n_live_tup |      relname
----------+------------+--------------------
 81264339 |         20 | MaintCode
 16840299 |          3 | DbTranImageStatus
 14905181 |         18 | ControlFeature
 11908114 |         10 | AgingBoundary
 8789288 |         22 | CtofcTypeCode
 7786110 |          6 | PrefCounty
 6303959 |          9 | ProtOrderHistEvent
 5835430 |          1 | ControlRecord
 5466806 |          1 | ControlAccounting
 5202028 |         12 | ProtEventOrderType
(10 rows)

You'll notice that they are all very small tables.  In all cases the
entire heap fits in one page, so any form of indexed scan would at
least double the number of pages visited, and slow things down.

If you have queries which are not performing to expectations, your
best bet might be to pick one of them and post it here, following
the advice on this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin


Re: : Tracking Full Table Scans

From
"Kevin Grittner"
Date:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> I would like to know the difference between "n_tup_upd" and
> "n_tup_hot_upd".

A HOT update is used when none of the updated columns are used in an
index and there is room for the new tuple (version of the row) on
the same page as the old tuple.  This is faster for a number of
reasons, and cleanup of the old tuple is a little different.

If you want the gory implementation details, take a look at this
file in the source tree:

src/backend/access/heap/README.HOT

-Kevin

Re: : Tracking Full Table Scans

From
Craig Ringer
Date:
On 09/28/2011 12:26 AM, Venkat Balaji wrote:
> Thanks a lot Kevin !!
>
> Yes. I intended to track full table scans first to ensure that only
> small tables or tables with very less pages are (as you said) getting
> scanned full.

It can also be best to do a full table scan of a big table for some
queries. If the query needs to touch all the data in a table - for
example, for an aggregate - then the query will often complete fastest
and with less disk use by using a sequential scan.

I guess what you'd really want to know is to find out about queries that
do seqscans to match relatively small fractions of the total tuples
scanned, ie low-selectivity seqscans. I'm not sure whether or not it's
possible to gather this data with PostgreSQL's current level of stats
detail.

--
Craig Ringer

Re: : Tracking Full Table Scans

From
Venkat Balaji
Date:
Thanks Kevin !!

I will have a look at the source tree.

Regards
VB

On Tue, Sep 27, 2011 at 10:45 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Venkat Balaji <venkat.balaji@verse.in> wrote:

> I would like to know the difference between "n_tup_upd" and
> "n_tup_hot_upd".

A HOT update is used when none of the updated columns are used in an
index and there is room for the new tuple (version of the row) on
the same page as the old tuple.  This is faster for a number of
reasons, and cleanup of the old tuple is a little different.

If you want the gory implementation details, take a look at this
file in the source tree:

src/backend/access/heap/README.HOT

-Kevin

Re: : Tracking Full Table Scans

From
Venkat Balaji
Date:
Yes. I am looking for the justified full table scans.

If bigger tables are getting scanned, I would like to know %age rows scanned against %age rows as the output.

If the query needs 80% of the rows as the output, then a full table scan is always better.

I believe there is a possibility for this in Postgres. I think we can get this using pg_stat_user_table, pg_statio_user_tables and pg_stats.

I will post the calculation once it i get it.

Thanks
VB

On Wed, Sep 28, 2011 at 6:25 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 09/28/2011 12:26 AM, Venkat Balaji wrote:
Thanks a lot Kevin !!

Yes. I intended to track full table scans first to ensure that only
small tables or tables with very less pages are (as you said) getting
scanned full.

It can also be best to do a full table scan of a big table for some queries. If the query needs to touch all the data in a table - for example, for an aggregate - then the query will often complete fastest and with less disk use by using a sequential scan.

I guess what you'd really want to know is to find out about queries that do seqscans to match relatively small fractions of the total tuples scanned, ie low-selectivity seqscans. I'm not sure whether or not it's possible to gather this data with PostgreSQL's current level of stats detail.

--
Craig Ringer