Thread: How can I manually alter the statistics for a column?

How can I manually alter the statistics for a column?

From
Douglas Alan
Date:
I'd like to manually alter the statistics for a column, as for the column in question the statistics are causing Postgres to do the wrong thing for my purposes. (I.e., a Seq Scan, rather than an Index Scan.)  If someone can tell me how to achieve this, I would quite grateful.

Thanks!
|>ouglas


P.S. Actually, for this particular problem, just deleting the statistics would be fine.  I've tried doing:

alter table maindb_astobject alter column survey_id set statistics 0;

And then analyzing the column, but when "statistics"  for a column are set to 0, Postgres seems to leave the current statistics in place, which is not the right thing for me at all.  I can successfully set "statistics" to 1, but that turns out to be one statistic too many.

I've tried settings the statistics via the table "pg_stats", but that turns out to be a view, and Postgres won't allow to me to alter it.

Perhaps I can achieve the end by altering the "pg_statistic" table instead, but that table is more than a bit opaque to me.

P.P.S The Seq Scan is 2-4 orders of magnitude slower than the Index Scan.



Re: How can I manually alter the statistics for a column?

From
Douglas Alan
Date:
If you want something done right, I guess you have to do it yourself!

Here's the answer to my question.  It works great! Or so it seems to:

delete from pg_statistic s
where exists ( select 1
from pg_class as c, pg_attribute as a
where a.attrelid = c.relfilenode
and s.starelid = c.relfilenode
and s.staattnum = a.attnum
and c.relname = 'maindb_astobject'
and attname = 'survey_id'
);

In the above SQL statement, "maindb_astobject" is the name of the table and "survey_id" is the name of the column.  The statement deletes all the statistics for the specified column in the specified table.

|>ouglas

On Mon, Jun 1, 2009 at 2:20 PM, Douglas Alan <darkwater42@gmail.com> wrote:
I'd like to manually alter the statistics for a column, as for the column in question the statistics are causing Postgres to do the wrong thing for my purposes. (I.e., a Seq Scan, rather than an Index Scan.)  If someone can tell me how to achieve this, I would quite grateful.

Thanks!
|>ouglas


P.S. Actually, for this particular problem, just deleting the statistics would be fine.  I've tried doing:

alter table maindb_astobject alter column survey_id set statistics 0;

And then analyzing the column, but when "statistics"  for a column are set to 0, Postgres seems to leave the current statistics in place, which is not the right thing for me at all.  I can successfully set "statistics" to 1, but that turns out to be one statistic too many.

I've tried settings the statistics via the table "pg_stats", but that turns out to be a view, and Postgres won't allow to me to alter it.

Perhaps I can achieve the end by altering the "pg_statistic" table instead, but that table is more than a bit opaque to me.

P.P.S The Seq Scan is 2-4 orders of magnitude slower than the Index Scan.

Re: How can I manually alter the statistics for a column?

From
Isak Hansen
Date:
On Mon, Jun 1, 2009 at 8:20 PM, Douglas Alan <darkwater42@gmail.com> wrote:
> I'd like to manually alter the statistics for a column, as for the column in
> question the statistics are causing Postgres to do the wrong thing for my
> purposes. (I.e., a Seq Scan, rather than an Index Scan.)  If someone can
> tell me how to achieve this, I would quite grateful.
>

Assuming estimates are off, wouldn't increasing the stats target be a
better approach? Then adjust the planner cost constants, in particular
seq_page_cost, random_page_cost and effective_cache_size, to prevent a
seq scan?

Considering you've pulled off the workaround, you must know what
you're doing, but I'd be worried about it backfiring in the future.


Regards,
Isak


> Thanks!
> |>ouglas
>
>
> P.S. Actually, for this particular problem, just deleting the statistics
> would be fine.  I've tried doing:
>
> alter table maindb_astobject alter column survey_id set statistics 0;
>
> And then analyzing the column, but when "statistics"  for a column are set
> to 0, Postgres seems to leave the current statistics in place, which is not
> the right thing for me at all.  I can successfully set "statistics" to 1,
> but that turns out to be one statistic too many.
>
> I've tried settings the statistics via the table "pg_stats", but that turns
> out to be a view, and Postgres won't allow to me to alter it.
>
> Perhaps I can achieve the end by altering the "pg_statistic" table instead,
> but that table is more than a bit opaque to me.
>
> P.P.S The Seq Scan is 2-4 orders of magnitude slower than the Index Scan.
>
>
>
>

Re: How can I manually alter the statistics for a column?

From
Tom Lane
Date:
Douglas Alan <darkwater42@gmail.com> writes:
> delete from pg_statistic s
> where exists ( select 1
> from pg_class as c, pg_attribute as a
> where a.attrelid = c.relfilenode
> and s.starelid = c.relfilenode
> and s.staattnum = a.attnum
> and c.relname = 'maindb_astobject'
> and attname = 'survey_id'
> );

Use c.oid, not c.relfilenode.

Also, the join to pg_class is both more and less than needed --- it
won't handle the situation where there are multiple tables of the same
name in different schemas.  You could add pg_namespace into the join,
but it's the hard way.  The way I'd do it is probably

delete from pg_statistic
where (starelid, staattnum) in
  (select attrelid, attnum from pg_attribute
   where attrelid = 'my_relation'::regclass and attname = 'my_attribute');

regclass knows about schemas and search paths, so stuff like
'my_schema.my_relation'::regclass will work unsurprisingly.

            regards, tom lane

Re: How can I manually alter the statistics for a column?

From
Douglas Alan
Date:
On Tue, Jun 2, 2009 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

delete from pg_statistic
where (starelid, staattnum) in
 (select attrelid, attnum from pg_attribute
  where attrelid = 'my_relation'::regclass and attname = 'my_attribute');

regclass knows about schemas and search paths, so stuff like
'my_schema.my_relation'::regclass will work unsurprisingly.


Thanks!  That's very helpful.

Hey, while I have you on the line, might you be so kind as to explain why this query is so slow?  Shouldn't it just fetch the first row in the table?  What could be faster than that?

explain analyze select * from maindb_astobject limit 1;

                                                             QUERY PLAN                                                            
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.04 rows=1 width=78) (actual time=8091.962..8091.965 rows=1 loops=1)
   ->  Seq Scan on maindb_astobject  (cost=0.00..3358190.12 rows=75426912 width=78) (actual time=8091.955..8091.955 rows=1 loops=1)
 Total runtime: 8092.040 ms
(3 rows)

The query runs perfectly fast, on the other hand, if I encourage it to use an index like so:

 explain analyze select * from maindb_astobject order by id limit 1;

                                                                         QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.10 rows=1 width=78) (actual time=0.203..0.205 rows=1 loops=1)
   ->  Index Scan using maindb_astobject_pkey on maindb_astobject  (cost=0.00..7650690.77 rows=75426912 width=78) (actual time=0.196..0.196 rows=1 loops=1)
 Total runtime: 0.292 ms
(3 rows)

|>ouglas



Re: How can I manually alter the statistics for a column?

From
Alvaro Herrera
Date:
Douglas Alan escribió:

> Hey, while I have you on the line, might you be so kind as to explain why
> this query is so slow?  Shouldn't it just fetch the first row in the table?
> What could be faster than that?
>
> explain analyze select * from maindb_astobject limit 1;
> >
> >                                                              QUERY
> > PLAN
> >
> >
------------------------------------------------------------------------------------------------------------------------------------
> >  Limit  (cost=0.00..0.04 rows=1 width=78) (actual time=8091.962..8091.965
> > rows=1 loops=1)
> >    ->  Seq Scan on maindb_astobject  (cost=0.00..3358190.12 rows=75426912
> > width=78) (actual time=8091.955..8091.955 rows=1 loops=1)
> >  Total runtime: 8092.040 ms
> > (3 rows)

Maybe there's a bunch of pages with only dead tuples at the start of the
table?  Maybe a lot of empty pages at the start of the table  (If this
is 8.3 you have to consider sync_seqscan as well)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: How can I manually alter the statistics for a column?

From
Tom Lane
Date:
Douglas Alan <darkwater42@gmail.com> writes:
> Hey, while I have you on the line, might you be so kind as to explain why
> this query is so slow?  Shouldn't it just fetch the first row in the table?
>
> explain analyze select * from maindb_astobject limit 1;

Yeah ...

>> QUERY
>> PLAN
>>
>>
------------------------------------------------------------------------------------------------------------------------------------
>> Limit  (cost=0.00..0.04 rows=1 width=78) (actual time=8091.962..8091.965
>> rows=1 loops=1)
>> ->  Seq Scan on maindb_astobject  (cost=0.00..3358190.12 rows=75426912
>> width=78) (actual time=8091.955..8091.955 rows=1 loops=1)
>> Total runtime: 8092.040 ms
>> (3 rows)

Ouch.

I'm betting that this table is horrendously bloated and the seqscan is
picking through many thousands of dead rows before it finds the first
live one.  Cue standard questions about your vacuuming habits ...

            regards, tom lane