Thread: How can I manually alter the statistics for a column?
I'd like to manually alter the statistics for a column, as for the column in question the statistics are causing Postgres to do the wrong thing for my purposes. (I.e., a Seq Scan, rather than an Index Scan.) If someone can tell me how to achieve this, I would quite grateful.
Thanks!
|>ouglas
P.S. Actually, for this particular problem, just deleting the statistics would be fine. I've tried doing:
I've tried settings the statistics via the table "pg_stats", but that turns out to be a view, and Postgres won't allow to me to alter it.
Perhaps I can achieve the end by altering the "pg_statistic" table instead, but that table is more than a bit opaque to me.
P.P.S The Seq Scan is 2-4 orders of magnitude slower than the Index Scan.
Thanks!
|>ouglas
P.S. Actually, for this particular problem, just deleting the statistics would be fine. I've tried doing:
alter table maindb_astobject alter column survey_id set statistics 0;
And then analyzing the column, but when "statistics" for a column are set to 0, Postgres seems to leave the current statistics in place, which is not the right thing for me at all. I can successfully set "statistics" to 1, but that turns out to be one statistic too many.I've tried settings the statistics via the table "pg_stats", but that turns out to be a view, and Postgres won't allow to me to alter it.
Perhaps I can achieve the end by altering the "pg_statistic" table instead, but that table is more than a bit opaque to me.
P.P.S The Seq Scan is 2-4 orders of magnitude slower than the Index Scan.
If you want something done right, I guess you have to do it yourself!
Here's the answer to my question. It works great! Or so it seems to:
|>ouglas
Here's the answer to my question. It works great! Or so it seems to:
delete from pg_statistic s
where exists ( select 1
from pg_class as c, pg_attribute as a
where a.attrelid = c.relfilenode
In the above SQL statement, "maindb_astobject" is the name of the table and "survey_id" is the name of the column. The statement deletes all the statistics for the specified column in the specified table.where exists ( select 1
from pg_class as c, pg_attribute as a
where a.attrelid = c.relfilenode
and s.starelid = c.relfilenode
and s.staattnum = a.attnum
and c.relname = 'maindb_astobject'
and attname = 'survey_id'
);and s.staattnum = a.attnum
and c.relname = 'maindb_astobject'
and attname = 'survey_id'
|>ouglas
On Mon, Jun 1, 2009 at 2:20 PM, Douglas Alan <darkwater42@gmail.com> wrote:
I'd like to manually alter the statistics for a column, as for the column in question the statistics are causing Postgres to do the wrong thing for my purposes. (I.e., a Seq Scan, rather than an Index Scan.) If someone can tell me how to achieve this, I would quite grateful.
Thanks!
|>ouglas
P.S. Actually, for this particular problem, just deleting the statistics would be fine. I've tried doing:alter table maindb_astobject alter column survey_id set statistics 0;And then analyzing the column, but when "statistics" for a column are set to 0, Postgres seems to leave the current statistics in place, which is not the right thing for me at all. I can successfully set "statistics" to 1, but that turns out to be one statistic too many.
I've tried settings the statistics via the table "pg_stats", but that turns out to be a view, and Postgres won't allow to me to alter it.
Perhaps I can achieve the end by altering the "pg_statistic" table instead, but that table is more than a bit opaque to me.
P.P.S The Seq Scan is 2-4 orders of magnitude slower than the Index Scan.
On Mon, Jun 1, 2009 at 8:20 PM, Douglas Alan <darkwater42@gmail.com> wrote: > I'd like to manually alter the statistics for a column, as for the column in > question the statistics are causing Postgres to do the wrong thing for my > purposes. (I.e., a Seq Scan, rather than an Index Scan.) If someone can > tell me how to achieve this, I would quite grateful. > Assuming estimates are off, wouldn't increasing the stats target be a better approach? Then adjust the planner cost constants, in particular seq_page_cost, random_page_cost and effective_cache_size, to prevent a seq scan? Considering you've pulled off the workaround, you must know what you're doing, but I'd be worried about it backfiring in the future. Regards, Isak > Thanks! > |>ouglas > > > P.S. Actually, for this particular problem, just deleting the statistics > would be fine. I've tried doing: > > alter table maindb_astobject alter column survey_id set statistics 0; > > And then analyzing the column, but when "statistics" for a column are set > to 0, Postgres seems to leave the current statistics in place, which is not > the right thing for me at all. I can successfully set "statistics" to 1, > but that turns out to be one statistic too many. > > I've tried settings the statistics via the table "pg_stats", but that turns > out to be a view, and Postgres won't allow to me to alter it. > > Perhaps I can achieve the end by altering the "pg_statistic" table instead, > but that table is more than a bit opaque to me. > > P.P.S The Seq Scan is 2-4 orders of magnitude slower than the Index Scan. > > > >
Douglas Alan <darkwater42@gmail.com> writes: > delete from pg_statistic s > where exists ( select 1 > from pg_class as c, pg_attribute as a > where a.attrelid = c.relfilenode > and s.starelid = c.relfilenode > and s.staattnum = a.attnum > and c.relname = 'maindb_astobject' > and attname = 'survey_id' > ); Use c.oid, not c.relfilenode. Also, the join to pg_class is both more and less than needed --- it won't handle the situation where there are multiple tables of the same name in different schemas. You could add pg_namespace into the join, but it's the hard way. The way I'd do it is probably delete from pg_statistic where (starelid, staattnum) in (select attrelid, attnum from pg_attribute where attrelid = 'my_relation'::regclass and attname = 'my_attribute'); regclass knows about schemas and search paths, so stuff like 'my_schema.my_relation'::regclass will work unsurprisingly. regards, tom lane
On Tue, Jun 2, 2009 at 9:52 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thanks! That's very helpful.
Hey, while I have you on the line, might you be so kind as to explain why this query is so slow? Shouldn't it just fetch the first row in the table? What could be faster than that?
The query runs perfectly fast, on the other hand, if I encourage it to use an index like so:
|>ouglas
delete from pg_statistic
where (starelid, staattnum) in
(select attrelid, attnum from pg_attribute
where attrelid = 'my_relation'::regclass and attname = 'my_attribute');
regclass knows about schemas and search paths, so stuff like
'my_schema.my_relation'::regclass will work unsurprisingly.
Thanks! That's very helpful.
Hey, while I have you on the line, might you be so kind as to explain why this query is so slow? Shouldn't it just fetch the first row in the table? What could be faster than that?
explain analyze select * from maindb_astobject limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.04 rows=1 width=78) (actual time=8091.962..8091.965 rows=1 loops=1)
-> Seq Scan on maindb_astobject (cost=0.00..3358190.12 rows=75426912 width=78) (actual time=8091.955..8091.955 rows=1 loops=1)
Total runtime: 8092.040 ms
(3 rows)
The query runs perfectly fast, on the other hand, if I encourage it to use an index like so:
explain analyze select * from maindb_astobject order by id limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.10 rows=1 width=78) (actual time=0.203..0.205 rows=1 loops=1)
-> Index Scan using maindb_astobject_pkey on maindb_astobject (cost=0.00..7650690.77 rows=75426912 width=78) (actual time=0.196..0.196 rows=1 loops=1)
Total runtime: 0.292 ms
(3 rows)
|>ouglas
Douglas Alan escribió: > Hey, while I have you on the line, might you be so kind as to explain why > this query is so slow? Shouldn't it just fetch the first row in the table? > What could be faster than that? > > explain analyze select * from maindb_astobject limit 1; > > > > QUERY > > PLAN > > > > ------------------------------------------------------------------------------------------------------------------------------------ > > Limit (cost=0.00..0.04 rows=1 width=78) (actual time=8091.962..8091.965 > > rows=1 loops=1) > > -> Seq Scan on maindb_astobject (cost=0.00..3358190.12 rows=75426912 > > width=78) (actual time=8091.955..8091.955 rows=1 loops=1) > > Total runtime: 8092.040 ms > > (3 rows) Maybe there's a bunch of pages with only dead tuples at the start of the table? Maybe a lot of empty pages at the start of the table (If this is 8.3 you have to consider sync_seqscan as well) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Douglas Alan <darkwater42@gmail.com> writes: > Hey, while I have you on the line, might you be so kind as to explain why > this query is so slow? Shouldn't it just fetch the first row in the table? > > explain analyze select * from maindb_astobject limit 1; Yeah ... >> QUERY >> PLAN >> >> ------------------------------------------------------------------------------------------------------------------------------------ >> Limit (cost=0.00..0.04 rows=1 width=78) (actual time=8091.962..8091.965 >> rows=1 loops=1) >> -> Seq Scan on maindb_astobject (cost=0.00..3358190.12 rows=75426912 >> width=78) (actual time=8091.955..8091.955 rows=1 loops=1) >> Total runtime: 8092.040 ms >> (3 rows) Ouch. I'm betting that this table is horrendously bloated and the seqscan is picking through many thousands of dead rows before it finds the first live one. Cue standard questions about your vacuuming habits ... regards, tom lane