Re: ALTER TABLE ... ALTER COLUMN ... SET DISTINCT - Mailing list pgsql-hackers

From Robert Haas
Subject Re: ALTER TABLE ... ALTER COLUMN ... SET DISTINCT
Date
Msg-id 603c8f070904041856i4e2b895dxd316daa3642d6a40@mail.gmail.com
Whole thread Raw
In response to Re: ALTER TABLE ... ALTER COLUMN ... SET DISTINCT  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: ALTER TABLE ... ALTER COLUMN ... SET DISTINCT  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: ALTER TABLE ... ALTER COLUMN ... SET DISTINCT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Apr 4, 2009 at 7:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Per previous discussion.
>> http://archives.postgresql.org/message-id/8066.1229106059@sss.pgh.pa.us
>> http://archives.postgresql.org/message-id/603c8f070904021926g92eb55sdfc68141133957c1@mail.gmail.com
>
> I'm not thrilled about adding a column to pg_attribute for this.
> Isn't there some way of keeping it in pg_statistic?

I don't like the idea of keeping it in pg_statistic.  Right now, all
of the data in pg_statistic is transient, so you could theoretically
truncate the table at any time without losing anything permanent.
It's true that we don't do that right now, but it seems cleaner to
keep the data generated by the analyzer separate from the stuff we
consider part of the structure of the database.  If we did put the
data in pg_statistic, then we'd have to teach vacuum that when it
writes out new statistics, it also has to copy over this setting from
the previous version of the tuple.  And that means it would have to
lock the tuples against concurrent updates while analyze is running.
Also, if someone happened to run ALTER TABLE SET DISTINCT before the
first run of ANALYZE on that table (for example, during pg_load)
there'd be no existing row in pg_statistic for the DDL command to
update, so we'd need to create and insert a fake row (which,
incidentally, would blow up any concurrent ANALYZE already in progress
when it got and tried to insert the resulting rows into pg_statistic,
violating the unique constraint).  All in all it seems rather messy.

What is the specific nature of your concern?  I thought about the
possibility of a distributed performance penalty that might be
associated with enlarging pg_attribute, but increasing the size of a
structure that is already 112 bytes by another 4 doesn't seem likely
to be significant, especially since we're not crossing a power-of-two
boundary.  It might be possible to reclaim 4 bytes by changing
attstattarget and attndims from int4 to int2, but I'd rather do that
as a separate patch.

...Robert


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Closing some 8.4 open items
Next
From: Tom Lane
Date:
Subject: Re: XML only working in UTF-8 - Re: 8.4 open items list