This is so bad in pg_dump use, when a database so big.
because pg_dump is also use repeatable read isolation. and when pg_dump backup database , the database will bloat .
Can we optimize it?
-- 公益是一辈子的事,I'm Digoal,Just Do It.
At 2014-04-29 02:53:33,"Heikki Linnakangas" <hlinnakangas@vmware.com> wrote:
>On 04/28/2014 11:37 AM, digoal@126.com wrote:
>> SESSION B :
>> but B cann't reclaim rows from table t.
>> why?
>> i think postgresql cann't reclaim tuples already exists before repeatable
>> read transaction start, why this case t's tuples after session a and cann't
>> reclaim.
>
>I think what you're arguing is that the system should be smarter and be
>able to reclaim the dead tuples. Because session A began before the
>table was even created, and there are no other backends that would need
>to see them either, they could indeed be safely vacuumed. The system
>just isn't smart enough to distinguish the case.
>
>The short answer is that such an optimization just doesn't exist in
>PostgreSQL. It's certainly not a bug.
>
>The long answer is that actually, even though the table was created
>after the transaction in session A began, session A *can* access the
>table. Schema changes don't follow the normal MVCC rules. If you do
>"SELECT * FROM t" in session A, it will work. However, the rows still
>won't be visible, to sessin A, because they were inserted after the
>snapshot was taken, so they could still be vacuumed if the system
>tracked the snapshots more carefully and was able to deduce that. But
>the fact that a new table was created is not relevant.
>
>- Heikki