Thread: CLUSTER and synchronized scans and pg_dump et al

CLUSTER and synchronized scans and pg_dump et al

From
Gregory Stark
Date:
It occurred to me the other day that synchronized scans could play havoc with
clustered tables. When you dump and reload a table even if it was recently
clustered if any other sequential scans are happening in the system at the
time you dump it the dump could shuffle the records out of order. 

Now the records would still be effectively ordered for most purposes but our
statistics can't detect that. Since the correlation would be poor the restored
database would have markedly different statistics showing virtually no
correlation on the clustered column.

Perhaps we should have some form of escape hatch for pg_dump to request real
physical order when dumping clustered tables.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support!


Re: CLUSTER and synchronized scans and pg_dump et al

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Perhaps we should have some form of escape hatch for pg_dump to request real
> physical order when dumping clustered tables.

Yeah, Rae Steining was complaining to me about that off-list a few weeks
ago.  The whole syncscan behavior risks breaking many apps that "always
worked before", even if they were disregarding the letter of the SQL spec.

Maybe a GUC variable to enable/disable syncscan?
        regards, tom lane


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Guillaume Smet"
Date:
On Jan 27, 2008 6:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, Rae Steining was complaining to me about that off-list a few weeks
> ago.  The whole syncscan behavior risks breaking many apps that "always
> worked before", even if they were disregarding the letter of the SQL spec.
>
> Maybe a GUC variable to enable/disable syncscan?

I'm not sure it's really a good reason for that because it's just a
matter of time for them to be broken anyway.

But it seems at least a good idea to have a way to build reproducible
test cases on production boxes without being perturbed by the other
scans running. Would it need a restart and be a global GUC variable or
could it be set temporarily per session?

--
Guillaume


Re: CLUSTER and synchronized scans and pg_dump et al

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> Would it need a restart and be a global GUC variable or
> could it be set temporarily per session?

It could be PGC_USERSET, afaics.
        regards, tom lane


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Guillaume Smet"
Date:
On Jan 27, 2008 7:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It could be PGC_USERSET, afaics.

If so, it seems like a good idea even if it's just for debugging purposes.

--
Guillaume


Re: CLUSTER and synchronized scans and pg_dump et al

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
>>> Maybe a GUC variable to enable/disable syncscan?

> If so, it seems like a good idea even if it's just for debugging purposes.

Do we have nominations for a name?  The first idea that comes to mind
is "synchronized_scanning" (defaulting to ON).

Also, does anyone object to making pg_dump just disable it
unconditionally?  Greg's original gripe only mentioned the case of
clustered tables, but it'd be kind of a pain to make pg_dump turn it
on and off again for different tables.  And I could see people
complaining about pg_dump failing to preserve row order even in
unclustered tables.
        regards, tom lane


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Guillaume Smet"
Date:
On Jan 27, 2008 7:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Also, does anyone object to making pg_dump just disable it
> unconditionally?  Greg's original gripe only mentioned the case of
> clustered tables, but it'd be kind of a pain to make pg_dump turn it
> on and off again for different tables.  And I could see people
> complaining about pg_dump failing to preserve row order even in
> unclustered tables.

+1. I don't think it's worth it to enable it for non clustered tables
and it's always better to keep the order if we can.

--
Guillaume


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Markus Bertheau"
Date:
2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:
>
> Do we have nominations for a name?  The first idea that comes to mind
> is "synchronized_scanning" (defaulting to ON).

"synchronized_sequential_scans" is a bit long, but contains the
keyword "sequential scans", which will ring a bell with many, more so
than "synchronized_scanning".

Markus


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Guillaume Smet"
Date:
On Jan 27, 2008 9:07 PM, Markus Bertheau <mbertheau.pg@googlemail.com> wrote:
> 2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:
> >
> > Do we have nominations for a name?  The first idea that comes to mind
> > is "synchronized_scanning" (defaulting to ON).
>
> "synchronized_sequential_scans" is a bit long, but contains the
> keyword "sequential scans", which will ring a bell with many, more so
> than "synchronized_scanning".

synchronize_seqscans?


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Florian G. Pflug"
Date:
Guillaume Smet wrote:
> On Jan 27, 2008 9:07 PM, Markus Bertheau
> <mbertheau.pg@googlemail.com> wrote:
>> 2008/1/28, Tom Lane <tgl@sss.pgh.pa.us>:
>>> Do we have nominations for a name?  The first idea that comes to
>>> mind is "synchronized_scanning" (defaulting to ON).
>> "synchronized_sequential_scans" is a bit long, but contains the 
>> keyword "sequential scans", which will ring a bell with many, more
>> so than "synchronized_scanning".
> 
> synchronize_seqscans?

How about enable_syncscan, or enable_seqscan_sync? It's not strictly
something the influences the planner, but maybe it's similar enough to
justify a similar naming?

regards, Florian Pflug


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Guillaume Smet"
Date:
Hi Florian,

Glad to see you back!

On Jan 28, 2008 3:25 PM, Florian G. Pflug <fgp@phlo.org> wrote:
> How about enable_syncscan, or enable_seqscan_sync? It's not strictly
> something the influences the planner, but maybe it's similar enough to
> justify a similar naming?

It was my first idea but I didn't propose it as it's really a
different thing IMHO. enable_* variables don't change the way
PostgreSQL really does the job as synchronize_scans (or whatever the
name will be) does.
And it's not very consistent with the other GUC variables (most of
them could have "enable" in their name) but we limited the usage of
enable_* to planner variables. I don't know if it's on purpose though.

--
Guillaume


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Kevin Grittner"
Date:
>>> On Sun, Jan 27, 2008 at  9:02 AM, in message
<87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
wrote:
> Perhaps we should have some form of escape hatch for pg_dump to request real
> physical order when dumping clustered tables.
It would seem reasonable to me for pg_dump to use ORDER BY to select
data from clustered tables.
I don't see a general case for worrying about the order of rows
returned by queries which lack an ORDER BY clause.
-Kevin




Re: CLUSTER and synchronized scans and pg_dump et al

From
Andrew Dunstan
Date:

Kevin Grittner wrote:
>>>> On Sun, Jan 27, 2008 at  9:02 AM, in message
>>>>         
> <87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
> wrote: 
>  
>   
>> Perhaps we should have some form of escape hatch for pg_dump to request real
>> physical order when dumping clustered tables.
>>     
>  
> It would seem reasonable to me for pg_dump to use ORDER BY to select
> data from clustered tables.
>  
> I don't see a general case for worrying about the order of rows
> returned by queries which lack an ORDER BY clause.
>  
>
>   

What will be the performance hit from doing that?

cheers

andrew


Re: CLUSTER and synchronized scans and pg_dump et al

From
"Kevin Grittner"
Date:
>>> On Mon, Jan 28, 2008 at  9:00 AM, in message <479DEDF5.4090909@dunslane.net>,
Andrew Dunstan <andrew@dunslane.net> wrote:
> Kevin Grittner wrote:
>>>>> On Sun, Jan 27, 2008 at  9:02 AM, in message
>> <87odb7s45i.fsf@oxford.xeocode.com>, Gregory Stark <stark@enterprisedb.com>
>> wrote:
>>
>>> Perhaps we should have some form of escape hatch for pg_dump to request real
>>> physical order when dumping clustered tables.
>>
>> It would seem reasonable to me for pg_dump to use ORDER BY to select
>> data from clustered tables.
>
> What will be the performance hit from doing that?
If the rows actually are in order of the clustered index, it
shouldn't add much more than the time needed to sequentially pass
the clustered index, should it?  Even so, perhaps there should be a
command-line option on pg_dump to control whether it does this.
-Kevin




Re: CLUSTER and synchronized scans and pg_dump et al

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> It was my first idea but I didn't propose it as it's really a
> different thing IMHO. enable_* variables don't change the way
> PostgreSQL really does the job as synchronize_scans (or whatever the
> name will be) does.
> And it's not very consistent with the other GUC variables (most of
> them could have "enable" in their name) but we limited the usage of
> enable_* to planner variables. I don't know if it's on purpose though.

Yeah, it is a more or less deliberate policy to use enable_ only for
planner control variables, which this one certainly isn't.  I seem
to recall an argument also that prefixing enable_ is just noise; it
doesn't add anything to your understanding of what the variable does.

So far I think "synchronize_seqscans" is the best proposal.
        regards, tom lane


Re: CLUSTER and synchronized scans and pg_dump et al

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Kevin Grittner wrote:
>> It would seem reasonable to me for pg_dump to use ORDER BY to select
>> data from clustered tables.

> What will be the performance hit from doing that?

That worries me too.  Also, in general pg_dump's charter is to reproduce
the state of the database as best it can, not to "improve" it.
        regards, tom lane


Re: CLUSTER and synchronized scans and pg_dump et al

From
Steve Atkins
Date:
On Jan 28, 2008, at 8:36 AM, Tom Lane wrote:

> Andrew Dunstan <andrew@dunslane.net> writes:
>> Kevin Grittner wrote:
>>> It would seem reasonable to me for pg_dump to use ORDER BY to select
>>> data from clustered tables.
>
>> What will be the performance hit from doing that?
>
> That worries me too.  Also, in general pg_dump's charter is to  
> reproduce
> the state of the database as best it can, not to "improve" it.

One common use of cluster around here is to act as a faster version
of vacuum full when there's a lot of dead rows in a table. There's no
intent to keep the table clustered on that index, and the cluster flag
isn't removed with alter table (why bother, the only thing it affects is
the cluster command).

I'm guessing that's not unusual, and it'd lead to sorting tables as part
of pg_dump.

Cheers,  Steve



Re: CLUSTER and synchronized scans and pg_dump et al

From
"Kevin Grittner"
Date:
>>> On Mon, Jan 28, 2008 at 10:36 AM, in message <3001.1201538162@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> in general pg_dump's charter is to reproduce
> the state of the database as best it can, not to "improve" it.
Seems that I've often seen it recommended as a way to eliminate bloat.
It seems like there are some practical use cases where it would be
a pain to have to do a CLUSTER right on the heels of having used
pg_dump to psql.
This does seem like the right way to do it where a user really wants
to maintain the physical sequence; my biggest concern is that
CLUSTER is sometimes used to eliminate bloat, and there is no real
interest in maintaining that sequence later.  I'd bet that people
generally do not alter the table to remove the clustered index
choice, so this option could be rather painful somewhere
downstream, when the sequence has become pretty random.
Maybe it would make sense if it was not the default, and the issues
were properly documented under the description of the option?
-Kevin




Re: CLUSTER and synchronized scans and pg_dump et al

From
"Florian G. Pflug"
Date:
Steve Atkins wrote:
> On Jan 28, 2008, at 8:36 AM, Tom Lane wrote:
> 
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>> Kevin Grittner wrote:
>>>> It would seem reasonable to me for pg_dump to use ORDER BY to select
>>>> data from clustered tables.
>>
>>> What will be the performance hit from doing that?
>>
>> That worries me too.  Also, in general pg_dump's charter is to reproduce
>> the state of the database as best it can, not to "improve" it.
> 
> One common use of cluster around here is to act as a faster version
> of vacuum full when there's a lot of dead rows in a table. There's no
> intent to keep the table clustered on that index, and the cluster flag
> isn't removed with alter table (why bother, the only thing it affects is
> the cluster command).
> 
> I'm guessing that's not unusual, and it'd lead to sorting tables as part
> of pg_dump.

I've done that too - and every time I typed that "CLUSTER ... " I 
thought why, oh why isn't there something like REWRITE TABLE <table>", 
which would work just like CLUSTER, but without the sorting ;-) Maybe 
something to put on the TODO list...

We might even call it "VACCUM REWRITE" ;-)

regards, Florian Pflug


Re: CLUSTER and synchronized scans and pg_dump et al

From
Jeff Davis
Date:
On Sun, 2008-01-27 at 12:45 -0500, Tom Lane wrote:
> Maybe a GUC variable to enable/disable syncscan?

The first iterations of the patch included a GUC. 

I don't have any objection to re-introducing a GUC to enable/disable it.
However, I would suggest that it defaults to "on", because:

1. There aren't many cases where you'd want it to be off, and this
particular case with pg_dump is the best one that I've heard of (thanks
Greg). We want people who install 8.3 to see a boost without lots of
tuning, if possible.
2. It only turns on for tables over 25% of shared buffers anyway.

Introducing GUCs reintroduces the same questions that were discussed
before. 

1. Should the 25% figure be tunable as well? 
2. Remember that the 25% figure is also tied to Simon and Heikki's
buffer recycling patch (buffer ring patch). Should they be configurable
independently? Should they be tied together, but configurable?

The simplest solution, in my opinion, is something like:

large_scan_threshold = 0.25 # set to -1 to disable

Where a scan of any table larger than (large_scan_threshold *
shared_buffers) employs both synchronized scans and buffer recycling. We
may implement other large scan strategies in the future.

Regards,Jeff Davis



Re: CLUSTER and synchronized scans and pg_dump et al

From
Jeff Davis
Date:
On Sun, 2008-01-27 at 15:02 +0000, Gregory Stark wrote:
> It occurred to me the other day that synchronized scans could play havoc with
> clustered tables. When you dump and reload a table even if it was recently
> clustered if any other sequential scans are happening in the system at the
> time you dump it the dump could shuffle the records out of order. 
> 
> Now the records would still be effectively ordered for most purposes but our
> statistics can't detect that. Since the correlation would be poor the restored
> database would have markedly different statistics showing virtually no
> correlation on the clustered column.
> 
> Perhaps we should have some form of escape hatch for pg_dump to request real
> physical order when dumping clustered tables.
> 

Thank you for bringing this up, it's an interesting point.

Keep in mind that this only matters if you are actually running pg_dump
concurrently with another scan, because a scan will reset the starting
point after completing.

Regards,Jeff Davis



Re: CLUSTER and synchronized scans and pg_dump et al

From
Jeff Davis
Date:
On Sun, 2008-01-27 at 13:37 -0500, Tom Lane wrote:
> Also, does anyone object to making pg_dump just disable it
> unconditionally?  Greg's original gripe only mentioned the case of
> clustered tables, but it'd be kind of a pain to make pg_dump turn it
> on and off again for different tables.  And I could see people
> complaining about pg_dump failing to preserve row order even in
> unclustered tables.
> 

If you are running pg_dump, that increases the likelihood that multiple
sequential scans will be reading the same large table at the same time.
Sync scans prevent that additional scan from bringing your active
database to a halt during your dump (due to horrible seeking and poor
cache efficiency).

I think that pg_dump is a good use case for synchronized scans. Assuming
it doesn't hold up 8.3, I think it's worthwhile to consider only
disabling it for tables that have been clustered.

That being said, this isn't a strong objection. Having a painless 8.3
release is the top priority, of course.

Regards,Jeff Davis