Thread: Why does slony use a cursor? Anyone know?

Why does slony use a cursor? Anyone know?

From
Shaun Thomas
Date:
Hey everyone,

Frankly, I'm shocked at what I just found.

We did a delete last night of a few million rows, and come back this
morning to find that slony is 9-hours behind. After some investigation,
it became apparent that slony opens up a cursor and orders it by the
log_actionseq column. Then it fetches 500 rows, and closes the cursor.
So it's fetching several million rows into a cursor, to fetch 500, and
then throw the rest away.

That is quite possibly the least efficient manner I could think of to
build a sync system, so maybe someone knows why they did it that way?

At least with a regular query, it could sort by the column it wanted,
and put a nifty index on it for those 500-row chunks it's grabbing. I
must be missing something...

Yeah, I know... millions of rows + slony = bad. But it doesn't have to
be *this* bad. Even a few hundred thousand rows would be terrible with
this design.

I plan on asking the slony guys too, but I figured someone on this list
might happen to know.

Looks like I might have to look into Bucardo or Londiste.

Thanks!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Why does slony use a cursor? Anyone know?

From
Martijn van Oosterhout
Date:
On Tue, Mar 05, 2013 at 08:51:11AM -0600, Shaun Thomas wrote:
> Hey everyone,
>
> Frankly, I'm shocked at what I just found.
>
> We did a delete last night of a few million rows, and come back this
> morning to find that slony is 9-hours behind. After some
> investigation, it became apparent that slony opens up a cursor and
> orders it by the log_actionseq column. Then it fetches 500 rows, and
> closes the cursor. So it's fetching several million rows into a
> cursor, to fetch 500, and then throw the rest away.

I once had a similar problem and it looked like they were missing an
index. I asked about it and it turns out that I was running a somewhat
old version, and it was fixed in later versions.

Check that first. But ask anyway, I've always found the Slony guys very
helpful.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

Re: Why does slony use a cursor? Anyone know?

From
Glyn Astill
Date:
> From: Shaun Thomas <sthomas@optionshouse.com
> To: PostgreSQL General <pgsql-general@postgresql.org>
> Cc:
> Sent: Tuesday, 5 March 2013, 14:51
> Subject: [GENERAL] Why does slony use a cursor? Anyone know?
>
> Hey everyone,
>
> Frankly, I'm shocked at what I just found.
>
> We did a delete last night of a few million rows, and come back this morning to
> find that slony is 9-hours behind. After some investigation, it became apparent
> that slony opens up a cursor and orders it by the log_actionseq column. Then it
> fetches 500 rows, and closes the cursor. So it's fetching several million
> rows into a cursor, to fetch 500, and then throw the rest away.
>
> That is quite possibly the least efficient manner I could think of to build a
> sync system, so maybe someone knows why they did it that way?
>
> At least with a regular query, it could sort by the column it wanted, and put a
> nifty index on it for those 500-row chunks it's grabbing. I must be missing
> something...

What version of slony are you on?  The specifics of what you mention don't sound quite right, but it sounds very much
likebug 167 which was fixed in 2.1.2 if I remember correctly. 

Glyn



Re: Why does slony use a cursor? Anyone know?

From
Shaun Thomas
Date:
On 03/06/2013 04:49 AM, Glyn Astill wrote:

> What version of slony are you on?  The specifics of what you mention
> don't sound quite right, but it sounds very much like bug 167 which
> was fixed in 2.1.2 if I remember correctly.

We're on 2.1.2. Presumably, anyway. I didn't encounter the problem in
stage when I set up a testbed. But it also might not be related. The
problem I can tell from the logs, is that it was closing the cursor
pretty much right as soon as it got the results. 75 seconds to set up a
cursor of that size and then an hour to sync all the data isn't a
problem. 75 seconds for every 500 rows *is*.

The stage test I did didn't do that when I deleted 20M rows from a 50M
row table, but I also only set it up with a single replication set. My
next test will be to test with two or three replication sets that all
get big deletes like that. My guess is that it can't adequately swap
between them on SYNC events, so it has to rebuild the cursor every time.

Either way, we're likely to be switching to an ETL system because we
need to start scaling horizontally soon. Unless I want to set up a bunch
of partition targets, we'll pretty much have to drop Slony then. I just
want to keep it working until then. :)

Thanks for the info!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Why does slony use a cursor? Anyone know?

From
Dean Rasheed
Date:
On 6 March 2013 14:35, Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 03/06/2013 04:49 AM, Glyn Astill wrote:
>
>> What version of slony are you on?  The specifics of what you mention
>> don't sound quite right, but it sounds very much like bug 167 which
>> was fixed in 2.1.2 if I remember correctly.
>
>
> We're on 2.1.2. Presumably, anyway. I didn't encounter the problem in stage
> when I set up a testbed. But it also might not be related. The problem I can
> tell from the logs, is that it was closing the cursor pretty much right as
> soon as it got the results. 75 seconds to set up a cursor of that size and
> then an hour to sync all the data isn't a problem. 75 seconds for every 500
> rows *is*.
>
> The stage test I did didn't do that when I deleted 20M rows from a 50M row
> table, but I also only set it up with a single replication set. My next test
> will be to test with two or three replication sets that all get big deletes
> like that. My guess is that it can't adequately swap between them on SYNC
> events, so it has to rebuild the cursor every time.
>
> Either way, we're likely to be switching to an ETL system because we need to
> start scaling horizontally soon. Unless I want to set up a bunch of
> partition targets, we'll pretty much have to drop Slony then. I just want to
> keep it working until then. :)
>

A cursor can make use of indexes for sorting, so an index on
sl_log_1/2(log_actionseq) may help.

Regards,
Dean


Re: Why does slony use a cursor? Anyone know?

From
Glyn Astill
Date:
> From: Shaun Thomas <sthomas@optionshouse.com>

> To: Glyn Astill <glynastill@yahoo.co.uk>
> Cc: PostgreSQL General <pgsql-general@postgresql.org>
> Sent: Wednesday, 6 March 2013, 14:35
> Subject: Re: [GENERAL] Why does slony use a cursor? Anyone know?
>
> On 03/06/2013 04:49 AM, Glyn Astill wrote:
>
>>  What version of slony are you on?  The specifics of what you mention
>>  don't sound quite right, but it sounds very much like bug 167 which
>>  was fixed in 2.1.2 if I remember correctly.
>
> We're on 2.1.2. Presumably, anyway. I didn't encounter the problem in
> stage when I set up a testbed. But it also might not be related. The problem I
> can tell from the logs, is that it was closing the cursor pretty much right as
> soon as it got the results. 75 seconds to set up a cursor of that size and then
> an hour to sync all the data isn't a problem. 75 seconds for every 500 rows
> *is*.
>
> The stage test I did didn't do that when I deleted 20M rows from a 50M row
> table, but I also only set it up with a single replication set. My next test
> will be to test with two or three replication sets that all get big deletes like
> that. My guess is that it can't adequately swap between them on SYNC events,
> so it has to rebuild the cursor every time.
>

Yeah, you'd expect the reason for using the cursor would be to pull those 500 lines into memory, process them and then get the next 500 etc.  I've not seen any such lags on our systems, that doesn't mean it's not happening with much milder symptoms.

You say it happened on your production setup but not when you tried to reproduce it in your test environment, so is there anything useful in the slony logs to suggest things were not quite right at the time? I'm guessing your slons were running and generating syncs.

I'd definitely be asking on the slony lists about this, either something isn't right with your setup or it's something they can resolve.



Re: Why does slony use a cursor? Anyone know?

From
Filip Rembiałkowski
Date:


On Tue, Mar 5, 2013 at 3:51 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
Hey everyone,

Frankly, I'm shocked at what I just found.

We did a delete last night of a few million rows, and come back this morning to find that slony is 9-hours behind. After some investigation, it became apparent that slony opens up a cursor and orders it by the log_actionseq column. Then it fetches 500 rows, and closes the cursor. So it's fetching several million rows into a cursor, to fetch 500, and then throw the rest away.



Why do you assume that opening a cursor with ORDER BY is equivalent to fetching all rows? It is not.

Just curious what made you think so,


thanks

 

Re: Why does slony use a cursor? Anyone know?

From
Shaun Thomas
Date:
On 03/07/2013 07:40 AM, Filip Rembiałkowski wrote:

> Why do you assume that opening a cursor with ORDER BY is equivalent
> to fetching all rows? It is not.

It is when 90% of a table's 65M rows are part of the result set.
Calculating the cursor puts the retrieved rows in a temporary space
until the cursor is freed. Calculating that many rows with that order
by, on our particular system took about 75 seconds. The problem wasn't
really that it was fetching the rows, but that it was closing the cursor
*right* afterwards. Like this:

2013-03-04 07:46:06 CST DEBUG1 remoteHelperThread_1_1: 78.010 seconds
until close cursor

We saw that in the slave log about every 80 seconds. When I set up slony
on our staging system, it only closed the cursor once all 20M rows were
fetched and deleted from the subscriber node. Clearly whatever was
happening in our production environment wasn't normal. But at the time,
it *looked* like slony was forcing it to calculate and fill the cursor,
getting the top 500 rows, and closing it right away on purpose. Turns
out, that was abnormal. :)

So, I'm not shocked anymore. A one time cursor calculation is fine. Now
I'm confused as to what broke it or why it was acting that way when that
clearly isn't the design goal.

Ah well.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Why does slony use a cursor? Anyone know?

From
Kevin Grittner
Date:
Shaun Thomas <sthomas@optionshouse.com> wrote:

> It is when 90% of a table's 65M rows are part of the result set.
> Calculating the cursor puts the retrieved rows in a temporary space
> until the cursor is freed. Calculating that many rows with that order
> by, on our particular system took about 75 seconds. The problem wasn't
> really that it was fetching the rows, but that it was closing the cursor
> *right* afterwards. Like this:
>
> 2013-03-04 07:46:06 CST DEBUG1 remoteHelperThread_1_1: 78.010 seconds
> until close cursor
>
> We saw that in the slave log about every 80 seconds. When I set up slony
> on our staging system, it only closed the cursor once all 20M rows were
> fetched and deleted from the subscriber node. Clearly whatever was
> happening in our production environment wasn't normal. But at the time,
> it *looked* like slony was forcing it to calculate and fill the cursor,
> getting the top 500 rows, and closing it right away on purpose. Turns
> out, that was abnormal. :)
>
> So, I'm not shocked anymore. A one time cursor calculation is fine. Now
> I'm confused as to what broke it or why it was acting that way when that
> clearly isn't the design goal.

Exactly what version was this?  Just about any queuing system (and
this definitely includes Slony and some JMS implementations) were
vulnerable to an autovacuum bug fixed with this patch from Jan
Wieck:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b19e4250b45e91c9cbdd18d35ea6391ab5961c8d

This fix was backpatched as far as 9.0 and is present in the latest
minor releases, but not earlier ones.  It can cause symptoms such
as you describe.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Why does slony use a cursor? Anyone know?

From
Shaun Thomas
Date:
On 03/07/2013 08:23 AM, Kevin Grittner wrote:

> Exactly what version was this?  Just about any queuing system (and
> this definitely includes Slony and some JMS implementations) were
> vulnerable to an autovacuum bug fixed with this patch from Jan
> Wieck:

Hmm. It's a 9.1.7 release, so if the latest is 9.1.8, that might be it.

It would explain why my contrived scenario in stage couldn't replicate
it, too. I'm going to see if I can trigger this behavior in stage by
creating a bunch of dead tuples in the slony log table while it's working.

Thanks, Kevin!

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email