Thread: Planner performance extremely affected by an hanging transaction (20-30 times)?

Planner performance extremely affected by an hanging transaction (20-30 times)?

From
Bartłomiej Romański
Date:
Hi all

We're experiencing a very strange performance issue. Our setup is a bit more complicated, but we've managed to isolate and replicate the core problem. Here's what we observe:

We took a strong machine (128 GB RAM, 8-core CPU, SSD drives...) and installed a fresh copy of PostgreSQL 9.2 (Ubuntu Server 12.04 LTS, default configuration).

Then, we created a test database with the following schema:

CREATE TABLE offers
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  category_id bigint NOT NULL,
  CONSTRAINT offers_pkey PRIMARY KEY (id)
);

CREATE TABLE categories
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  CONSTRAINT categories_pkey PRIMARY KEY (id)
);

and populated it with in the following way:

insert into categories (name) select 'category_' || x from generate_series(1,1000) as x;
insert into offers (name, category_id) select 'offer_' || x, floor(random() * 1000) + 1 from generate_series(1,1000*1000) as x;

Finally, we created a python script to make simple queries in a loop:

while True:
    id = random.randrange(1, 1000 * 1000)
    db.execute('select offers.idoffers.namecategories.idcategories.name from offers left join categories on categories.id = offers.category_id where offers.id = %s', (id,))
    print db.fetchall()

We start 20 instances simultaneously and measure performance:

parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null

Normally we observe about 30k QPS what's a satisfying result (without any tuning at all).

The problem occurs when we open a second console, start psql and type:

pgtest=> begin; insert into categories (name) select 'category_' || x from generate_series(1,1000) as x;

We start a transaction and insert 1k records to the 'categories' table. After that performance of the benchmark described above immediately drops to about 1-2k QPS. That's 20-30 times! After closing the transaction (committing or aborting - doesn't matter) it immediately jumps back to 30k QPS.

Restarting the running script and other simple tricks do not change anything. The hanging, open transaction is causing a huge slowdown. What's more when doing similar insert (even much larger) to the 'offers' table we do not observe this effect.

We analyzed the problem a bit deeper and find out that the actual query execution times are not affected that much. They are constantly close to 0.5 ms. This can be observed in a server log (after enabling appropriate option) and this can be found in 'explain analyze...' result. Also the query plan returned do not change and looks optimal (pkey scan for 'offers' + pkey scan for 'categories').

After a few random thought we've finally turned the 'log_planner_stats' option and found out that the planner executions times are highly affected by the hanging transaction. Here's the typical output in the initial situation:

2013-09-17 21:54:59 UTC LOG:  PLANNER STATISTICS
2013-09-17 21:54:59 UTC DETAIL:  ! system usage stats:
        !       0.000137 elapsed 0.000000 user 0.000000 system sec
        !       [2.169670 user 0.383941 sys total]
        !       0/0 [0/11520] filesystem blocks in/out
        !       0/0 [0/7408] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       0/0 [1362/7648] voluntary/involuntary context switches

And here's a typical output with a hanging transaction:

2013-09-17 21:56:12 UTC LOG:  PLANNER STATISTICS
2013-09-17 21:56:12 UTC DETAIL:  ! system usage stats:
        !       0.027251 elapsed 0.008999 user 0.001000 system sec
        !       [32.722025 user 3.550460 sys total]
        !       0/0 [0/115128] filesystem blocks in/out
        !       0/0 [0/7482] page faults/reclaims, 0 [0] swaps
        !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
        !       1/6 [12817/80202] voluntary/involuntary context switches

As you can see it takes over 100 times more time when the extra transaction is open!

Any ideas why's that? It extremely affects total query time. 

I know that using prepared statements to solves the problem completely, so we can deal with it, but we've already spend a lot of time to go that far and I'm personally a bit curious what's the fundamental reason for such a behavior.

I'll be very thankful for any explanation what's going on here!

Thanks,
BR
Hi Bart,

You are doing heavy random reads in addition to a 1000k row insert
within a single transaction.

Also it is not clear whether your query within the python loop is itself
within a transaction. (i.e. pyscopg2.connection.autocommit to True,
disables transactional queries).

Depending on your pg adapter, it may open a transaction by default even
though it may not be required.

So please clarify whether this is the case.

Regards,

Julian.

On 20/09/13 10:49, Bartłomiej Romański wrote:
> Hi all
>
> We're experiencing a very strange performance issue. Our setup is a
> bit more complicated, but we've managed to isolate and replicate the
> core problem. Here's what we observe:
>
> We took a strong machine (128 GB RAM, 8-core CPU, SSD drives...) and
> installed a fresh copy of PostgreSQL 9.2 (Ubuntu Server 12.04 LTS,
> default configuration).
>
> Then, we created a test database with the following schema:
>
> CREATE TABLE offers
> (
> id bigserial NOT NULL,
> name character varying NOT NULL,
> category_id bigint NOT NULL,
> CONSTRAINT offers_pkey PRIMARY KEY (id)
> );
>
> CREATE TABLE categories
> (
> id bigserial NOT NULL,
> name character varying NOT NULL,
> CONSTRAINT categories_pkey PRIMARY KEY (id)
> );
>
> and populated it with in the following way:
>
> insert into categories (name) select 'category_' || x from
> generate_series(1,1000) as x;
> insert into offers (name, category_id) select 'offer_' || x,
> floor(random() * 1000) + 1 from generate_series(1,1000*1000) as x;
>
> Finally, we created a python script to make simple queries in a loop:
>
> while True:
> id = random.randrange(1, 1000 * 1000)
> db.execute('select offers.id <http://offers.id/>, offers.name
> <http://offers.name/>, categories.id <http://categories.id/>,
> categories.name <http://categories.name/> from offers left join
> categories on categories.id <http://categories.id/> =
> offers.category_id where offers.id <http://offers.id/> = %s', (id,))
> print db.fetchall()
>
> We start 20 instances simultaneously and measure performance:
>
> parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null
>
> Normally we observe about 30k QPS what's a satisfying result (without
> any tuning at all).
>
> The problem occurs when we open a second console, start psql and type:
>
> pgtest=> begin; insert into categories (name) select 'category_' || x
> from generate_series(1,1000) as x;
>
> We start a transaction and insert 1k records to the 'categories'
> table. After that performance of the benchmark described above
> immediately drops to about 1-2k QPS. That's 20-30 times! After closing
> the transaction (committing or aborting - doesn't matter) it
> immediately jumps back to 30k QPS.
>
> Restarting the running script and other simple tricks do not change
> anything. The hanging, open transaction is causing a huge slowdown.
> What's more when doing similar insert (even much larger) to the
> 'offers' table we do not observe this effect.
>
> We analyzed the problem a bit deeper and find out that the actual
> query execution times are not affected that much. They are constantly
> close to 0.5 ms. This can be observed in a server log (after enabling
> appropriate option) and this can be found in 'explain analyze...'
> result. Also the query plan returned do not change and looks optimal
> (pkey scan for 'offers' + pkey scan for 'categories').
>
> After a few random thought we've finally turned the
> 'log_planner_stats' option and found out that the planner executions
> times are highly affected by the hanging transaction. Here's the
> typical output in the initial situation:
>
> 2013-09-17 21:54:59 UTC LOG: PLANNER STATISTICS
> 2013-09-17 21:54:59 UTC DETAIL: ! system usage stats:
> ! 0.000137 elapsed 0.000000 user 0.000000 system sec
> ! [2.169670 user 0.383941 sys total]
> ! 0/0 [0/11520] filesystem blocks in/out
> ! 0/0 [0/7408] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
> ! 0/0 [1362/7648] voluntary/involuntary context switches
>
> And here's a typical output with a hanging transaction:
>
> 2013-09-17 21:56:12 UTC LOG: PLANNER STATISTICS
> 2013-09-17 21:56:12 UTC DETAIL: ! system usage stats:
> ! 0.027251 elapsed 0.008999 user 0.001000 system sec
> ! [32.722025 user 3.550460 sys total]
> ! 0/0 [0/115128] filesystem blocks in/out
> ! 0/0 [0/7482] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
> ! 1/6 [12817/80202] voluntary/involuntary context switches
>
> As you can see it takes over 100 times more time when the extra
> transaction is open!
>
> Any ideas why's that? It extremely affects total query time.
>
> I know that using prepared statements to solves the problem
> completely, so we can deal with it, but we've already spend a lot of
> time to go that far and I'm personally a bit curious what's the
> fundamental reason for such a behavior.
>
> I'll be very thankful for any explanation what's going on here!
>
> Thanks,
> BR



Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

From
Bartłomiej Romański
Date:
Hi Julian,

Here's my complete python script:

import psycopg2
import random
import math
import time

connection = psycopg2.connect('host=localhost dbname=pgtest user=pgtest password=pgtest')
cursor = connection.cursor()

while True:
    id = random.randrange(1, 1000 * 1000)
    cursor.execute('select offers.id, offers.name, categories.id, categories.name from offers left join categories on categories.id = offers.category_id where offers.id = %s', (id,))
    print cursor.fetchall()

So I assume that each of 20 instances opens and uses a single transaction. I've just tested the options with connection.autocommit = True at the begging, but it does not change anything. Also in production (where we first noticed the problem) we use a new transaction for every select.

I start 20 instances of this python script (I use pv to measure performance):

parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null

And then in a second console, I open psql and type: 

pgtest=> begin; insert into categories (name) select 'category_' || x from generate_series(1,1000) as x;

The QPS displayed be the first command drops immediately 20-30 times and stays low as long as the transaction with insert is open.

Here's the script that I use to initiate the database. You should be able to replicate the situation easily yourself if you want.

drop database pgtest;
drop user pgtest;
create user pgtest with password 'pgtest';
create database pgtest;
grant all privileges on database pgtest to pgtest;
\connect pgtest
CREATE TABLE categories
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  CONSTRAINT categories_pkey PRIMARY KEY (id)
);
CREATE TABLE offers
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  category_id bigint NOT NULL,
  CONSTRAINT offers_pkey PRIMARY KEY (id)
);
insert into categories (name) select 'category_' || x from generate_series(1,1000) as x;
insert into offers (name, category_id) select 'offer_' || x, floor(random() * 1000) + 1 from generate_series(1,1000*1000) as x;
alter table categories owner to pgtest;
alter table offers owner to pgtest;

Thanks,
Bartek




On Fri, Sep 20, 2013 at 3:42 AM, Julian <tempura@internode.on.net> wrote:
Hi Bart,

You are doing heavy random reads in addition to a 1000k row insert
within a single transaction.

Also it is not clear whether your query within the python loop is itself
within a transaction. (i.e. pyscopg2.connection.autocommit to True,
disables transactional queries).

Depending on your pg adapter, it may open a transaction by default even
though it may not be required.

So please clarify whether this is the case.

Regards,

Julian.

On 20/09/13 10:49, Bartłomiej Romański wrote:
> Hi all
>
> We're experiencing a very strange performance issue. Our setup is a
> bit more complicated, but we've managed to isolate and replicate the
> core problem. Here's what we observe:
>
> We took a strong machine (128 GB RAM, 8-core CPU, SSD drives...) and
> installed a fresh copy of PostgreSQL 9.2 (Ubuntu Server 12.04 LTS,
> default configuration).
>
> Then, we created a test database with the following schema:
>
> CREATE TABLE offers
> (
> id bigserial NOT NULL,
> name character varying NOT NULL,
> category_id bigint NOT NULL,
> CONSTRAINT offers_pkey PRIMARY KEY (id)
> );
>
> CREATE TABLE categories
> (
> id bigserial NOT NULL,
> name character varying NOT NULL,
> CONSTRAINT categories_pkey PRIMARY KEY (id)
> );
>
> and populated it with in the following way:
>
> insert into categories (name) select 'category_' || x from
> generate_series(1,1000) as x;
> insert into offers (name, category_id) select 'offer_' || x,
> floor(random() * 1000) + 1 from generate_series(1,1000*1000) as x;
>
> Finally, we created a python script to make simple queries in a loop:
>
> while True:
> id = random.randrange(1, 1000 * 1000)
> db.execute('select offers.id <http://offers.id/>, offers.name
> <http://offers.name/>, categories.id <http://categories.id/>,
> categories.name <http://categories.name/> from offers left join
> categories on categories.id <http://categories.id/> =
> offers.category_id where offers.id <http://offers.id/> = %s', (id,))
> print db.fetchall()
>
> We start 20 instances simultaneously and measure performance:
>
> parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null
>
> Normally we observe about 30k QPS what's a satisfying result (without
> any tuning at all).
>
> The problem occurs when we open a second console, start psql and type:
>
> pgtest=> begin; insert into categories (name) select 'category_' || x
> from generate_series(1,1000) as x;
>
> We start a transaction and insert 1k records to the 'categories'
> table. After that performance of the benchmark described above
> immediately drops to about 1-2k QPS. That's 20-30 times! After closing
> the transaction (committing or aborting - doesn't matter) it
> immediately jumps back to 30k QPS.
>
> Restarting the running script and other simple tricks do not change
> anything. The hanging, open transaction is causing a huge slowdown.
> What's more when doing similar insert (even much larger) to the
> 'offers' table we do not observe this effect.
>
> We analyzed the problem a bit deeper and find out that the actual
> query execution times are not affected that much. They are constantly
> close to 0.5 ms. This can be observed in a server log (after enabling
> appropriate option) and this can be found in 'explain analyze...'
> result. Also the query plan returned do not change and looks optimal
> (pkey scan for 'offers' + pkey scan for 'categories').
>
> After a few random thought we've finally turned the
> 'log_planner_stats' option and found out that the planner executions
> times are highly affected by the hanging transaction. Here's the
> typical output in the initial situation:
>
> 2013-09-17 21:54:59 UTC LOG: PLANNER STATISTICS
> 2013-09-17 21:54:59 UTC DETAIL: ! system usage stats:
> ! 0.000137 elapsed 0.000000 user 0.000000 system sec
> ! [2.169670 user 0.383941 sys total]
> ! 0/0 [0/11520] filesystem blocks in/out
> ! 0/0 [0/7408] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
> ! 0/0 [1362/7648] voluntary/involuntary context switches
>
> And here's a typical output with a hanging transaction:
>
> 2013-09-17 21:56:12 UTC LOG: PLANNER STATISTICS
> 2013-09-17 21:56:12 UTC DETAIL: ! system usage stats:
> ! 0.027251 elapsed 0.008999 user 0.001000 system sec
> ! [32.722025 user 3.550460 sys total]
> ! 0/0 [0/115128] filesystem blocks in/out
> ! 0/0 [0/7482] page faults/reclaims, 0 [0] swaps
> ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
> ! 1/6 [12817/80202] voluntary/involuntary context switches
>
> As you can see it takes over 100 times more time when the extra
> transaction is open!
>
> Any ideas why's that? It extremely affects total query time.
>
> I know that using prepared statements to solves the problem
> completely, so we can deal with it, but we've already spend a lot of
> time to go that far and I'm personally a bit curious what's the
> fundamental reason for such a behavior.
>
> I'll be very thankful for any explanation what's going on here!
>
> Thanks,
> BR



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

On Thu, Sep 19, 2013 at 5:49 PM, Bartłomiej Romański <br@sentia.pl> wrote:

Finally, we created a python script to make simple queries in a loop:

while True:
    id = random.randrange(1, 1000 * 1000)
    db.execute('select offers.idoffers.namecategories.idcategories.name from offers left join categories on categories.id = offers.category_id where offers.id = %s', (id,))
    print db.fetchall()

We start 20 instances simultaneously and measure performance:

parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null

Normally we observe about 30k QPS what's a satisfying result (without any tuning at all).

The problem occurs when we open a second console, start psql and type:

pgtest=> begin; insert into categories (name) select 'category_' || x from generate_series(1,1000) as x;


Related topics have been discussed recently, but without much apparent resolution.

See "In progress INSERT wrecks plans on table" and "Performance bug in prepared statement binding in 9.2" also on this list

The issues are:

1) The planner actually queries the relation to find the end points of the variable ranges, rather than using potentially out-of-date statistics.

2) When doing so, it needs to wade through the "in-progress" rows, figuring out whether the owning transaction is still in progress or has already committed or aborted.  If the owning transaction *has* committed or rolled back, then it can set hint bits so that future executions don't need to do this.  But if the owning transaction is still open, then the querying transaction has done the work, but is not able to set any hint bits so other executions also need to do the work, repeatedly until the other transactions finishes.

3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked.  So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query.

One idea (from Simon, I think) was to remember that a transaction was just checked and was in progress, and not checking it again for future rows.  In the future the transaction might have committed, but since it would have committed after we took the snapshot, thinking it is still in progress would not be a correctness problem, it would just needlessly delay setting the hint bits.  

Another idea was not to check if it were in progress at all, because if it is in the snapshot it doesn't matter if it is still in progress.  This would a slightly more aggressive way to delay setting the hint bit (but also delay doing the work needed to figure out how to set them).

Items 2 and 3 and can also arise in situations other than paired with 1.


Cheers,

Jeff

On 21/09/2013, at 00.01, Jeff Janes <jeff.janes@gmail.com> wrote:
See "In progress INSERT wrecks plans on table" and "Performance bug in prepared statement binding in 9.2" also on this list

This feels like the same




The issues are:

1) The planner actually queries the relation to find the end points of the variable ranges, rather than using potentially out-of-date statistics.


In my app i would prefer potentially out-of-date statistics instead.

Jesper

Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

From
Kevin Grittner
Date:
Jeff Janes <jeff.janes@gmail.com> wrote:

> 1) The planner actually queries the relation to find the end
> points of the variable ranges, rather than using potentially
> out-of-date statistics.

Are we talking about the probe for the end (or beginning) of an
index?  If so, should we even care about visibility of the row
related to the most extreme index entry?  Should we even go to the
heap during the plan phase?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Kevin Grittner <kgrittn@ymail.com> writes:
> Are we talking about the probe for the end (or beginning) of an
> index?  If so, should we even care about visibility of the row
> related to the most extreme index entry?  Should we even go to the
> heap during the plan phase?

Consider the case where some transaction inserted a wildly out-of-range
value, then rolled back.  If we don't check validity of the heap row,
we'd be using that silly endpoint value for planning purposes ---
indefinitely.  That's not an improvement over the situation that the
probe is meant to fix.

            regards, tom lane


Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

From
jesper@krogh.cc
Date:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Are we talking about the probe for the end (or beginning) of an
>> index?  If so, should we even care about visibility of the row
>> related to the most extreme index entry?  Should we even go to the
>> heap during the plan phase?
>
> Consider the case where some transaction inserted a wildly out-of-range
> value, then rolled back.  If we don't check validity of the heap row,
> we'd be using that silly endpoint value for planning purposes ---
> indefinitely.  That's not an improvement over the situation that the
> probe is meant to fix.

Apparently it is waiting for locks, cant the check be make in a
"non-blocking" way, so if it ends up waiting for a lock then it just
assumes non-visible and moves onto the next non-blocking?

This stuff is a 9.2 feature right? What was the original problem to be
adressed?

--
Jesper



On 09/24/2013 08:01 AM, jesper@krogh.cc wrote:
> This stuff is a 9.2 feature right? What was the original problem to be
> adressed?

Earlier, actually.  9.1?  9.0?

The problem addressed was that, for tables with a "progressive" value
like a sequence or a timestamp, the planner tended to estimate 1 row any
time the user queried the 10,000 most recent rows due to the stats being
out-of-date.  This resulted in some colossally bad query plans for a
very common situation.

So there's no question that the current behavior is an improvement,
since it affects *only* users who have left an idle transaction open for
long periods of time, something you're not supposed to do anyway.  Not
that we shouldn't fix it (and backport the fix), but we don't want to
regress to the prior planner behavior.

However, a solution is not readily obvious:

On 09/24/2013 03:35 AM, Tom Lane wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> > Are we talking about the probe for the end (or beginning) of an
>> > index?  If so, should we even care about visibility of the row
>> > related to the most extreme index entry?  Should we even go to the
>> > heap during the plan phase?
> Consider the case where some transaction inserted a wildly out-of-range
> value, then rolled back.  If we don't check validity of the heap row,
> we'd be using that silly endpoint value for planning purposes ---
> indefinitely.  That's not an improvement over the situation that the
> probe is meant to fix.

Agreed.  And I'll also attest that the patch did fix a chronic bad
planner issue.

On 09/20/2013 03:01 PM, Jeff Janes wrote:> 3) Even worse, asking if a
given transaction has finished yet can be a
> serious point of system-wide contention, because it takes the
> ProcArrayLock, once per row which needs to be checked.  So you have 20
> processes all fighting over the ProcArrayLock, each doing so 1000
times per
> query.

Why do we need a procarraylock for this?  Seems like the solution would
be not to take a lock at all; the information on transaction commit is
in the clog, after all.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Hi


On Tue, Sep 24, 2013 at 5:01 PM, <jesper@krogh.cc> wrote:

Apparently it is waiting for locks, cant the check be make in a
"non-blocking" way, so if it ends up waiting for a lock then it just
assumes non-visible and moves onto the next non-blocking?

Not only, it's a reason but you can get the same slow down with only  one client and a bigger insert.

The issue is that a btree search for one value  degenerate to a linear search other  1000 or more tuples.

As a matter of fact you get the same slow down after a rollback until autovacuum, and if autovacuum can't keep up...

Didier

On Tue, Sep 24, 2013 at 11:03 AM, didier <did447@gmail.com> wrote:
Hi


On Tue, Sep 24, 2013 at 5:01 PM, <jesper@krogh.cc> wrote:

Apparently it is waiting for locks, cant the check be make in a
"non-blocking" way, so if it ends up waiting for a lock then it just
assumes non-visible and moves onto the next non-blocking?

Not only, it's a reason but you can get the same slow down with only  one client and a bigger insert.

The issue is that a btree search for one value  degenerate to a linear search other  1000 or more tuples.

As a matter of fact you get the same slow down after a rollback until autovacuum, and if autovacuum can't keep up...

Have you experimentally verified the last part?  btree indices have some special kill-tuple code which should remove aborted tuples from the index the first time they are encountered, without need for a vacuum.

Cheers,

Jeff 

Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

From
Bartłomiej Romański
Date:
As a matter of fact you get the same slow down after a rollback until autovacuum, and if autovacuum can't keep up...

Actually, this is not what we observe. The performance goes back to the normal level immediately after committing or aborting the transaction.



On Wed, Sep 25, 2013 at 1:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Sep 24, 2013 at 11:03 AM, didier <did447@gmail.com> wrote:
Hi


On Tue, Sep 24, 2013 at 5:01 PM, <jesper@krogh.cc> wrote:

Apparently it is waiting for locks, cant the check be make in a
"non-blocking" way, so if it ends up waiting for a lock then it just
assumes non-visible and moves onto the next non-blocking?

Not only, it's a reason but you can get the same slow down with only  one client and a bigger insert.

The issue is that a btree search for one value  degenerate to a linear search other  1000 or more tuples.

As a matter of fact you get the same slow down after a rollback until autovacuum, and if autovacuum can't keep up...

Have you experimentally verified the last part?  btree indices have some special kill-tuple code which should remove aborted tuples from the index the first time they are encountered, without need for a vacuum.

Cheers,

Jeff 

Hi


On Wed, Sep 25, 2013 at 1:30 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Sep 24, 2013 at 11:03 AM, didier <did447@gmail.com> wrote:

As a matter of fact you get the same slow down after a rollback until autovacuum, and if autovacuum can't keep up...

Have you experimentally verified the last part?  btree indices have some special kill-tuple code which should remove aborted tuples from the index the first time they are encountered, without need for a vacuum.

Cheers,

Jeff
 
 
 

Yes my bad, it works but there's leftover  junk and a vacuum is still needed

Running above test with autovacuum off, 1 client  and insert 50 000 on postgresql 9.4 qit version.
Before insert 2 000 queries/s
after insert 80/s
after rollback 800/s (back to 2 000/s if commit)
after vacuum 2 000 /s again and vacuum output:

INFO:  vacuuming "public.categories"                                                                                                                
INFO:  scanned index "categories_pkey" to remove 50000 row versions                                                                                 
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.                                                                                                      
INFO:  "categories": removed 50000 row versions in 319 pages                                                                                        
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.                                                                                                      
INFO:  index "categories_pkey" now contains 1000 row versions in 278 pages                                                                          
DETAIL:  50000 index row versions were removed.                                                                                                     
272 index pages have been deleted, 136 are currently reusable.                                                                                      
CPU 0.00s/0.00u sec elapsed 0.00 sec.                                                                                                               
INFO:  "categories": found 50000 removable, 1000 nonremovable row versions in 325 out of 325 pages                                                  
DETAIL:  0 dead row versions cannot be removed yet.                                                                                                 
There were 0 unused item pointers.                                                                                                                  
0 pages are entirely empty.                                                                                                                         
CPU 0.00s/0.01u sec elapsed 0.02 sec.                                                                                                               
INFO:  "categories": stopping truncate due to conflicting lock request                                                                              
INFO:  vacuuming "pg_toast.pg_toast_16783"                                                                                                          
INFO:  index "pg_toast_16783_index" now contains 0 row versions in 1 pages                                                                          
DETAIL:  0 index row versions were removed.                                                                                                         
0 index pages have been deleted, 0 are currently reusable.                                                                                          
CPU 0.00s/0.00u sec elapsed 0.00 sec.                                                                                                               
INFO:  "pg_toast_16783": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages                                                         
DETAIL:  0 dead row versions cannot be removed yet.                                                                                                 
There were 0 unused item pointers.                                                                                                                  
0 pages are entirely empty.                                                                                                                         
CPU 0.00s/0.00u sec elapsed 0.00 sec.                                                                                                               
INFO:  analyzing "public.categories"                                                                                                                
INFO:  "categories": scanned 325 of 325 pages, containing 1000 live rows and 0 dead rows; 1000 rows in sample, 1000 estimated total rows         
  
perf output after rollback but before vacuum
    93.36%         postgres  /var/lib/vz/root/165/usr/local/pgsql/bin/postgres    
                |         
                |--41.51%-- _bt_checkkeys
                |          |         
                |          |--93.03%-- _bt_readpage
                |          |          |         
                |          |          |--97.46%-- _bt_steppage
                |          |          |          _bt_first
                |          |          |          btgettuple
                |          |          |          FunctionCall2Coll
                |          |          |          index_getnext_tid


Didier
On Tue, Sep 24, 2013 at 3:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kevin Grittner <kgrittn@ymail.com> writes:
> Are we talking about the probe for the end (or beginning) of an
> index?  If so, should we even care about visibility of the row
> related to the most extreme index entry?  Should we even go to the
> heap during the plan phase?

Consider the case where some transaction inserted a wildly out-of-range
value, then rolled back.  If we don't check validity of the heap row,
we'd be using that silly endpoint value for planning purposes ---
indefinitely.


Would it really be indefinite?  Would it be different from if someone inserted a wild value, committed, then deleted it and committed that?  It seems like eventually the histogram would have to get rebuilt with the ability to shrink the range.

To get really complicated, it could stop at an in-progress tuple and use its value for immediate purposes, but suppress storing it in the histogram (storing only committed, not in-progress, values).

Cheers,

Jeff
On Tue, Sep 24, 2013 at 10:43 AM, Josh Berkus <josh@agliodbs.com> wrote:
On 09/24/2013 08:01 AM, jesper@krogh.cc wrote:
> This stuff is a 9.2 feature right? What was the original problem to be
> adressed?

Earlier, actually.  9.1?  9.0?

The problem addressed was that, for tables with a "progressive" value
like a sequence or a timestamp, the planner tended to estimate 1 row any
time the user queried the 10,000 most recent rows due to the stats being
out-of-date.  This resulted in some colossally bad query plans for a
very common situation.

So there's no question that the current behavior is an improvement,
since it affects *only* users who have left an idle transaction open for
long periods of time, something you're not supposed to do anyway.

Some transaction just take a long time to complete their work.  If the first thing it does is insert these poisoned values, then go on to do other intensive work on other tables, it can do some serious damage without being idle.

 
 Not
that we shouldn't fix it (and backport the fix), but we don't want to
regress to the prior planner behavior.

However, a solution is not readily obvious:

The mergejoinscansel code is almost pathologically designed to exercise this case (which seems to be what is doing in the original poster) because it systematically probes the highest and lowest values from one table against the other.  If they have the same range, that means it will always be testing the upper limit.  Perhaps mergejoinscansel could pass a flag to prevent the look-up from happening.  My gut feeling is that mergejoin it would not be very sensitive to the progressive value issue, but I can't really back that up.  On the other hand, if we could just make getting the actual value faster then everyone would be better off.
 

On 09/20/2013 03:01 PM, Jeff Janes wrote:> 3) Even worse, asking if a
given transaction has finished yet can be a
> serious point of system-wide contention, because it takes the
> ProcArrayLock, once per row which needs to be checked.  So you have 20
> processes all fighting over the ProcArrayLock, each doing so 1000
times per
> query.

Why do we need a procarraylock for this?  Seems like the solution would
be not to take a lock at all; the information on transaction commit is
in the clog, after all.

My understanding is that you are not allowed to check the clog until after you verify the transaction is no longer in progress, otherwise you open up race conditions.
 
Cheers,

Jeff
On 09/25/2013 12:06 AM, Jeff Janes wrote:
>> Why do we need a procarraylock for this?  Seems like the solution would
>> be not to take a lock at all; the information on transaction commit is
>> in the clog, after all.
>>
>
> My understanding is that you are not allowed to check the clog until after
> you verify the transaction is no longer in progress, otherwise you open up
> race conditions.

In this particular case, I'd argue that we don't care about race
conditions -- it's a plan estimate.  We certainly care about them a lot
less than lock-blocks.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

From
Andres Freund
Date:
On 2013-09-25 00:06:06 -0700, Jeff Janes wrote:
> > On 09/20/2013 03:01 PM, Jeff Janes wrote:> 3) Even worse, asking if a
> > given transaction has finished yet can be a
> > > serious point of system-wide contention, because it takes the
> > > ProcArrayLock, once per row which needs to be checked.  So you have 20
> > > processes all fighting over the ProcArrayLock, each doing so 1000
> > times per
> > > query.

That should be gone in master, we don't use SnapshotNow anymore which
had those TransactionIdIsInProgress() calls you're probably referring
to. The lookups discussed in this thread now use the statement's
snapshot. And all those have their own copy of the currently running
transactions.

> > Why do we need a procarraylock for this?  Seems like the solution would
> > be not to take a lock at all; the information on transaction commit is
> > in the clog, after all.

More clog accesses would hardly improve the situation.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


On Wed, Sep 25, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-25 00:06:06 -0700, Jeff Janes wrote:
> > On 09/20/2013 03:01 PM, Jeff Janes wrote:> 3) Even worse, asking if a
> > given transaction has finished yet can be a
> > > serious point of system-wide contention, because it takes the
> > > ProcArrayLock, once per row which needs to be checked.  So you have 20
> > > processes all fighting over the ProcArrayLock, each doing so 1000
> > times per
> > > query.

That should be gone in master, we don't use SnapshotNow anymore which
had those TransactionIdIsInProgress() calls you're probably referring
to. The lookups discussed in this thread now use the statement's
snapshot. And all those have their own copy of the currently running
transactions.


See HeapTupleSatisfiesMVCC, near line 943 of tqual.c:

        else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
            return false;
        else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
            SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
                        HeapTupleHeaderGetXmin(tuple));
 

If we guarded that check by moving up line 961 to before 943:

    if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot))
        return false;           /* treat as still in progress */

Then we could avoid the contention, as that check only refers to local memory. 

As far as I can tell, the only downside of doing that is that, since hint bits might be set later, it is possible some dirty pages will get written unhinted and then re-dirtied by the hint bit setting, when more aggressive setting would have only one combined dirty write instead.  But that seems rather hypothetical, and if it really is a problem we should probably tackle it directly rather than by barring other optimizations.

Cheers,

Jeff

Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

From
Andres Freund
Date:
On 2013-09-25 11:17:51 -0700, Jeff Janes wrote:
> On Wed, Sep 25, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com>wrote:
>
> > On 2013-09-25 00:06:06 -0700, Jeff Janes wrote:
> > > > On 09/20/2013 03:01 PM, Jeff Janes wrote:> 3) Even worse, asking if a
> > > > given transaction has finished yet can be a
> > > > > serious point of system-wide contention, because it takes the
> > > > > ProcArrayLock, once per row which needs to be checked.  So you have
> > 20
> > > > > processes all fighting over the ProcArrayLock, each doing so 1000
> > > > times per
> > > > > query.
> >
> > That should be gone in master, we don't use SnapshotNow anymore which
> > had those TransactionIdIsInProgress() calls you're probably referring
> > to. The lookups discussed in this thread now use the statement's
> > snapshot. And all those have their own copy of the currently running
> > transactions.

> See HeapTupleSatisfiesMVCC, near line 943 of tqual.c:
>
>         else if (TransactionIdIsInProgress(HeapTupleHeaderGetXmin(tuple)))
>             return false;
>         else if (TransactionIdDidCommit(HeapTupleHeaderGetXmin(tuple)))
>             SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
>                         HeapTupleHeaderGetXmin(tuple));
>

Hm, sorry, misrembered things a bit there.

> If we guarded that check by moving up line 961 to before 943:
>
>     if (XidInMVCCSnapshot(HeapTupleHeaderGetXmin(tuple), snapshot))
>         return false;           /* treat as still in progress */
>
> Then we could avoid the contention, as that check only refers to local
> memory.

That wouldn't be correct afaics - the current TransactionIdIsInProgress
callsite is only called when no HEAP_XMIN_COMMITTED was set. So you
would have to duplicate it.

> As far as I can tell, the only downside of doing that is that, since hint
> bits might be set later, it is possible some dirty pages will get written
> unhinted and then re-dirtied by the hint bit setting, when more aggressive
> setting would have only one combined dirty write instead.  But that seems
> rather hypothetical, and if it really is a problem we should probably
> tackle it directly rather than by barring other optimizations.

I am - as evidenced - too tired to think about this properly, but I
think you might be right here.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Andres, Jeff,


>> As far as I can tell, the only downside of doing that is that, since hint
>> bits might be set later, it is possible some dirty pages will get written
>> unhinted and then re-dirtied by the hint bit setting, when more aggressive
>> setting would have only one combined dirty write instead.  But that seems
>> rather hypothetical, and if it really is a problem we should probably
>> tackle it directly rather than by barring other optimizations.
>
> I am - as evidenced - too tired to think about this properly, but I
> think you might be right here.

Any thoughts on a fix for this we could get into 9.2.5?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

From
Andres Freund
Date:
On 2013-09-27 13:57:02 -0700, Josh Berkus wrote:
> Andres, Jeff,
>
>
> >> As far as I can tell, the only downside of doing that is that, since hint
> >> bits might be set later, it is possible some dirty pages will get written
> >> unhinted and then re-dirtied by the hint bit setting, when more aggressive
> >> setting would have only one combined dirty write instead.  But that seems
> >> rather hypothetical, and if it really is a problem we should probably
> >> tackle it directly rather than by barring other optimizations.
> >
> > I am - as evidenced - too tired to think about this properly, but I
> > think you might be right here.
>
> Any thoughts on a fix for this we could get into 9.2.5?

I don't see much chance to apply anything like this in a
backbranch. Changing IO patterns in a noticeable way in a minor release
is just asking for trouble.

Also, this really isn't going to fix the issue discussed here - this was
just about the additional ProcArrayLock contention. I don't think it
would change anything dramatical in your case.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Andres Freund <andres@2ndquadrant.com> writes:
> Also, this really isn't going to fix the issue discussed here - this was
> just about the additional ProcArrayLock contention. I don't think it
> would change anything dramatical in your case.

All of these proposals are pretty scary for back-patching purposes,
anyway.  I think what we should consider doing is just changing
get_actual_variable_range() to use a cheaper snapshot type, as in
the attached patch (which is for 9.3 but applies to 9.2 with slight
offset).  On my machine, this seems to make the pathological behavior
in BR's test case go away just fine.  I'd be interested to hear what
it does in the real-world scenarios being complained of.

            regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d8c1a88..d1b9473 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** get_actual_variable_range(PlannerInfo *r
*** 4951,4956 ****
--- 4951,4957 ----
              HeapTuple    tup;
              Datum        values[INDEX_MAX_KEYS];
              bool        isnull[INDEX_MAX_KEYS];
+             SnapshotData SnapshotDirty;

              estate = CreateExecutorState();
              econtext = GetPerTupleExprContext(estate);
*************** get_actual_variable_range(PlannerInfo *r
*** 4973,4978 ****
--- 4974,4980 ----
              slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel));
              econtext->ecxt_scantuple = slot;
              get_typlenbyval(vardata->atttype, &typLen, &typByVal);
+             InitDirtySnapshot(SnapshotDirty);

              /* set up an IS NOT NULL scan key so that we ignore nulls */
              ScanKeyEntryInitialize(&scankeys[0],
*************** get_actual_variable_range(PlannerInfo *r
*** 4989,4996 ****
              /* If min is requested ... */
              if (min)
              {
!                 index_scan = index_beginscan(heapRel, indexRel, SnapshotNow,
!                                              1, 0);
                  index_rescan(index_scan, scankeys, 1, NULL, 0);

                  /* Fetch first tuple in sortop's direction */
--- 4991,5013 ----
              /* If min is requested ... */
              if (min)
              {
!                 /*
!                  * In principle, we should scan the index with our current
!                  * active snapshot, which is the best approximation we've got
!                  * to what the query will see when executed.  But that won't
!                  * be exact if a new snap is taken before running the query,
!                  * and it can be very expensive if a lot of uncommitted rows
!                  * exist at the end of the index (because we'll laboriously
!                  * fetch each one and reject it).  What seems like a good
!                  * compromise is to use SnapshotDirty.    That will accept
!                  * uncommitted rows, and thus avoid fetching multiple heap
!                  * tuples in this scenario.  On the other hand, it will reject
!                  * known-dead rows, and thus not give a bogus answer when the
!                  * extreme value has been deleted; that case motivates not
!                  * using SnapshotAny here.
!                  */
!                 index_scan = index_beginscan(heapRel, indexRel,
!                                              &SnapshotDirty, 1, 0);
                  index_rescan(index_scan, scankeys, 1, NULL, 0);

                  /* Fetch first tuple in sortop's direction */
*************** get_actual_variable_range(PlannerInfo *r
*** 5021,5028 ****
              /* If max is requested, and we didn't find the index is empty */
              if (max && have_data)
              {
!                 index_scan = index_beginscan(heapRel, indexRel, SnapshotNow,
!                                              1, 0);
                  index_rescan(index_scan, scankeys, 1, NULL, 0);

                  /* Fetch first tuple in reverse direction */
--- 5038,5045 ----
              /* If max is requested, and we didn't find the index is empty */
              if (max && have_data)
              {
!                 index_scan = index_beginscan(heapRel, indexRel,
!                                              &SnapshotDirty, 1, 0);
                  index_rescan(index_scan, scankeys, 1, NULL, 0);

                  /* Fetch first tuple in reverse direction */

I wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> Also, this really isn't going to fix the issue discussed here - this was
>> just about the additional ProcArrayLock contention. I don't think it
>> would change anything dramatical in your case.

> All of these proposals are pretty scary for back-patching purposes,
> anyway.  I think what we should consider doing is just changing
> get_actual_variable_range() to use a cheaper snapshot type, as in
> the attached patch (which is for 9.3 but applies to 9.2 with slight
> offset).  On my machine, this seems to make the pathological behavior
> in BR's test case go away just fine.  I'd be interested to hear what
> it does in the real-world scenarios being complained of.

Well, it's three months later, and none of the people who were complaining
so vociferously in this thread seem to have bothered to test the proposed
solution.

However, over at
http://www.postgresql.org/message-id/CAFj8pRDHyAK_2JHSVKZ5YQNGQmFGVcJKcpBXhFaS=vSSCH-vNw@mail.gmail.com
Pavel did test it and reported that it successfully alleviates his
real-world problem.  So I'm now inclined to commit this.  Objections?

            regards, tom lane


On 02/25/2014 08:06 AM, Tom Lane wrote:
> Well, it's three months later, and none of the people who were complaining
> so vociferously in this thread seem to have bothered to test the proposed
> solution.

Sorry about that.  The client lost interest once they had a workaround
(fixing the hanging transaction), and I don't have direct access to
their test workload -- nor was I able to reproduce the issue on a purely
synthetic workload.

> However, over at
> http://www.postgresql.org/message-id/CAFj8pRDHyAK_2JHSVKZ5YQNGQmFGVcJKcpBXhFaS=vSSCH-vNw@mail.gmail.com
> Pavel did test it and reported that it successfully alleviates his
> real-world problem.  So I'm now inclined to commit this.  Objections?

None from me.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com