Re: Planner performance extremely affected by an hanging transaction (20-30 times)? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Planner performance extremely affected by an hanging transaction (20-30 times)?
Date
Msg-id CAMkU=1wjnoFOrzVqLpUg2Ja=+kd_sv2HP4+cF-Y7r4a3uHT-tg@mail.gmail.com
Whole thread Raw
In response to Planner performance extremely affected by an hanging transaction (20-30 times)?  (Bartłomiej Romański <br@sentia.pl>)
Responses Re: Planner performance extremely affected by an hanging transaction (20-30 times)?  (Jesper Krogh <jesper@krogh.cc>)
Re: Planner performance extremely affected by an hanging transaction (20-30 times)?  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-performance
On Thu, Sep 19, 2013 at 5:49 PM, Bartłomiej Romański <br@sentia.pl> wrote:

Finally, we created a python script to make simple queries in a loop:

while True:
    id = random.randrange(1, 1000 * 1000)
    db.execute('select offers.idoffers.namecategories.idcategories.name from offers left join categories on categories.id = offers.category_id where offers.id = %s', (id,))
    print db.fetchall()

We start 20 instances simultaneously and measure performance:

parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l > /dev/null

Normally we observe about 30k QPS what's a satisfying result (without any tuning at all).

The problem occurs when we open a second console, start psql and type:

pgtest=> begin; insert into categories (name) select 'category_' || x from generate_series(1,1000) as x;


Related topics have been discussed recently, but without much apparent resolution.

See "In progress INSERT wrecks plans on table" and "Performance bug in prepared statement binding in 9.2" also on this list

The issues are:

1) The planner actually queries the relation to find the end points of the variable ranges, rather than using potentially out-of-date statistics.

2) When doing so, it needs to wade through the "in-progress" rows, figuring out whether the owning transaction is still in progress or has already committed or aborted.  If the owning transaction *has* committed or rolled back, then it can set hint bits so that future executions don't need to do this.  But if the owning transaction is still open, then the querying transaction has done the work, but is not able to set any hint bits so other executions also need to do the work, repeatedly until the other transactions finishes.

3) Even worse, asking if a given transaction has finished yet can be a serious point of system-wide contention, because it takes the ProcArrayLock, once per row which needs to be checked.  So you have 20 processes all fighting over the ProcArrayLock, each doing so 1000 times per query.

One idea (from Simon, I think) was to remember that a transaction was just checked and was in progress, and not checking it again for future rows.  In the future the transaction might have committed, but since it would have committed after we took the snapshot, thinking it is still in progress would not be a correctness problem, it would just needlessly delay setting the hint bits.  

Another idea was not to check if it were in progress at all, because if it is in the snapshot it doesn't matter if it is still in progress.  This would a slightly more aggressive way to delay setting the hint bit (but also delay doing the work needed to figure out how to set them).

Items 2 and 3 and can also arise in situations other than paired with 1.


Cheers,

Jeff

pgsql-performance by date:

Previous
From: David Whittaker
Date:
Subject: Re: Intermittent hangs with 9.2
Next
From: Jesper Krogh
Date:
Subject: Re: Planner performance extremely affected by an hanging transaction (20-30 times)?