Re: Queue table that quickly grows causes query planner to choose poor plan - Mailing list pgsql-performance

From David Wheeler
Subject Re: Queue table that quickly grows causes query planner to choose poor plan
Date
Msg-id DEEA145E-68E0-4644-9105-66343FBA71A6@dgitsystems.com
Whole thread Raw
In response to Re: Queue table that quickly grows causes query planner to choose poor plan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hi Tom, 

Thanks for your reply, that’s very helpful and informative. 

Although there's no way to have any useful pg_statistic stats if you won't
do an ANALYZE, the planner nonetheless can see the table's current
physical size, and what it normally does is to multiply the last-reported
tuple density (reltuples/relpages) by the current size.  So if you're
getting an "empty table" estimate anyway, I have to suppose that the
table's state involves reltuples = 0 and relpages > 0.  That's not a
good place to be in; it constrains the planner to believe that the table
is in fact devoid of tuples, because that's what the last ANALYZE saw.

That appears to be correct. I assumed that because the table was analyzed and found to be empty then the autovacuum would probably have cleared all the tuples too, but that’s not the case.

 relpages |  reltuples
----------+-------------
        0 | 2.33795e+06

I am not following your aversion to sticking an ANALYZE in there,
either.  It's not like inserting 30 million rows would be free.

There are many usage profiles for these tables. Sometimes there will be a single insert of 30 million rows, sometimes there will be several inserts of up to 100 million rows each in different threads, sometimes there will be many (~80 000) inserts of 0 rows (for which an ANALYSE is simply a waste) - I don’t want to cause undue performance penalty on the other usage profiles. 

But as Justin rightly points out I can selectively ANALYSE only when > x rows are inserted, which I think is the best way forward. 

David Wheeler
Software developer



dwheeler@dgitsystems.com
D +61 3 9663 3554  W http://dgitsystems.com
Level 8, 620 Bourke St, Melbourne VIC 3000.


On 28 Jun 2018, at 4:27 am, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Wheeler <dwheeler@dgitsystems.com> writes:
I'm having performance trouble with a particular set of queries. It goes a bit like this

1) queue table is initially empty, and very narrow (1 bigint column)
2) we insert ~30 million rows into queue table
3) we do a join with queue table to delete from another table (delete from a using queue where a.id<http://a.id> = queue.id<http://queue.id>), but postgres stats say that queue table is empty, so it uses a nested loop over all 30 million rows, taking forever

Although there's no way to have any useful pg_statistic stats if you won't
do an ANALYZE, the planner nonetheless can see the table's current
physical size, and what it normally does is to multiply the last-reported
tuple density (reltuples/relpages) by the current size.  So if you're
getting an "empty table" estimate anyway, I have to suppose that the
table's state involves reltuples = 0 and relpages > 0.  That's not a
good place to be in; it constrains the planner to believe that the table
is in fact devoid of tuples, because that's what the last ANALYZE saw.

Now, the initial state for a freshly-created or freshly-truncated table
is *not* that.  It is reltuples = 0 and relpages = 0, representing an
undefined tuple density.  Given that, the planner will make some guess
about average tuple size --- which is likely to be a very good guess,
for a table with only fixed-width columns --- and then compute a rowcount
estimate using that plus the observed physical size.

So I think your problem comes from oscillating between really-empty
and not-at-all-empty, and not using an idiomatic way of going back
to the empty state.  Have you tried using TRUNCATE instead of DELETE?

This queue table is empty 99% of the time, and the query in 3 runs immediately after step 2. Is there any likelyhood that tweaking the autoanalyze params would help in this case? I don't want to explicitly analyze the table between steps 2 and three either as there are other patterns of use where for example 0 rows are inserted in step 2 and this is expected to run very very quickly. Do I have any other options?

I am not following your aversion to sticking an ANALYZE in there,
either.  It's not like inserting 30 million rows would be free.

regards, tom lane


Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Queue table that quickly grows causes query planner to choose poor plan
Next
From: Laurenz Albe
Date:
Subject: Re: Bug in PostgreSQL