Home > mailing lists

Re: Question about optimizing access to a table. - Mailing list pgsql-general

From	Jeff Janes
Subject	Re: Question about optimizing access to a table.
Date	December 10, 2013 21:55:55
Msg-id	CAMkU=1xZAtKddZngO4=YW7okHD06weYKBg53hEuEcGzXteKwyw@mail.gmail.com Whole thread Raw
In response to	Question about optimizing access to a table. (Herouth Maoz <herouth@unicell.co.il>)
Responses	Re: Question about optimizing access to a table.
List	pgsql-general

Tree view

On Tue, Dec 10, 2013 at 8:23 AM, Herouth Maoz <herouth@unicell.co.il> wrote:

Hello.

I have one particular table with very specialized use. I am sending messages to some partner. The partner processes them asynchronously, and then returns the status report to me. The table is used to store a serialized version of the message object, together with a few identifiers, expiry date, and a reference from the partner. When the status report comes back from the partner, we:

Select the record using the partner's reference number
reconstruct and queue the object.
Delete the record from database using the serial primary key ID.

Where do you "queue" the object? Back into a different table within the same database? Why not use "DELETE ...RETURNING"?

Every once in a while we run a garbage collection process which selects expired messages, reconstructs and queues them for processing, and deletes the record using the primary key.

This works wonderfully as long as the table remains small - a few thousand records waiting for their status report, and that's it. The table is set to have frequent auto-anylize runs.

You might want to turn

The problem starts when our partner has some glitch, under high load, and fails to send back a few hundred thousand reports. In that case, the table grows to a few hundred records, and they are not deleted until they hit their expiry date, at which point the "garbage collector" takes care of them and everything goes back to normal. When it contains hundreds of thousands of records, performance deteriorates considerably.

There is no inherent reason the performance needs to degrade. Can you give the specific queries that perform worse?

I am trying to figure out a solution that will keep the system working well even when there is a large number of records in the table. At first I thought of partitioning the table on the partner's reference field. But the DELETE would actually slow down if I do this, right?

Not necessarily, but partitioning should be your last resort not your first resort, and I don't see any reason it would be needed here.

Indexes:
"transient_pkey" PRIMARY KEY, btree (id)
"transient_msisdn_ref" btree (msisdn, smsc_reference)
"transient_sm_vp" btree (validity)

What is the partner reference? If it is smsc_reference, then you probably need a index in which that is the lead (or only) column.

Cheers,

Jeff

pgsql-general by date:

From: Kevin Grittner
Date: 10 December 2013, 21:55:30
Subject: Re: Question about optimizing access to a table.

From: Jeff Janes
Date: 10 December 2013, 22:13:34
Subject: Re: While only running pg_dump, postgresql performs writes inside base/ directory. How can we stop this?

Re: Question about optimizing access to a table. - Mailing list pgsql-general

Previous

Next