Re: UPDATE on two large datasets is very slow - Mailing list pgsql-general

From Scott Marlowe
Subject Re: UPDATE on two large datasets is very slow
Date
Msg-id 1175615047.22459.9.camel@state.g2switchworks.com
Whole thread Raw
In response to UPDATE on two large datasets is very slow  (Steve Gerhardt <ocean@ocean.fraknet.org>)
Responses Re: UPDATE on two large datasets is very slow  (Jonathan Vanasco <postgres@2xlp.com>)
Re: UPDATE on two large datasets is very slow  (Listmail <lists@peufeu.com>)
List pgsql-general
On Mon, 2007-04-02 at 22:24, Steve Gerhardt wrote:
> I've been working for the past few weeks on porting a closed source
> BitTorrent tracker to use PostgreSQL instead of MySQL for storing
> statistical data, but I've run in to a rather large snag. The tracker in
> question buffers its updates to the database, then makes them all at
> once, sending anywhere from 1-3 MiB of query data. With MySQL, this is
> accomplished using the INSERT INTO...ON DUPLICATE KEY UPDATE query,
> which seems to handle the insert/update very quickly; generally it only
> takes about a second for the entire set of new data to be merged.
>
> The problem I am encountering is that when I attempt to duplicate this
> functionality in Postgres, it is terrifically slow to a point of utter
> unusability. The tracker currently handles around 10,000-40,000 client
> updates per minute, which translates roughly to the same number of rows
> in the database. Part of the issue is that some of those rows cannot be
> updated because they do not yet exist in the database, but there is
> likely around a 100:1 ratio on updates to inserts.
>
> After consulting with some of the folks on the PostgreSQL IRC channel on
> freenode.net, I was left with this idea to try:

I can't help but think that the way this application writes data is
optimized for MySQL's transactionless table type, where lots of
simultaneous input streams writing at the same time to the same table
would be death.

Can you step back and work on how the app writes out data, so that it
opens a persistent connection, and then sends in the updates one at a
time, committing every couple of seconds while doing so?

pgsql-general by date:

Previous
From: Jonathan Vanasco
Date:
Subject: Re: sql schema advice sought
Next
From: Thorsten Kraus
Date:
Subject: Re: Webappication and PostgreSQL login roles