Thread: Deleting 173000 records takes forever, blocks async queries for unrelated records

Deleting 173000 records takes forever, blocks async queries for unrelated records

From
Alexander Farber
Date:
Hello,

in a Facebook game running on
PostgreSQL 8.4.13 and having so many players:

#  select count(*) from pref_users;
 count
--------
 223964

I am trying to get rid of inactive users,
who just visited the canvas page, but
never played (I'm sure, Facebook has
a clever-sounding name for them):

# select count(*) from pref_users
where id not in (select distinct id from pref_money);
 count
--------
 173936
(1 row)

So I call:

# delete from pref_users
where id not in (select distinct id from pref_money);

but that query lasts forever and
what's more troubling me - it blocks
the async queries of my game daemon
(the Perl function pg_ready starts returning
false all the time and my game accumulates
thousands of yet-to-be-executed SQL queries).

The good news is, that my quad server
doesn't hang - I just see 1 postmaster
process at 90-100% CPU but total load is 20%.

Also my game daemon in Perl recovers
and executes the thousands of queued
up async queries, when I interrupt the
above DELETE query with CTRL-C at
the pgsql prompt - i.e. my game is not buggy.

My question is how handle this?

Why does deleting takes so long,
is it because of CASCADES?

And why does it make the pg_ready
calls of my game daemon return false?
The users I'm deleting aren't active,
they shouldn't "intersect" with the
async queries of my game daemon.

Below are the both SQL tables involved,
thank you for any insights.

Regards
Alex

#  \d pref_money
                        Table "public.pref_money"
 Column |         Type          |                Modifiers
--------+-----------------------+-----------------------------------------
 id     | character varying(32) |
 money  | integer               | not null
 yw     | character(7)          | default to_char(now(), 'IYYY-IW'::text)
Indexes:
    "pref_money_money_index" btree (money DESC)
    "pref_money_yw_index" btree (yw)
Foreign-key constraints:
    "pref_money_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
DELETE CASCADE

#  \d pref_users
                   Table "public.pref_users"
   Column   |            Type             |     Modifiers
------------+-----------------------------+--------------------
 id         | character varying(32)       | not null
 first_name | character varying(64)       |
 last_name  | character varying(64)       |
 female     | boolean                     |
 avatar     | character varying(128)      |
 city       | character varying(64)       |
 login      | timestamp without time zone | default now()
 last_ip    | inet                        |
 logout     | timestamp without time zone |
 vip        | timestamp without time zone |
 mail       | character varying(254)      |
 medals     | integer                     | not null default 0
Indexes:
    "pref_users_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "pref_cards" CONSTRAINT "pref_cards_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_catch" CONSTRAINT "pref_catch_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_chat" CONSTRAINT "pref_chat_id_fkey" FOREIGN KEY (id)
REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_discuss" CONSTRAINT "pref_discuss_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_game" CONSTRAINT "pref_game_id_fkey" FOREIGN KEY (id)
REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_luck" CONSTRAINT "pref_luck_id_fkey" FOREIGN KEY (id)
REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_match" CONSTRAINT "pref_match_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_misere" CONSTRAINT "pref_misere_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_money" CONSTRAINT "pref_money_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_pass" CONSTRAINT "pref_pass_id_fkey" FOREIGN KEY (id)
REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_payment" CONSTRAINT "pref_payment_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_rep" CONSTRAINT "pref_rep_author_fkey" FOREIGN KEY
(author) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_rep" CONSTRAINT "pref_rep_id_fkey" FOREIGN KEY (id)
REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_scores" CONSTRAINT "pref_scores_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_status" CONSTRAINT "pref_status_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
    TABLE "pref_votes" CONSTRAINT "pref_votes_id_fkey" FOREIGN KEY
(id) REFERENCES pref_users(id) ON DELETE CASCADE
Alexander Farber wrote:
> in a Facebook game running on
> PostgreSQL 8.4.13 and having so many players:
>=20
> #  select count(*) from pref_users;
>  count
> --------
>  223964
>=20
> I am trying to get rid of inactive users,
> who just visited the canvas page, but
> never played (I'm sure, Facebook has
> a clever-sounding name for them):
>=20
> # select count(*) from pref_users
> where id not in (select distinct id from pref_money);
>  count
> --------
>  173936
> (1 row)
>=20
> So I call:
>=20
> # delete from pref_users
> where id not in (select distinct id from pref_money);
>=20
> but that query lasts forever and
> what's more troubling me - it blocks
> the async queries of my game daemon
> (the Perl function pg_ready starts returning
> false all the time and my game accumulates
> thousands of yet-to-be-executed SQL queries).
>=20
> The good news is, that my quad server
> doesn't hang - I just see 1 postmaster
> process at 90-100% CPU but total load is 20%.

I hope it's a backend and not the postmaster.

> Also my game daemon in Perl recovers
> and executes the thousands of queued
> up async queries, when I interrupt the
> above DELETE query with CTRL-C at
> the pgsql prompt - i.e. my game is not buggy.
>=20
> My question is how handle this?
>=20
> Why does deleting takes so long,
> is it because of CASCADES?
>=20
> And why does it make the pg_ready
> calls of my game daemon return false?
> The users I'm deleting aren't active,
> they shouldn't "intersect" with the
> async queries of my game daemon.
>=20
> Below are the both SQL tables involved,
> thank you for any insights.

It would help if you send EXPLAIN output for the DELETE statement.

> #  \d pref_money
>                         Table "public.pref_money"
>  Column |         Type          |                Modifiers
> --------+-----------------------+----------------------------------------=
-
>  id     | character varying(32) |
>  money  | integer               | not null
>  yw     | character(7)          | default to_char(now(), 'IYYY-IW'::text)
> Indexes:
>     "pref_money_money_index" btree (money DESC)
>     "pref_money_yw_index" btree (yw)
> Foreign-key constraints:
>     "pref_money_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
> DELETE CASCADE

The problem is very likely that you have no index on the "id" column.

That means that for each id deleted from pref_users, the cascading
delete has to perform a full table scan on pref_money to find the
corresponding rows.  This is very likely the cause of your problem.

Indeed, this table does not have a primary key.  That is usually a bad
idea.  How about PRIMARY KEY (id, yw)?
That should speed up the query.

> #  \d pref_users
>                    Table "public.pref_users"
>    Column   |            Type             |     Modifiers
> ------------+-----------------------------+--------------------
>  id         | character varying(32)       | not null
[...]
> Indexes:
>     "pref_users_pkey" PRIMARY KEY, btree (id)
> Referenced by:
[many tables]

While you are at it, check all the other tables referencing pref_users
and make sure that they have an index on the referencing column.

Yours,
Laurenz Albe

Re: Deleting 173000 records takes forever, blocks async queries for unrelated records

From
Michal Politowski
Date:
On Fri,  1 Feb 2013 10:38:26 +0100, Alexander Farber wrote:
> Hello,
>=20
> in a Facebook game running on
> PostgreSQL 8.4.13 and having so many players:
>=20
> #  select count(*) from pref_users;
>  count
> --------
>  223964
>=20
> I am trying to get rid of inactive users,
> who just visited the canvas page, but
> never played (I'm sure, Facebook has
> a clever-sounding name for them):
>=20
> # select count(*) from pref_users
> where id not in (select distinct id from pref_money);
>  count
> --------
>  173936
> (1 row)
>=20
> So I call:
>=20
> # delete from pref_users
> where id not in (select distinct id from pref_money);
>=20
> but that query lasts forever and
> what's more troubling me - it blocks
> the async queries of my game daemon
> (the Perl function pg_ready starts returning
> false all the time and my game accumulates
> thousands of yet-to-be-executed SQL queries).
>=20
> The good news is, that my quad server
> doesn't hang - I just see 1 postmaster
> process at 90-100% CPU but total load is 20%.
>=20
> Also my game daemon in Perl recovers
> and executes the thousands of queued
> up async queries, when I interrupt the
> above DELETE query with CTRL-C at
> the pgsql prompt - i.e. my game is not buggy.
>=20
> My question is how handle this?
>=20
> Why does deleting takes so long,
> is it because of CASCADES?
>=20
> And why does it make the pg_ready
> calls of my game daemon return false?
> The users I'm deleting aren't active,
> they shouldn't "intersect" with the
> async queries of my game daemon.
>=20
> Below are the both SQL tables involved,
> thank you for any insights.
>=20
> Regards
> Alex
>=20
> #  \d pref_money
>                         Table "public.pref_money"
>  Column |         Type          |                Modifiers
> --------+-----------------------+-----------------------------------------
>  id     | character varying(32) |
>  money  | integer               | not null
>  yw     | character(7)          | default to_char(now(), 'IYYY-IW'::text)
> Indexes:
>     "pref_money_money_index" btree (money DESC)
>     "pref_money_yw_index" btree (yw)
> Foreign-key constraints:
>     "pref_money_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
> DELETE CASCADE

I believe an index on pref_money.id could help.
Without it Postgres is, if I understand correctly, making 173936 table scan=
s on pref_money
to try (and fail) to find for each deleted row the referencing row in that =
table.
=20
> #  \d pref_users
[...]

--=20
Micha=B3 Politowski

Re: Deleting 173000 records takes forever, blocks async queries for unrelated records

From
Bèrto ëd Sèra
Date:
Hi,

What Albe said about PKs. I'm also not very fond of people using text
in keys (even if it's a varchar or a char field). Test the same thing
with a numeric key and you are likely to see a difference.

Apart from making sure your design is ok, you might want to keep this
stuff well in the background, firing it as a frequent (and small) job.
Sort of "your own private vacuum job".

Something like

delete from pref_users
where  id not in ( select distinct id                 from pref_money )
limit 100;

Where 100 may be any number of records you find to be compatible with
a smooth performance. Keep calling this stuff at a suitable interval
(X secs in between each call), and it will silently do the cleaning
without creating giant transactions.

You definitely want to make a good design BEFORE doing this, though.

Cheers
Bèrto

On 1 February 2013 09:38, Alexander Farber <alexander.farber@gmail.com> wrote:
> Hello,
>
> in a Facebook game running on
> PostgreSQL 8.4.13 and having so many players:
>
> #  select count(*) from pref_users;
>  count
> --------
>  223964
>
> I am trying to get rid of inactive users,
> who just visited the canvas page, but
> never played (I'm sure, Facebook has
> a clever-sounding name for them):
>
> # select count(*) from pref_users
> where id not in (select distinct id from pref_money);
>  count
> --------
>  173936
> (1 row)
>
> So I call:
>
> # delete from pref_users
> where id not in (select distinct id from pref_money);
>
> but that query lasts forever and
> what's more troubling me - it blocks
> the async queries of my game daemon
> (the Perl function pg_ready starts returning
> false all the time and my game accumulates
> thousands of yet-to-be-executed SQL queries).
>
> The good news is, that my quad server
> doesn't hang - I just see 1 postmaster
> process at 90-100% CPU but total load is 20%.
>
> Also my game daemon in Perl recovers
> and executes the thousands of queued
> up async queries, when I interrupt the
> above DELETE query with CTRL-C at
> the pgsql prompt - i.e. my game is not buggy.
>
> My question is how handle this?
>
> Why does deleting takes so long,
> is it because of CASCADES?
>
> And why does it make the pg_ready
> calls of my game daemon return false?
> The users I'm deleting aren't active,
> they shouldn't "intersect" with the
> async queries of my game daemon.
>
> Below are the both SQL tables involved,
> thank you for any insights.
>
> Regards
> Alex
>
> #  \d pref_money
>                         Table "public.pref_money"
>  Column |         Type          |                Modifiers
> --------+-----------------------+-----------------------------------------
>  id     | character varying(32) |
>  money  | integer               | not null
>  yw     | character(7)          | default to_char(now(), 'IYYY-IW'::text)
> Indexes:
>     "pref_money_money_index" btree (money DESC)
>     "pref_money_yw_index" btree (yw)
> Foreign-key constraints:
>     "pref_money_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
> DELETE CASCADE
>
> #  \d pref_users
>                    Table "public.pref_users"
>    Column   |            Type             |     Modifiers
> ------------+-----------------------------+--------------------
>  id         | character varying(32)       | not null
>  first_name | character varying(64)       |
>  last_name  | character varying(64)       |
>  female     | boolean                     |
>  avatar     | character varying(128)      |
>  city       | character varying(64)       |
>  login      | timestamp without time zone | default now()
>  last_ip    | inet                        |
>  logout     | timestamp without time zone |
>  vip        | timestamp without time zone |
>  mail       | character varying(254)      |
>  medals     | integer                     | not null default 0
> Indexes:
>     "pref_users_pkey" PRIMARY KEY, btree (id)
> Referenced by:
>     TABLE "pref_cards" CONSTRAINT "pref_cards_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_catch" CONSTRAINT "pref_catch_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_chat" CONSTRAINT "pref_chat_id_fkey" FOREIGN KEY (id)
> REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_discuss" CONSTRAINT "pref_discuss_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_game" CONSTRAINT "pref_game_id_fkey" FOREIGN KEY (id)
> REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_luck" CONSTRAINT "pref_luck_id_fkey" FOREIGN KEY (id)
> REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_match" CONSTRAINT "pref_match_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_misere" CONSTRAINT "pref_misere_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_money" CONSTRAINT "pref_money_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_pass" CONSTRAINT "pref_pass_id_fkey" FOREIGN KEY (id)
> REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_payment" CONSTRAINT "pref_payment_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_rep" CONSTRAINT "pref_rep_author_fkey" FOREIGN KEY
> (author) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_rep" CONSTRAINT "pref_rep_id_fkey" FOREIGN KEY (id)
> REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_scores" CONSTRAINT "pref_scores_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_status" CONSTRAINT "pref_status_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>     TABLE "pref_votes" CONSTRAINT "pref_votes_id_fkey" FOREIGN KEY
> (id) REFERENCES pref_users(id) ON DELETE CASCADE
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.



Re: Deleting 173000 records takes forever, blocks async queries for unrelated records

From
Alexander Farber
Date:
Thanks, I will add an index(id, yw) and check.

The limit suggestion (yes, I already have a "vacuum"
cronjob to purge spam-users from drupal_users etc.)
doesn't work:

# delete from pref_users
where id not in (select distinct id from pref_money) limit 10;
ERROR:  syntax error at or near "limit"
LINE 2: ...ere id not in (select distinct id from pref_money) limit 10;

Re: Deleting 173000 records takes forever, blocks async queries for unrelated records

From
Bèrto ëd Sèra
Date:
righto. You need a CTE to do that

create table deletable ( id bigint not null primary key);

create table condition ( id bigint not null primary key);

insert into deletable select generate_series(1,500);

insert into condition select generate_series(1,50);


WITH target AS (
select id
from deletable d
where d.id not in (select id from condition)
limit 10 )
delete
from deletable
where id in (select id from target);

This will open a hole from 51 to 60, next time from 61 to 70 etc...

Cheers
Bèrto

On 1 February 2013 11:30, Alexander Farber <alexander.farber@gmail.com> wrote:
> Thanks, I will add an index(id, yw) and check.
>
> The limit suggestion (yes, I already have a "vacuum"
> cronjob to purge spam-users from drupal_users etc.)
> doesn't work:
>
> # delete from pref_users
> where id not in (select distinct id from pref_money) limit 10;
> ERROR:  syntax error at or near "limit"
> LINE 2: ...ere id not in (select distinct id from pref_money) limit 10;
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.



Re: Deleting 173000 records takes forever, blocks async queries for unrelated records

From
"Carlo Stonebanks"
Date:
>># delete from pref_users
>>where id not in (select distinct id from pref_money) limit 10;
>>ERROR:  syntax error at or near "limit"
>>LINE 2: ...ere id not in (select distinct id from pref_money) limit 10;

Or this?

DELETE FROM pref_users
WHERE id IN (
    SELECT id
    FROM pref_users
    WHERE id NOT IN (
        SELECT DISTINCT id
        FROM pref_money
    )
    LIMIT 10
);

Saw elsewhere another suggestion with EXCEPT, which is also sexy.