Thread: Speedup hint needed, if available? :)

Speedup hint needed, if available? :)

From
Mario Splivalo
Date:
Hello again.

I have to track user subscriptions to certain mailinglists, and I also
need to track credits users have on those mailinglists. On one side I
have procedures that add credits, on other side I have procedures that
subtract available credits. Add/subtract is pretty intensive, around
30-50 adds per minute (usualy 10 or 100 credits), and around 200-500
subtracts per minute (usualy by one or two credits).

I have created table user_subscriptions to track user subscriptions to
certain mailing list. I have derived subscription_id as primary key. I
have two other tables, user_subscription_credits_given, and
_credits_taken, wich track credits for subscription added or subtracted
to or from certain subscription. I created those two tables so I could
eliminate a lot of UPDATES on user_subscriptions table (if I were to
have a column 'credits' in that table). user_subscriptions table is
projected to have around 100.000 rows, and _credits_given/_credits_taken
table is projected to have around 10.000.000 rows.

Now, I have filled the tables with test data, and the query results is
kind of poor. It takes almost 50 seconds to get the data for the
particular subscription. Now, is there a way to speed this up, or I need
different approach?

Here is the DDL/DML:

CREATE TABLE user_subscriptions
(
  subscription_id int4 NOT NULL DEFAULT
nextval('user_subscriptions_id_seq'::regclass),
  user_id int4 NOT NULL,
  mailinglist_id int4 NOT NULL,
  valid_from timestamptz NOT NULL,
  valid_to timestamptz,
  CONSTRAINT user_subscriptions_pkey PRIMARY KEY (subscription_id)
);

CREATE TABLE user_subscription_credits_given
(
  subscription_id int4 NOT NULL,
  credits int4 NOT NULL,
  CONSTRAINT user_subscription_credits_given_fk__subscription_id FOREIGN
KEY (subscription_id)
      REFERENCES user_subscriptions (subscription_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
);


CREATE INDEX fki_user_subscriptions_fk__mailinglist_id
  ON user_subscriptions
  USING btree
  (mailinglist_id);

CREATE INDEX fki_user_subscriptions_fk__users_id
  ON user_subscriptions
  USING btree
  (user_id);

CREATE INDEX fki_user_subscription_credits_given_fk__subscription_id
  ON user_subscription_credits_given
  USING btree
  (subscription_id);

CREATE INDEX fki_user_subscription_credits_taken_fk__subscription_id
  ON user_subscription_credits_taken
  USING btree
  (subscription_id);


Here is the query which gets information on particular user, shows
subscriptions to mailinglists and available credits on those
mailinglists:

SELECT u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
u.valid_to, sum(credits.credits_given - credits.credits_taken)::integer
AS credits
FROM user_subscriptions u
LEFT JOIN
    (SELECT user_subscription_credits_given.subscription_id,
user_subscription_credits_given.credits AS credits_given, 0 AS
credits_taken
           FROM user_subscription_credits_given
    UNION ALL
         SELECT user_subscription_credits_taken.subscription_id, 0 AS
credits_given, user_subscription_credits_taken.credits AS credits_taken
           FROM user_subscription_credits_taken) credits
    ON u.subscription_id = credits.subscription_id
where
    u.user_id = 1
GROUP BY u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
u.valid_to

And here is the 'explain analyze' of the above query:


QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=200079055.24..200079055.28 rows=2 width=36)
(actual time=56527.153..56527.163 rows=2 loops=1)
   ->  Nested Loop Left Join  (cost=200033690.72..200078931.34 rows=8260
width=36) (actual time=0.432..54705.844 rows=275366 loops=1)
         Join Filter: ("outer".subscription_id =
"inner".subscription_id)
         ->  Index Scan using fki_user_subscriptions_fk__users_id on
user_subscriptions u  (cost=0.00..3.03 rows=2 width=28) (actual
time=0.030..0.055 rows=2 loops=1)
               Index Cond: (user_id = 1)
         ->  Materialize  (cost=200033690.72..200045984.63 rows=825991
width=12) (actual time=0.043..22404.107 rows=826032 loops=2)
               ->  Subquery Scan credits
(cost=100000000.00..200028830.73 rows=825991 width=12) (actual
time=0.050..31500.589 rows=826032 loops=1)
                     ->  Append  (cost=100000000.00..200020570.82
rows=825991 width=8) (actual time=0.041..22571.540 rows=826032 loops=1)
                           ->  Subquery Scan "*SELECT* 1"
(cost=100000000.00..100001946.96 rows=78148 width=8) (actual
time=0.031..1226.640 rows=78148 loops=1)
                                 ->  Seq Scan on
user_subscription_credits_given  (cost=100000000.00..100001165.48
rows=78148 width=8) (actual time=0.022..404.253 rows=78148 loops=1)
                           ->  Subquery Scan "*SELECT* 2"
(cost=100000000.00..100018623.86 rows=747843 width=8) (actual
time=0.032..12641.705 rows=747884 loops=1)
                                 ->  Seq Scan on
user_subscription_credits_taken  (cost=100000000.00..100011145.43
rows=747843 width=8) (actual time=0.023..4386.769 rows=747884 loops=1)
 Total runtime: 56536.774 ms
(13 rows)


Thank you all in advance,

    Mario
--
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



Re: Speedup hint needed, if available? :)

From
Tom Lane
Date:
Mario Splivalo <mario.splivalo@mobart.hr> writes:
> Here is the query which gets information on particular user, shows
> subscriptions to mailinglists and available credits on those
> mailinglists:

> SELECT u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
> u.valid_to, sum(credits.credits_given - credits.credits_taken)::integer
> AS credits
> FROM user_subscriptions u
> LEFT JOIN
>     (SELECT user_subscription_credits_given.subscription_id,
> user_subscription_credits_given.credits AS credits_given, 0 AS
> credits_taken
>            FROM user_subscription_credits_given
>     UNION ALL
>          SELECT user_subscription_credits_taken.subscription_id, 0 AS
> credits_given, user_subscription_credits_taken.credits AS credits_taken
>            FROM user_subscription_credits_taken) credits
>     ON u.subscription_id = credits.subscription_id
> where
>     u.user_id = 1
> GROUP BY u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
> u.valid_to

Do you have realistic test data?  The EXPLAIN shows that this is pulling
275366 of the 826032 rows in the two tables, which seems like rather a
lot for a single user.  If it's reasonable that the query needs to fetch
one-third of the data, then you should resign yourself to it taking
awhile :-(

If the expected number of matching rows were much smaller, it would
make sense to use indexscans over the two big tables, but unfortunately
existing PG releases don't know how to generate an indexscan join
with a UNION ALL in between :-(.  FWIW, 8.2 will be able to do it.
In current releases the only thing I can suggest is to merge
user_subscription_credits_given and user_subscription_credits_taken
into one table so you don't need the UNION ALL.

            regards, tom lane

Re: Speedup hint needed, if available? :)

From
Mario Splivalo
Date:
On Tue, 2006-05-30 at 11:05 -0400, Tom Lane wrote:

> Do you have realistic test data?  The EXPLAIN shows that this is pulling
> 275366 of the 826032 rows in the two tables, which seems like rather a
> lot for a single user.  If it's reasonable that the query needs to fetch
> one-third of the data, then you should resign yourself to it taking
> awhile :-(

I'd say so, yes. The user_subscription table now has only six rows, but
the number of actions (giving/taking credits) for a user could easily be
as high as 50.000.

> If the expected number of matching rows were much smaller, it would
> make sense to use indexscans over the two big tables, but unfortunately
> existing PG releases don't know how to generate an indexscan join
> with a UNION ALL in between :-(.  FWIW, 8.2 will be able to do it.
> In current releases the only thing I can suggest is to merge
> user_subscription_credits_given and user_subscription_credits_taken
> into one table so you don't need the UNION ALL.

See, that's an idea! :) Thnx, I'll try that.

Is it inapropriate to ask about rough estimate on availableness of
8.2? :)

    Mario
--
Mario Splivalo
Mob-Art
mario.splivalo@mobart.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



Re: Speedup hint needed, if available? :)

From
Nis Jorgensen
Date:
Mario Splivalo wrote:
> Hello again.
>
> I have to track user subscriptions to certain mailinglists, and I also
> need to track credits users have on those mailinglists. On one side I
> have procedures that add credits, on other side I have procedures that
> subtract available credits. Add/subtract is pretty intensive, around
> 30-50 adds per minute (usualy 10 or 100 credits), and around 200-500
> subtracts per minute (usualy by one or two credits).
>
> I have created table user_subscriptions to track user subscriptions to
> certain mailing list. I have derived subscription_id as primary key. I
> have two other tables, user_subscription_credits_given, and
> _credits_taken, wich track credits for subscription added or subtracted
> to or from certain subscription. I created those two tables so I could
> eliminate a lot of UPDATES on user_subscriptions table (if I were to
> have a column 'credits' in that table).

It sounds to me like you have decided beforehand that the obvious
solution (update a credit field in the user_subscriptions table) is not
going to perform well. Have you tried it? How does it perform?

If it does indeed give you performance problems, you could instead run
some kind of batch job to update the credits field (and delete the
/given/taken records).

Finally: You could refactor the query to get rid of the union:

SELECT u.subscription_id, u.user_id, u.mailinglist_id, u.valid_from,
u.valid_to, (
   SELECT sum(credits)
   FROM credits_given
   WHERE subscription_id = u.subscription_id
) - (
   SELECT sum(credits)
   FROM credits_taken
   WHERE subscription_id = u.subscription_id)
) AS credits
FROM user_subscriptions u
WHERE u.user_id = 1

(Not tested).

You will probably need a COALESCE around each of the subqueries to avoid
problems with nulls. <rant>The sum of an empty set of numbers is 0. The
conjunction of an empty set of booleans is true. The SQL standard
somehow manages to get this wrong</rant>

/Nis