Thread: From Simple to Complex

From Simple to Complex

From

Alessandro Gagliardi

Date:

31 January 2012, 17:22:30

My slow query today is somewhat more complex than yesterday's, but I'm
hopeful it can be improved. Here's the query:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

As you'll see below, moments is inherited by a number of other tables
and the purpose of relname is to see which one. Meanwhile, emotions
inherits feedback.

Here's the Full Table and Index Schema:

CREATE TABLE moments
(
  moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text),
  block_id character(24) NOT NULL,
  inserted timestamp without time zone NOT NULL DEFAULT now(),
  CONSTRAINT moments_pkey PRIMARY KEY (moment_id )
)
WITH (
  OIDS=FALSE
);

CREATE INDEX moments_block_id_idx
  ON moments
  USING btree
  (block_id );

CREATE INDEX moments_inserted_idx
  ON moments
  USING btree
  (inserted );

CREATE TABLE feedback
(
  feedback_id character(24) NOT NULL,
  user_id character(24) NOT NULL,
  moment_id character(24) NOT NULL,
  created timestamp without time zone,
  inserted timestamp without time zone NOT NULL DEFAULT now(),
  lnglat point,
  CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id )
)
WITH (
  OIDS=FALSE
);

CREATE INDEX feedback_lnglat_idx
  ON feedback
  USING gist
  (lnglat );

CREATE INDEX feedback_moment_id_idx
  ON feedback
  USING btree
  (moment_id );

CREATE TABLE emotions
(
-- Inherited from table feedback:  feedback_id character(24) NOT NULL,
-- Inherited from table feedback:  user_id character(24) NOT NULL,
-- Inherited from table feedback:  moment_id character(24) NOT NULL,
-- Inherited from table feedback:  created timestamp without time zone,
-- Inherited from table feedback:  inserted timestamp without time
zone NOT NULL DEFAULT now(),
  emotion character varying NOT NULL,
-- Inherited from table :  lnglat point,
  CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id )
)
INHERITS (feedback)
WITH (
  OIDS=FALSE
);

CREATE INDEX emotions_emotion_idx
  ON emotions
  USING btree
  (emotion );

Here's the results from EXPLAIN ANALYZE:

"Sort  (cost=309717.70..309718.43 rows=1460 width=94) (actual
time=60462.534..60462.544 rows=25 loops=1)"
"  Sort Key: pg_class.relname, emotions.emotion"
"  Sort Method:  quicksort  Memory: 20kB"
"  ->  HashAggregate  (cost=309697.24..309702.35 rows=1460 width=94)
(actual time=60462.457..60462.476 rows=25 loops=1)"
"        ->  Hash Join  (cost=133154.62..308963.70 rows=489024
width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)"
"              Hash Cond: (public.moments.tableoid = pg_class.oid)"
"              ->  Hash Join  (cost=133144.72..307119.96 rows=489024
width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)"
"                    Hash Cond: (public.moments.moment_id = emotions.moment_id)"
"                    ->  Append  (cost=0.00..114981.64 rows=119665
width=29) (actual time=883.153..21696.939 rows=357565 loops=1)"
"                          ->  Seq Scan on moments  (cost=0.00..0.00
rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on thoughts moments
(cost=0.00..38856.88 rows=44388 width=29) (actual
time=883.150..9040.959 rows=115436 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on photos moments
(cost=0.00..29635.78 rows=194 width=29) (actual
time=5329.700..5827.447 rows=116420 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on music moments
(cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266
rows=37248 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on people moments
(cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393
rows=0 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on places moments
(cost=0.00..24551.03 rows=54961 width=29) (actual
time=5224.044..5324.517 rows=85564 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on videos moments
(cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735
rows=2897 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on facebook_people moments
(cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0
loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on address_people moments
(cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0
loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on path_people moments
(cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166
rows=0 loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                          ->  Seq Scan on sleep moments
(cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0
loops=1)"
"                                Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
"                    ->  Hash  (cost=79292.49..79292.49 rows=4059496
width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)"
"                          Buckets: 262144  Batches: 4  Memory Usage: 75211kB"
"                          ->  Seq Scan on emotions
(cost=0.00..79292.49 rows=4059496 width=55) (actual
time=0.012..15969.981 rows=4058642 loops=1)"
"              ->  Hash  (cost=8.88..8.88 rows=292 width=68) (actual
time=0.487..0.487 rows=319 loops=1)"
"                    Buckets: 1024  Batches: 1  Memory Usage: 28kB"
"                    ->  Seq Scan on pg_class  (cost=0.00..8.88
rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)"
"Total runtime: 60601.612 ms"

Postgres version: is still 9.0.5

History: N/A (This is the first time I've run this query.)

Hardware: 1.7 GB Cache and other things you'd expect from a Ronin
instance of a Heroku Postgres database.

Maintenance Setup: What Heroku does. As before, vacuum should not be
relevant as there are no deletes or even updates (just inserts and
selects)

WAL Configuration: I still don't know. Heroku hosts the database on
Amazon's servers, so maybe that answers the question?

GUC Settings: As per the yesterday's discussion, I reduced
random_page_cost to 2. Other than that, it's all default.

Bonus question: If that was too simple, here's something even more
complex I'd like to do: I have another table that inherits feedback
called "comments". Ideally, rather than an "emotion" column coming
out, I would like to have a "feedback_type" column that would be
either the value in the emotion column of the emotions table, or
"comment" if it's from the comments table. I'm thinking I'm going to
have to simply put that together on the client, but if I can do that
in a single query (that doesn't take an hour to run) that would be
super cool. But that's definitely secondary compared to getting the
above query to run faster.

Thank you very much for any help!
-Alessandro Gagliardi

Re: From Simple to Complex

From

Alessandro Gagliardi

Date:

31 January 2012, 18:10:28

Looks like I missed a key sentence in http://www.postgresql.org/docs/9.0/static/ddl-inherit.html which states: "A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children."

I should have realized that as I exploited that "limitation" in three of my tables. Gradually adding those indices now; will report on what kind of difference it makes....

On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi <alessandro@path.com> wrote:

My slow query today is somewhat more complex than yesterday's, but I'm
hopeful it can be improved. Here's the query:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

As you'll see below, moments is inherited by a number of other tables
and the purpose of relname is to see which one. Meanwhile, emotions
inherits feedback.

Here's the Full Table and Index Schema:

CREATE TABLE moments
(
moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text),
block_id character(24) NOT NULL,
inserted timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT moments_pkey PRIMARY KEY (moment_id )
)
WITH (
OIDS=FALSE
);

CREATE INDEX moments_block_id_idx
ON moments
USING btree
(block_id );

CREATE INDEX moments_inserted_idx
ON moments
USING btree
(inserted );

CREATE TABLE feedback
(
feedback_id character(24) NOT NULL,
user_id character(24) NOT NULL,
moment_id character(24) NOT NULL,
created timestamp without time zone,
inserted timestamp without time zone NOT NULL DEFAULT now(),
lnglat point,
CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id )
)
WITH (
OIDS=FALSE
);

CREATE INDEX feedback_lnglat_idx
ON feedback
USING gist
(lnglat );

CREATE INDEX feedback_moment_id_idx
ON feedback
USING btree
(moment_id );

CREATE TABLE emotions
(
-- Inherited from table feedback: feedback_id character(24) NOT NULL,
-- Inherited from table feedback: user_id character(24) NOT NULL,
-- Inherited from table feedback: moment_id character(24) NOT NULL,
-- Inherited from table feedback: created timestamp without time zone,
-- Inherited from table feedback: inserted timestamp without time
zone NOT NULL DEFAULT now(),
emotion character varying NOT NULL,
-- Inherited from table : lnglat point,
CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id )
)
INHERITS (feedback)
WITH (
OIDS=FALSE
);

CREATE INDEX emotions_emotion_idx
ON emotions
USING btree
(emotion );

Here's the results from EXPLAIN ANALYZE:

"Sort (cost=309717.70..309718.43 rows=1460 width=94) (actual
time=60462.534..60462.544 rows=25 loops=1)"
" Sort Key: pg_class.relname, emotions.emotion"
" Sort Method: quicksort Memory: 20kB"
" -> HashAggregate (cost=309697.24..309702.35 rows=1460 width=94)
(actual time=60462.457..60462.476 rows=25 loops=1)"
" -> Hash Join (cost=133154.62..308963.70 rows=489024
width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)"
" Hash Cond: (public.moments.tableoid = pg_class.oid)"
" -> Hash Join (cost=133144.72..307119.96 rows=489024
width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)"
" Hash Cond: (public.moments.moment_id = emotions.moment_id)"
" -> Append (cost=0.00..114981.64 rows=119665
width=29) (actual time=883.153..21696.939 rows=357565 loops=1)"
" -> Seq Scan on moments (cost=0.00..0.00
rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on thoughts moments
(cost=0.00..38856.88 rows=44388 width=29) (actual
time=883.150..9040.959 rows=115436 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on photos moments
(cost=0.00..29635.78 rows=194 width=29) (actual
time=5329.700..5827.447 rows=116420 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on music moments
(cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266
rows=37248 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on people moments
(cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393
rows=0 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on places moments
(cost=0.00..24551.03 rows=54961 width=29) (actual
time=5224.044..5324.517 rows=85564 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on videos moments
(cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735
rows=2897 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on facebook_people moments
(cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0
loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on address_people moments
(cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0
loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on path_people moments
(cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166
rows=0 loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Seq Scan on sleep moments
(cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0
loops=1)"
" Filter: (inserted > '2012-01-31
00:00:00'::timestamp without time zone)"
" -> Hash (cost=79292.49..79292.49 rows=4059496
width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)"
" Buckets: 262144 Batches: 4 Memory Usage: 75211kB"
" -> Seq Scan on emotions
(cost=0.00..79292.49 rows=4059496 width=55) (actual
time=0.012..15969.981 rows=4058642 loops=1)"
" -> Hash (cost=8.88..8.88 rows=292 width=68) (actual
time=0.487..0.487 rows=319 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 28kB"
" -> Seq Scan on pg_class (cost=0.00..8.88
rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)"
"Total runtime: 60601.612 ms"

Postgres version: is still 9.0.5

History: N/A (This is the first time I've run this query.)

Hardware: 1.7 GB Cache and other things you'd expect from a Ronin
instance of a Heroku Postgres database.

Maintenance Setup: What Heroku does. As before, vacuum should not be
relevant as there are no deletes or even updates (just inserts and
selects)

WAL Configuration: I still don't know. Heroku hosts the database on
Amazon's servers, so maybe that answers the question?

GUC Settings: As per the yesterday's discussion, I reduced
random_page_cost to 2. Other than that, it's all default.

Bonus question: If that was too simple, here's something even more
complex I'd like to do: I have another table that inherits feedback
called "comments". Ideally, rather than an "emotion" column coming
out, I would like to have a "feedback_type" column that would be
either the value in the emotion column of the emotions table, or
"comment" if it's from the comments table. I'm thinking I'm going to
have to simply put that together on the client, but if I can do that
in a single query (that doesn't take an hour to run) that would be
super cool. But that's definitely secondary compared to getting the
above query to run faster.

Thank you very much for any help!
-Alessandro Gagliardi

Re: From Simple to Complex

From

Alessandro Gagliardi

Date:

31 January 2012, 18:54:02

I changed the query a bit so the results would not change over the
course of the day to:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

Adding the indices means that I am now doing index scans instead of
seq scans but it doesn't seem to help with speed. Here are the new
EXPLAIN ANALYZE results:

"Sort  (cost=174432.85..174433.58 rows=1460 width=94) (actual
time=73440.079..73440.088 rows=25 loops=1)"
"  Sort Key: pg_class.relname, emotions.emotion"
"  Sort Method:  quicksort  Memory: 20kB"
"  ->  HashAggregate  (cost=174412.39..174417.50 rows=1460 width=94)
(actual time=73437.905..73437.940 rows=25 loops=1)"
"        ->  Merge Join  (cost=27888.98..172032.86 rows=1586355
width=94) (actual time=65563.027..72763.848 rows=245917 loops=1)"
"              Merge Cond: (emotions.moment_id = public.moments.moment_id)"
"              ->  Index Scan using emotions_moment_id_idx on emotions
 (cost=0.00..135759.78 rows=4077358 width=55) (actual
time=1.283..43894.799 rows=3841095 loops=1)"
"              ->  Sort  (cost=27888.98..28083.07 rows=388184
width=89) (actual time=16556.348..17384.537 rows=521025 loops=1)"
"                    Sort Key: public.moments.moment_id"
"                    Sort Method:  quicksort  Memory: 60865kB"
"                    ->  Hash Join  (cost=9.90..20681.81 rows=388184
width=89) (actual time=2.612..4309.131 rows=396594 loops=1)"
"                          Hash Cond: (public.moments.tableoid = pg_class.oid)"
"                          ->  Append  (cost=0.00..19216.22
rows=388184 width=29) (actual time=2.066..2851.885 rows=396594
loops=1)"
"                                ->  Seq Scan on moments
(cost=0.00..0.00 rows=1 width=104) (actual time=0.002..0.002 rows=0
loops=1)"
"                                      Filter: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
thoughts_inserted_idx on thoughts moments  (cost=0.00..6146.96
rows=136903 width=29) (actual time=2.063..606.584 rows=130884
loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
photos_inserted_idx on photos moments  (cost=0.00..4975.46 rows=109900
width=29) (actual time=1.542..836.063 rows=128286 loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
music_inserted_idx on music moments  (cost=0.00..3102.69 rows=40775
width=29) (actual time=0.756..308.031 rows=41176 loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
people_inserted_idx on people moments  (cost=0.00..4.07 rows=1
width=29) (actual time=0.015..0.015 rows=0 loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
places_inserted_idx on places moments  (cost=0.00..4125.65 rows=96348
width=29) (actual time=0.066..263.853 rows=92756 loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Bitmap Heap Scan on videos
moments  (cost=29.56..835.20 rows=3660 width=29) (actual
time=3.122..87.889 rows=3492 loops=1)"
"                                      Recheck Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                      ->  Bitmap Index Scan on
videos_inserted_idx  (cost=0.00..29.37 rows=3660 width=0) (actual
time=0.696..0.696 rows=3492 loops=1)"
"                                            Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Seq Scan on facebook_people
moments  (cost=0.00..1.04 rows=1 width=104) (actual time=0.040..0.040
rows=0 loops=1)"
"                                      Filter: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
address_people_inserted_idx on address_people moments
(cost=0.00..4.06 rows=1 width=29) (actual time=0.017..0.017 rows=0
loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
path_people_inserted_idx on path_people moments  (cost=0.00..17.03
rows=593 width=29) (actual time=1.758..1.758 rows=0 loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                                ->  Index Scan using
sleep_inserted_idx on sleep moments  (cost=0.00..4.06 rows=1 width=29)
(actual time=0.012..0.012 rows=0 loops=1)"
"                                      Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
"                          ->  Hash  (cost=8.88..8.88 rows=292
width=68) (actual time=0.520..0.520 rows=334 loops=1)"
"                                Buckets: 1024  Batches: 1  Memory Usage: 29kB"
"                                ->  Seq Scan on pg_class
(cost=0.00..8.88 rows=292 width=68) (actual time=0.007..0.257 rows=334
loops=1)"
"Total runtime: 73511.072 ms"

Please let me know if there is any way to make this more efficient.

Thank you,
-Alessandro

On Tue, Jan 31, 2012 at 2:10 PM, Alessandro Gagliardi
<alessandro@path.com> wrote:
>
> Looks like I missed a key sentence in http://www.postgresql.org/docs/9.0/static/ddl-inherit.html which states: "A
seriouslimitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints
onlyapply to single tables, not to their inheritance children." 
> I should have realized that as I exploited that "limitation" in three of my tables. Gradually adding those indices
now;will report on what kind of difference it makes.... 
>
> On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
>>
>> My slow query today is somewhat more complex than yesterday's, but I'm
>> hopeful it can be improved. Here's the query:
>>
>> SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
>> JOIN emotions USING (moment_id)
>> WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid
>> GROUP BY relname, emotion ORDER BY relname, emotion;
>>
>> As you'll see below, moments is inherited by a number of other tables
>> and the purpose of relname is to see which one. Meanwhile, emotions
>> inherits feedback.
>>
>> Here's the Full Table and Index Schema:
>>
>> CREATE TABLE moments
>> (
>>   moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text),
>>   block_id character(24) NOT NULL,
>>   inserted timestamp without time zone NOT NULL DEFAULT now(),
>>   CONSTRAINT moments_pkey PRIMARY KEY (moment_id )
>> )
>> WITH (
>>   OIDS=FALSE
>> );
>>
>> CREATE INDEX moments_block_id_idx
>>   ON moments
>>   USING btree
>>   (block_id );
>>
>> CREATE INDEX moments_inserted_idx
>>   ON moments
>>   USING btree
>>   (inserted );
>>
>> CREATE TABLE feedback
>> (
>>  feedback_id character(24) NOT NULL,
>>  user_id character(24) NOT NULL,
>>  moment_id character(24) NOT NULL,
>>  created timestamp without time zone,
>>  inserted timestamp without time zone NOT NULL DEFAULT now(),
>>  lnglat point,
>>  CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id )
>> )
>> WITH (
>>  OIDS=FALSE
>> );
>>
>> CREATE INDEX feedback_lnglat_idx
>>  ON feedback
>>  USING gist
>>  (lnglat );
>>
>> CREATE INDEX feedback_moment_id_idx
>>  ON feedback
>>  USING btree
>>  (moment_id );
>>
>> CREATE TABLE emotions
>> (
>> -- Inherited from table feedback:  feedback_id character(24) NOT NULL,
>> -- Inherited from table feedback:  user_id character(24) NOT NULL,
>> -- Inherited from table feedback:  moment_id character(24) NOT NULL,
>> -- Inherited from table feedback:  created timestamp without time zone,
>> -- Inherited from table feedback:  inserted timestamp without time
>> zone NOT NULL DEFAULT now(),
>>  emotion character varying NOT NULL,
>> -- Inherited from table :  lnglat point,
>>  CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id )
>> )
>> INHERITS (feedback)
>> WITH (
>>  OIDS=FALSE
>> );
>>
>> CREATE INDEX emotions_emotion_idx
>>  ON emotions
>>  USING btree
>>  (emotion );
>>
>> Here's the results from EXPLAIN ANALYZE:
>>
>> "Sort  (cost=309717.70..309718.43 rows=1460 width=94) (actual
>> time=60462.534..60462.544 rows=25 loops=1)"
>> "  Sort Key: pg_class.relname, emotions.emotion"
>> "  Sort Method:  quicksort  Memory: 20kB"
>> "  ->  HashAggregate  (cost=309697.24..309702.35 rows=1460 width=94)
>> (actual time=60462.457..60462.476 rows=25 loops=1)"
>> "        ->  Hash Join  (cost=133154.62..308963.70 rows=489024
>> width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)"
>> "              Hash Cond: (public.moments.tableoid = pg_class.oid)"
>> "              ->  Hash Join  (cost=133144.72..307119.96 rows=489024
>> width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)"
>> "                    Hash Cond: (public.moments.moment_id = emotions.moment_id)"
>> "                    ->  Append  (cost=0.00..114981.64 rows=119665
>> width=29) (actual time=883.153..21696.939 rows=357565 loops=1)"
>> "                          ->  Seq Scan on moments  (cost=0.00..0.00
>> rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on thoughts moments
>> (cost=0.00..38856.88 rows=44388 width=29) (actual
>> time=883.150..9040.959 rows=115436 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on photos moments
>> (cost=0.00..29635.78 rows=194 width=29) (actual
>> time=5329.700..5827.447 rows=116420 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on music moments
>> (cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266
>> rows=37248 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on people moments
>> (cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393
>> rows=0 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on places moments
>> (cost=0.00..24551.03 rows=54961 width=29) (actual
>> time=5224.044..5324.517 rows=85564 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on videos moments
>> (cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735
>> rows=2897 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on facebook_people moments
>> (cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0
>> loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on address_people moments
>> (cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0
>> loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on path_people moments
>> (cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166
>> rows=0 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on sleep moments
>> (cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0
>> loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                    ->  Hash  (cost=79292.49..79292.49 rows=4059496
>> width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)"
>> "                          Buckets: 262144  Batches: 4  Memory Usage: 75211kB"
>> "                          ->  Seq Scan on emotions
>> (cost=0.00..79292.49 rows=4059496 width=55) (actual
>> time=0.012..15969.981 rows=4058642 loops=1)"
>> "              ->  Hash  (cost=8.88..8.88 rows=292 width=68) (actual
>> time=0.487..0.487 rows=319 loops=1)"
>> "                    Buckets: 1024  Batches: 1  Memory Usage: 28kB"
>> "                    ->  Seq Scan on pg_class  (cost=0.00..8.88
>> rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)"
>> "Total runtime: 60601.612 ms"
>>
>> Postgres version: is still 9.0.5
>>
>> History: N/A (This is the first time I've run this query.)
>>
>> Hardware: 1.7 GB Cache and other things you'd expect from a Ronin
>> instance of a Heroku Postgres database.
>>
>> Maintenance Setup: What Heroku does. As before, vacuum should not be
>> relevant as there are no deletes or even updates (just inserts and
>> selects)
>>
>> WAL Configuration: I still don't know. Heroku hosts the database on
>> Amazon's servers, so maybe that answers the question?
>>
>> GUC Settings: As per the yesterday's discussion, I reduced
>> random_page_cost to 2. Other than that, it's all default.
>>
>> Bonus question: If that was too simple, here's something even more
>> complex I'd like to do: I have another table that inherits feedback
>> called "comments". Ideally, rather than an "emotion" column coming
>> out, I would like to have a "feedback_type" column that would be
>> either the value in the emotion column of the emotions table, or
>> "comment" if it's from the comments table. I'm thinking I'm going to
>> have to simply put that together on the client, but if I can do that
>> in a single query (that doesn't take an hour to run) that would be
>> super cool. But that's definitely secondary compared to getting the
>> above query to run faster.
>>
>> Thank you very much for any help!
>> -Alessandro Gagliardi
>
>

Re: From Simple to Complex

From

Alessandro Gagliardi

Date:

31 January 2012, 19:43:26

I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S

From this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that "FROM pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit faster than "FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid". So I tried:

SELECT relname, emotion, COUNT(feedback_id)

FROM pg_class, moments, emotions

WHERE moments.tableoid = pg_class.oid

AND emotions.inserted > 'yesterday'

AND moments.inserted BETWEEN 'yesterday' AND 'today'

AND emotions.moment_id = moments.moment_id

GROUP BY relname, emotion

ORDER BY relname, emotion;

That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdF

On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessandro@path.com> wrote:

I changed the query a bit so the results would not change over the
course of the day to:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

Re: From Simple to Complex

From

Alessandro Gagliardi

Date:

01 February 2012, 14:19:44

Final update on this thread: since it is only necessary for me to get a rough ratio of the distribution (and not the absolute count), I refactored the query to include a subquery that samples from the moments table thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 10000; I also took advantage of another table called blocks that happens to contain the moment_type as well (thus making it so I don't need to reference pg_class). The final query looks like:

SELECT moment_type, emotion, COUNT(feedback_id)

FROM (SELECT moment_id, block_id

FROM moments

WHERE inserted BETWEEN 'yesterday' AND 'today'

ORDER BY RANDOM() LIMIT 10000) AS sample_moments

JOIN blocks USING (block_id)

JOIN emotions USING (moment_id)

GROUP BY moment_type, emotion

ORDER BY moment_type, emotion

The explain is at http://explain.depesz.com/s/lYh

Interestingly, increasing the limit does not seem to increase the runtime in a linear fashion. When I run it with a limit of 60000 I get a runtime of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744 ms. I assume that that's because I'm hitting a memory limit and paging out. Is that right?

On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi <alessandro@path.com> wrote:

I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S

From this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that "FROM pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit faster than "FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid". So I tried:

SELECT relname, emotion, COUNT(feedback_id)
FROM pg_class, moments, emotions
WHERE moments.tableoid = pg_class.oid
AND emotions.inserted > 'yesterday'
AND moments.inserted BETWEEN 'yesterday' AND 'today'
AND emotions.moment_id = moments.moment_id
GROUP BY relname, emotion
ORDER BY relname, emotion;

That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdF

On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
I changed the query a bit so the results would not change over the
course of the day to:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

Re: From Simple to Complex

From

Scott Marlowe

Date:

01 February 2012, 14:35:47

On Wed, Feb 1, 2012 at 11:19 AM, Alessandro Gagliardi
<alessandro@path.com> wrote:
> Final update on this thread: since it is only necessary for me to get a
> rough ratio of the distribution (and not the absolute count), I refactored
> the query to include a subquery that samples from the moments table
> thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN
> 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 10000; I also took advantage
> of another table called blocks that happens to contain the moment_type as
> well (thus making it so I don't need to reference pg_class). The final query
> looks like:
>
> SELECT moment_type, emotion, COUNT(feedback_id)
>   FROM (SELECT moment_id, block_id
>           FROM moments
>          WHERE inserted BETWEEN 'yesterday' AND 'today'
>          ORDER BY RANDOM() LIMIT 10000) AS sample_moments
>   JOIN blocks USING (block_id)
>   JOIN emotions USING (moment_id)
>  GROUP BY moment_type, emotion
>  ORDER BY moment_type, emotion
>
> The explain is at http://explain.depesz.com/s/lYh
>
> Interestingly, increasing the limit does not seem to increase the runtime in
> a linear fashion. When I run it with a limit of 60000 I get a runtime
> of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744
> ms. I assume that that's because I'm hitting a memory limit and paging out.
> Is that right?

Hard to say.  more likely your query plan changes at that point.  Run
the queries with "explain analyze" in front of them to find out.

Re: From Simple to Complex

From

Alessandro Gagliardi

Date:

01 February 2012, 14:48:50

LIMIT 65536; Total query runtime: 14846 ms. - http://explain.depesz.com/s/I3E

LIMIT 69632: Total query runtime: 80141 ms. - http://explain.depesz.com/s/9hp

So it looks like when the limit crosses a certain threshold (somewhere north of 2^16), Postgres decides to do a Seq Scan instead of an Index Scan.

I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5? Actually 60K should be plenty for my purposes anyway.

On Wed, Feb 1, 2012 at 10:35 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Wed, Feb 1, 2012 at 11:19 AM, Alessandro Gagliardi
<alessandro@path.com> wrote:
> Interestingly, increasing the limit does not seem to increase the runtime in
> a linear fashion. When I run it with a limit of 60000 I get a runtime
> of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744
> ms. I assume that that's because I'm hitting a memory limit and paging out.
> Is that right?

Hard to say. more likely your query plan changes at that point. Run
the queries with "explain analyze" in front of them to find out.

Re: From Simple to Complex

From

Scott Marlowe

Date:

01 February 2012, 14:53:25

On Wed, Feb 1, 2012 at 11:48 AM, Alessandro Gagliardi
<alessandro@path.com> wrote:
> LIMIT 65536; Total query runtime: 14846 ms.
> - http://explain.depesz.com/s/I3E
> LIMIT 69632: Total query runtime: 80141 ms.
> - http://explain.depesz.com/s/9hp
>
> So it looks like when the limit crosses a certain threshold (somewhere north
> of 2^16), Postgres decides to do a Seq Scan instead of an Index Scan.
> I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5?
> Actually 60K should be plenty for my purposes anyway.

It's important to set random_page_cost according to more than just one
query, but yeah, at this point it's likely a good idea to set it
closer to 1.0.  You're on heroku right?  Something closer to 1.0 is
likely called for if so.  1.2 to 1.4 or so.

If you've got other queries you can test the change on all the better.

Re: From Simple to Complex

From

Bob Lunney

Date:

01 February 2012, 15:05:27

Possibly. What does

psql > show work_mem;

say?

Bob Lunney

From: Alessandro Gagliardi <alessandro@path.com>
To: pgsql-performance@postgresql.org
Sent: Wednesday, February 1, 2012 12:19 PM
Subject: Re: [PERFORM] From Simple to Complex

SELECT moment_type, emotion, COUNT(feedback_id)

FROM (SELECT moment_id, block_id

FROM moments

WHERE inserted BETWEEN 'yesterday' AND 'today'

ORDER BY RANDOM() LIMIT 10000) AS sample_moments

JOIN blocks USING (block_id)

JOIN emotions USING (moment_id)

GROUP BY moment_type, emotion

ORDER BY moment_type, emotion

The explain is at http://explain.depesz.com/s/lYh

On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi <alessandro@path.com> wrote:

I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S

From this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that "FROM pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit faster than "FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid". So I tried:

SELECT relname, emotion, COUNT(feedback_id)
FROM pg_class, moments, emotions
WHERE moments.tableoid = pg_class.oid
AND emotions.inserted > 'yesterday'
AND moments.inserted BETWEEN 'yesterday' AND 'today'
AND emotions.moment_id = moments.moment_id
GROUP BY relname, emotion
ORDER BY relname, emotion;

That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdF

On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessandro@path.com> wrote:
I changed the query a bit so the results would not change over the
course of the day to:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

Re: From Simple to Complex

From

Alessandro Gagliardi

Date:

01 February 2012, 15:19:44

On Wed, Feb 1, 2012 at 11:04 AM, Bob Lunney <bob_lunney@yahoo.com> wrote:

Possibly. What does

psql > show work_mem;

say?

100MB

Re: From Simple to Complex

From

Merlin Moncure

Date:

02 February 2012, 10:52:34

On Wed, Feb 1, 2012 at 12:48 PM, Alessandro Gagliardi
<alessandro@path.com> wrote:
> LIMIT 65536; Total query runtime: 14846 ms.
> - http://explain.depesz.com/s/I3E
> LIMIT 69632: Total query runtime: 80141 ms.
> - http://explain.depesz.com/s/9hp
>
> So it looks like when the limit crosses a certain threshold (somewhere north
> of 2^16), Postgres decides to do a Seq Scan instead of an Index Scan.
> I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5?
> Actually 60K should be plenty for my purposes anyway.


also, is effective_cache_size set to a reasonable value?

merlin

Re: From Simple to Complex

From

Alessandro Gagliardi

Date:

02 February 2012, 15:38:27

On Thu, Feb 2, 2012 at 6:52 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

also, is effective_cache_size set to a reasonable value?

Yeah, it's 1530000kB