Re: query issue - Mailing list pgsql-general

From David Rowley
Subject Re: query issue
Date
Msg-id CAApHDvqiOm17J47ohbs4UfmTQv-OTLvYg0-=RFK9JoQU0po+QQ@mail.gmail.com
Whole thread Raw
In response to Re: query issue  (Atul Kumar <akumar14871@gmail.com>)
List pgsql-general
On Wed, 16 Jun 2021 at 18:29, Atul Kumar <akumar14871@gmail.com> wrote:
> QUERY PLAN
> Limit  (cost=0.43..5529.03 rows=10 width=37) (actual
> time=0.974..12911.087 rows=10 loops=1)
>   Output: items._id
>   Buffers: shared hit=4838 read=3701
>   ->  Subquery Scan on items  (cost=0.43..1622646.30 rows=2935
> width=37) (actual time=0.972..12911.078 rows=10 loops=1)
>         Output: items._id
>         Buffers: shared hit=4838 read=3701
>         ->  Index Scan using sort on
> "op_KFDaBAZDSXc4YYts9"."UserFeedItems"  (cost=0.43..1622616.95
> rows=2935 width=50) (actual time=0.970..12911.070 rows=10 loops=1)
>               Output: "UserFeedItems"._id, "UserFeedItems".score,
> "UserFeedItems"."updatedAt"
>               Filter: (("UserFeedItems".is_deleted = ANY
> ('{t,f}'::boolean[])) AND ("UserFeedItems"."userId" =
> '5d230d67bd99c5001b1ae757'::text) AND ("UserFeedItems"."itemType" <>
> ALL ('{WELCOME_POST,UPLOAD_CONTACTS,BROADCAST_POST}'::text[])))
>               Rows Removed by Filter: 15478
>               Buffers: shared hit=4838 read=3701
> Planning time: 100.949 ms
> Execution time: 12930.302 ms

It seems to me that this system is pretty slow on I/O.  I imagine if
you do: SET track_io_timing = ON:  then run EXPLAIN (ANALYZE, BUFFERS)
on the query that you'll see that most of the time is spent doing I/O.

If you're unable to make I/O faster then you might want to upgrade to
a machine that's more likely to be able to keep the working set of
your database in memory.

From looking at your earlier queries:

>         ->  Bitmap Heap Scan on "UserFeedItems"
> (cost=131.33..10994.60 rows=2935 width=1304) (actual
> time=26.245..6093.680 rows=3882 loops=1)
>               Recheck Cond: ("userId" = '5d230d67bd99c5001b1ae757'::text)
>               Filter: ((is_deleted = ANY ('{t,f}'::boolean[])) AND
> ("itemType" <> ALL
> ('{WELCOME_POST,UPLOAD_CONTACTS,BROADCAST_POST}'::text[])))
>               Rows Removed by Filter: 1
>               Heap Blocks: exact=3804

The bitmap index scan matched 3804 pages and there are only 3882 rows.
That's an average of just over 1 tuple per page.  If the table was
clustered by that index then that might help speed up your query, but
since the cluster order is not maintained then it'll likely go out
over time and the query will just become slow again.

If you had been using at least PostgreSQL 12 then you could have
looked into using partitioning.  Partitioning does exist before 12,
but it became much better in that version.  Partitioning might help
you here if you partitioned by HASH (userid) as you might average a
few more matched rows per page in the bitmap scan and reduce the
number of pages that need to be read from disk. I'm not really sure
how many partitions you'd have to make to get a meaningful improvement
there though.  That'll depend on how many users there are and how big
the rows are.

There are also some pretty bad design choices with your table. You
seem to have lots of IDs which are TEXT fields.  It's fairly normal
practice in databases not to do that and to use something like INT or
BIGINT. Using TEXT is pretty bad for a few reasons. 1) it makes your
data bigger and reduces cache hit ratios. 2) variable length fields at
the start of tables is not so great as tuple deforming becomes slower
due to there being no fixed offset into columns that come after a
variable-length field.

Anyway, there's quite a lot you could do here to make this query run faster.

David



pgsql-general by date:

Previous
From: "Holtgrewe, Manuel"
Date:
Subject: Re: [ext] Re: Losing data because of problematic configuration?
Next
From: "Peter J. Holzer"
Date:
Subject: Re: clear cache in postgresql