Re: (VERY) Slow Query - PostgreSQL 9.2 - Mailing list pgsql-general

From Bill Moran
Subject Re: (VERY) Slow Query - PostgreSQL 9.2
Date
Msg-id 20160503063332.42dd365fa687af5bd8c4e862@potentialtech.com
Whole thread Raw
In response to (VERY) Slow Query - PostgreSQL 9.2  ("drum.lucas@gmail.com" <drum.lucas@gmail.com>)
List pgsql-general
On Tue, 3 May 2016 21:55:21 +1200
"drum.lucas@gmail.com" <drum.lucas@gmail.com> wrote:

> Hi all,
>
> I'm trying to get the query below a better performance.. but just don't
> know what else I can do...
>
> Please, have a look and let me know if you can help somehow.. also.. if you
> need some extra data jet ask me please.
>
> * Note that the gorfs.inode_segments table is 1.7TB size
>
> I have the following Query:
>
> explain analyzeSELECT split_part(full_path, '/', 4)::INT AS account_id,
>        split_part(full_path, '/', 6)::INT AS note_id,
>        split_part(full_path, '/', 9)::TEXT AS variation,
>        st_size,
>        segment_index,
>        reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>        i.st_ino,
>        full_path,
>        (i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS
> size_mbFROM gorfs.inodes iJOIN gorfs.inode_segments s
>   ON i.st_ino = s.st_ino_targetWHERE
>       i.checksum_md5 IS NOT NULL
>   AND
>   AND i.st_size > 0;
>   split_part(s.full_path, '/', 4)::INT IN (
> SELECT account.id
>         FROM public.ja_clients AS account
>         WHERE
>         NOT (
>                 ((account.last_sub_pay > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND
> (account.price_model > 0)) OR
>                 (account.regdate > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR
>                 (((account.price_model = 0) AND (account.jobcredits >
> 0)) AND (account.last_login > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))
>         ) LIMIT 100);
>
>
>    - Explain analyze link: http://explain.depesz.com/s/Oc6
>
> The query is taking ages, and I can't get the problem solved.
>
> These are the index I've already created on the inode_segments table:
>
> Indexes:
>     "ix_account_id_from_full_path" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
> "full_path"::"text" ~
> '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'::"text"
>     "ix_inode_segments_ja_files_lookup" "btree" ((CASE
>     WHEN "full_path"::"text" ~ '/[^/]*\.[^/]*$'::"text" THEN
> "upper"("regexp_replace"("full_path"::"text", '.*\.'::"text",
> ''::"text", 'g'::"text"))
>     ELSE NULL::"text"END)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>     "ix_inode_segments_notes_clientids" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>     "ix_inode_segments_notes_clientids2" "btree" ("full_path")
>     "ix_inode_segments_notes_fileids" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 8)::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>     "ix_inode_segments_notes_noteids" "btree"
> ((NULLIF("split_part"("full_path"::"text", '/'::"text", 6),
> 'unassigned'::"text")::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>
> These are the index I've already created on the inodes table:
>
>  Indexes:
>     "ix_inodes_checksum_st_size" "btree" ("checksum_md5", "st_size")
> WHERE "checksum_md5" IS NOT NULL
>
> *Question:*
>
> What else can I do to improve the Performance of the Query?

From the explain, it looks like the biggest pain point is the
inode_segments table, specifically, this condition:
s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
It's doing a full scan of every record in that table, which is
a large number, and that regex can't be cheap over that kind of
volume.

If you do:
SELECT count(*)
FROM inode_segments
WHERE full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+';
how many tuples actually match that condition? If the number
is a large percentage of the total table, then I'm not sure
how to help you, but if the percentage is small, you might
be able to speed things up by adding an index:

CREATE INDEX is_fp_trunc_idx ON inode_segments(substring(full_path FROM 1 FOR 19));

Then adding this condition to the where clause:
substring(s.full_path FROM 1 FOR 19) = '/userfiles/account/'

There are other index combinations that may help as well,
depending on the nature of the values in that table, but,
in general, anything you can do to reduce the number of
records that have to be examined in that table is liable
to speed things up.

--
Bill Moran


pgsql-general by date:

Previous
From: Maxim Boguk
Date:
Subject: Insert only table and size of GIN index JSONB field.
Next
From: Maxim Boguk
Date:
Subject: Re: Insert only table and size of GIN index JSONB field.