Thread: (VERY) Slow Query - PostgreSQL 9.2

(VERY) Slow Query - PostgreSQL 9.2

From
"drum.lucas@gmail.com"
Date:

Hi all,


I'm trying to get the query below a better performance.. but just don't know what else I can do...

Please, have a look and let me know if you can help somehow.. also.. if you need some extra data jet ask me please.

* Note that the gorfs.inode_segments table is 1.7TB size

I have the following Query:

explain analyze
SELECT split_part(full_path, '/', 4)::INT AS account_id,      split_part(full_path, '/', 6)::INT AS note_id,      split_part(full_path, '/', 9)::TEXT AS variation,      st_size,      segment_index,      reverse(split_part(reverse(full_path), '/', 1)) as file_name,      i.st_ino,      full_path,      (i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb
FROM gorfs.inodes i
JOIN gorfs.inode_segments s ON i.st_ino = s.st_ino_target
WHERE     i.checksum_md5 IS NOT NULL AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+' AND i.st_size > 0; split_part(s.full_path, '/', 4)::INT IN (

SELECT account.id       FROM public.ja_clients AS account       WHERE       NOT (               ((account.last_sub_pay > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND (account.price_model > 0)) OR               (account.regdate > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR               (((account.price_model = 0) AND (account.jobcredits > 0)) AND (account.last_login > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))       ) LIMIT 100
);

The query is taking ages, and I can't get the problem solved.

These are the index I've already created on the inode_segments table:

Indexes:   "ix_account_id_from_full_path" "btree" (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE "full_path"::"text" ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'::"text"   "ix_inode_segments_ja_files_lookup" "btree" ((
CASE   WHEN "full_path"::"text" ~ '/[^/]*\.[^/]*$'::"text" THEN "upper"("regexp_replace"("full_path"::"text", '.*\.'::"text", ''::"text", 'g'::"text"))   ELSE NULL::"text"
END)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")   "ix_inode_segments_notes_clientids" "btree" (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")   "ix_inode_segments_notes_clientids2" "btree" ("full_path")   "ix_inode_segments_notes_fileids" "btree" (("split_part"("full_path"::"text", '/'::"text", 8)::integer)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")   "ix_inode_segments_notes_noteids" "btree" ((NULLIF("split_part"("full_path"::"text", '/'::"text", 6), 'unassigned'::"text")::integer)) WHERE "gorfs"."is_kaminski_note_path"("full_path"::"text")

These are the index I've already created on the inodes table:

 Indexes:   "ix_inodes_checksum_st_size" "btree" ("checksum_md5", "st_size") WHERE "checksum_md5" IS NOT NULL

Question:

What else can I do to improve the Performance of the Query?

Re: (VERY) Slow Query - PostgreSQL 9.2

From
Bill Moran
Date:
On Tue, 3 May 2016 21:55:21 +1200
"drum.lucas@gmail.com" <drum.lucas@gmail.com> wrote:

> Hi all,
>
> I'm trying to get the query below a better performance.. but just don't
> know what else I can do...
>
> Please, have a look and let me know if you can help somehow.. also.. if you
> need some extra data jet ask me please.
>
> * Note that the gorfs.inode_segments table is 1.7TB size
>
> I have the following Query:
>
> explain analyzeSELECT split_part(full_path, '/', 4)::INT AS account_id,
>        split_part(full_path, '/', 6)::INT AS note_id,
>        split_part(full_path, '/', 9)::TEXT AS variation,
>        st_size,
>        segment_index,
>        reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>        i.st_ino,
>        full_path,
>        (i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS
> size_mbFROM gorfs.inodes iJOIN gorfs.inode_segments s
>   ON i.st_ino = s.st_ino_targetWHERE
>       i.checksum_md5 IS NOT NULL
>   AND
>   AND i.st_size > 0;
>   split_part(s.full_path, '/', 4)::INT IN (
> SELECT account.id
>         FROM public.ja_clients AS account
>         WHERE
>         NOT (
>                 ((account.last_sub_pay > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND
> (account.price_model > 0)) OR
>                 (account.regdate > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR
>                 (((account.price_model = 0) AND (account.jobcredits >
> 0)) AND (account.last_login > EXTRACT('epoch' FROM
> (transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))
>         ) LIMIT 100);
>
>
>    - Explain analyze link: http://explain.depesz.com/s/Oc6
>
> The query is taking ages, and I can't get the problem solved.
>
> These are the index I've already created on the inode_segments table:
>
> Indexes:
>     "ix_account_id_from_full_path" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
> "full_path"::"text" ~
> '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'::"text"
>     "ix_inode_segments_ja_files_lookup" "btree" ((CASE
>     WHEN "full_path"::"text" ~ '/[^/]*\.[^/]*$'::"text" THEN
> "upper"("regexp_replace"("full_path"::"text", '.*\.'::"text",
> ''::"text", 'g'::"text"))
>     ELSE NULL::"text"END)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>     "ix_inode_segments_notes_clientids" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>     "ix_inode_segments_notes_clientids2" "btree" ("full_path")
>     "ix_inode_segments_notes_fileids" "btree"
> (("split_part"("full_path"::"text", '/'::"text", 8)::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>     "ix_inode_segments_notes_noteids" "btree"
> ((NULLIF("split_part"("full_path"::"text", '/'::"text", 6),
> 'unassigned'::"text")::integer)) WHERE
> "gorfs"."is_kaminski_note_path"("full_path"::"text")
>
> These are the index I've already created on the inodes table:
>
>  Indexes:
>     "ix_inodes_checksum_st_size" "btree" ("checksum_md5", "st_size")
> WHERE "checksum_md5" IS NOT NULL
>
> *Question:*
>
> What else can I do to improve the Performance of the Query?

From the explain, it looks like the biggest pain point is the
inode_segments table, specifically, this condition:
s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
It's doing a full scan of every record in that table, which is
a large number, and that regex can't be cheap over that kind of
volume.

If you do:
SELECT count(*)
FROM inode_segments
WHERE full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+';
how many tuples actually match that condition? If the number
is a large percentage of the total table, then I'm not sure
how to help you, but if the percentage is small, you might
be able to speed things up by adding an index:

CREATE INDEX is_fp_trunc_idx ON inode_segments(substring(full_path FROM 1 FOR 19));

Then adding this condition to the where clause:
substring(s.full_path FROM 1 FOR 19) = '/userfiles/account/'

There are other index combinations that may help as well,
depending on the nature of the values in that table, but,
in general, anything you can do to reduce the number of
records that have to be examined in that table is liable
to speed things up.

--
Bill Moran


Re: (VERY) Slow Query - PostgreSQL 9.2

From
Alban Hertroys
Date:
> On 03 May 2016, at 11:55, drum.lucas@gmail.com wrote:
>
> Hi all,
>
> I'm trying to get the query below a better performance.. but just don't know what else I can do...
>
> Please, have a look and let me know if you can help somehow.. also.. if you need some extra data jet ask me please.
>
> * Note that the gorfs.inode_segments table is 1.7TB size
>
> I have the following Query:
>
> explain analyze
>
> SELECT split_part(full_path, '/', 4)::INT AS account_id,
>        split_part(full_path, '/', 6)::INT AS note_id,
>        split_part(full_path, '/', 9)::TEXT AS variation,
>        st_size,
>        segment_index,
>        reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>        i.st_ino,
>        full_path,
>        (i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb
> FROM gorfs.inodes i
> JOIN gorfs.inode_segments s
>   ON i.st_ino = s.st_ino_target
> WHERE i.checksum_md5 IS NOT NULL
>   AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
>   AND i.st_size > 0;

(Stripped the 1-and-a-half extra queries in there, but that incomplete one might be why you're waiting?)

>     • Explain analyze link: http://explain.depesz.com/s/Oc6
> The query is taking ages, and I can't get the problem solved.
>
> These are the index I've already created on the inode_segments table:


> What else can I do to improve the Performance of the Query?

The first thing I notice in your query is that you're making use of hierarchically organised data without storing it
hierarchically,namely that full_path field. The result of that is that both your table and your index contain a lot of
redundantinformation. 

Now I'm not so sure a hierarchical table + query are going to help get you much performance out of this (probably worth
anexperiment or two, mind that O/S's usually use inode trees for such things), but reducing the redundancy in the index
wouldprobably help: 

create index gorfs.inodes_accounts_idx on gorfs.inodes (substring (full_path from 20)) where full_path like
'/userfiles/account/%';

and then use similar expressions in your query of course:

where full_path like '/userfiles/account/%' and substring(full_path from 20) ~ '^[0-9]+/[a-z]+/[0-9]+';

Good luck!
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: (VERY) Slow Query - PostgreSQL 9.2

From
"Mike Sofen"
Date:

 

From: drum.lucas@gmail.com  Sent: Tuesday, May 03, 2016 2:55 AM
I'm trying to get the query below a better performance.. but just don't know what else I can do...

Please, have a look and let me know if you can help somehow.. also.. if you need some extra data jet ask me please.

* Note that the gorfs.inode_segments table is 1.7TB size

I have the following Query:

explain analyze
SELECT split_part(full_path, '/', 4)::INT AS account_id,
       split_part(full_path, '/', 6)::INT AS note_id,
       split_part(full_path, '/', 9)::TEXT AS variation,
       st_size,
       segment_index,
       reverse(split_part(reverse(full_path), '/', 1)) as file_name,
       i.st_ino,
       full_path,
       (i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb
FROM gorfs.inodes i
JOIN gorfs.inode_segments s
  ON i.st_ino = s.st_ino_target
WHERE
      i.checksum_md5 IS NOT NULL
  AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
  AND i.st_size > 0;
  split_part(s.full_path, '/', 4)::INT IN (
 
SELECT account.id
        FROM public.ja_clients AS account
        WHERE
        NOT (
                ((account.last_sub_pay > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND (account.price_model > 0)) OR
                (account.regdate > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR
                (((account.price_model = 0) AND (account.jobcredits > 0)) AND (account.last_login > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))
        ) LIMIT 100
);

 

There is one obvious solution: restructure your data, since it is not in a “standard” form but you’re trying to query it as if it were…you are turning your long full_path string into columns…if performance is a concern, that overhead has to be eliminated. 

Your two choices would be to either restructure this table directly (requiring a change in app code that was filling it), or use it to fill a proper table that already has everything decomposed from the long full_path string via post-processing after the insert.  A third consideration would be to archive off older/unneeded rows to a history table to reduce row counts.  This is about proper structure.

Mike Sofen