Thread: Hibernate generated query slow compared to 'equivalent' hand written one

Hibernate generated query slow compared to 'equivalent' hand written one

From
Kyle Moser
Date:
The depesz link for explain (analyze, buffers) is shown below for 3 different queries. The first two queries show a log dump of the postgres log, showing a query that was generated by Java Hibernate. The third query was one I wrote and ran in pgadmin that I think is similar to what Hibernate is doing. You can see in the first link query plan output that the query shown below is doing a sequential scan on REMINDERS which I do not understand since REMINDERS has an index on FK_USER. That isn't the focus of my question. In the second query I forced it to use the index but it wasn't any faster. Now, if I go into pgadmin and execute what I thought was an equivalent query it is much faster (less than 2ms vs. 2000 ms). My question is basically: why is the query I wrote so much faster than the hibernate ones, when the query appears essentially the same?

Schema for REMINDERS table (SQL shown):

CREATE TABLE pgsch.reminders
(
    id bigint NOT NULL,
    fk_user bigint NOT NULL,
    date timestamp without time zone NOT NULL,
    finished character(1) COLLATE "default".pg_catalog NOT NULL DEFAULT 'F'::bpchar,
    subject character varying(255) COLLATE "default".pg_catalog NOT NULL,
    is_on character(1) COLLATE "default".pg_catalog NOT NULL DEFAULT 'T'::bpchar,
    body text COLLATE "default".pg_catalog,
    CONSTRAINT reminders_pkey PRIMARY KEY (id),
    CONSTRAINT usr_cstr FOREIGN KEY (fk_user)
        REFERENCES pgsch.user (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
        DEFERRABLE INITIALLY DEFERRED
)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX remidx
    ON pgsch.reminders USING btree
    (fk_user)
    TABLESPACE pg_default;

__________________________________________________________________


Additional info:
PostgreSQL 9.5.3
reminders table has 66K rows
Settings used in postgres conf: 
auto_explain.log_analyze = true, auto_explain.log_buffers = true, track_io_timing = true, work_mem = 256MB, shared_buffers = 8GB, effective_cache_size = 25GB

Additional info from the postgres log for the first query:

2016-10-14 11:23:38.107 EDT >LOG:  duration: 0.798 ms  parse <unnamed>: select distinct this_.FK_USER as y0_ from pgsch.REMINDERS this_ where this_.FK_USER in ($1, ..., $999)
2016-10-14 11:23:38.109 EDT >LOG:  duration: 1.164 ms  bind <unnamed>: select distinct this_.FK_USER as y0_ from pgsch.REMINDERS this_ where this_.FK_USER in ($1, ..., $999)

 >DETAIL:  parameters: $1 = '213', $2 = '382', $3 = '131', $4 = '174', $5 = '885', ..., $992 = '830', $993 = '333', $994 = '414', $995 = '481', $996 = '454', $997 = '728', $998 = '281', $999 = '717'
  duration: 1571.404 ms  execute <unnamed>: select distinct this_.FK_USER as y0_ from pgsch.REMINDERS this_ where this_.FK_USER in ($1, ..., $999)
  parameters: $1 = '213', $2 = '382', $3 = '131', $4 = '174', $5 = '885', ..., $992 = '830', $993 = '333', $994 = '414', $995 = '481', $996 = '454', $997 = '728', $998 = '281', $999 = '717'
 
 2016-10-14 11:23:39.682 EDT >LOG:  duration: 1571.388 ms  plan:
        Query Text: select distinct this_.FK_USER as y0_ from pgsch.REMINDERS this_ where this_.FK_USER in ($1, ..., $999)

Then I put an additional setting in my postgres.conf (enable_seqscan = off) and got this for the second query:
 
 So as you can see it didn't really improve anything even though it was using the index.
 
 However, if I run the following query in pgadmin: "explain(analyze, buffers) select distinct this_.FK_USER as y0_ from pgsch.REMINDERS this_ where this_.FK_USER in (1...1000);" 
 I get the following output. Note that 1...1000 here are 1000 values I generated. I ran this a bunch of times and the randomness or values themselves don't seem to make much difference (always faster than 2ms). I also ran this query as a prepared statement with no noticeable difference, since I figured that's what Hibernate is doing.
 
 
Why does the third query (pgadmin one) show 270 for the rows whereas the query from the postgres log (initiated by hibernate) shows 65597 rows, which is close to the real number of rows? 
Any other tips why the hibernate query is so slow compared to the pgadmin one?  I am totally stumped here and out of ideas.

Thanks for any help or suggestions anyone can provide.
Kyle Moser <moser.kyle@gmail.com> writes:
> The depesz link for explain (analyze, buffers) is shown below for 3
> different queries. The first two queries show a log dump of the postgres
> log, showing a query that was generated by Java Hibernate. The third query
> was one I wrote and ran in pgadmin that I think is similar to what
> Hibernate is doing.

It's not all that similar: according to the EXPLAIN output, the condition
Hibernate is generating is

Filter: ((FK_USER)::numeric = ANY ('{213,382,131,...,717}'::numeric[]))

whereas your handwritten query is generating

Index Cond: (fk_user = ANY ('{70,150,1248,1269,1530,...,199954}'::bigint[]))

IOW, Hibernate is telling the server that the parameters it's supplying
are NUMERIC not INTEGER, which results in a query using numeric_eq, which
can't be indexed by a bigint index.

If you can't find a hammer big enough to persuade Hibernate that it's
dealing with integers/bigints rather than numerics, you could probably
regain most of the performance by creating an index on (FK_USER::numeric).

BTW, why is one of your EXPLAINs showing the identifiers in upper case
and the other in lower case?  One could be forgiven for wondering if
these were really against the same data.

            regards, tom lane


Re: Hibernate generated query slow compared to 'equivalent' hand written one

From
Kyle Moser
Date:
Tom,

Thanks so much for the response. They are the same data, that was due to deidentification on my part. So even though the second Hibernate query says "index only scan" (in addition to the filter, as you said) it is inefficient. Why does it say index only scan if it can't use the index due to the types being numeric and the index being bigint? (I suppose my question here is how to interpret the output properly - so I don't make this mistake again). 

On Fri, Oct 14, 2016 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kyle Moser <moser.kyle@gmail.com> writes:
> The depesz link for explain (analyze, buffers) is shown below for 3
> different queries. The first two queries show a log dump of the postgres
> log, showing a query that was generated by Java Hibernate. The third query
> was one I wrote and ran in pgadmin that I think is similar to what
> Hibernate is doing.

It's not all that similar: according to the EXPLAIN output, the condition
Hibernate is generating is

Filter: ((FK_USER)::numeric = ANY ('{213,382,131,...,717}'::numeric[]))

whereas your handwritten query is generating

Index Cond: (fk_user = ANY ('{70,150,1248,1269,1530,...,199954}'::bigint[]))

IOW, Hibernate is telling the server that the parameters it's supplying
are NUMERIC not INTEGER, which results in a query using numeric_eq, which
can't be indexed by a bigint index.

If you can't find a hammer big enough to persuade Hibernate that it's
dealing with integers/bigints rather than numerics, you could probably
regain most of the performance by creating an index on (FK_USER::numeric).

BTW, why is one of your EXPLAINs showing the identifiers in upper case
and the other in lower case?  One could be forgiven for wondering if
these were really against the same data.

                        regards, tom lane

Kyle Moser <moser.kyle@gmail.com> writes:
> Thanks so much for the response. They are the same data, that was due to
> deidentification on my part. So even though the second Hibernate query says
> "index only scan" (in addition to the filter, as you said) it is
> inefficient. Why does it say index only scan if it can't use the index due
> to the types being numeric and the index being bigint? (I suppose my
> question here is how to interpret the output properly - so I don't make
> this mistake again).

The key thing to notice about that is that it says "Filter" not
"Index Cond".  That means it's pulling data from the index but
not making use of the index's search ability --- that is, it's
scanning every index entry and applying the "IN" condition to the
value, in much the same way as it'd do with heap entries in a plain
seqscan.  That's a pretty silly plan, which in most cases you would
not get if you hadn't forced it.

            regards, tom lane