Go Back   Rhinocerus > Newsgroup > Newsgroup comp.databases.mysql

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 06-04-2008, 05:05 AM
John Nagle
Guest
 
Posts: n/a
Default Slow query with MEMORY engine - why?

I'm doing this query repeatedly on a MEMORY database.

SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
rating_state = 'queued' AND NOT EXISTS
(SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
!= 'queued') ORDER BY request_timestamp LIMIT 1

Here's the table:

describe ratingqueue;
+-------------------+-------------------------------------+------+-----+---------------------+-------+
| Field | Type | Null | Key | Default
| Extra |
+-------------------+-------------------------------------+------+-----+---------------------+-------+
| domain | varchar(255) | NO | PRI |
| |
| requestor_ip_hash | int(11) | YES | MUL | NULL
| |
| rating_state | enum('queued','starting','running') | NO | MUL | queued
| |
| server | varchar(63) | YES | | NULL
| |
| priority | smallint(6) | YES | | 0
| |
| update_timestamp | timestamp | NO | |
CURRENT_TIMESTAMP | |
| request_timestamp | timestamp | NO | MUL |
0000-00-00 00:00:00 | |
+-------------------+-------------------------------------+------+-----+---------------------+-------+


It's actually a scheduler; a number of machines are looking for the next
work queue item. There's no need for the table to persist over restarts,
so it's in the MEMORY engine. The point of doing this in the database
is that multiple machines can use the same work queue.

This works fine when the table is small, but with
about 1000 entries, it seems to choke. Each query takes about 2 seconds.
And that's in the MEMORY database. This suggests that it's doing an O(N^2)
operation.

Here's an EXPLAIN:

explain SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
rating_state = 'queued' AND NOT EXISTS
(SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
!= 'queued') ORDER BY request_timestamp LIMIT 1;
+----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra |
+----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
| 1 | PRIMARY | ratingqueue | ref | rating_state
| rating_state | 1 | const | 1097 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | ratingqueue | ALL | requestor_ip_hash,rating_state
| NULL | NULL | NULL | 823 | Using where |
+----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+

All the fields in the subquery are keys. What's the problem?

There must be a more efficient way to express this query.
Suggestions?

John Nagle

Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 06-04-2008, 08:49 AM
Captain Paralytic
Guest
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
> I'm doing this query repeatedly on a MEMORY database.
>
> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
> rating_state = 'queued' AND NOT EXISTS
> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
> != 'queued') ORDER BY request_timestamp LIMIT 1
>
> Here's the table:
>
> describe ratingqueue;
> +-------------------+-------------------------------------+------+-----+---------------------+-------+
> | Field | Type | Null | Key | Default
> | Extra |
> +-------------------+-------------------------------------+------+-----+---------------------+-------+
> | domain | varchar(255) | NO | PRI |
> | |
> | requestor_ip_hash | int(11) | YES | MUL | NULL
> | |
> | rating_state | enum('queued','starting','running') | NO | MUL | queued
> | |
> | server | varchar(63) | YES | | NULL
> | |
> | priority | smallint(6) | YES | | 0
> | |
> | update_timestamp | timestamp | NO | |
> CURRENT_TIMESTAMP | |
> | request_timestamp | timestamp | NO | MUL |
> 0000-00-00 00:00:00 | |
> +-------------------+-------------------------------------+------+-----+---------------------+-------+
>
> It's actually a scheduler; a number of machines are looking for the next
> work queue item. There's no need for the table to persist over restarts,
> so it's in the MEMORY engine. The point of doing this in the database
> is that multiple machines can use the same work queue.
>
> This works fine when the table is small, but with
> about 1000 entries, it seems to choke. Each query takes about 2 seconds.
> And that's in the MEMORY database. This suggests that it's doing an O(N^2)
> operation.
>
> Here's an EXPLAIN:
>
> explain SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
> rating_state = 'queued' AND NOT EXISTS
> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
> != 'queued') ORDER BY request_timestamp LIMIT 1;
> +----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
> | id | select_type | table | type | possible_keys
> | key | key_len | ref | rows | Extra |
> +----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
> | 1 | PRIMARY | ratingqueue | ref | rating_state
> | rating_state | 1 | const | 1097 | Using where; Using filesort |
> | 2 | DEPENDENT SUBQUERY | ratingqueue | ALL | requestor_ip_hash,rating_state
> | NULL | NULL | NULL | 823 | Using where |
> +----+--------------------+-------------+------+--------------------------------+--------------+---------+-------+------+-----------------------------+
>
> All the fields in the subquery are keys. What's the problem?
>
> There must be a more efficient way to express this query.
> Suggestions?
>
> John Nagle


Having all the fields as keys may not help that much when you have
more than one criteria in a WHERE or JOIN clause. In this case an
appropriate composite key is much more useful.

Also, I would rewrite this as a JOIN instead of a dependent subquery.
Reply With Quote
  #3 (permalink)  
Old 06-05-2008, 01:25 AM
John Nagle
Guest
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

Captain Paralytic wrote:
> On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
>> I'm doing this query repeatedly on a MEMORY database.
>>
>> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
>> rating_state = 'queued' AND NOT EXISTS
>> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
>> != 'queued') ORDER BY request_timestamp LIMIT 1

>
> Having all the fields as keys may not help that much when you have
> more than one criteria in a WHERE or JOIN clause. In this case an
> appropriate composite key is much more useful.


> Also, I would rewrite this as a JOIN instead of a dependent subquery.


How do you express "not exists" in a JOIN?

John Nagle
Reply With Quote
  #4 (permalink)  
Old 06-05-2008, 09:05 AM
Captain Paralytic
Guest
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

On 5 Jun, 02:25, John Nagle <na...@animats.com> wrote:
> Captain Paralytic wrote:
> > On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
> >> I'm doing this query repeatedly on a MEMORY database.

>
> >> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
> >> rating_state = 'queued' AND NOT EXISTS
> >> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
> >> != 'queued') ORDER BY request_timestamp LIMIT 1

>
> > Having all the fields as keys may not help that much when you have
> > more than one criteria in a WHERE or JOIN clause. In this case an
> > appropriate composite key is much more useful.
> > Also, I would rewrite this as a JOIN instead of a dependent subquery.

>
> How do you express "not exists" in a JOIN?
>
> John Nagle


I'm not totally sure about what the data rules are in your table, but
something this should do it.

SELECT
r1.domain,
r1.requestor_ip_hash
FROM ratingqueue r1
LEFT JOIN ratingqueue r2 ON r1.requestor_ip_hash =
r2.requestor_ip_hash AND r2.rating_state != 'queued'
WHERE r1.rating_state = 'queued' AND r2.requestor_ip_hash IS NULL

Note that setting your query out nicely as above before posting makes
it much easier for us to offer help.
Reply With Quote
  #5 (permalink)  
Old 06-10-2008, 07:03 PM
John Nagle
Guest
 
Posts: n/a
Default Re: Slow query with MEMORY engine - why?

Captain Paralytic wrote:
> On 5 Jun, 02:25, John Nagle <na...@animats.com> wrote:
>> Captain Paralytic wrote:
>>> On 4 Jun, 06:05, John Nagle <na...@animats.com> wrote:
>>>> I'm doing this query repeatedly on a MEMORY database.
>>>> SELECT domain, requestor_ip_hash AS requestor FROM ratingqueue WHERE
>>>> rating_state = 'queued' AND NOT EXISTS
>>>> (SELECT * FROM ratingqueue WHERE requestor_ip_hash = requestor AND rating_state
>>>> != 'queued') ORDER BY request_timestamp LIMIT 1
>>> Having all the fields as keys may not help that much when you have
>>> more than one criteria in a WHERE or JOIN clause. In this case an
>>> appropriate composite key is much more useful.
>>> Also, I would rewrite this as a JOIN instead of a dependent subquery.

>> How do you express "not exists" in a JOIN?
>>
>> John Nagle

>
> I'm not totally sure about what the data rules are in your table, but
> something this should do it.
>
> SELECT
> r1.domain,
> r1.requestor_ip_hash
> FROM ratingqueue r1
> LEFT JOIN ratingqueue r2 ON r1.requestor_ip_hash =
> r2.requestor_ip_hash AND r2.rating_state != 'queued'
> WHERE r1.rating_state = 'queued' AND r2.requestor_ip_hash IS NULL


That's worse. Tried the query in this form:

SELECT
r1.domain,
r1.requestor_ip_hash
FROM ratingqueue r1
LEFT JOIN ratingqueue r2 ON r1.requestor_ip_hash =
r2.requestor_ip_hash AND r2.rating_state != 'queued'
WHERE r1.rating_state = 'queued' AND r2.requestor_ip_hash IS NULL
ORDER BY request_timestamp LIMIT 1;

Timing results: (MySQL query browser, idle Windows PC, idle
database, MySQL 5 running on same machine as query browser.)

Original query: 0 rows fetched in 0.0007s. (4.3616s)
Left join query: 0 rows fetched in 0.0165s. (5.0268s)

In either case, it's taking 4 to 5 seconds to process this
query on a MEMORY table of modest size.

Table def is:

CREATE TABLE ratingqueue
(
domain VARCHAR(255) NOT NULL UNIQUE PRIMARY KEY,
requestor_ip_hash INT, rating_state ENUM ('queued','starting','running')
NOT NULL DEFAULT 'queued',
server VARCHAR(63) NULL, priority SMALLINT DEFAULT 0,
update_timestamp TIMESTAMP NOT NULL, request_timestamp TIMESTAMP NOT NULL,
INDEX(requestor_ip_hash, rating_state), INDEX(request_timestamp)
) ENGINE=MEMORY;

The table contains 1222 entries, 2 of which are 'rating_state="running"'
and all of which have the same 'requestor_ip_hash'. That's the problem.
The query is to return a row for which the requestor_ip_hash does not
match any row that is in "running" state. (It's a fair load balancer.)
When no such row exists, the query is incredibly inefficient.

There must be some better way to do this.

John Nagle
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: SAS Memory Phil Rack Newsgroup comp.soft-sys.sas 0 03-19-2009 10:03 PM
Re: SAS Memory Thomas Schmitt Newsgroup comp.soft-sys.sas 0 03-19-2009 09:46 PM
Re: How to incorporate/assign variables within SQL query? Howard Schreier Newsgroup comp.soft-sys.sas 0 04-25-2008 03:04 AM
Re: z/OS memory questions Charles Harbour Newsgroup comp.soft-sys.sas 1 11-03-2005 04:37 PM
Proc Report using RTF Out of Memory (8.1) on Mainframe McDonald, John M Newsgroup comp.soft-sys.sas 0 08-06-2005 06:25 PM



All times are GMT. The time now is 01:37 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.