MySQL – rich text https://www.lafferty.ca Rich Lafferty's OLD blog Wed, 17 Sep 2008 04:20:20 +0000 en-US hourly 1 https://wordpress.org/?v=5.9.2 Fun with ANALYZE TABLE https://www.lafferty.ca/2008/09/17/fun-with-analyze-table/ https://www.lafferty.ca/2008/09/17/fun-with-analyze-table/#comments Wed, 17 Sep 2008 04:11:48 +0000 http://www.lafferty.ca/?p=943 MySQL has been naughty for me lately.

First, I ran into a neat little issue on FreshBooks’ production servers last week involving the table cache and an O(n) algorithm for selecting a table to close. I wrote up a little explanation over on the FreshBooks blog that you might find interesting if you find any of this interesting. The short version is that if you’re going to be running with a full table cache and still opening tables regularly, you’ll be better off with a much smaller table cache, because finding the least-recently-used table to close is big-O of the size of the table cache. Smaller table cache = fewer tables to determine the LRU.

And then last night, out of the blue, a web forum about tinwhistles that I host hit a wall. About 8:30, my mostly-idle Linode went heavily IO-bound — as in one of the four CPUs spinning in diskwait all the time. What had originally been complex but fast (and common) queries were suddenly taking minutes and minutes to run: things like “get a list of topics in a forum”, and especially “get a list of posts for a forum’s RSS feed”.

There’s a lot of EXPLAIN output here, so I’d better put this behind a cut.

I took a look at the RSS feed query, which seemed the worst not only because it took a long time — 600+ seconds — but because it read-locked every important table in the database. Here’s the query, a big inner join:

SELECT t.topic_title, t.topic_last_post_id, t.forum_id,
    f.forum_name, p.post_time, pt.post_text, pt.bbcode_uid,
    u.username, u.user_id
FROM phpbb_topics t, phpbb_posts p, phpbb_posts_text pt,
    phpbb_users u, phpbb_forums f
WHERE t.topic_status != 1
  AND p.post_id = t.topic_last_post_id
  AND pt.post_id = p.post_id
  AND u.user_id = p.poster_id
  AND t.forum_id = f.forum_id
ORDER BY t.topic_last_post_id DESC
LIMIT 0, 15;

Here’s the mk-visual-explain output. I’ve replaced the table aliases with readable things:

Filesort
+- TEMPORARY
   table          temporary(forums,topics,posttext,posts,users)
   +- JOIN
      +- Bookmark lookup
      |  +- Table
      |  |  table          users
      |  |  possible_keys  PRIMARY
      |  +- Unique index lookup
      |     key            users->PRIMARY
      |     possible_keys  PRIMARY
      |     key_len        3
      |     ref            chiffbb.posts.poster_id
      |     rows           1
      +- JOIN
         +- Bookmark lookup
         |  +- Table
         |  |  table          posts
         |  |  possible_keys  PRIMARY,poster_id
         |  +- Unique index lookup
         |     key            posts->PRIMARY
         |     possible_keys  PRIMARY,poster_id
         |     key_len        3
         |     ref            chiffbb.topics.topic_last_post_id
         |     rows           1
         +- JOIN
            +- Bookmark lookup
            |  +- Table
            |  |  table          posttext
            |  |  possible_keys  PRIMARY
            |  +- Unique index lookup
            |     key            posttext->PRIMARY
            |     possible_keys  PRIMARY
            |     key_len        3
            |     ref            chiffbb.topics.topic_last_post_id
            |     rows           1
            +- JOIN
               +- Filter with WHERE
               |  +- Bookmark lookup
               |     +- Table
               |     |  table          topics
               |     |  possible_keys  forum_id,topic_status,topic_last_post_id
               |     +- Index lookup
               |        key            topics->forum_id
               |        possible_keys  forum_id,topic_status,topic_last_post_id
               |        key_len        2
               |        ref            chiffbb.forums.forum_id
               |        rows           2579
               +- Table scan
                  rows           23
                  +- Table
                     table          forums
                     possible_keys  PRIMARY

See that temporary table at the top that gets used in a filesort? Well…

The whole thing was multiple joins which were then ORDERed and LIMITed. So that meant that it had to find all posts to the forum, ever, and shove them in a temporary table, sort that, and take the 15 most recent posts.

“All posts to the forum, ever” is about 500MB of data. That made the temporary table big enough to go to disk. So every time this query ran and couldn’t be answered from the query cache, it had to write that 500MB file. And the cached query was invalidated whenever someone posted to the forum, which is pretty often.

The problem in this case wasn’t (entirely) the SQL. MySQL was optimizing the query poorly because the key distribution statistics were off. An ANALYZE TABLE on the affected tables fixed that, and gave us:

JOIN
+- Bookmark lookup
|  +- Table
|  |  table          forums
|  |  possible_keys  PRIMARY
|  +- Unique index lookup
|     key            forum->PRIMARY
|     possible_keys  PRIMARY
|     key_len        2
|     ref            chiffbb.topics.forum_id
|     rows           1
+- JOIN
   +- Bookmark lookup
   |  +- Table
   |  |  table          users
   |  |  possible_keys  PRIMARY
   |  +- Unique index lookup
   |     key            users->PRIMARY
   |     possible_keys  PRIMARY
   |     key_len        3
   |     ref            chiffbb.posts.poster_id
   |     rows           1
   +- JOIN
      +- Bookmark lookup
      |  +- Table
      |  |  table          posttext
      |  |  possible_keys  PRIMARY
      |  +- Unique index lookup
      |     key            posttext->PRIMARY
      |     possible_keys  PRIMARY
      |     key_len        3
      |     ref            chiffbb.posttext.post_id
      |     rows           1
      +- JOIN
         +- Bookmark lookup
         |  +- Table
         |  |  table          posts
         |  |  possible_keys  PRIMARY,poster_id
         |  +- Unique index lookup
         |     key            posts->PRIMARY
         |     possible_keys  PRIMARY,poster_id
         |     key_len        3
         |     ref            chiffbb.topics.topic_last_post_id
         |     rows           1
         +- Filesort
            +- Filter with WHERE
               +- Bookmark lookup
                  +- Table
                  |  table          topics
                  |  possible_keys  forum_id,topic_status,topic_last_post_id
                  +- Index range scan
                     key            topics->topic_status
                     possible_keys  forum_id,topic_status,topic_last_post_id
                     key_len        1
                     rows           57912

There’s still a filesort, but it’s now a filesort of a single 57k-row table that’s already been filtered. That table is about 5MB, and fits in tmp_table_size, so doesn’t go to disk. The joins all stack, and the ORDER BY just follows that one-table filesort. The query takes about 0.15s now, or about 4000x as fast.

Incidentally, it can still be improved: that “filter with WHERE” is because of the “WHERE t.topic_status != 1” in the query, and that means “where the topic is not locked”. The idea was that locked topics aren’t going to appear in the last-15-posts anyhow, so may as well exclude them. But if they’re not going to appear because of the sorting, and since you’re sorting anyhow, unless there are a LOT of locked posts that doesn’t matter. Taking out that restriction gets us:

JOIN
+- Bookmark lookup
|  +- Table
|  |  table          users
|  |  possible_keys  PRIMARY
|  +- Unique index lookup
|     key            users->PRIMARY
|     possible_keys  PRIMARY
|     key_len        3
|     ref            chiffbb.posts.poster_id
|     rows           1
+- JOIN
   +- Bookmark lookup
   |  +- Table
   |  |  table          posts
   |  |  possible_keys  PRIMARY,poster_id
   |  +- Unique index lookup
   |     key            posts->PRIMARY
   |     possible_keys  PRIMARY,poster_id
   |     key_len        3
   |     ref            chiffbb.topics.topic_last_post_id
   |     rows           1
   +- JOIN
      +- Bookmark lookup
      |  +- Table
      |  |  table          posttext
      |  |  possible_keys  PRIMARY
      |  +- Unique index lookup
      |     key            posttext->PRIMARY
      |     possible_keys  PRIMARY
      |     key_len        3
      |     ref            chiffbb.topics.topic_last_post_id
      |     rows           1
      +- JOIN
         +- Bookmark lookup
         |  +- Table
         |  |  table          forums
         |  |  possible_keys  PRIMARY
         |  +- Unique index lookup
         |     key            forums->PRIMARY
         |     possible_keys  PRIMARY
         |     key_len        2
         |     ref            chiffbb.t.forum_id
         |     rows           1
         +- Bookmark lookup
            +- Table
            |  table          topics
            |  possible_keys  forum_id,topic_last_post_id
            +- Index scan
               key            topics->topic_last_post_id
               possible_keys  forum_id,topic_last_post_id
               key_len        3
               rows           59900

And with that there isn’t even a filesort and the query finishes in <0.01 seconds, 60000x as fast as the original problem and 15x as fast as the post-ANALYZE optimization. Nice.

]]>
https://www.lafferty.ca/2008/09/17/fun-with-analyze-table/feed/ 1
mysql argh https://www.lafferty.ca/2005/10/12/mysql-argh/ https://www.lafferty.ca/2005/10/12/mysql-argh/#comments Wed, 12 Oct 2005 16:18:00 +0000 Fun MySQL problems at work today. An application update introduced a handful of regularly-run but poorly optimized queries — full table scans, temporary tables, the works — on a box that was already pretty heavily taxed. The end result was a run queue of 20-60 when things were bad, alternating with inexplicable periods of idleness. Cranked up temporary table and cache sizes to no avail; everything got a little bit faster, but context switches went through the roof and things were still getting hung up.

The problem was lock contention — we’re still on MyISAM and occasionally a query would obtain a read lock on a table and hold it for 30-120 seconds. In that period a query would request a write lock, and after that some other queries would request read locks. But MySQL prioritizes write locks over read, so all of the queries requesting read locks had to wait for the writing process to get its lock and finish its write before they could proceed. Get a few of the problem queries in that queue and everything would repeat.

Luckily I remembered seeing a switch for that behavior — setting low-priority-updates prioritizes read locks over write. The writing process still has to wait for the pathological reader to release its lock, but in the meantime other processes get concurrent read locks and go about their business. Load average down to 4-8, which is pretty much all application, and context switching down to levels below what we saw before the update. Yay! I wish I’d made that change ages ago.

I seriously need to start planning an InnoDB migration soon (even with Oracle’s purchase of Innobase).

]]>
https://www.lafferty.ca/2005/10/12/mysql-argh/feed/ 3