Comments on: mysql replication

By: Rich

Rich — Thu, 15 Apr 2004 09:32:53 +0000

I figured you’d show up eventually. :-) Have I mentioned that they’ve got me working on HP/UX and Win2k as well now? Everything’s going pretty well, though. It’s odd to have Sun kit under my desk again though.

(And yes, you should, it’s a pain to have to unscreen your comments every time. :-) Or open a new one, since you don’t need invite codes anymore.)

By: anonymous

anonymous — Thu, 15 Apr 2004 04:00:34 +0000

Reading this, I suddenly am reminded exactly *why* I’ve spent all day studying magnetic fields in preparation for an upcoming exam.

I dance the happy dance of disregard! Wheeee!

(I *really* should reactivate that old LJ account)

By: Rich

Rich — Thu, 15 Apr 2004 00:26:37 +0000

It randomly broke on us for long enough that we stopped trusting built-in things and started reinventing wheels, which is what I’m trying to undo, because the breakage seems to have stopped since we moved from 3.x to 4. We’re running 4.0.14 across the board right now.

We’ve tried a few different ways of monitoring slaves in the past. Our usage means that there is a guaranteed constant stream of updates, and that if there isn’t, the replication alert will be lost in all of the other significantly more important alerts we’re getting. :-) Right now we’re watching that Read_Master_Log_Pos and Relay_Log_Pos are incrementing or rolling over and that Slave_IO_Running and Slave_SQL_Running are “Yes”; in the past we’ve watched the questions counter in “mysqladmin stat” with some success, but wanted the finer granularity (particularly since some experimental maintenance tasks occasionally involve stopping SQL but not IO).

Incidentally — and mostly for my own reference since I’m about to go to bed and I need to remember this tomorrow — what I ended up with for bringing a broken slave back in sync was:

On the slave, if necessary: SLAVE STOP
copy the tables on master and slave to a local staging area
On the master: FLUSH TABLES WITH READ LOCK (downtime begins)
rsync the tables on master to local staging area again
master: UNLOCK TABLES (downtime ends, should be seconds long)
master: RESET MASTER
rsync the master’s staging area over the slave’s
slave: FLUSH TABLES WITH READ LOCK
rsync from the slave’s staging area over the slave’s datadir
slave: UNLOCK TABLES
slave: RESET SLAVE
slave: SLAVE START

(Since all of our slaves are hot spares or backups, we can afford to RESET MASTER and bring them all back up together, instead of counting on starting replication at the right point. That doesn’t mean I won’t try to refine things further, just that that was the minimum requirement that I worked out right now.)

By: Rich

Rich — Thu, 15 Apr 2004 00:12:05 +0000

mysqlhotcopy is locky, but anything that copies the full set of data out while maintaining something vaguely resembling integrity while still keeping the whole lot on disk has to be locky. It’s not sqldumpish, though — it’s copying the actual files out of the datadir (which is why it’s no good with InnoDB). It’s also thoroughly frightening perl (although it’s Tim Bunce’s, and later Jeremy Zawodny’s, thoroughly frightening perl), and doubly so once you make it work with rsync, but it works for taking snapshots.

I like the dummy machine idea, and we have such a machine already, but I can’t go introducing an extra site into the picture — the site with the dummy machine is the unreliable one that led the live machines to be colocated elsewhere (one with some yahoos in Fisher Plaza, incidentally.)

Mostly I was having a hard time convincing myself that these two well-documented and well-established features were both impractical and that I should stop paying attention to them. :-)

Nonetheless I think I’ve managed to get rid of some messy manual steps in our process tonight. There was a great deal of wheel-reinventing going on here.

By: brad

brad — Thu, 15 Apr 2004 00:02:45 +0000

What’s wrong with MySQL replication, as far as you’re concerned?

And what’s so hard/different about replicating InnoDB? I’ve replicated every combination of {MyISAM,InnoDB} to {MyISAM,InnoDB} without problems. (well, obviously can’t do InnoDB to MyISAM if preserving transactions…)

How do you find MySQL randomly breaking? With what errors?

By: Rich

Rich — Wed, 14 Apr 2004 23:59:45 +0000

It’s struck me as something that had to be there for some general sense of completeness (“What, you can only sync up a slave out-of-band? We need to fix that, practicality be damned.”)

I think the mysqldump problem is a different cause to the same end. Yes, there needs to be a tool to go from a database to a bunch of SQL that can recreate the database, but that doesn’t mean that you want to do that.

On the Big Database side, one common practice seems to be backing up snapshots infrequently and redo logs regularly, which is a practice you don’t hear about much on the OSS database side of things. (I doubt many think about MySQL binlogs as “redo logs”.) If I had tape onsite I’d think about that, but tape is a few online backups away which makes things that amount to “just keep writing this to tape indefinitely” a bit inconvenient.

By: euqsam

euqsam — Wed, 14 Apr 2004 23:29:51 +0000

I always saw LOAD DATA FROM MASTER as really intended for very small tables. I have always wondered, though; if your tables are that small, what's replication doing on your menu? I like the dummy/backup machine idea. Pity we don't have machines for that, but it's a fantastic idea....

By: euqsam

euqsam — Wed, 14 Apr 2004 23:24:54 +0000

Amen to that, brother. Lock the tables. Copy the data. Unlock the tables. Move the data. Start the slave. Repeat. There is no other good way, I think. I just spent this past weekend building and rebuilding slaves, checking on replication status, tweaking indices, etc. I've come to these conclusions:

Machine loads in the 300s affect replication pretty seriously
Maybe we should STOP THAT
We finally stopped that.
Replication on MySQL is really pretty good, for the price
...especially considering the (free) competition, which really isn't in this case.

We're on MyISAM as well; InnoDB looks tasty but the headaches involved in replication with InnoDB (at least for free) are not as nice. Your setup sounds eerily similar to ours, even in terms of the database size I was dealing with. Hey, while we're on the topic, do you find MySQL replication randomly breaking on you? How do you monitor your slaves? Oh, and what version are you running?

By: brad

brad — Wed, 14 Apr 2004 23:21:29 +0000

I thought hotcopy was pretty locky and sqldump-ish.

Just replicate to a dummy machines for backups, and also use the dummy machine as a machine you can use to start up fresh slaves. So you have a new machine on the WAN that needs to be brought into the replication cluster: rsync the dummy data to it, along with the master.info, and start it up.

I’ve never used LOAD DATA FROM MASTER. They keep talking about making it not suck, but I haven’t heard any progress on it in a while.