Dreamhost: a comedy of errors

You may have noticed that this place was hard to get to for the last week or so.

I’ve hosted this blog and a bunch of other websites on Dreamhost since 2004, and I’ve referred enough people to them that my hosting there has been free for years. But most of those four years have been spent just below the “I need to do something about this” level of dissatisfaction.

As of this last week, though — which featured a 12h planned outage followed by the rest of the week trying to recover from NFS problems which left sites unresponsive or just plain missing — I’ve had enough, and I’ve bought a virtual private server at Linode instead. I’ll post more about Linode later on, but last night Dreamhost resolved their NFS issues and I had a brief moment of reconsideration. After all, it’s free

So I brought up my support history and read through it, and once again I’ve convinced myself it’s time to move critical services away from there. But the more I read it, the more I realized I should share the highlights of my experience. Like last time with IStop, my awful Ottawa ISP, I’m left wondering why I stuck around so long!

The details are after the cut.

May 21, 2004: Apache configuration prevents files named “README.txt” from showing in directory listings. Support writes,

Can’t you just rename it to something that will not be filtered like that?

May 23, 2004: I complain that the installed SpamAssassin is ancient. Support writes:

[W]e may upgrade at some point, but you’d really want to install your own version if you want to stay current at all. […] I would definitely not suggest using 2.20 for anything at this point.

Sep 15, 2004: Payment fails:

Current Balance: -$9.94
Amount Due: $9.94
Due Date: 2004-09-15

Failure! Please correct the errors below.

Feb 5, 2005: Fileserver damage. I get five copies of a “Your data has been restored!” form mail. I point this out to support, since I assumed at that point I’d keep getting copies of the form mail forever. They reply:

One of our server clusters was having fileserver issues on Friday. The account you’re writing in from was not affected.

I point out that no, it was affected, the restore was indeed successful, and I want to stop receiving mail about it. They reply apologizing for the multiple messages but continue to insist that my data (which was missing 24h previously) was not affected.

May 8, 2005: I get mail at 2AM:

I had to disable your database chiffbb. It was using enough of the CPUs on the server to justify having its own server. If you want to continue running your bulletin board, you should consider a dedicated server.

I point out that it has been running at a constant load for months, except that recently they replaced the database server:

So, now I have no access to the data, no more access to the conuery statistics to see if something went wrong recently or if it’s been building up slowly, no idea what the problem queries were, no way to make sure the indices I needed were there — nothing at all to work with aside from “enough of the CPUs”.Nothing has changed on the forums in the past year, so I’m a bit confused as to what might have happened overnight to get your attention.

Please let me know what I’m supposed to do at this point to figure out if things can simply be scaled back, given that “blindly trust you that I have to give you more money to get my data back” is an unacceptable option.

(Incidentally, shutting things down with no grace period at 2 AM on Sunday — and Mother’s Day no less — doesn’t really seem to match the whole “we’ll be nice about it” from your conueries knowledge base page. I’d recommend either going back to a hard quota which people can compare to their usage, or giving a grace period for this sor, otherwise it is impossible for users to actually manage their usage.)

They eventually re-enable the service and tell me their MySQL guy will get in touch with me to figure out what’s going on. That never happens and I don’t hear anything about the forums again for a while.

Nov 15, 2005: One of four IMAP servers isn’t authenticating. Mail problems are the new black.

Jan 16, 2006: Someone’s added “dnsalias.com” and a dozen other dyndns.org second-level domains to their account, making Dreamhost’s nameservers (which are both their customer authoritative nameservers and their resolvers) refuse to believe that anyone else’s dyndns hostnames exist. They remove “dnsalias.com”.

Apr 26, 2006: Home directories under /home disappear on the mail server cluster, although the actual mountpoint at some long undocumented path still works.

Jun 27, 2006: “crontab -e” reports “Permission denied”.

Jun 30, 2006: Home directories under /home disappear on the mail cluster again.

Aug 3, 2006: Home directories under /home disappear on the mail cluster AGAIN.

Aug 5, 2006: Internal reverse DNS fails, and suddenly nothing can authenticate to MySQL, which has grants to ‘user’@’hostname’.

Aug 11, 2006: Home directories under /home disappear on the mail cluster. Again.

Aug 16, 2006: Home directories… yeah. Support reply begins:

I was actually going to email you to let you know that we had a problems with those machines, however, I couldn’t remember your ID.

Aug 20, 2006: The mail problem from the 16th is resolved four days later.

Oct 2, 2006: Remember back in January where someone claimed “dnsalias.com”? It happened again. They ask me to provide a full list of “the domains I own”, even though I explained what dyndns.org was in my request. At least this time they add all of dyndns.org’s domains to their list.

Feb 24, 2006: Home directories. Mail. Yep.

Mar 16, 2007: Remember back in 2005 when they shut down the whistle forums because of load? Guess what! Again I point out that the load has been constant for months and that the only change was a new server on their end. They’re less angry this time, at least, and they again reconsider:

Actually, I’m still digging into the load on this server, and the more I dig, the more I see that the throttle on your site is pointless :) I’m very sorry about that, I actually went ahead and removed the throttle as it wasn’t bringing the load down at all. I’m still looking into the load and will let you know when I pin it down. Don’t worry though, you were a false alarm, at the time, you were the busiest site, and you were the best candidate, however, the wrong one. I’m very sorry about that!

This time I convince them to put a note on my account which basically says “This is the first site you’ll notice when this cluster gets slow, but it’s not the root cause”.

May 8, 2007: Dreamhost gets listed in the CBL.

May 18, 2007: Mail server won’t accept mail. “450 Server configuration problem.”

Jun 8, 2007: Mail bounces with “unknown user”.

Jun 9, 2007: Mail server home directories again. Not the usual root cause, though:

We had an issue with our mail updating system where the server responsible for password updates cut off the password file short.

Jul 18, 2007: They accidentally disable relaying from localhost on the webserver’s mail server. Suddenly no web apps that use SMTP can send mail. Reply in part:

Looks like this was an accidental change made to the mail config when one of the admins altered something else.

Aug 11, 2007: Mail server home directories.

Sep 19, 2007: Mail spool fills up. “452 Insufficient system storage”, complains Postfix. Reply in part:

Sorry about that! Our systems normally notify us before there are any problems, but it’s been rather busy lately with larger issues.

Oct 25, 2007: pop3 logins failing. I love this part of the (auto)reply:

Note: This was an announcement due to a large support incident. Sorry if you did not get callback support.

Jan 18, 2008: Dreamhost accidentally bills customers for their entire next year of hosting, a $7.5-million error that ends up costing DreamHost over $500,000.

Mar 21, 2008: The outage that prompted this post and my move: growth issues necessitates moving my webserver’s cluster to a new data centre, which involves a 12-hour scheduled outage. Were that not enough, following the move, load hovers around 10, and disk operations take seconds to complete. The load/IO problem isn’t resolved until Mar 26.

Dreamhost will still be handy for the whistle forums and anything I need to host a lot of noncritical but big data for, but I’m looking forward to a change.

8 responses to “Dreamhost: a comedy of errors”

  1. It’s sad to hear how incompetent DH has become of late; I used to colo there and had a very good friend who was a senior sysadmin there, and they were good at the time. Requiring a 12-hour scheduled outage to move customer data between facilities can ONLY be evidence of lack of sufficient planning or proper architecture. =\ (we moved 100+ TB, several hundred domains, and 500+ Mbps Internet traffic from one datacenter to another in 2006 without an outage, so I know it can be done.)

    glad to hear you got your stability issues sorted though. :)

  2. I’m seriously considering just stopping my dreamhost service and directing my domain to a googlepages site or something. I rarely even use it for anything other than email, which just ends up being picked up by gmail anyway due to gmail’s superior spam filtering.

    I’ve had a lot of problems with them over the years too, but have only been bothered enough to submit a support request maybe twice.

  3. “Load hovers around 10”
    I’m on the ermac machine and the load there is consistently around 50 with 1-,2-hour long spikes of up to 400 every day. Unfortunately, I’m paid for until next January. :(

  4. I’ve been a customer of theirs for a few years now, and I’ve only had a couple of inconvenient problems — but then, I’m a fairly low-impact user, and mostly use it for remote backup disk space. Now, after all I’ve heard, I’d be hesitant to run anything big on their servers.

  5. I think that their main problem is that they oversell like crazy.
    This leads to poor server performance and stability.
    If you want a host that doesn’t oversell I highly recommend WebFaction ( http:/www.webfaction.com ). They have great customer service too!

  6. Rich,
    I just signed up for Linode because of this, too. I haven’t had nearly as many problems as you’ve listed, but my sites were down long enough for them to drop out of Google. I’m still working on getting all of my sites moved over, but I’ve got backups running now (S3 FTW!) so I should be able to get everything over fairly quickly.