This page is updated manually with status of current and recent (30ish days) events.
(Times are US/Arizona UTC-7)
Current status is: Green: I am completely operational, and all my circuits are functioning perfectly normally.
20170217 @ 2:37AM – We did a quick reboot of the server that crashed and rebooted on the 13th to update its kernel.
20170213 @ 2:07AM – Found the cause. That server is running an older kernel – one with a bug that occasionally manifests during really high I/O (like backups). It’s been updated, so will be running the correct one on its next reboot. We’ll get that done one (very late) weeknight this coming week.
20170213 @ 1:45AM – One server crashed and restarted during backups. Everything came back up, but we’re rerunning backups, and looking for the cause – will post it here if we find anything.
20170207 @ 7:15 AM – The needed migrations to get the server load down have been completed. We’re still doing migrations to empty the old Phoenix facility – that will be completed by March 1st.
20170205 @ 2:10PM – One server in Quebec is in the midst of a load spike of real traffic. Its load had been creeping up, so now seems like a fine time to move a few sites off it.
20170124 @ 1:06 PM – The email server is being a little odd about allowing logins. Not sure why yet.
20170124 @ 3:20AM – On a few servers, the web server process didn’t cleanly reset their log files as they are supposed to do each night. The issue was an updated “library file” changed which prevented the existing process from properly resetting itself. Manually killing the process and restarting it solved the issue. Estimated downtime was 6 minutes on 5 servers.
Green: I am completely operational, and all my circuits are functioning perfectly normally.
AMBER: External network issues.
RED: Zombie Apocalypse