Status

This page is updated manually with status of current and recent events.

(Times are US/Arizona UTC-7)

Current status is: Green: I am completely operational, and all my circuits are functioning perfectly normally.

20161127 @ 11:53PM – Had a server get very confused about who it was allowed to talk to. If you saw weirdness, that’s what it was. Instead of it using DNS to figure out what it’s supposed to do, we’re now hard coding it. (Better to be bulletproof than easier to manage.)

20161122 @8:15 AM – Looks like Charter fixed their DNS servers. All looks good on their network.

20161116 @1:00PM – For about the last week, there’s been some significant slowness between Telus and the greater Internet. It’s impacting site performance for folk in (at least) Western Canada. If desired, we can migrate client sites hosted in Canada to our new Phoenix facility (without downtime), as it is not affected.

20161115 @ 1:18AM – Server is back up. All sites testing fine.

20161115 @ 1:10AM – We’re going to do a quick memory upgrade in one server. 5 minutes downtime tops.

20161114 @ 3:29AM – Everything has been stable. The replacement cable has had zero packet errors – even after we performed some artificial extreme load testing on it. With the fixes to the network and the interface upgrades, backups finished in 2.5 hours instead of the usual 3.5 hours.

20161113 @10:45PM – (Sorry for the late update) Our backup on-call tech went to the data center while our primary tech attempted to remotely rebalance traffic & thwart the DDoS attacks. Those efforts were slowly successful, but while at the data center the tech discovered an intermittent fiber optic cable was causing the traffic on the same server to flap between the primary link and secondary link under load.  That cable has been replaced.

The secondary link was not fast enough to carry all the traffic of the server under load. It has been upgraded from 1Gbps to 10Gbps to match the primary link. Tonight, we’re doing the same with one other server with a slower backup network link. No downtime is expected.

20161113 @7:25PM – Sorry for the late update – both the TS office & internet connection for the on-call employee lost internet access. Access regained via hotspot.

20161113 @7:15PM One Server in Phoenix is overloaded. We’re diverting some of its traffic elsewhere.

20161112 11:50PM – Yoast seems to have fixed the WordPress SEO plugin.

20161109 @ 3:03PM – We’re (hopefully) disabling updates on the wordpress-seo plugin for the moment. It has a bug that causes sites to white screen on PHP 7.0 – our default version.

20161108 @ 6:05 PM – Server is back up

20161108 @ 5:55 PM  – Bad drive was causing a server to slow – pulled it and am bring the server back up

20161108 @ 8:22 AM – All network interfaces up.

20161107 @ 7:54 AM – At least one network connection is live for all sites. As part of the next maintenance window, we’ll also divide traffic across both NICs instead of having active/failover like we are now. One Provider on one & the other on the other.

20161107 @ 7:32 AM- The network dropped on 1 server in Phoenix – we originally thought it was 2, but just read it wrong. The issue was a bug in a Brocade 10G Network Interface that caused it to stop passing traffic on all VLAN. We’re going to be swapping those NICs for Intel ones at our next maintenance window.

20161107 @ 9:33 AM – Everything is back up. Analyzing logs to try and determine why the node lost its networking.  (So glad we’re retiring Quebec 2 soon.)

20161107 @ 9:30 AM – Node Rebooted – two of three webhosts on node are back up. Third almost up.

20161107 @ 9:20 AM – Quebec 2 Dropped. We can see its console interface, but it’s not talking on the Internet at all. An on-location tech is investigating.

20161104 @ 6:40 PM – No further issues. Going green.

20161104 @ 2:40 PM – Filters have been installed on all Quebec servers. All looks good. The bozos are being blocked at the cost of a small increase (5ms) of latency.

20161104 @ 2:15 PM – Since that worked, we’re going to do the same blocks on WebhostQC7.  10-15 minutes until it’s in place.

20161104 @ 1:05 PM – And boom! Blocked go the bozos.

20161104 @ 12:55 PM – So much for that. Attack is back – we’re coordinating with our network provider on a fix to block the bad IP address ranges. 5-10 minutes

20161104 @ 11:30 AM – Looks like we’ve been able to mitigate most of the attack. For now. Leaving status as red.

20161104 @ 10:30 AM – WebhostQC1 and WebhostQC7 are under heavy attack. It’s coming from many different IP addresses – primarily cloud services from Amazon, GoDaddy, and Rackspace. We’re blocking the IP addresses as they pop up.

20161002 @ 6:00 PM – The sites hosted on WebhostQC5 have grown and at peak times exceed our comfortable load threshold. We’ll be bringing up new servers in the next week and will divide the sites on this server to two new ones. In the meantime we’ll be shunting what traffic we can to the CDN, and possibly moving a couple of sites to existing servers.

 

 

 

Green: I am completely operational, and all my circuits are functioning perfectly normally.

AMBER: External network issues .