Last week one of the ad companies a bunch of our clients use had a really hard time. However, instead of denying the problem, and putting the blame on something else, Mediavine owned it, and are doing everything in their power to make things right.

This is too rare in their industry, and we have been so impressed by how they handled the situation, that it’s worthy of a post. (Especially since we grumped about the problem when we were dealing with it.)

For many of our clients, their site is how they earn their living. Some earn revenue through their stores, but most use advertising to earn revenue. Unfortunately, placing ads on a site introduces a lot of complexity to allow for tracking, changing out ads, and preventing fraud. And with greater complexity comes a greater chance of failure. (This complexity can often manifest as slow page load times, but that’s a topic for a different day.)

In this case the issue was their plugin. One of the things it does is to schedule daily and weekly tasks. However there was a bug where with every hit to the site, it added new daily and weekly tasks for clients’ sites to do.

Uh oh. You can imagine what happened with very busy sites. Yup, they were very busy scheduling new tasks, and then trying to run all the scheduled tasks. One site had over 900,000 tasks scheduled. The hamsters powering the servers were very busy – we noticed higher database & server loads than normal, had just figured out the culprit, and were executing a fix when a client forwarded the most amazing message with the subject…

“IMPORTANT – ACTION REQUIRED IF YOU ARE RUNNING MCP PLUGIN!”

Whoa! An email from Mediavine sent to their clients admitting the problem, giving a quick explanation of what happened, providing instructions on how to fix it, offering help if necessary, and promising to make things right.

Wow! That’s awesome if true! Yeah, I admit it, I’m a bit cynical at times. Those times being days which end in “y”.

The instructions matched up with what we had been doing – turn off the plugin and clean up all the scheduled tasks. Score one for them in properly analyzing the problem and providing useful info to fix.

They continued to communicate well regarding what pitfalls folk might encounter trying to fix the problem, what each subsequent iteration of the plugin did, and ideas for how to resolve them. Score another point for them.

We were pretty worn out from cleaning things up, so I wasn’t tallying the points they were scoring, and I somewhat sarcastically (You, sarcastic?) tweeted if they’d at least send us pizza & emailed a crankyish note to see if the “make things right” included us, as we really didn’t want to bill clients for spending so much time cleaning up something not in their control.

Amber at Mediavine responded pretty quickly in the affirmative. She wasn’t defensive and didn’t try to put any spin on what happened. So different from others we’ve dealt with!

Okay, now I’m starting to feel more confident about them. They’re doing the same things we try to do in the event of an emergency: “Be transparent about issues, get fixes out, keep communicating, and figure out how to not have the same problem again.”

Too many companies don’t realize how important this is.

People don’t really expect perfection. We all gripe and grumble when things aren’t working right. But so long as we’re treated with respect and know that the people dealing with the problem are doing their best to get it solved, are communicating the issues and possibilities, and then making an effort to address consequences afterward, it’s easier to accept those (hopefully few) imperfect times.

Mediavine has done all we could want, and more.

Troubleshooting live server / software events is a bit like dealing with a high-speed tire blowout while driving a school bus full of kids – but thankfully, without the physical risk.

  • First you need to get the bus safely stopped on the side of the road.
  • Then you need to figure out how to change the tire – maybe enlist some passengers to help.
  • Simultaneously, you need to reassure everyone that they’re safe & will be back on the road soon.
  • Then, before going full speed, you test that the new tire can handle freeway speeds
  • And then afterwards, get everyone cleaned up, and figure out if there are better tires that won’t blow out.

As a result of how Mediavine handled the plugin ‘blowout’, we’re now comfortable recommending them to clients looking for an ad service. They earned our trust. And we’re looking forward to working with them on ways we can better care for our mutual clients.

-TS Jay

P.S. Yes, we’re getting pizza. 🙂

The Mediavine control panel has been crushing sites running it.

We’re happy to say that it didn’t bring any of our hosted sites down, or slow servers down significantly. 🙂 But it did create a major mess for us, and sites running it.

Version 1.4.x has a terrible bug that continually adds cron (scheduled) jobs to a site’s list of things to run. Ultimately, it chokes the site’s and servers database, plus causes the server to run the daily & weekly jobs multiple times a second.

They sent out an emergency alert to their clients about it. The alert says to deactivate the plugin, delete it, reinstall the newest version  (1.5.1), and then activate it.

We’ve followed the process for our clients, and have at least installed (but not activated) version 1.5.1.

Since the plugin added so many lines to the cron part of a site’s database, not all plugin re-activations have been successful. If your site is one of them, please let us know, and we can try to manually clean out the database to see if that fixes it.

-TS

P.S. Send chocolate.

Hi all,

If you use some of the most popular themes and plugins, it’s been a rough couple weeks in WordPress land.

First the Genesis theme had a bad update – 2.5.1. Next a new major version of WordPress was released – 4.8. That’s not a bad thing in itself, but it takes a bit for plugin authors to catch up with changes in WordPress. Automattic released a new major version of Jetpack. And finally the Social Warfare plugin broke hard.

If you want to read about the Social Warfare bug, here’s a link to the report – http://bit.ly/2soiNL9. We recommend disabling the plugin until the next update. (For those keeping track, this is the second time the plugin has broken in a bad way.)

We’re not convinced Social Warfare is the only source of the 50x errors (Server Unavailable or Internal Server errors), so will continue to seek out the gremlins.

Thanks for hosting with us,
-TS

Hi all,

After a crazy last 7 months which saw us move hundreds of sites behind the scenes to our new Phoenix facility, while simultaneously growing faster than ever (thank you!), we’ve fallen behind on some important projects. Plus, we’ve worked so many late hours, ‘Walkers’ have been telling us how terrible we look.

The Pause That Refreshes…

To that end, we’re mostly closing the front door to new clients until July 1st to allow us to catch up on projects & recharge our batteries. Keeping you & your sites happy is our highest priority, above even growing the company.

This pause won’t change anything for existing clients. We’ll still happily add whatever sites you need, and provide the same (hopefully) fantastic service you deserve. (The same applies to our web developer friends.)

For folk not yet hosting with us, if you have an existing site, and can hold off for a couple of months that would be wonderful. As we do all the work to migrate your site from your old host to us, they’re terribly time intensive, so pausing them is the easiest way to free up time. If you’re starting a brand new site, ping us through our contact page.

What we’ll be up to…

Here’s a partial list of the projects we’ll be working on:

  • Upgrading the TS web & mail server to SSD – 75% complete
    This will speed up our own sites, and make email syncing much faster – especially for webmail.
  • Roll out PHP 7.1 as an option – not yet started
    PHP 7.1 is about 7% faster than PHP 7.0. This will reduce the time it takes for our servers to do the work to build your pages. (That’s one piece of many that affect page load times.)
  • Improve our CDN – not yet started
    For our busiest sites, and those with heavy traffic in Europe, we’ll be rolling out a CDN server in Paris. And behind the scenes, we have some improvements planned to eliminate some quirks common to CDNs.
  • Implement a CPU pool for super-viral posts – 25% complete
    The goal is to make it so we can quickly bring in extra servers to help share the workload when a post goes so viral that it overwhelms the site’s normal server.
  • “Cold Spare” server – longer term project, but worthy of putting here
    This is a spare powered-off (“cold”) server that is cloned from the ‘real’ server to allow recovery more quickly than restoring from backups. (It’s effectively a third type of backup.)
  • Plus a whole lot of other behind the scenes improvements to make it easier to identify and block bozos, manage your sites, and keep things chugging along happily.

Remember, check our live status page if something ever seems wrong – bookmark it. It’s where we’ll post our hot takes about incidents and issues we see (not just ours), and items not worthy of a post.


We are hiring a junior to mid-range IT position. If you know a sharp, smart, geek with about 2-3 years of Linux experience, who is reliable, friendly, and (somewhat) patient, we’d love to talk with them. Preference is for someone in the Phoenix metro area, but remote is okay too.


Thanks so much for hosting with us!

-TS

 

 

Summary:

The TechSurgeons web & email server will be moved to our new facility March 9th, starting at 11PM Arizona time. Client sites will not be impacted by this move – just *.techsurgeons.com sites and services.

Estimated downtime is 4 hours. Status will be updated as we can at www2.techsurgeons.com and on Facebook.

If the email server move goes well, we may update the web servers and reboot them. Estimated downtime is 10 minutes each, and would affect client sites.


All the details:

We’ve been dissatisfied with the Phoenix facility we’ve colocated our servers at for a while. The original company we chose had been acquired, and the new company has not been performing. Which is why we’d been putting almost everyone on servers at our Quebec facility. (Which we still like for sites where most traffic goes to the UK/Europe.)

On October 7th, we started moving into our new Phoenix facility with all new servers. Since then, we’ve migrated over 350 sites from the old facility to the new Phoenix one, and an additional 100ish sites from Quebec.

All of these migrations have been done one at a time, by hand, completely behind the scenes to minimize any impact on you or your site’s visitors. While incredibly time consuming, it was more successful than we expected, with only a few sites having issues. (Early on, we kinda overloaded the network connection on a couple servers, and had to bump them from a gigabit connection to a ten gigabit connection.)

One of the advantages of the new facility is that we can add additional ISPs for network redundancy. Almost all sites now have multiple “phone numbers” split between two providers – “multi-homed” in geek speak. If someone tries to connect to a site on one, and it’s not working right, browsers are smart enough to try the second.
This has worked amazingly well, but confuses some monitoring tools (sorry Pingdom) which don’t know how to deal with “multi-homed” sites.

Now that all the production web sites have been moved, we’re down to the stuff that can’t be moved without downtime.

The primary goal for Thursday’s maintenance window is of our original infrastructure server. It’s been 711 days since that server has last been rebooted. That’s been too long, but we had a bad experience last time, and are wary of rebooting it unless we’re physically there. The poor thing has been a little flaky the last couple weeks, but we’ve been keeping it going until Thursday.

This server is responsible for:

  • mail.techsurgeons.com
  • webmail.techsurgeons.com
  • portal.techsurgeons.com
  • www.techsurgeons.com
  • support.techsurgeons.com

We will start the shutdown process at 11PM. It will take about 30 minutes to shut the server down and disassemble it. Then 30 minutes to drive it to the new facility. Once we get it there, figure 45 minutes to reassemble and mount it. And then something between 15 minutes and 2 hours for it to come back up.

Yes, we’ll drive carefully, have a fresh backup of the server, and have extra hardware on hand in case the server doesn’t come back up correctly. Those should mitigate the worst case scenarios. (In a future maintenance window, we’ll upgrade the server itself.)

During the downtime, mail destined for us should just queue up. The email protocol says that sending mail servers should retry sending mail for at least 3 days.

If the mail server move goes well, there’s some other maintenance we’d like to do, but nothing is as urgent.

Please understand that we won’t be very reachable during this time. We’ll put up a status page at the soon to be created www2.techsurgeons.com, and try to post updates on FB to keep folk informed.

Wish us luck!

-TS

Emergency Update: Security flaw in NextGen Gallery Plugin – all sites updated

Uncategorized

Hi all, We’ve updated all hosted sites with the NextGen Gallery plugin installed to the latest version. There was a significant security hole in older versions, which would have allowed an attacker to retrieve info from the site’s database. More here: https://blog.sucuri.net/2017/02/sql-injection-vulnerability-nextgen-gallery-wordpress.html -TS    

Read the full article →

Problem with latest Yoast SEO plugin (wordpress-seo)

Uncategorized

Howdy all, The latest update to Yoast SEO will break sites running on PHP 7.0 – our default version. We’ve (hopefully) disabled updates for that plugin by changing permissions on the folder to “read-only”.  As soon as we hear of a fix, we’ll change its permissions back, so it can be updated. -TS    

Read the full article →

How your computer can find a website by its name. – DeGreekifying Technology #1

Uncategorized

This is the first in a series of posts designed to help explain how computers and the Internet work in plain English. This series was inspired by our friend Mercedes M. Yardley. Have you ever wondered how your web browser can find a website like Google automatically? You can’t just key a name into your […]

Read the full article →

DeGreekifying Technology

Uncategorized

I was talking with my friend Mercedes M. Yardley a few days ago about all the spam she receives and when I asked if her ISP used any tools to block some of it, she didn’t know and said that “computer talk is Greek to me”. Now she’s a brilliant author that could master Greek […]

Read the full article →

Welcome to the new Techsurgeons Site

Uncategorized

Thanks for taking the time to visit us. After helping many of our customers implement websites of their own, we figured that we should redo our archaic site.  The new site is pretty basic.  We’re using WordPress with the wonderful Thesis theme.  We chose WordPress and Thesis because we needed a simple solution, we’re far […]

Read the full article →