Category Archives: Maintenance

RESOLVED: Nationwide Comcast Outage

At 11:51 am CST (GMT -6), one of the major ISPs in the USA, Comcast, experienced a  peering issue that caused a portion of the Internet to become inaccessible to Comcast users.  Comcast has such a wide reach that both individuals and businesses were impacted.  To those affected, it seemed that some websites went off-the-air and simply did not respond.

Although the routing issues were resolved by 12:30 pm CST, we are still hearing from customers who are responding to delayed alerts.

The good news is that only a small portion of the Internet was impacted.  We did not see a significant dip on our own charts, and traffic from all other carriers was not affected.

Check http://downdetector.com/status/comcast-xfinity/map/ for comments from those impacted.

 

RESOLVED – Network outage: 1-5 minutes – 9/3/2015

20150903-xo.outage.map

Thursday September 3, 2015: 14:30 – 14:35 CST (GMT -6)

A Tier 1 backbone provider, XO, experienced trouble in the Midwest region of the USA, impacting BGP peers in Chicago area.  As a regional hub, traffic disruptions could be seen in a wider area.

Parts of the eBoundHost network experienced an outage of up to five (5) minutes while our traffic reconverged onto other providers.

We are working with XO to ensure service is fully restored before establishing peering.

current status: https://downdetector.com/status/xo-communications/map/

 

Comcast “blinks” and we get calls!

A few minutes ago Comcast went down.  Boy, did we start receiving calls from worried clients who thought we were at fault!  Everything looks good on our side folks!  Nothing we can do to help until Comcast gets their network back up.

You really don’t appreciate a good ISP until you realize how rarely they have issues.

Final maintenance window

This weekend, Saturday 11/22/2008, we have a scheduled outage.  Some customers are going to receive the following text.  Just to make sure that everyone has a chance to know what is happening, here is an update.

If you don’t hear from us in the next 24 hours, then your account is not effected.

==============================

Dear eBoundHost Customer,

This upcoming Saturday night, November 22nd, 2008, the server which hosts your account is scheduled for maintenance.  The service window is between 9pm (21:00) and 3am (03:00) with an actual expected outage of 30-60 minutes.  This impacts, shared hosting, vps and dedicated hosting customers.  All times are in the Central Time Zone, GMT -6.

Over the past few weeks some servers have been upgraded in anticipation of this event.  We appreciate your patience through this final scheduled outage.  After this maintenance window, we do not foresee any more wide ranging outages, only the occasional kernel upgrade which requires a quick reboot.

Due to phenomenal growth in the past two quarters, we are upgrading hosting facilities.  This will allow us to provide faster throughput, lower network latency and ability to scale with demand.  Additionally, our equipment is being consolidated into a formation that will make it more resilient to power failure and less susceptible to overheating.  In a nutshell, things are getting even better.

As always, it is recommend that you keep a full backup of your entire account, including website, email and database.  We provide excellent tools for backing up your account, check your control panel.  Data loss is not anticipated but you can never have too many backups!

VPS maintenance complete

Several VPS servers were down tonight for software and hardware maintenance, if you did not receive a notification ahead of the maintenance, your system was not effected, feel free to stop reading.  Everything is now back online and arrays are happily rebuilding.  Most servers experienced 15-20 minutes of total downtime but one server in particular took over 45 minutes to finish the process.

Beyond the outage itself, customers were not impacted.  No data was lost and there is nothing you need to do post upgrade.

Great success!

Never a dull moment

Not a week goes by without some kind of emergency: hackers, backup server woes, operating system issues, hardware trouble, software trouble, spammers, integration of new technologies. Round and round it goes.

So, to start with Hackers. A long long time ago, EBOUNDHOST acquired a smaller hosting outfit to broaden its offerings with cPanel. Up to 2005, EBOUNDHOST was a Plesk only outfit. cPanel and Plesk are two competing hosting control panel systems that run on Unix-like servers. Both systems have their raison d’être, one is better suited for power users, another for SOHO and non professional website owners.

Unfortunately, one of the acquired cPanel servers had a serious vulnerability which was inherited with the machine. The root-kit survived our admins’ sweeps and lock downs and lay dormant for at least a year. When the time was right, our friendly hacker, or should I say `cracker` (hackers generally don’t damage systems), sprung into action. When the situation was finally under control, several clients no longer had their databases and files were missing. Unfortunately, the attack was scheduled in a time when the server backup was in progress and corrupted the backup. This was a glaring oversight, and our team took ownership of the problem and helped our customers rebuild their websites from old backups with the help of pieces recovered by the Data Recovery procedure.

After this event, a new backup strategy was deployed to production almost immediately. Client data is now archived in snapshot style for several weeks on our new Backup Server cluster. All of our Shared servers and many Dedicated/VPS hosting customers make daily backups to this system. Additionally, databases are being archived in a separate structure. Whereas previously, to recover a single client, an entire server backup had to be unwrapped onto a dedicated machine and then moved back into place; one of our techs can now mount the image and copy the files back into place within minutes. This is possible due to some very cool technologies that became available recently, but this is geek talk.

In real world terms, this requires a tremendous amount of storage, lots of spinning hard drives RAIDed together into mammoth terrabytes of backup space. But it’s never a dull moment, just two months later, we’re almost out of room.

Not all emergencies are of the bad kind some are just exciting opportunities, but one common thread emerges, in hindsight they are all valuable learning experiences. Once you pass one hurdle, the next one seems more approachable.

 

more work on server ORANGE

server ORANGE is having serious issues with it’s RAID array, hardware is being replaced and rebuilt. Techs are working on it as I’m writing this message. There may be a series of brief outages, but the system should be completely solid during business hours tomorrow morning.

It is 10:35 PM CST on Thursday, February 2nd, 2007

emergency server outage

Server ORANGE is going to be brought down for emergency hardware maintenance in two hours (Feb 15, 2007 @ 1pm). The outage should last for no more than 20 minutes, possibly less.

My apologies for doing this in the middle of the business hours but my tech says it’s a real emergency.

Network Maintenance @ midnight

Last night, January 25th, 2007, there was a brief outage that effected our entire network at 2 am – 2:30 am.

Our network people had to take down both of the redundant gateway switches for maintenance (big iron 1 and big iron 2) resulting in all external traffic completely stopping. The switches were dropping packets and had to have some hardware replaced.
Some of our customers did not know what this was a scheduled outage and we were flooded with email as soon as the services were back up. So I want to take this time to point out the Network Status page which has a running schedule of all upcoming maintenance (network and server).

The vast majority of such repairs do not result in outages, since just one switch at a time is usually being worked on, but this was a very special event. They explained why both of the switches had to be worked on at the same time, but not being a network person, it was a bit over my head.

Server ORANGE upgrade

If I could pick one thing to do all day long for the next 10 years, without a moment’s hesitation, it would be tinker with server hardware. I’m talking about the kind of stuff that your IT department geeks are too ashamed of talking about by the water cooler.

Unfortunately (or fortunately) my role in the company has drifted into another arena and I rarely get to geek out. Yesterday (Wednesday, Nov 1st) was an exception. Shared server ORANGE, is a mature server that has been running for almost two years without any major issues. It hosts several hundred websites and is by no means overloaded. But in the past two weeks it was having high server load that could not be explained.
While only a handful of customers noticed that email was arriving slower than usual, our network admin was busy putting out this “fire” by following every procedure to decrease load. This helped stabilize the situation somewhat, but it was a patch at best, and by no means a solution.

When all possibilities were exhausted, I was called in to make the final call. My directive was to upgrade the hardware and take our users out of the line of fire, and then troubleshoot the old parts to see what is causing the problem.  To make the story a little shorter, that’s exactly what they did and it made a tremendous difference.
So if you ever wondered what some of these systems consist of, here is a base line shared hosting system that we use today:

  • 2 Dual Core XEON 5130 (2.00 GHz) Woodcrest CPU – 4 Processors total
  • 1366 MHz bus speed
  • ECC Registered RAM
  • Hardware RAID Controller for SATA Drives
  • WD Raptor 1500ADFD hard drives with 10,000 RPM drive speed fastest in its class

Normally the shared servers use SCSI hard drives which are much faster than SATA, but to ease the transition of an already working server, WD Raptors were used because we could simply take out one of the old drives and rebuild the array onto the new (much faster) Raptors, one at a time.

It sure worked well! No delays of any sort!  And it turned out to be a bad drive in the array that could not keep up with the rest of the healthy hard drives and was slowing down write operations.
So the moral of the story, most problems disappear if you just throw money at them.

One rough Monday

So while last Friday went without any problems, today is Monday so something was to be expected. No wonder that it is supposedly the most common day of the week for heart attacks.

As the day began and all system components began stressing out with highest load of the week (monday morning) the weakest link failed, the power unit. All 4 RAID drives were spinning and CPU usage (and power consumption) shot through the roof as the email began to pour in and tens of thousands of little files were being written to the mail system. But to be fair, these things happen, nobody is immune from hardware failure, this is why I always stress having a full backup of all your content.

In little less than an hour, everything was back to normal. RAID array checked out, file system checked out, temperature nominal on the new power unit. All was well until the hour-long outage began to catch up with us in the form of an avalance of email that was not received on time. When it began to pour in, it really hit hard.

The bad news is that it took a few hours to work through the queue and there was a delay with real-time messages.  The good news is that as a result of this experience a new mail system is being implemented that will eliminate 90% of all spam before it arrives at the hosting servers. This mail will be filtered out as it hits the network and will reduce the stress on machines that have other tasks to handle such as serving websites.

Testing will begin early next week and we should be fully implemented before end of the month.

Server maintenance WHITE and ORANGE

Dear eBoundHost Customers,

This is an emergency notification to let you know that servers WHITE and ORANGE which host your website and email, have been upgraded overnight to the latest software to patch up various critical vulnerabilities.

While the majority of users will experience no problems, we ask that you review your websites to make sure that there are no apparent errors.

There are several new and very exciting features in your control panel that were not previously available, and some of the older features are now easier to use. Please log in to familiarize yourself with the updated interface.

Some users may need to retrain their spam filter as it may have been effected by the upgrade. We encourage you to visit the anti-spam tutorial and follow these steps:
http://www.eboundhost.com/helpcenter/tutorials/spamfilter/