Minimize downtime in Azure?

There was an outage in the European data center today with respect to SQL Azure. Some of our clients got hit and had to move to another data center.

There was an outage in the European data center today with respect to SQL Azure. Some of our clients got hit and had to move to another data center. If you are running mission critical applications that cannot be down, I would deploy the application into multiple regions.

DNS resolution is obviously a weak link right now in Azure, but can be worked around (if you only run a website it can be done very simply using Response. Redirects or similar) Now, there is a data synchronization service from Microsoft that will sync up multiple SQL Azure databases. Check here.

This way, you can have mirror sites up in different regions and have them be in sync with SQL Azure perspective Also, be a good idea to employ a 3rd party monitoring service that would detect problems with your deployed instances externally. AzureWatch can notify or even deploy new nodes if you choose to, when some of the instances turn "Unresponsive" Hope this helps.

Thanks for the reply, deploying to multiple regions seems to be the only solution that I have influence over, although it's not something they talk about in the docs. I guess this is still an immature platform and will be sometime before it gets to the levels of reliability that other cloud providers give. – Peter McEvoy Dec 10 '10 at 9:59.

This is just about a programming/architecture issue, but you amy also want to ask the question on webmasters.stackexchange. Com You need to find out the root cause before drawing any conclusions. However.My guess one of two things was the problem The ISP connectivity differs for the test system and your production system.

Either they use different ISPs, or different lines from the same ISP. When I worked in a hosting company we made sure that ou IP connectivity went through at least two different ISPS who did not share fibre to our premises (and where we could, they had different physical routes to the building - the homing ability of backhoes when there's a critical piece of fibre to dig up is well proven Your datacentre had an issue with some shared production infrastructure. These might be edge routers, firewalls, load balancers, intrusion detection systems, traffic shapers etc. These typically are also often only installed on production systems.

Defences here involve understanding the architecture and making sure the provider has a (tested!) DR plan for restoring SOME service when things go pair shaped. Neatest hack I saw here was persuading an IPS (intrusion prevention system) that its own management servers were malicious. And so you couldn't reconfigure it at all.

Just a thought - your DC doesn't host any of the Wikileaks mirrors, or Paypal/Mastercard/Amazon (who are getting DDOS'd by wikileaks supporters at the moment)?

Thanks for your reply, I'm using Windows Azure so it's not an ISP in the classic sense - it's cloud computing so I hope they have that kind of redundancy. I doubt any of the wikileaks sites are on Azure and doubt that Mastercard etc use Azure – Peter McEvoy Dec 9 '10 at 16:44 Can I move the question or do I need to old-school copy/paste to the other stackexchange sites? – Peter McEvoy Dec 9 '10 at 16:46 True, but it's still internet service provision :) I don't know about moving the question.

I didn't realise it was an MS DC, rather then one you had contracted with directly. I think either of my possibilities are still, er, possbile. – Paul Dec 9 '10 at 16:52 Agreed, your suggestions are possible and I appreciate them, but there is little I can do about that from a programming or deployment sense WRT Azure.

I would like to see if there are any dedicated azure strategies that would mitigate this downtime. – Peter McEvoy Dec 9 '10 at 17:02 How does Azure manage geographical distribution? It really shouldn't be vunerable to any single DC having problems - although you may need to co-locate the application and the database just to get RTT low enough.

I'm curious to know what the problem turns out to be... – Paul Dec 9 '10 at 17:04.

As you're deploying to Azure you don't have much control about how SQL server is setup. MS have already set it up so that it is highly available. Having said that, it seems that MS has been having some issues with SQL Azure over the last few days.

We've been told that it only affected "a small number of users". At one point the service dashboard had 5 data centres affected by a problem. I had 3 databases in one of those data centres down twice for about an hour each time, but one database in another affected data centre that had no interruption.

If having a database connection is critical to your app, then the only way in the Azure environment to ensure against problems that MS haven't prepared against (this latest technical problem, earthquakes, meteor strikes) would be to co-locate your sql data in another data centre. At the moment the most practical way to do this is to use the synch framework. There is an ability to copy SQL Azure databases, but this only works within a data centre.

With your data located elsewhere you could then point your app at the new database if the main one becomes unavailable. While this looks good on paper though, this may not have helped you with the latest problem as it did affect multiple data centres. If you'd just been making database copies on a regular basis, that might have been enough to get you through.

Or not. (I would have posted this answer on server fault, but I couldn't find the question).

Thanks for the reply. I was aware of the Sql Azure issues you refer to as it hit us as well at the weekend and we were assured that there was a manual fix in place. But the incident above was separate and limited to West Europe only and did not seem to be anything to do with SQL azure (that was still available).

I appreciate that the only solution that I have any influence over is to have our own fail-over instance in a different region and do something clever with DNS. – Peter McEvoy Dec 10 '10 at 9:56.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions