My new favorite subreddit: Tales From Tech Support. Lots of plucky little heroes, fighting Mighty Battles for the good. I am learning how to be a better netizen and help-desk user, by inference. My favorite are the stone-bad-ass ISTJ buck-stops-here people who know how to absolutely make things secure and plan for disaster. Like this quote from one participant, which is his base recommendation for if your facility had to have -- like, HAD to have, no expense spared, no gear too costly -- nine 9s of guaranteed stabiity and uptime. Not a 99% guarantee, but a 99.999999999% guarantee of no failure. ANSWER: This is a difficult question because the answer is highly dependent upon the services offered: Web server? Database? eCommerce? How is downtime reported? (When you call to report downtime or when the service is unavailable?) What is downtime? (Cant complete a transaction? Cant access data? Cant access server? ) Who is the customer? (Businesses or Consumers?) Noting that 100% uptime isnt realistically achievable I would start with the following: Quadruple-redundant geographically distributed environments with cross-site load balancing with the following at each site: On-site water and power (minimum 72 hours) Full, independent infrastructure at each site (DNS, DHCP, etc.) Replicated servers in a virtual environment with dual NIC, dual power supplies, etc. with automatic failover (All hardware must have a hot spare and cold spare on-site) Dual business class SLA ISPs with automatic failover On-line Disaster Recovery All configuration and customer data must be replicated at each site with parity All services must be distributed, scalable and stateless Client application must automatically upgrade upon detection of client-server mismatch Must have automated scripting to deploy new systems on demand with minimal manual intervention Must have tools to migrate any data and services from failing systems on demand with minimal manual intervention. I love this kind of thinking.
Posted on: Fri, 12 Dec 2014 08:27:40 +0000