Questions tagged [high-availability]
High availability is an architectural consideration often involving degrees of redundancy to insure availability in case of system or component failure.
980
questions
80
votes
11
answers
27k
views
Multiple data centers and HTTP traffic: DNS Round Robin is the ONLY way to assure instant fail-over?
Multiple A records pointing to the same domain seem to be used almost exclusively to implement DNS Round Robin as a cheap load balancing technique.
The usual warning against DNS RR is that it is not ...
38
votes
6
answers
10k
views
Windows 2008 ignores Gratuitous ARP requests
We recently saw an issue after a fail over of our router where our Windows 2008 Boxes didn't start talking to the primary router after fail-back.
When we did some digging they still had the ARP ...
27
votes
9
answers
43k
views
Alternatives to Heartbeat, Pacemaker and CoroSync?
Are there any major alternatives for automatic failover on Linux besides the typical Heartbeat/Pacemaker/CoroSync combinations? In particular, I'm setting up failover on EC2 instances, which only ...
23
votes
3
answers
14k
views
What is the difference between Anycast and GeoDNS / GeoIP wrt HA?
Based on the Wikipedia description of Anycast, it includes both the distribution of a domain-name-to-many-IP-mapping across many DNS servers as well as replying to clients with the most geographically ...
22
votes
6
answers
21k
views
DNS Round Robin: Do browsers stick to one IP as long as it is online?
How do most browsers behave if they get multiple A-records from the DNS server? Do the stick to one IP as long as it is reachable (and only use another if the IP is down)? Or do they switch all the ...
22
votes
8
answers
33k
views
Avoiding DNS timeouts when a DNSserver fails
We have a small datacenter with about a hundred hosts pointing to 3 internal DNS servers (bind 9). Our problem comes when one of the internal DNS servers becomes unavailable.
At that point all the ...
21
votes
2
answers
34k
views
What is the difference between keepalive and heartbeat?
I want to structure a high available server cluster . Now I want to know detail about keepalive and heartbeat, what is the difference between both, and How to choice one.
17
votes
1
answer
1k
views
Highly-available, Web-accessible and scalable deployment of statsd and graphite
I'd like to setup statsd/graphite so that I can log JS apps running on HTML devices (ie. not in a contained LAN environment, and possibly with a large volume of incoming data that I don't directly ...
16
votes
8
answers
2k
views
When is the right time to introduce high availability for web site?
When is the right time to introduce high availability for web site?
There are many articles on High Availability options.
It’s not that obvious however WHEN is the right time to switch from single ...
16
votes
3
answers
8k
views
Multi-site high availability
We have a SaaS application that we need to be highly available. We already have an expensive, well-maintained Hyper-V failover cluster, but today the datacenter where we host that cluster had a five-...
15
votes
2
answers
8k
views
Options for Multisite High Availability with Puppet
I maintain two datacenters, and as more of our important infrastructure starts to get controlled via puppet, it is important the the puppet master work at the second site should our primary site fail. ...
15
votes
5
answers
3k
views
When my A web server gets unplugged, how do I automatically redirect all the users to my B web server in another city, and vice versa?
When my A web server gets unplugged, how do I automatically redirect all the users to my B web server in another city, and vice versa?
A load-balancing switch does what I want, except I can't figure ...
15
votes
1
answer
6k
views
Replicating beanstalkd for High Availability
Title says it all.
Does anyone know of a way to replicate beanstalkd such that if a beanstalk server went down, others slaves could take over?
Here's one approach I thought of:
I could make ...
14
votes
5
answers
41k
views
How to setup STONITH in a 2-node active/passive linux HA pacemaker cluster?
I am trying to setup an active/passive (2 nodes) Linux-HA cluster with corosync and pacemaker to hold a PostgreSQL-Database up and running. It works via DRBD and a service-ip. If node1 fails, node2 ...
13
votes
8
answers
4k
views
Load balancing Apache on a budget?
I am trying to get my head around the concept of load balancing to ensure availability and redundancy to keep users happy when things go wrong, rather than load balancing for the sake of offering ...
13
votes
2
answers
29k
views
Keepalived for more than 20 virtual addresses
I have set up keepalived on two Debian machines for high availability, but I've run into the maximum number of virtual IP's I can assign to my vrrp_instance. How would I go about configuring and ...
12
votes
7
answers
13k
views
Mathematically, how to calculate an uptime percentage based on a number of nodes and their respective uptime percentage?
This question is more of a math question than a server question, but it is strongly server related.
If I have a server that I would be able to guarantee 95% uptime and I would put that server in a ...
12
votes
2
answers
24k
views
What's the difference between keepalived and corosync, others?
I'm building a failover firewall for a server cluster and started looking at the various options. I'm more familiar with carp on freebsd, but need to use linux for this project.
Searching google has ...
12
votes
3
answers
2k
views
The downsides of using nginx as a primary web server?
I've seen millions of websites using nginx as a proxifying webserver working together with Apache. But I've seen very few servers running nginx only as their default webserver. What are the main ...
12
votes
4
answers
14k
views
What exactly does Gluster do?
I've been playing with gluster for the last 2 days and been asking questions here and on their questions system.
I really don't understand some of the stuff. I see people saying stuff like
Set up ...
12
votes
2
answers
16k
views
How to do client side NFS failover in Linux?
I have a CentOS 6.3 client that needs to access NFS storage. There are two NFS servers that serve up the same content stored on a SAN with a clustered filesystem. How do I set up CentOS to failover ...
12
votes
2
answers
9k
views
Avoiding SPOFS with GlusterFS and Windows
We have a GlusterFS cluster we use for our processing function. We want to get Windows integrated into it, but are having some trouble figuring out how to avoid the single-point-of-failure that is a ...
12
votes
3
answers
2k
views
How can I load-balance a load balancer?
I'm about to convert a single-server single-database web application into a physically distributed high-available configuration with servers on two physical locations (for now). Now, obviously, I need ...
12
votes
3
answers
2k
views
RabbitMQ - How I do configure servers for zero-downtime upgrades?
Having read through the docs and RabbitMQ in Action, creating a RabbitMQ cluster seems straightforward enough, but upgrading or patching an existing RabbitMQ cluster seems to require the whole cluster ...
12
votes
5
answers
5k
views
what cluster management software to use for linux?
I have found the following cluster management software tools:
pacemaker (clusterlabs.org), - original a Heartbeat project, focus for high-availability, will be in the next debian version
openqrm (...
11
votes
8
answers
8k
views
High Availability DNS Hosting Strategy?
I'm trying to find a few options of ways to do high availability DNS hosting for a few existing websites. This morning, the company I work for was brought to its knees because the DNS hosting we have ...
11
votes
1
answer
2k
views
How to design/ensure high-availability of web servers?
I have been given a dedicated server by 1&1 internet, which has two hard drives in a RAID1 configuration. I expected this would be good enough as if one disk fails, the other can take over until ...
11
votes
5
answers
679
views
High server availabilty for a small business
After having a bit of scare with a server that wouldn't come up one morning, the higher ups have decided that the business needs a high availability / fail over setup.
We have 5 main servers (4x ...
11
votes
3
answers
27k
views
Can not switch drbd to secondary
I'm running drbd83 with ocfs2 in centos 5 and planning to use packemaker with them.
Afer some time, I'm facing drbd split brain problem.
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: ...
11
votes
7
answers
1k
views
Looking for a recommendation on measuring a high availability app that is using a CDN
I work for a Fortune 500 company that struggles with accurately measuring performance and availability for high availability applications (i.e., apps that are up 99.5% with 5 seconds page to page ...
10
votes
3
answers
13k
views
Is round-robin DNS a possible solution for high availability?
Let's say I have 2 IPs for a given domain (round-robin DNS).
If one the IPs becomes unresponsive, will clients try to connect to the other IP? or they will just fail to establish comunication with the ...
10
votes
5
answers
1k
views
Setup for high availability virtualized environment
For a project I have the task of planning a high availability setup for a web shop and CMS system. However, of course the project is on a tight budget. So a high end solution might not be in the ...
10
votes
1
answer
23k
views
Keepalived send gratuitous ARP periodically
Is there a way for a keepalived to send gratuitous ARP periodically?
We had following situation:
switch failure (VLAN setup)
keepalived failovered to backup instance
backup instance sent gratuitous ...
10
votes
3
answers
8k
views
High Availability Cron Jobs
Information
We are currently in the process of creating a high availability cluster for NGINX (on Centos 7) running PHP. Most of the configuration has been mapped and it should work nicely in a ...
9
votes
4
answers
8k
views
Ganeti vs Proxmox [closed]
I'm system administrator in small software house. I'm going to virtualise our servers. The main reason for doing this is providing highest possible uptime, but probably it will also increase resources ...
9
votes
11
answers
18k
views
good failover / high availability solutions for linux? [closed]
I have several cases where I need applications to be migrated from one server to another in the event of a failure (server hang or crash).
On solaris we do this with VCS (Veritas Cluster Server).
...
9
votes
9
answers
2k
views
Questions about single point of failure for small operations
If you can't afford or don't need a cluster or spare server waiting to come online in the event of a failure, it seems like you might split the services provided by one beefy server onto two less ...
9
votes
1
answer
29k
views
Keepalived's virtual_router_id - should it be unique per node?
I have two nodes running keepalived, and two VIP, e.g.
Node 1 Node 2
VIP1 VIP2
So in each node, I have two definition of vrrp_instance, so I assume the two vrrp_instance in my keepalived....
9
votes
3
answers
1k
views
Global high availability setup question
I own and operate visualwebsiteoptimizer.com/. The app provides a code snippet which my customers insert in their websites to track certain metrics. Since the code snippet is external JavaScript (at ...
9
votes
3
answers
5k
views
SQL Server - Cluster vs Mirror for high availability?
I have been doing research into various high availability options for SQL Server 2005. With regard to high availability, what circumstances would favor clustering over mirroring as an option?
From ...
9
votes
1
answer
941
views
Using ZFS head node as database server?
I'm using a dual-head ZFS-backed NAS for high availability cluster shared storage, based on Nexenta's recommended architecture as seen here:
The disks in 1 JBOD will store the database files for a ...
8
votes
3
answers
7k
views
Shared storage options for ESXi HA cluster
I am seeking recommendations for shared storage options to support ESXi HA cluster (note I'm NOT asking for product/brand/model recommendation - I know this is against the rules here). I am asking ...
8
votes
1
answer
10k
views
How to lower Gluster FS down peer timeout / reduce down peer impact?
The setting:
Two fresh CentOS 6.5 server with latest updates. Both have a fresh install of Gluster 3.5.2.
What I did ( from the perspective of server 2, shared1 and shared2 are logical volumes ) :
...
8
votes
2
answers
4k
views
DNS issue with Failover IP from Hetzner
Assume we have two servers A and B with 'real' and external IPs and we can switch the so called 'failover ip' (W.X.Y.Z) to point to a specific external IP of A or B. This works from the 'outside' and ...
8
votes
1
answer
11k
views
RDS snapshot: how long does I/O suspension occur?
As we're relying on RDS Postgresql manual backup for our backup strategy, we encountered the issue with the possible downtime of the RDS instance (single AZ) during snapshot creation. According to AWS:...
8
votes
7
answers
11k
views
How to perform cron jobs failover?
Using two Debian servers, I need to setup a strong failover environment for cron jobs that can be only called on one server at a time.
Moving a file in /etc/cron.d should do the trick, but is there ...
8
votes
1
answer
17k
views
OpenVPN timeout to reconnect on fail takes a long time
I'm trying to create a high-availability environment for my OpenVPN servers. I do this by having two identical VPN servers and in my client config specify multiple remote's:
# The hostname/IP and ...
8
votes
3
answers
7k
views
Is there a way to force heartbeat to add new ip addresses to the system without a full restart?
We utilize heartbeat for High Availability. I'd like to add an additional ip address to the heartbeat cluster, but I don't want to do a full restart of the cluster in the process. Is there a signal ...
8
votes
3
answers
14k
views
Any problems with having an active/active HAProxy setup with Keepalived
Apologies if this has been asked before, but I can't seem to find much on it.
We're going to be using HAProxy to load balance our MariaDB Galera Cluster. All the articles/tutorials I have seen on ...
8
votes
2
answers
11k
views
How to setup Traefik for HA? Need a reverse-proxy in front of Traefik?
I am trying to setup Traefik on a production site, and I'm struggling with some high availability issues. I think we still need a reverse-proxy in front of the Traefik cluster. Here are the potential ...