Maxloaded - Load Balancing

July 1, 2009

Load Balancing Procedures

Filed under: Base — Tags: — maxerg @ 12:52 pm

Load Balancing Procedures

Different load balancing procedures are used to balance network traffic on multiple servers. These services are provided as a web service on multiple servers. The goal is to redirect the traffic to a service running on multiple servers in order to protect from a total failure of the system in case of a hardware failure. By adding new hardware, additional computing power can easily be added as needed.

In contrast to distributed computing systems, load balancing is used to ensure that users always use the same server in order to maintain the established connection. For example, when using SSL-secured connections or session ID logged transactions, as in e-mail exchanges, online banking and online shopping load balancing can be used.
Load balancing systems are available as hardware, but also there are pure software solutions. Which product is appropriate depends on the situation and your needs. In general, it is Hareware Load Balancer to Layer 4 p.m. to 7 p.m. switches. The following procedures will explain expensive load balancing products and cheap software solutions. Software solutions usually support only one of the procedures and depends on the use of a single server operating system. Different operating systems like Windows and Linux, can not be used together and with different tasks.

For Linux there is the Linux Virtual Server Project with a specific Linux kernel. From there it is from Microsoft Windows NT 4.0 Windows Load Balancer Service (WLBS).

DNS variant

The classical load balancing is the DNS itself. This is the DNS server of your own domain multiple IP addresses registered in the name of the host can be achieved. Behind every IP address is a separate and independent server. The requests of the clients served the DNS server in sequence with the registered IP addresses. This procedure is very easy to set up.

Problems caused by interactive sessions and connections are secure thanks to the DNS resolved. The DNS request by the client will only connect when you first asked. The client’s other requests always start with the IP address of this server. Access to another server is excluded.

A proper load balancing does not take place with this method. The clients are evenly distributed to the available servers. Whether a client uses a high load or not is not detected. In the worst case, a server runs constantly at full load, another server in the idle mode. In addition, each server requires a valid IP address on the Internet.

1. DNS query for the IP address
2. Resolution of the IP address and answer of the DNS server with IP address
3. Connection to the server

Round-Robin method

The round-robin procedure comes with a single IP address. Instead of the DNS server, it requires a NAT proxy load of requests. Instead of a list of available proxy servers, it forwards all inquiries to his known target systems. Here it remembers the IP address with which server had a connection and runs a new query to this server.

The advantage is that only one IP address on the Internet is needed, and this variant only a small administrative burden. Among other things, it doesn’t require  a list of servers to be maintained. However, this is also not a real load. The state of individual servers will not be considered.

NAT with feedback

The step to effective load balancing is only on the active exchange of load information between the servers and the load balancer is possible. The NAT proxy is ever the right direction. It receives information about the actual utilization of each server can be viewed from the data to create a list from which he the next target servers.

The communication between the server and load balancer can be via serial lines, periodically running batch jobs or SNMP queries are made. The installation and configuration effort will be greater. The advantage of this method is the exchange between the server and load balancer. If a server is no longer reachable from the Load Balancer’s IP address just from his list. Does the server, the IP address back to the list.

URL-based procedure

The URL-based methods for load balancing is designed specifically for Web or FTP servers. The load balancer decides based on which server URL for a request is responsible. These indexes are stored on different computers. Before that we need a traffic analysis done to determine what area of more computing power and bandwidth. The analysis during the operation must be repeated regularly, since the user behavior of visitors can change the situation and, where applicable, the lists must be adapted.

As the load by the desired destination directory is, a filtering of the data is carried out, is a special hardware or a very fast computer is needed. The procurement of expensive or special hardware is often unavoidable.

This procedure is purely for web and not suitable for e-mail servers or services with suitable transactions.

Service-based procedures

Most run several services, such as HTTP, FTP and e-mail on the same server. Under load is the parallel operation as a drawback, if a utility computing services, all other performance steals.
All services use a TCP port through which a data packet of an application or service is assigned. Looking at the different services on separate servers and operates the load can be distributed depending on services. Previously, a traffic analysis of the data held by the services with a high consumption of computing power and bandwidth to find out. This procedure is much easy to install because the server only with the software must be installed, the service then they operate. The routing takes over, for example, a NAT router configured with port forwarding. This is a port in the router a fixed IP address assigned.

Which procedure to choose

None of the procedures above can be used in any type of situation as a single solution. Usually a combination of two or more processes are used. Either nested or mutually integrated. In any case, a complex system that is constantly monitored and adapted to new requirements need to be adjusted.

Because the data on each server must be always in sync, it is recommended that a central storage solution for all servers. This NAS or SAN systems are anything but cheap, and they exist in different versions of SCSI, Firewire, Gigabit Ethernet or Fiber Channel. Before using a Load Balancer are the existing programs and applications to be examined. Bad programmed slow applications and applications bring a Load Balancer to its limits. An extensive analysis of the data is absolutely necessary.

Server Load Balancing

Filed under: Base — Tags: , — maxerg @ 12:10 pm

The term server load or English Server Load Balancing (SLB) describes in the network engineering methods for load balancing on several separate server computers in the network.

Server Load Balancing is used everywhere, where many clients request a high density and thus produce a single server computer would be overloaded. Typical criteria for identifying the need for SLB are the data, the number of clients and the request rate.

Another aspect is to increase the data through SNL. The use of multiple systems, redundant data storage. The role of the SLB is the placement of clients to each server.

SLB is used on large portals such as Wikipedia, market places or online shops. Basically, the user does not notice, whether on the opposite side SLB is used. See also Redirect (forwarding).

SNL can be on different layers of the ISO-OSI reference model used. There are three fundamentally different approaches:

DNS Based SLB

SLB DNS is based on the application layer made and based on the Domain Name System protocol. It is the simplest and cheapest option, SLB to implement needed only a router and a switch. Several scenarios are possible.

Example:

A client connects to example.com

Subsequently, a transfer to the host is possible:

mirror1.example.com
mirror2.example.com mirror2.example.com
mirror3.example.com mirror3.example.com

NAT based SLB (Server Load Balancing)

Expensive but more powerful is the so-called NAT-based SLB. You must first set up two networks: a private network, the servers, and a public network that has routers with the public Internet. Between these two networks will be a load balancer running, so a router, the inquiries from the public network to receive, analyze and then decide on which computer on the private network to connect it conveys. This happens at the mediation layer of the OSI reference model. To use the NAT technique: The Load Balancer manipulate incoming and outgoing IP packets so that the client has the impression that he always communicate with one and the same computer, namely the load balancer. The servers in the private network, so to speak, all the same virtual IP address.

The problem with this method is that all traffic flowing through the load balancer, so that sooner or later a bottleneck, if this is too small or was not redundant.

As an advantage arising out of the NAT-based SLB that the single server through the load balancer also be protected. Numerous manufacturers of Loadbalancer solutions provide additional security modules, the attacks or even erroneous requests before reaching the server cluster can filter out. Also, the termination of SSL sessions and therefore the discharge of the HTTP server cluster farms is a significant advantage in the server-based load balancing.

In addition to active health checks, as in the other procedures are necessary, since some time for large web clusters increasingly passive health checks in operation. Here, the incoming and outgoing traffic through the load balancer to monitor when a server cluster computer in a time when a request triggers this same question can be moved to another cluster server, without this the client is noticed.

Flat based SLB

This procedure requires only one network. The server and the load balancer must have a switch together.The client sends a request (to the load balancer), the corresponding Ethernet frame is manipulated so that there is a direct request from the client to the server represents - the load balancer to replace its own MAC address against the server-to-place and sends the next frame. The IP address remains unchanged. One speaks in this approach by MAT (MAC Address Translation). The server, the frame has been received, sends the response directly to the IP address of the sender, so the client. The client has the impression that he only communicate with a single machine, namely the load balancer, while the server is actually only a computer directly with the client communicates. This procedure is known as DSR (Direct Server Return) means.

Advantage Flat based SLB with the discharge of the Load Balancer. The (mostly richer data) traffic will return to direct instead.

Problems of Practice

Applications such as online shops manage client requests often have sessions. For example, existing sessions will be the contents of the saved cart. This presupposes, however, that a client has already opened a session was always with the same server, if this client-based sessions are used. The load balancer must be able, even at the application layer of the OSI reference model, to take action, so as cookies and session IDs from packages to extract and rank it, then to a placement decision. The Forward is always a session on the same backend server is called “affinity” means. As the load balancer be in practice, therefore, Layer 4-7 switches used.

This problem can, however, by appropriate orientation of the software and programming will be solved, so that a request from any computer in the server pool can be answered.

January 13, 2009

DNS Round Robin vs Hardware Load Balancers

Filed under: Base — Tags: , , , , , , — maxerg @ 1:53 pm

In the previous article we’ve talked about two major load balancing methods which are widely accepted and used by server administrators. One of them is DNS round robin method which is rather easy to setup and doesn’t require any third party device, and the other is using hardware load balancers.

There are certain advantages and disadvantages of using the DNS round robin method or a hardware load balancer.

DNS Round Robin

The main advantage of the DNS round robin method is it’s cheaper and easy to set up. The administrator only needs to change the DNS settings of the web application. With this method, you don’t need to make any changes to the code of your application.

Since this system doesn’t require any changes to the code, you don’t need a network expert to setup or debug the system in case of a problem.

However, there are some important disadvantages of the DNS method. With this method, requests are evenly distributed among the available machines. This means, you cannot manage this distribution based on the user session data.

After establishing a session with one server, once your browser cahce expires, the browser makes another DNS request and if another IP address is returned, your session data on the server is lost.

Another problem is that users may hit a dead server if there is a failure in one machine. DNS keeps sending users to the dead server and they cannot reach your web application. However, an advanced router solves this problem by checking if the server is up and running at regular intervals.

Yet, another disadvantage of DNS round robin method is actually it doesn’t perform a real load balancing. It just distributes the users evenly between your machines and doesn’t know which users are creating a heavy load on your application and which are not. So you might still get bottlenecks with some of your machines serving to very active users.

Hardware Load Balancers

Hardware load balancers solve many problems that can be encountered with the DNS round robin method. It uses one single virtual IP address for all your servers.

A hardware load balancer reads the cookies and URLs on each request and rewrites the header based on this information. Then, it sends the request to the appropriate machine where the user session data is maintained.

When using hardware load balancers, users doesn’t hit dead servers. When a node in the cluster cannot respond to a request, the request is passed along to another one. In simple terms, when one of your servers goes down, all the subsequent users will be redirected to a working one.

The main disadvantage of using load balancers is the high  costs, complexity of setting it up and the vulnerability of single point failure. If your load balancer device fails for some reason, your whole site will go down and be unavailable to all the users until the problem is resolved.

January 12, 2009

Load Balancing Overview

What is load balancing?

Load balancing is the process of distributing the web traffic across multiple servers in order to optimize the performance of the servers and the web application using those servers.

Websites with a great deal of traffic often utilize load balancing in order to ensure the optimal performance of any single server and get the job done under any condition.

Every web server has a maximum capacity of responding to a certain amount of requests. When the amount of requests directed to a server is more than the amount it can support, the server becomes overloaded and cannot respond to the requests. In such cases, the server goes down and the website becomes unavailable.

In order to avoid downtime in a high traffic website, using multiple servers with load balancing is the way to go. As the name suggests, load balancing is performed between two or more servers. In load balancing, when a server gets overloaded, the requests are forwarded to another server and the website still becomes available to all the users.

In order to determine which server is busy and which one has the appropriate capacity to respond to the traffic, generally a load balancer is used. The request for the web page is first sent to the load balancer and the load balancer forwards the request to an appropriate server.

In its simplest meaning, load balancing is used in order to avoid the downtime which is usually the case with the use of a single server. If you are using multiple servers with load balancing, your website doesn’t go down even if one of your servers totally go down due to a failure.

However, there are more complex uses of load balancing for large scale websites. Most big players use geographically scattered server farms and perform load balancing based on different factors such as geographical location of the request, server proximity, server health or load.

Load Balancing and Scalability

Scalability is the capability of a web application to respond to a large number of users simultaneously without deteriorating the user experience or going down as more and more users are connected to the system.

To do this, server clusters are used on large scale web applications. A cluster is a group of servers supporting a website/ web application simultaneously as if it were a single server. In a cluster, if a server fails, another one takes over to process the request and this process is carried out as transparently as possible so that the end user doesn’t feel anything and doesn’t even know which server is used to process his/her request.

There are two major load balancing methods widely used by server administrators. These are DNS round robin method and hardware load balancers.

DNS ROUND ROBIN

DNS database is the system used to match domain names with the IP addresses. When you enter a domain name in your browser, the request is sent to the DNS database. DNS database, then, answers back with the IP address of that domain and you are directed to the server with the correct IP address for that website.

Normally, DNS database contains a single IP for a domain name. However, in the DNS load balancing method, the database contains several IP addresses for a domain name. In this case, DNS holds the IP addresses for all the machines in the cluster which map to a single site name.

In the DNS round robin method, when a request is made to the DNS database, it returns the IP address of the first server, and on the second request second IP is returned, and so on. In this way, load is evenly balanced among the available machines.

This method is cheap and easy to set up but it has certain downsides. We will come to the advantages and disadvantages of DNS round robin method in a later in-depth article.

HARDWARE LOAD BALANCERS

Hardware load balancers use virtual IP addresses and solve many problems which can be encountered with the DNS method. In this case, the load balancer uses a single universal virtual IP address which maps to the each server in the cluster.

When using a hardware load balancer, all the servers in the cluster looks as if they have the same IP address so if a server is down due to a failure the request doesn’t run the risk of hitting a dead server providing a higher availability for the web application.

With the use of a load balancer, the end user deals with a single machine: the load balancer. This way, the session information between the user and the machine can be preserved regardless of the changes in the server used for the requests coming from the same user.

Powered by WordPress