Our Big Blogs

Wrapping multiple backend Hadoop web applications with HAProxy

Written by Robert Gibbon | Dec 26, 2017 2:41:27 PM

Authorizing access to multiple Hadoop applications on different nodes of the cluster can be complex and troublesome for some organizations.

In order to assure a consistent access access path, ideally we want to expose all web applications via a single entry point. In this example, we will use HAProxy to aggregate a bunch of Hadoop backend web applications and expose them from a single host and port.

HAProxy

HAProxy is a free, open source software load balancer with some nice features, including some features specifically for HTTP traffic.

http://www.haproxy.org/

Most Linux distributions include a version of HAProxy, and while it might not be the latest and greatest, the default version that comes with your Linux distro is probably going to be sufficient for what we want to do in this example. 

Setting Up

We will simulate the Hue, Oozie and Cloudera Manager backend web apps using the python SimpleHTTPserver module. The python SimpleHTTPserver module enables us to serve up a directory listing of the present working directory of the job.

We will create some placeholder content and serve it up on the ports used by the real applications.

cd ~

mkdir -p hue

echo “hello hue” > hue/hellohue.txt

 

mkdir -p oozie

echo “hello oozie” > oozie/hellooozie.txt

 

mkdir -p cm

echo “hello Cloudera Manager” > cm/hellocm.txt

 

cd hue

python -m SimpleHTTPServer 8888 &

cd ..

 

cd oozie

python -m SimpleHTTPServer 11000 &

cd ..

 

cd cm

python -m SimpleHTTPServer 7180 &

cd ..

In order to route to the right backend, we need to have a way to tell HAProxy which backend to route to. One way is to use alternative host names and have HAProxy inspect the hostname, however this may be unacceptably complex for some organizations, so instead we will rely on the root path.

So if, for example, the user enters the path http://loadbalancer.fqdn.org:8080/cm then we want HAProxy to route this request and all subsequent requests to Cloudera Manager.

If, on the other hand, the user enters the path http://loadbalancer.fqdn.org:8080/hue then we want HAProxy to route this request and all subsequent requests to Hue.

Finally if the user enters the path http://loadbalancer.fqdn.org:8080/oozie then we want HAProxy to route this request and all subsequent requests to Oozie.

Seems simple right? Well, no, because the backend applications are not listening at /oozie and /cm and /hue. They are all listening a the root of the given backend webserver, /.

Furthermore, subrequests, for example for javascript, css, and images, might be on other paths beneath /. How will the loadbalancer know to send them to the right backend?

Lastly, when the user follows a link in the application, how will the load balancer know which backend to send the request to?

The answer to these questions is to set a cookie. When the first request comes in, we strip the application identifier from the path and then send the request to the appropriate backend, setting a cookie that identifies the current application at the same time.

When the next request comes in, the application identifier won’t be on the path, but we know which backend to send the request to - based on the cookie.

When the user wants to access another application, he just has to enter the application path for the other application, and HAProxy will know to first strip the application identifier from the path, then set a cookie, and forward this and subsequent requests to the other application backend.

Here’s a simple example of how the HAProxy configuration file would look:

 defaults

    log     global

    mode    http

    timeout connect 5000

    timeout client  50000

    timeout server  50000

 

frontend webfe

    bind *:8080

    mode http

 

    acl is_hue_path path_beg -i /hue

    acl is_cm_path path_beg -i /cm

    acl is_oozie_path path_beg -i /oozie

 

    acl is_hue_cookie hdr_sub(cookie) BACKEND=hue

    acl is_cm_cookie hdr_sub(cookie) BACKEND=cm

    acl is_oozie_cookie hdr_sub(cookie) BACKEND=oozie

 

    use_backend hue if is_hue_path

    use_backend cm if is_cm_path

    use_backend oozie if is_oozie_path

    use_backend hue if is_hue_cookie

    use_backend cm if is_cm_cookie

    use_backend oozie if is_oozie_cookie

 

backend hue

    mode http

    balance roundrobin

    option forwardfor

 

    http-request set-header X-Forwarded-Port %[dst_port]

    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    cookie BACKEND insert indirect nocache

 

    reqirep ^([^\ :]*)\ /hue([^\ ]*)\ (.*)$       \1\ /\2\ \3

    rspirep ^(Location:)\ http://([^/]*)/(.*)$    \1\ http://\2/hue/\3

    rspirep ^(Set-Cookie:.*\ path=)([^\ ]+)(.*)$       \1/hue\2\3

 

    server hue01 localhost:8888 cookie hue   

 

backend cm

    mode http

    balance roundrobin

    option forwardfor

 

    http-request set-header X-Forwarded-Port %[dst_port]

    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    cookie BACKEND insert indirect nocache

 

    reqirep ^([^\ :]*)\ /cm([^\ ]*)\ (.*)$       \1\ /\2\ \3

    rspirep ^(Location:)\ http://([^/]*)/(.*)$    \1\ http://\2/cm/\3

    rspirep ^(Set-Cookie:.*\ path=)([^\ ]+)(.*)$       \1/cm\2\3

 

    server cm01 localhost:7180 cookie cm

 

backend oozie

    mode http

    balance roundrobin

    option forwardfor

 

    http-request set-header X-Forwarded-Port %[dst_port]

    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    cookie BACKEND insert indirect nocache

 

    reqirep ^([^\ :]*)\ /oozie([^\ ]*)\ (.*)$       \1\ /\2\ \3

    rspirep ^(Location:)\ http://([^/]*)/(.*)$    \1\ http://\2/oozie/\3

    rspirep ^(Set-Cookie:.*\ path=)([^\ ]+)(.*)$       \1/oozie\2\3

 

    server oozie01 localhost:11000 cookie oozie

 

To test, fire up haproxy in foreground with the config file:

haproxy -f our_test_haproxy.cfg

 

And try to browse to the HAProxy paths:

 

Cool, looks like that works. Will browsing to the file work?

 

 

What about now browsing to our Cloudera Manager url?

 

 

looks good

 

 

Now onto Oozie

 

 

Yes!!, seems to work