Our Big Blogs

Wrapping multiple backend Hadoop web applications with HAProxy

Written by Robert Gibbon | Dec 26, 2017 2:41:27 PM

Authorizing access to multiple Hadoop applications on different nodes of the cluster can be complex and troublesome for some organizations.

In order to assure a consistent access access path, ideally we want to expose all web applications via a single entry point. In this example, we will use HAProxy to aggregate a bunch of Hadoop backend web applications and expose them from a single host and port.


HAProxy is a free, open source software load balancer with some nice features, including some features specifically for HTTP traffic.


Most Linux distributions include a version of HAProxy, and while it might not be the latest and greatest, the default version that comes with your Linux distro is probably going to be sufficient for what we want to do in this example. 

Setting Up

We will simulate the Hue, Oozie and Cloudera Manager backend web apps using the python SimpleHTTPserver module. The python SimpleHTTPserver module enables us to serve up a directory listing of the present working directory of the job.

We will create some placeholder content and serve it up on the ports used by the real applications.

cd ~

mkdir -p hue

echo “hello hue” > hue/hellohue.txt


mkdir -p oozie

echo “hello oozie” > oozie/hellooozie.txt


mkdir -p cm

echo “hello Cloudera Manager” > cm/hellocm.txt


cd hue

python -m SimpleHTTPServer 8888 &

cd ..


cd oozie

python -m SimpleHTTPServer 11000 &

cd ..


cd cm

python -m SimpleHTTPServer 7180 &

cd ..

In order to route to the right backend, we need to have a way to tell HAProxy which backend to route to. One way is to use alternative host names and have HAProxy inspect the hostname, however this may be unacceptably complex for some organizations, so instead we will rely on the root path.

So if, for example, the user enters the path http://loadbalancer.fqdn.org:8080/cm then we want HAProxy to route this request and all subsequent requests to Cloudera Manager.

If, on the other hand, the user enters the path http://loadbalancer.fqdn.org:8080/hue then we want HAProxy to route this request and all subsequent requests to Hue.

Finally if the user enters the path http://loadbalancer.fqdn.org:8080/oozie then we want HAProxy to route this request and all subsequent requests to Oozie.

Seems simple right? Well, no, because the backend applications are not listening at /oozie and /cm and /hue. They are all listening a the root of the given backend webserver, /.

Furthermore, subrequests, for example for javascript, css, and images, might be on other paths beneath /. How will the loadbalancer know to send them to the right backend?

Lastly, when the user follows a link in the application, how will the load balancer know which backend to send the request to?

The answer to these questions is to set a cookie. When the first request comes in, we strip the application identifier from the path and then send the request to the appropriate backend, setting a cookie that identifies the current application at the same time.

When the next request comes in, the application identifier won’t be on the path, but we know which backend to send the request to - based on the cookie.

When the user wants to access another application, he just has to enter the application path for the other application, and HAProxy will know to first strip the application identifier from the path, then set a cookie, and forward this and subsequent requests to the other application backend.

Here’s a simple example of how the HAProxy configuration file would look:


    log     global

    mode    http

    timeout connect 5000

    timeout client  50000

    timeout server  50000


frontend webfe

    bind *:8080

    mode http


    acl is_hue_path path_beg -i /hue

    acl is_cm_path path_beg -i /cm

    acl is_oozie_path path_beg -i /oozie


    acl is_hue_cookie hdr_sub(cookie) BACKEND=hue

    acl is_cm_cookie hdr_sub(cookie) BACKEND=cm

    acl is_oozie_cookie hdr_sub(cookie) BACKEND=oozie


    use_backend hue if is_hue_path

    use_backend cm if is_cm_path

    use_backend oozie if is_oozie_path

    use_backend hue if is_hue_cookie

    use_backend cm if is_cm_cookie

    use_backend oozie if is_oozie_cookie


backend hue

    mode http

    balance roundrobin

    option forwardfor


    http-request set-header X-Forwarded-Port %[dst_port]

    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    cookie BACKEND insert indirect nocache


    reqirep ^([^\ :]*)\ /hue([^\ ]*)\ (.*)$       \1\ /\2\ \3

    rspirep ^(Location:)\ http://([^/]*)/(.*)$    \1\ http://\2/hue/\3

    rspirep ^(Set-Cookie:.*\ path=)([^\ ]+)(.*)$       \1/hue\2\3


    server hue01 localhost:8888 cookie hue   


backend cm

    mode http

    balance roundrobin

    option forwardfor


    http-request set-header X-Forwarded-Port %[dst_port]

    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    cookie BACKEND insert indirect nocache


    reqirep ^([^\ :]*)\ /cm([^\ ]*)\ (.*)$       \1\ /\2\ \3

    rspirep ^(Location:)\ http://([^/]*)/(.*)$    \1\ http://\2/cm/\3

    rspirep ^(Set-Cookie:.*\ path=)([^\ ]+)(.*)$       \1/cm\2\3


    server cm01 localhost:7180 cookie cm


backend oozie

    mode http

    balance roundrobin

    option forwardfor


    http-request set-header X-Forwarded-Port %[dst_port]

    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    cookie BACKEND insert indirect nocache


    reqirep ^([^\ :]*)\ /oozie([^\ ]*)\ (.*)$       \1\ /\2\ \3

    rspirep ^(Location:)\ http://([^/]*)/(.*)$    \1\ http://\2/oozie/\3

    rspirep ^(Set-Cookie:.*\ path=)([^\ ]+)(.*)$       \1/oozie\2\3


    server oozie01 localhost:11000 cookie oozie


To test, fire up haproxy in foreground with the config file:

haproxy -f our_test_haproxy.cfg


And try to browse to the HAProxy paths:


Cool, looks like that works. Will browsing to the file work?



What about now browsing to our Cloudera Manager url?



looks good



Now onto Oozie



Yes!!, seems to work