Using prerender.io with haproxy

One of the main downsides of developing single page applications using frameworks like angularjs is that crawlers can’t read content that is rendered by javascript. This can seriously impact SEO, indexing and general linkability. If you rely heavily on rendering content via AJAX, you will definitely have an issue.

There are a few techniques that you can use to provide content to the crawlers and one I particularly like is prerender.io. It basically solves the problem by caching prerendered content and returning that to the crawlers rather than the AJAX enabled single page applications.

You need to configure your application to use prerender.io based upon some simple rules based upon user agents and prerender.io has many examples of configuration in the documentation for middleware such as nginx and apache. Unfortunately at the time of writing there is no configuration sample for haproxy, so I had to figure it out for myself.

Haproxy configuration for prerender.io

So here’s the haproxy.cfg. I came up with for redirecting crawlers to prerender.io.

# Change YOUR_TOKEN to your prerender token
# Change http://example.com (server_name) to your website url

frontend my-frontend
  mode http
  bind :80

  # prerender.io
  acl user-agent-bot hdr_sub(User-Agent) -i baiduspider twitterbot facebookexternalhit rogerbot linkedinbot embedly showyoubot outbrain pinterest slackbot vkShare W3C_Validator
  acl url-asset path_end js css xml less png jpg jpeg gif pdf doc txt ico rss zip mp3 rar exe wmv doc avi ppt mpg mpeg tif wav mov psd ai xls mp4 m4a swf dat dmg iso flv m4v torrent ttf woff
  acl url-escaped-fragment url_sub _escaped_fragment_

  use_backend prerender if user-agent-bot !url-asset
  use_backend prerender if url-escaped-fragment !url-asset

backend prerender
  mode http
  timeout server 20s
  server prerender service.prerender.io:443 check ssl verify none
  http-request set-header X-Prerender-Token YOUR_TOKEN
  reqrep ^([^\ ]*)\ /(.*)$ \1\ /http://example.com/\2

For brevity I’ve left out any other configuration from the haproxy configuration file. As you can see it will redirect the crawlers to the prerender.io service. Assets and other agents will not be redirected.

You can download the full haproxy.cfg file here.