Using prerender.io with haproxy
One of the main downsides of developing single page applications using frameworks like angularjs is that crawlers can’t read content that is rendered by javascript. This can seriously impact SEO, indexing and general linkability. If you rely heavily on rendering content via AJAX, you will definitely have an issue.
There are a few techniques that you can use to provide content to the crawlers and one I particularly like is prerender.io. It basically solves the problem by caching prerendered content and returning that to the crawlers rather than the AJAX enabled single page applications.
You need to configure your application to use prerender.io based upon some simple rules based upon user agents and prerender.io has many examples of configuration in the documentation for middleware such as nginx and apache. Unfortunately at the time of writing there is no configuration sample for haproxy, so I had to figure it out for myself.
Haproxy configuration for prerender.io
So here’s the haproxy.cfg. I came up with for redirecting crawlers to prerender.io.
# Change YOUR_TOKEN to your prerender token
# Change http://example.com (server_name) to your website url
frontend my-frontend
mode http
bind :80
# prerender.io
acl user-agent-bot hdr_sub(User-Agent) -i baiduspider twitterbot facebookexternalhit rogerbot linkedinbot embedly showyoubot outbrain pinterest slackbot vkShare W3C_Validator
acl url-asset path_end js css xml less png jpg jpeg gif pdf doc txt ico rss zip mp3 rar exe wmv doc avi ppt mpg mpeg tif wav mov psd ai xls mp4 m4a swf dat dmg iso flv m4v torrent ttf woff
acl url-escaped-fragment url_sub _escaped_fragment_
use_backend prerender if user-agent-bot !url-asset
use_backend prerender if url-escaped-fragment !url-asset
backend prerender
mode http
timeout server 20s
server prerender service.prerender.io:443 check ssl verify none
http-request set-header X-Prerender-Token YOUR_TOKEN
reqrep ^([^\ ]*)\ /(.*)$ \1\ /http://example.com/\2
For brevity I’ve left out any other configuration from the haproxy configuration file. As you can see it will redirect the crawlers to the prerender.io service. Assets and other agents will not be redirected.
You can download the full haproxy.cfg file here.