Avoiding SEO Duplicate Content Issues with Ruby and Rack Middleware
Duplicate content is identical content that appears online in multiple locations. It is important to be aware of because search engines have difficulty identifying canonical content locations and thus may penalize site owners by way of lowered index rankings or omissions.
Below are a few example URLs that could present duplicate content issues.
- http://www.example.com
- http://example.com
- http://www.example.com/contact/
- http://www.example.com/contact
- http://www.example.com/contact/
- http://example.com/contact
Note the 'www' and the trailing '/'. From these, it is easy to see how minute differences could cause the same content to be available at several locations. Is your site affected?
There are several common methods to tackle this issue. The one that we'll show here is to simply redirect the duplicate page to the canonical page.
We'll do this using Ruby and a little bit of Rack middleware. Rack provides a standard interface for a webserver to interact with a web application. Rack middleware is a filter that can be used to intercept a request and alter the response as a request is made to an application. Check out the resources section below for more information on Rack and Rack middleware.
The Code
The middleware code is contained in a single plain old Ruby object (PORO) as shown below. All Rack middleware classes have an initialize and a call method. The initialize method simply sets up the @app instance variable which holds a reference to the web application, which can then be used elsewhere. It is the call method however that we are more interested in, for it is here that we can inspect and alter the request, prior to it reaching the web application.
Essentially, this middleware will receive and inspect an incoming request, remove a starting 'www.' and trim a trailing '/' as needed, then either perform a permanent redirect to the updated location or, if no changes were made, simply allow the original request to pass through.
How do you configure this in your app?
It's almost painfully easy to get things set up. Below are examples for Rails and Sinatra.
Rails
- Copy seo_redirect.rb to your Rails app in /lib/rack/seo_redirect.rb
- Add the following line to config/application.rb
TIP: Run "rake middleware" to view the middleware used by your app. Rack::SeoRedirect should now be at the top.
Sinatra
- Copy seo_redirect.rb to your Sinatra app.
- Put this in your pipe and smoke it.
How do you try this out locally when you don't have a www subdomain?
There are several publicly available loopback wildcard domains that can be used to help out here. While that is quite a mouthful, these are simply domain names that people have registered that point to 127.0.0.1, aka your local server. They also support wildcard subdomains (*.example.com). Several of these include http://lvh.me, http://vcap.me, and http://42foo.com
So, fire up your application, and using one of the domains above, verify the behavior in your application.
Try it in your browser...
- http://www.lvh.me:3000 should redirect to http://lvh.me:3000
- http://www.lvh.me:3000/someotherpage/ should redirect to http://lvh.me:3000/someotherpage
NOTE: Your port numbers may be differ.
TIP: Check the network tab in Chrome developer tools or in Firebug to watch the redirect happen.
Or via curl...
- curl -I -L http://www.lvh.me:3000
- curl -I -L http://www.lvh.me:3000/someotherpage/