Faceted navigation is the collection of UI elements and functionality which provide the ability to filter and refine category views. There is some debate in the SEO, UX and general web development communities about the best way to present faceted navigation in the URL.
Faceted navigation, such as filtering by color or price range, can be helpful for your visitors, but it’s often not search-friendly since it creates many combinations of URLs with duplicative content.
The correct way to denote facets in a URL is through the use of query parameters. However, some believe that virtual subdirectories present a better alternative for SEO and UX. We’ll compare the different options for including facets in a URL, starting with simplistic examples of each method.
- Query parameters
- Virtual subdirectories
Tip: Both query parameters and virtual subdirectories (path segments) are susceptible to duplicate content issues if not managed appropriately; canonicalizing URLs can help avoid duplicate content penalties.
To determine which URL structure is best for SEO, let’s weigh the pros and cons of each option according to the “best and worst” practices of faceted navigation as defined by Google.
In an ideal state, unique content – whether an individual product/article or a category of products/articles – would have only one accessible URL.
As it pertains to URL uniqueness, neither query parameters nor virtual subdirectories are superior.
Google treats each distinct URL (including those with query parameters) as a unique URL with unique content. This is because it is technically possible for a website to serve different content at each distinct URL, and for URLs that include query parameters, it is likely that content may change based on these values.
Tip: This also applies to
https URLs and URLs that can be visited with and without a subdomain (e.g.
www); canonicalizing or redirecting these URLs will help avoid duplicate content penalties.
One challenge faced when structuring faceted URLs is correctly denoting taxonomy for multiple values and multiple options. As more facets are added, the URL structure becomes increasingly complicated. The following examples illustrate two common approaches to this problem:
- Query parameters
- Virtual subdirectories
These approaches raise concerns about valid URL encoding.
… [The danger of hierarchical classification as a general solution lies] in the philosophy of meaning. … Because the relationships between subjects are web-like rather than tree-like, even for people who agree on a web may pick a different tree representation.
– Cool URIs don’t change, Tim Berners-Lee
Non-Standard URL Encoding
Google advises against using “non-standard URL encoding for parameters… instead of
Google provides two “worst practice” examples where key-value pairs are marked incorrectly with
, rather than
=, and where multiple parameters are appended with
,, rather than
If key-value pairs are marked correctly, there are valid special characters that may be used within a URL.
The comma, for example, is an allowed path character (i.e. “pchar”) as part of the sub delimiters (i.e. “sub-delims”) defined in RFC 3986 § 2.2 for both query parameters and path segments. As RFC 3986 § 3.3 states:
the semicolon (“;”) and equals (“=”) reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (“,”) reserved character is often used for similar purposes.
While these characters may be used in both path and query segments, it is uncommon to see these characters in path segments because delimiting options in a path segment does not impart a clear hierarchy.
Are Facets Hierarchical?
The heart of the question is whether or not facets are hierarchical data.
According to the Wikipedia entry on “Faceted classification”, facets are not hierarchical:
Hierarchical classification refers to the classification of objects using one single hierarchical taxonomy. Faceted classification may actually employ hierarchy in one or more of its facets, but allows for the use of more than one taxonomy to classify objects.
As we’ve seen in our examples above, the “multiple taxonomies” presented by facets is not well suited to inclusion in the path segment of the URL. Returning to the point of using non-standard URL encoding, according to RFC 3986 § 3.3:
The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource…
RFC 3986 § 3.4 continues:
The query component contains non-hierarchical data
This makes it clear that facets should not appear in path segments, but should appear as query parameters. In fact, Google instructs:
Use parameters (when possible) with standard encoding and key=value pairs.
Using virtual subdirectories to denote facets is non-standard and Google is able to make better assumptions of page contents when facets are conveyed through query parameters. Google even provides a URL Parameters tool in Google Search Console that allows site administrators to instruct Google on how to interpret query parameters; no such tool exists for virtual subdirectories.
Another “worst practice” as defined by Google is “using directories or file paths rather than parameters to list values that don’t change page content.”
In a directory based faceted URL structure, the following URLs would all serve the same content:
This issue is less apparent when using query parameters because there is a clear delineation between hierarchy and facets using key-value pairs. This issue may be resolved via canonicalization, but is not considered best practice as noted by Google; best practice is to use query parameters, because “URL parameters allow more flexibility for search engines to determine how to crawl efficiently.”
Regardless of which URL structure is used, facets should always be presented in a unified manner (e.g. alphabetical order), so multiple URLs are not indexed for the same content. Take the following URLs as an example:
Both of these URLs would display the same content. To reduce the total number of unique links on a site, and thus duplicate content from being indexed, only one of the above should be used consistently across a site.
Again, redirection or canonicalization can help search engines index this content correctly if it is referenced elsewhere.
As a front-facing component of websites, URLs are an important part of the user experience. The URL acts as a reference point for the current view, and advanced users may use the URL as a “virtual breadcrumb trail” to navigate backwards through your site’s hierarchy. Maintaining a human parsible URL is an important and non-trivial endeavor.
Note: I’ve written an extensive post on the importance of designing URLs as part of a site’s UI.
Google advises against “appending URL parameters without logic”. Unnecessary parameters should be stripped to maintain a human parsible URL structure, when possible. Google recommends removing user session information from the URL and storing that data in cookies instead. Keeping the URL free of unnecessary data not only helps users understand the content present in the current view, it also aides SEO, as Google notes:
Extraneous URL parameters only increase duplication, causing less efficient crawling and indexing.