Marian Steinbach: Blog

Six Principles of URL Design

2011/01/19

Completely without notice here in my very own blog, last year I offered a session on URL design at the User Experience Bar Camp (UXCamp) in Berlin. Now that this space serves as a spot worthy of content, I re-post it here.

Unfortunately, Myriad renders horribly when embedded into a Powerpoint presentation / PDF and then uploaded to slideshare. So please do your eyes a favor and watch the slides as large as you can. Here you can find the presentation on Slideshare: Sexy URLs don’t end in .aspx?id=23859

As a sneak preview, these are the six principles:

  1. Simplicity
  2. Meaningfulness
  3. Hackability
  4. Unambiguousness
  5. Persistence
  6. Canonicalization

If you’re interested in my background thoughts, find them after the break.

Transcript of what I might have said while presenting these slides:

Slide 4: This is how you put it on a barcamp. Here I would ask: Why should we deal with a topic that’s so basic and already existed with the very first incarnation of the web?

Slide 5: Instead of an answer, please try to tell my what these figures are. Or what they represent.

Slide 6: The figures are actually a GTIN number. That’s a globally unique code for a product. You often find them with bar codes attached on all kinds of goods. These bar codes are just another machine readable way to encode the figures.

Slide 7: The GTIN actually represents the Mars candy bar. More precisely, the 51 gram variant. You could have known figured that out, couldn’t you? Well… The point I’m trying to make is that URLs are just another way to identify something in a globally unique way. And the nice thing is, we can design these identifiers in a way so that they not only make sense to machines, but to humans as well. And there are a whole lot of reasons why we should do that.

Slide 8: To answer the question from slide 4: We should consciously think about the design of URLs because they are the key to pretty much everything on the web.

Slide 9: And if we don’t design URLs, nobody will and they’ll look all bad. We all know those.

Slide 10: So these are the principles I think we should think of when it comes to consciously designing URLs: Simplicity, Meaningfulness, Hackability, Unambiguousness, Persistence and Canonicalization. Each will be explained in more detail.

Slide 11+12: First to Simplicity. There is an ancient Nielsen Alertbox requesting URLs to be esy remember, easy to spell and easy to type. And why is that?

Slide 13+14: Because URLs are tools for communication. And they are communicated in various ways. When I think about all the ways to communicate a URL, only very few do NOT involve exposing the URL itself to the recipient. The exceptions are hyperlinks and QRcodes. And this often means, you have to write these URLs down, and someone else has to read them.

Slide 15 to 20: These real world examples should demonstrate that URLs oftentimes carry useless stuff. Simplicity is mainly about getting rid of this useless payload.

Slide 21: Since the last simplified example is a mixed case URL, I’d like to address the question whether or not uppercase characters should be encouraged. I’d say, in order to be as simple as possible and raise no questions whatsoever, rather stick with all lowercase letters.

Slide 22: Not only do we have to decide if upper-vs-lowercase letters are better, the question which characters are fine for URL is important overall. With UTF-8 becomeing more and more a standard of encoding content on the web, should we make use of this power when it comes to URLs? In order to make URLs as simple as possible, I’d say no. Actually I’d recommend to stick to a very limited character set for encoding URLs. Letters like the German umlauts (e.g. ä) should be encoded using the closest acceptable ASCII alternative (which would be “ae”). There might be tools to help you with that. For example. Ruby on Rails has FriendlyId where this is taken care of.

Slide 23: File endings where already unnecessary when websites where still a bunch of HTML files. They make the URL more complex than necessary, and by the way, what if you switch from PHP to JSP? And they have the bad habit to make a visible distinction between a page and a section of a site.

Slide 24: If you have a page /mypage.html and want to add sub-pages to that page, there is no really nice URL scheme to communicate the new structure. Whereas, if you started with calling the page /mypage in the first place, you could grow the structure by adding /mypage/subpage and so on.

Slide 25: Simplicity can be reduced by the will to be over-explicit. Fortunately, only the URL as a whole has to be unique, not every path segment of it. So instead of imitating this real-world example from the Star Alliance website or doing things like “/games/game-24/game-details” let’s just make it brief.

Slide 26: Should these nice path-style URLs always end in a slash? Or can we drop that character, too? I’d say it doesn’t really matter. I’d only advise to be consistent here. Pick one way and redirect the other.

Slide 27: Home page URLs deserve special attention. The goal should be to let your home page URl end after the third slash. Period.

Slide 28: So far for simplicity. But there is more to come, because obviously not every URL that is simple is also cool in every way it could be.

Slide 29: Principle no. two is Meaningfulness. The term sounded awkward to me, but I didn’t find an alternative that sounded better or more meaningful, so I kept this one.

Slide 30 to 32: We’ve seen before that URLs are often exposed in communication. Another important aspect is that users actually try to interpret them. Edward Cutrell of Microsoft Research and Zhiwei Guan have extensively studied user’s attention in web search in 2007. Not only do users attribute higher importance to the URL than they do to the search snippet. The researchers also point out the importance of the URL for assessing the credibility of a resource.

Slide 33: When users try to understand URLs, what do they conclude? It’s pretty much up to us who build sites and applications. The Craigslist example contains the term “Austin”, but what’s “fua”? The URL is actually about furniture sales in Austin, Texas (Go, Longhorns!). And what about the IMDB URL? It actually represents 2001 – A Space Odyssey by Stanley Kubrick. But the URL doesn’t convey that. It’s not much better than the Mars barcode.

Slide 34: Small things can make a big difference. To those who noticed that I used the plural form for the IMDB link instead of “title”: Onl the plural form can convey the fact that there is a multitude of titles hidden in that collection.

Slide 35: Meaning can, of course, best be conveyed in the language the user understands. That will most likely be the language of the content. This, as a result, means that different language versions of a page would ideally have different URLs.

Slide 36 and 37: Binding the URL to the content of a page, or to other aspects which are subject to change, might mean that an URL may change. But that can usually be handled by redirects. In many cases this choice is far better than the alternative of having never-changing, but completely cryptic IDs. BTW, do you remember ICQ? That was the instant messaging service that thought it was clever to give users numeric IDs instead of self-chosen user names. As a result, not even the account owners could remember their own user names, which led to uncounted abandoned accounts and buddy lists. And how is ICQ doing now? Side note: Twitter has numeric user IDs as well, they just don’t expose them in the UI or the URL.

Slide 38 and 39: Meet principle number three: Hackability.

Slide 40: Flickr (and many other sites) has nice examples of hackable URLs. Knowing that you look at interesting photos of May 2010, you can easily tell how to get to those of April 2010.

Slide 41: Without meaningfulness, URLs probably won’t be hackable. Because URL hacking would be just random if the user wouldn’t know what it meant.

Slide 42 and 43: Enter Unambiguousness. Another clumsy word. What it means is: No two resources should share the same URL. And it also means that no two states of an application should share the same URL. In other words: Each state of your application deserves a unique URL.

Slide 44: The reasons are quite simple: Because people want to use the URL in order to show others stuff they found. Or keep it for later. Or complain about. Or write a feedback email. Or whatever. As discussed before, URLs universally represent stuff on the Web. People want to communicate about stuff on the web, so they want to use the URL in communication.

Slide 45 to 46: The example under that URL doesn’t exist any longer, but instead redirects to http://www.mini.de/mini_cabrio/cooper/index.html which doesn’t have as many problems as the former version. The former site was built using Flash. Whenever I clicked on a “link”, the URL didn’t change. Someone trying to point someone else to the nice blue cabrio interior view couldn’t use the URL for that. What a fail in times of Social Media!

Slide 47: Besides Flash, there are many technologies that often prevent us from having one URL per state. That’s basically a problem of knowledge and effort on the implementor’s side. For example, if a search form enforces the HTTP POST method instead of GET, the search result doesn’t have a URL that represents this exact search. Or if a site uses frames or iframes, the URL of the outmost page/frameset doesn’t represent what resources are to be loaded in the inner frames. And with AJAX, this problem has become prevalent. Pages can load all sorts of content dynamically, based on various decisions made. All this is transparent to the URL unless developers decide to put effort in making it visible to the URL.

Slide 48 and 49: Principle no. five is: Persistence. In other terms: As long as a resource still exists, it’s URL should persist.

Slide 50: Why do we want persistence? Because URLs live very long lives outside your website. The are linked to from other websites. The are archived in search engine indexes, in emails, in forum posts, in PDF documents, nearly eveywhere. And since we as a site owner have a vivid interest in getting users to visit our site, we should keep an eye on those URLs. In fact, we sometimes spend lots of money to get our URL out in the world, build incoming hyperlinks and stuff. But these links are only good if they do not lead to a 404 error.

Slide 51: URLs can be designed for persistence, to avoid the need of changing a URL. If you use a term like “charts-of-the-month” in a URL, be aware that “the-month” means something different as time goes by and the content on that page will change. So how do people keep a URL of the specific charts of a specific month? However, also keep in mind that this might contradict Meaningfulness. You have to think about tradeoffs. As mentioned above, I think Twitter made a good decision in favor of Meaningfulness when they designed the user page URLs. They valued persistence less in that moment, which is OK. If they had then thought about proper redirects, everything would have been great.

Slide 52 to 55: Last but not least is Canonicalization. This is mostly a matter of the right process. What it means is: Chose one (and only one) correct URL for each resource and ensure that this one is used. Because if several URLs spread, users will always wonder if the two represent the same thing. It will also mean that some sites link to one URL, some others link to the other URL, which results in two of your URLs competing about their PageRank instead of one getting all the Mojo. it might even mean that caches aren’t used because caches use URLs as a key.

Slide 56: Canonicalization actually starts when deciding for a domain and a hostname. www.yoursite.com isn’t the same as yoursite.com.

Slide 57: If you use tracking tools like Google Analytics you can find out which domains and hostnames people already use to access your content. Often it’s much more than you could have imagines (Analytics: Visitors / Network Properties / Hostnames).

Slide 58: The web content manegement system TYPO3 is especially bad at giving you multiple choices for accessing one page, without ever redirecting to a canonical version.

Slide 59: The way you should enforce access to the canonical URL is redirecting from other URLs with a proper HTTP redirect. Make sure to send the HTTP code 301 which stands for “permanent redirect” as opposed to temporary (302). The canonical tag is a work around if you cannot provide redirects.

Slide 60: The popular european site Qype.com has had (or still has) problems with duplicate content, as Sistrix pointed out in his Blog.

Slide 61 and 62: Even Flickr, which has kick-ass URLs, isn’t perfect. They let the same photo appear in various contexts (group pools, albums, photostream etc.). Each appearance has the same URL. Since the main content of the page is always the same (photo plus info and comments), I would consider these duplicates which should be eliminated. Flickr could have done that using hash-extensions, as shown in slide 60.

Slide 63: When (and if) you flipped through these slides, you might have come to the conclusion that it’s not easy to follow these principles. It means that everybody involved in creating a web site or web application, from interaction designer to information architect to developer to server admin have to be on the same page. But it’s important, because URLs are one of the few things that make the web great. And the more usable they are, the better for your sites and applications.

If you care, I’d be happy to continue the discussion in the comments!

2 Comments

johan on 2012/04/08 at 11:21h GMT:

I hate that ugly id= urls, its no good for the search engines, nice post thanks for sharing.

Your comment

Note: Due to issues with comment spam, your comment might not be published immediately.

Subscribe to new comments via email