In Anatomy Of A URI (Part 1), we took a look at various components of a web address.  In this post, we’ll look at some of those components in greater detail. For reference, here again is our handy diagram (slightly modified) from the actual technical specification for URI’s (RFC3986).  A URI will not necessarily contain all of these components, and it may contain others not shown here.

  http://www.example.com/animals/mammals?name=ferret#nose
  \__/   \_____________/\______________/\__________/ \__/
    |           |              |             |         |
scheme      authority         path         query   fragment

Scheme – This is the protocol.  HTTP is the protocol used for websites, but there are many other official and unofficial schemes is use.  Terminology note:  When a scheme other than HTTP (or HTTPS) is used, the URI really can’t be called a web address anymore, since it doesn’t involved the World Wide Web.  See this post for a simple analogy regarding the WWW vs. the Internet.

Authority – This is the entity to whom you are making the request, the Internet domain name.  See this Wikipedia article for explanation of domain names.  Basically it breaks down like this.  The last part is the top-level domain, e.g., com, net, org, etc., the next part is the second-level domain, the main entity, e.g., Google, Yahoo, WordPress, etc., and the following part(s) is something specific to that entity.  The use of www as a third-level domain is merely a convention.  A website or other service could just as easily use www2.example.com, w3.example.com, or happy-dude.example.com.  An authority may contain multiple sub-level domains in succession, e.g., http://www.animals.example.com, or none at all, e.g., example.com.  Often, multiple sub-level domains will be employed for different purposes.  For example, http://www.google.com is Google’s main search page, while mail.google.com is Google’s main GMail page.  As another example, consider wordpress.com.  Htipe.wordpress.com is my little corner at wordpress.com, and the thousands of other blogs hosted here have their own third-level domain name.

The authority part of a URI can contain more information, such as user info or port number.  For example, username:password@example.com:8080.  A discussion of these is outside the scope of this article, but here’s a quick analogy.  If example.com is a restaurant, http://www.example.com might be the front door.  The port number is an alternate door to the same place.  Perhaps example.com:8080 (with or without the initial www, depending how things are configured) is a side door specifically for to-go orders, and maybe example.com:8081 is an employee entrance where you need to provide credentials in order to get in, chef:yummy@example.com:8081.

Path – An authority will often organize its pages in a kind of hierarchical structure.  You can think of it like folders, sub-folders and files, on your own computer.   If you look at the path for this article, you’ll see that it is arranged by year, month, day, and finally this article itself.  Sometimes the path will end in something like page.html, or song.mp3, but not always.

Query – The beginning of a query is marked by a question mark (?).  It is a way to include information about what you’re requesting.  You’ll often notice the presence of query strings used with web sites that are fairly interactive.  Consider the following juxtaposition.  Let’s say example.com provides pages about different animals.  If you want to read its page about ferrets, you can go to http://www.example.com/animals/mamals/ferrets.html.  You are requesting the page about ferrets, no query string necessary.  On the other hand, maybe you don’t know about example.com and so you go to your favorite search engine, e.g., Google, Yahoo, etc., and search for ferret.  Take a glance at the the URI of the page that displays your results.  It will probably look something like http://www.google.com/search?hl=en&q=ferret&btnG=Google+Search&aq=f&oq=, and quite possibly longer and crazier.  Here’s the deal.  Google, or whatever search engine you used, does not have a search results page about ferret in the same way that example.com had a page about ferrets.  There is no way that Google can have a results page ready for everything that everyone might search, so it creates one for you on the fly.  If you browse to the URI above, Google will look at the query string, and create a web page just for you based on the query information.

Further Reader

RFC3986 – The official standard

Wikipedia URI Article

Wikipedia URI Scheme Article

Wikipedia Domain Name Article

Wikipedia Query String Article

How To Obscure Any URL – A little dated, but lots of good info