Rails Router: How it Do

Anything becomes interesting if you look at it long enough

-Gustave Flaubert

I recently decided that I would spend some time investigating how the Rails Router works. I figured I’d read some source code, scan some documentation, and then move on with my life, having gained a slightly better understanding of this chunk of Rails I take for granted.

WRONG.

Turns out, the deeper you go with this, the more interesting it gets.  Fast forward a week, and I’m reading books about tokenization, spending my spare time pondering parsing, and watching YouTube videos about constructing Deterministic Finite Automata.  Much more challenging and enthralling than I originally anticipated.

If you don’t have a CS background, those terms may seem completely foreign.  Don’t worry, they were gibberish to me as well.  I’m going to do my best to explain the basic concepts and hopefully we can all learn something together.

Let’s get started.

Ch.1 – A simple router

One of the main functions of the Rails Router is to match an incoming URL to an existing route.  The router looks at the URL, sees if it matches any routes that it knows about, and if it does, it then sends the appropriate information about the route to a controller action associated with that route.

Here’s an example of pretty much the most basic route you can have:


get '/users' => 'users#index'

Breaking this route down into its constituent parts, it’s basically saying, “Hey Router, when you receive a GET request, and the URL matches /users, call the index method on the UsersController.”

So, imagine for a moment we were writing our own router.  What’s the easiest way you can think of to check if our incoming request URL string matches the string of /users? If you thought “regular expression”, I just awarded you 11 bonus points.

We could totally write an ‘if’ statement using a regex that would do exactly that.


if url_path.match /\/users/
  # call users#index
else
  # 404 not found
end

It’s highly unlikely that our app would ever need just one route, and the problems with this approach start to become clear as we add more. If we add a show route for users to our routes file, it becomes:


get '/users' => 'users#index'
get '/users/:id' => 'users#show'

Our if statement would then become:


if url_path.match /^\/users$/
  # call users#index
elsif url_path.match /^\/users\/[^\.\/\?]+$/
  # call users#show
else
  # 404 not found
end

(If you’re not super familiar with regex, the part in the brackets means ‘anything that isn’t a dot, slash, or question mark’.  Rubular is a great resource I use for figuring out regexes for Ruby).

You can probably start to see the problem here.  With a non-trivial amount of routes, this if block becomes enormous.  And it’s not just about calling the right controller method once we find a match.  We also need keep track of and pass along other important information.  For instance, we need the id of the user that we parsed for the users#show route.

So, what if we could make our own regular-expression-like parser engine that knew all the rules for parsing URL fragments, and could also keep track of important pieces of that URL as it parsed it?

OMG… that’s exactly what the Journey module in Rails does.

1iyojy

Fig 1 – Aaron Patterson (unconfirmed)

Journey is a module written by Aaron Patterson (AKA @tenderlove) that became the routing solution for Rails as of Rails 3.2, and was officially merged with Rails 4 and up.  The readme file provides a very helpful explanation of how everything works, so we’re pretty much done here.

Just kidding.  It says:

Journey is a router. It routes requests.

And the synopsis says:

Too complex right now. 🙁

He’s not wrong.  Getting into the nitty gritty of exactly how everything works would take way too much time and page space to explain (and to be honest, I don’t fully understand all of it myself).  However, I’m confident we can do a high-level flyover of the process and get a feel for the basic steps.

Ch.2 – How to read

Journey needs to be able to read and understand URL paths, so how does it know how to do that?  Turns out, it’s not much different than the way you or I read a sentence.  Let’s take a look at a very simple English sentence as an example:

My cat eats the food

The first step, even though you probably don’t consciously think about it, is to break up the sentence by reading each word individually.  This sentence could be represented as:


[:ADJECTIVE, 'My'],
[:NOUN, 'cat'],
[:VERB, 'eats'],
[:ADJECTIVE, 'the']
[:NOUN, 'food']

This process of breaking an expression into its minimally significant parts is called tokenization, and the individual pieces are called tokens. In Journey’s case, it converts URL paths to tokens utilizing a Scanner class that in turn uses Ruby’s StringScanner class.  In fact, if you’d like to see this process in action, you can open up Rails console and try it out:


journey_scanner = ActionDispatch::Journey::Scanner.new
journey_scanner.scan_setup("/users/:id")
journey_scanner.next_token
  => [:SLASH, "/"]
journey_scanner.next_token
  => [:LITERAL, "users"]
journey_scanner.next_token
  => [:SLASH, "/"]
journey_scanner.next_token
  => [:SYMBOL, ":id"]

The tokens that Journey recognizes are :SLASH, :LITERAL, :SYMBOL, :LPAREN, :RPAREN, :DOT, :OR, :STAR.

Cool.  Now we know how Journey breaks up URLs into “words” AKA “tokens”.  But how does that translate into knowing what to do with them?  To understand that, let’s go back to our English sentence example.

Right now we have the word-tokens “My”, “cat”, “eats”, “the”, and “food.”  While these words do have some meaning in isolation, they don’t provide the meaning of the sentence as a whole.  The next step is to begin grouping these tokens according to grammar rules.  The word “My” is an adjective that describes “cat”, so we can group them together as the Noun Phrase (NP) “My cat”.  The verb, “eats”, and the object of that verb, the noun phrase made by the words “the” and “food”, can be grouped together in a Verb Phrase (VP).  We also know that when we have a Noun Phrase + Verb Phrase, that meets the grammar rules of a Sentence, a group of words that conveys a complete idea.  That was a lot of words, so here’s a picture of what we just said:

screen-shot-2017-02-05-at-4-31-08-pm

Fig 2 – Flashbacks of elementary school

If we were to write out the rules we just used to make this diagram in a code-like way, it might look something like this:


class KittySentenceParser

token NOUN VERB ADJECTIVE

rule
  sentence
    : noun_phrase verb_phrase { Sentence.new(val[0], val[1]) }
    ;
  noun_phrase
    : ADJECTIVE NOUN          { NounPhrase.new(val[0], val[1]) }
    | NOUN                    { NounPhrase.new(val[0]) }
    ;
  verb_phrase
    : VERB noun_phrase        { VerbPhrase.new(val[0], val[1]) }
    ;
end

Ok, so this isn’t really imaginary syntax, it’s Racc syntax. Racc is a LALR parser generator included in the Ruby standard library, and the same parser generator that Journey uses.  We’ll get to what LALR means in a minute, but first let’s look at what we just wrote.

The first line of code is the class name, followed by the list of all the token types that we’ll be using in our rules.  After that, we start our actual list of rules with the rule keyword. In our group of rules, we have rules named sentence, noun_phrase, and verb_phrase. The rule definition starts with the : and ends with the ;, and if there is more than one way to match a rule, those ways are separated by the | operator. (For instance, a noun_phrase can either be an adjective and a noun or just a noun by itself. “Cat eats the food” is just as valid of a sentence as “My cat eats the food.”)

Also, notice that for each rule, you can also provide a block that runs when that particular rule is matched.  For example, when we match the sentence rule, it runs a block that creates a new Sentence object, passing the values of the noun_phrase and the verb_phrase as parameters.

Still with me?  7 more bonus points.

Now, as promised, here is the meaning of LALR: Look-Ahead Left Reversed Rightmost Derivation.  As you may have noticed, that is way too many words for that to be a proper abbreviation, but it’s 33% shorter than LALRRD, so I say we let it slide.  In addition to not being a very representative abbreviation, those words are not particularly meaningful on an initial read, so what the heck do they mean?

In layman’s terms, it means our parser is going to read our tokens from left to right, trying to match our rules along the way.  If it can’t match a rule with the tokens it has read so far, it will look ahead to the next ones to try and find a rule match.

Let’s walk through how our parser handles our sample sentence.

Step 1:  Parser consumes the first token


Grammar Stack                 Tokens
['My']                        ['cat'] ['eats'] ['the'] ['food']

My is an ADJECTIVE token, and if we check our rules we can see there is no rule that matches an adjective by itself. This means our parser will look ahead to the next token to see if it can find a match

Step 2:  Second token consumed


Grammar Stack                   Tokens
['My'] ['cat']                  ['eats'] ['the'] ['food']

Now we have a combination of ADJECTIVE and NOUN, which does match a rule – the rule for a noun_phrase. Our parser then replaces the two tokens, with the rule match that represents both of them.  This is known as a reduce operation

Step 3: First two tokens reduced by rule match


Grammar Stack                   Tokens
[noun_phrase]                   ['eats'] ['the'] ['food']

Step 4: Consume third token


  Grammar Stack                   Tokens
[noun_phrase] ['eats']          ['the'] ['food']

Our grammar stack now has a noun_phrase and a VERB.  There’s no direct match for that, so we’ll look ahead.

Step 5: Consume fourth token


Grammar Stack                   Tokens
[noun_phrase] [VERB] ['the']    ['food']

Still no match, look ahead again.

Step 6: Consume last token


Grammar Stack                   
[noun_phrase] [VERB] [ADJECTIVE] ['food']
[noun_phrase] [VERB] [noun_phrase]       # reduce
[noun_phrase] [verb_phrase]              # reduce again
[sentence]                               # final answer

When the last token is consumed, we have an ADJECTIVE followed by NOUN, which reduces to noun_phrase. That in turn gives a VERB followed by noun_phrase, which further reduces to verb_phrase. Then we are left with a definitive noun_phrase verb_phrase combo which finally reduces to sentence.

Ok, everybody see how our parsing rules lead to that tree-like diagram of the sentence in Fig 2?  Excellent… 17 points for everyone.  I say ‘tree-like’, but that’s actually what it’s called – a syntax tree.  It’s a pretty common thing in computer science, particularly when talking about compilers.  In fact, so far we’ve covered the first two steps of any compiler, tokenization followed by parsing of the tokens.

Not surprisingly, Journey does this same step, and in fact provides us with a way we can see the resulting syntax trees (Note:  you’ll need the gem graphviz installed).  Feel free to try this out in your Rails console:


# Make an instance of Parser
parser = ActionDispatch::Journey::Parser.new
# Use the parser to make a syntax tree
syntax_tree = parser.parse("/users/:id(.:format)")
# Write the tree data to a dot file
File.write("syntax_tree.dot", syntax_tree.to_dot)

# exit console and convert dot file to png
dot -Tpng syntax_tree.dot > syntax_tree.png
open syntax_tree.png

If all went well, this should give you the following image…

tree

Fig 3 – super hi-res syntax tree

If you would like to see the actual Racc rules that Journey uses to come up with this tree, they are in the parser.y file in ActionDispatch::Journey. It’s a fun exercise take a look at the rules and see if you can figure out which ones it is matching as it builds the tree. Pro tip: It’s generally easiest to start at the bottom of tree and work your way up.

Ch.3 – Tiny Traveling Robots

Imagine, if you will, that we can somehow convert the syntax tree from Fig 3 into a map.  The nodes on the tree are cities, and the tokens are the names of the roads that take us from one city to the next.  Such a map might look something like this:

gtg

Fig 4 – Weirdest map I’ve ever seen

We’ve got seven cities (numbered starting with 0) with roads connecting them.  The road names are a little odd, but whatevs.  Also, notice that cities 4 and 6 are shown with double circles around them.  These are the cool cities we want to end up in (the double circle is probably an awesome moat or something).  If we make it to one of those, we’ll consider the trip a success.

Now imagine there is a tiny driving robot hanging out in city 0.  He’s quite enthusiastic, but he also has no idea how to get anywhere.  He needs some instructions.  We hand him a URL path – “/users/22”.  He begins reading the path, token by token.  The first token is a “/”.   Sweet, that’s the name of the road that connects city 0 and city 1, so he drives to city 1.  The next token is the literal “users”, and he now knows how to proceed from city 1 to city 2.  The next token is a “/” again, and he makes the short hop from city 2 to city 3.  His last token is the symbol “22”.  Hmm… well he doesn’t have a road named “22”, but he does have a road named “anything that isn’t DOT, SLASH, or QUESTION MARK”, which seems like a pretty good match to his tiny binary brain.  He has now made it to city 4, and he has also run out of instructions, so he is quite content to stay there and consider it a job well done.

As you may have guessed, our “map” is actually something a bit more technical.  Its actual name is a Generalized Transition Graph or GTG.  A GTG is defined as follows:

  1.  It has a finite set of states (cities).
  2.  It has a finite set of input symbols (characters that can be used in a URL)
  3.  It has a non-empty set of start states (city 0).
  4.  It has a set of final or accepting states (cool cities 4 and 6).
  5.  It has a finite set of transitions, where the edges (roads) are regular expressions.

Our GTG was created using Journey’s GTG Builder class.  You can try it out by using the following instructions:


parser = ActionDispatch::Journey::Parser.new
tree = parser.parse("/users/:id(.:format)")
gtg = ActionDispatch::Journey::GTG::Builder.new(tree).transition_table
File.write("gtg.dot", gtg.to_dot)

dot -Tpng gtg.dot > gtg.png

Our tiny little robot is something a bit more technical as well.  It’s actually a Deterministic Finite Automaton, or DFA.  Basically, it’s a state machine that consumes input one chunk at a time, and makes transitions of its state depending on what input it just consumed.  If it ends up in an accepting state when all the input is consumed, it’s happy and accepts.  If it ends up in a non-accepting state or receives input that it doesn’t recognize, it’s unhappy and rejects.  In our case, the accept states occur when we have a valid URL that matches one of our routes (state 4 is where we have a url that matches “/users/:id”, and state 6 is where we have that same route plus the optional format argument).

Sweet, so now we finally have a way we can match URLs to our routes that we declared in our routes file.  But wait… remember how we started this whole story by making a theoretical router with regular expressions and a giant ‘if’ statement?  Are we any better off if we have to make a different DFA / GTG for each route?  What if there was a way to smash multiple route syntax trees together into one efficient transition graph?  Let’s give it a shot:


parser = ActionDispatch::Journey::Parser.new
tree1 = parser.parse("/users/:id(.:format)")
tree2 = parser.parse("/users(.:format)")
gtg = ActionDispatch::Journey::GTG::Builder.new(
  ActionDispatch::Journey::Nodes::Or.new([tree1, tree2])  
).transition_table
File.write("combined_gtg.dot", gtg.to_dot)

dot -Tpng combined_gtg.dot > combined_gtg.png

This is essentially the same thing we did previously, except this time we are building it with TWO syntax trees combined in an OR node. Here is the output of the combined GTG:

combined_gtg

Fig 5 – So many choices

Awwww dang, that’s pretty!  As you can see, it’s smart enough to figure out that the paths are identical up until state 2, and then diverge into two paths depending on what the next token is after that.  We also now have 4 accept states, one for each of the following valid route matches:

  1. “/users”
  2. “/users(.:format)”
  3. “/users/:id”
  4. “/users/:id(.:format)”

Now I know what you’re thinking… that’s only two routes.  What does it look like when we have LOTS of routes like a typical Rails app?  I’m glad you asked, because one of the coolest things about the Journey module is that it has a built-in routes visualizer that builds an interactive HTML page where you can see all of your app’s routes!  To show this, I’ve made a new Rails app and generated scaffolds for Users and Cats to give us some routes.  Once we’ve got those created, we can pop into Rails console and type:


File.write("routes_visualizer.html", Rails.application.routes.router.visualizer)

Then, when we open the generated routes_visualizer.html file in the browser, we get:

screen-shot-2017-02-07-at-5-32-17-pm

Fig 6 – Route map for the greatest Rails app ever built

All routes share the route from state 0 to 1, the initial “/”, then begin branching from there.  The bottom branch is all built-in Rails routes, and the top two branches our the routes added for our “Users” and “Cats”.  You’ll also notice that there is an input box at the top of the screen where we can type in a route and simulate walking the graph.  For instance, if we type in “/cats/22/edit”, then hit “simulate”:

screen-shot-2017-02-07-at-5-31-48-pm

Fig 7 – editing a cat

And an invalid route like “/users/32/asdfkadsf”:

screen-shot-2017-02-07-at-5-33-20-pm

Fig 8 – failing to accept my state

Pretty awesome, right?

Well, that pretty much wraps up this high-level overview of how routing actually gets done in Rails.  Here’s a quick recap of what we’ve learned:

  1. Journey uses its own regular expression engine to match routes.
  2. It first tokenizes the routes into individual components.
  3. It then parses the tokens into a syntax tree using a parser generated by Racc.
  4. The syntax trees are all mashed together into a Generalized Transition Graph.
  5. Finite Automaton walks the states on the graph by consuming incoming URLs.
  6. If the automaton ends up in an accept state, we have a valid URL and can resolve to a controller action.

I hope you have enjoyed this journey into Journey as much as I have.  #dadjokes

Additional Resources

Ruby Under a Microscope by Pat Shaughnessy.  The first two chapters cover tokenization, parsing, and compilation quite nicely.

A Journey into the Rails Router, Lexers, parsers, and automata with Andy Lindeman – ATLRUG 2015.  A lovely video available on YouTube about this subject.  It would have taken me far longer to understand much of this material without it.

The Journey Module in the Rails Repository.  The source code for everything discussed here.

Latest posts by Seth Baughman (see all)

permalink

Post a Comment