What else is new in Razor v2?

In my last post I discusses some of the big new features in Razor v2. In this post I'm going to talk a bit about some of the other (admittedly smaller) new features. So, let's get right to it!

Void Elements

The HTML spec defines a certain type of element called a "Void Element" as

[An] element whose content model never allows it to have contents under any circumstance

- W3C HTML 5 Spec section 4.3

Put more simply, a void element is a type of element that can NEVER EVER have contents. It can come in three different forms:

  <!-- Self-closing -->
  <input name="foo" /><p>A different tag</p>
  <!-- Closed -->
  <input name="foo"></input><p>A different tag</p>
  <!-- Unclosed -->
  <input name="foo"><p>A different tag</p>

Any of the following is considered valid HTML5. However, in Razor v1, we only allowed the first two, because we had a much simpler parser. In Razor v2, you can now use the third form as well. This works because if a void element's start tag is not self-closed AND is not IMMEDIATELY followed by an end tag (whitespace is allowed) then it is considered closed at the ">" of the start tag. So in Razor, when we parse a void element and reach the ">", we look ahead and check if we see "</[tagname]>" (we allow whitespace between the start and end tags). If we do NOT see it, we consider the tag closed. So this means that if you typed the following inside a code block (i.e. @if() {})

<p><input name="foo">Some content</input></p>

Razor would end the markup block at the "</input>" tag. Why? Because the input element was closed by the ">" of it's start tag, so the "</input>" tag is considered an extraneous end tag. Since it has no matching start tag (remember the "<input>" is already closed), we think that it belongs to a start tag Razor can't see (because it's outside of the code block this markup is within or even in a different document) and we end the block there.

For the most part, this shouldn't affect you adversely, since syntax like the sample above is invalid HTML, but please let me know if you end up in an edge case where this is occurring in legal HTML.

Finally, what are the elements HTML5 considers void? The spec lists them off for us and Razor uses this exact same list:

area base br col command embed hr img input keygen link meta param source track wbr

- W3C HTML 5 Spec section 4.3

Syntax Tree and Internals Overhaul

NOTE: This part is going to dive in to parser internals a bit. Feel free to skim ;)

The last thing I'm going to talk about is a behind-the-scenes change that's mostly relevant to people hosting the Razor parser. In order to support these exciting new features, we had to overhaul our parser internals and syntax tree structure. Razor now uses full HTML, C# and VB tokenizers and the parse tree gives you access to that granularity. For example, given the code:

@foo.bar.baz

In Razor v1, this would be two Spans (a type of parse tree node), one for "@" and one for "foo.bar.baz". Each span would contain the string pulled from the input file. However, in Razor v2, we produce the same two spans, but now each span is a collection of Symbols. In this case, the first Span contains one symbol "@" and the second Span contains 5 symbols ("foo", ".", "bar", ".", "baz"). This allows us to perform more advanced analysis of the input document without have to reparse strings over and over.

We also reworked our higher-level Syntax Tree nodes, Span and Block. In Razor v1, we broke the file into chunks called spans and used sub-classes of the Span class to mark their type (for example: MarkupSpan, CodeSpan, HelperHeaderSpan, TransitionSpan). In v2, we removed all of those sub-classes, moving to an "Annotations" model. Going back to our previous example, in Razor v1 we would have produced a TransitionSpan and a ImplicitExpressionSpan (a kind of CodeSpan). In v2, both are concrete instances of Span, however they have various properties which attach annotations to control how they behave. For example, each Span has a CodeGenerator annotation which indicates how we generate C#/VB code from the node. They also have an EditHandler annotation which indicates how the editor should behave around this Span. By doing this, we (and even you if you want!) can add new syntax without having to dramatically overhaul all the various pieces of our infrastructure.

In future posts, I'm going to use some of this information to show you how to create a new kind of directive that works at runtime AND design-time (i.e. in the Editor).

Conclusion

Well, that's basically it for Razor v2. It's not a super-long list but believe me it was a lot of work. In many ways, Razor v1 was our "hey, check out this cool language!" release. Razor v2 has primarily been about internal clean-up and future-proofing (not that one can ever be totally future-proof). We added some cool little features, but the work done to the underlying structures should make it easier for us to add even more features and extensibility points in the future.

Enjoy Razor 2, Web Pags 2 and MVC 4!