222

CSS syntax

Share on TwitterShare on TumblrSubmit to StumbleUponSave on DeliciousDigg This

In this article I’ll explain the core syntax of CSS with a particular reference to the rules for handling parsing errors. I’ll try to outline the main components of CSS grammar by providing various examples of correct and incorrect coding practices.

Table of contents

Rules

At a basic level, CSS is a set of rules. A rule (R) consists of two parts: a selector (S) and a declaration block (B). The selector consists of everything occurring up to (but not including) the first left curly brace ({) and always goes together with a declaration block. Formally speaking, a selector is a pattern that may range from simple element names (such as E) to more complex contextual patterns (such as E1 > E2). The case-sensitivity in selectors depends on the document language (e.g. in HTML, element names are case-insensitive, but in XML they are case-sensitive). A declaration block is delimited by a left curly brace ({) and a matching right curly brace (}). Within a declaration block there must be a list of zero or more semicolon-separated (;) declarations. A declaration consists of a property (P), followed by a colon (:), followed in turn by a value (V). Whitespace may occur around these components. The semicolon at the end of a declaration is optional only when there is a single declaration within a declaration block.

We can summarize this schema with a syntagmatic tree, as shown in Figure 1.

A CSS ruleFigure 1. Anatomy of a CSS rule

Here’s a more practical example:

Listing 1. A CSS rule

  1. p {color: green;}

Properties are identifiers. Values are specified separately for each property. They are made up of identifiers, strings, numbers, lengths, percentages, URIs, and colors. There is a strict relation between properties and values.

Characters and case

CSS is generally case-insensitive, except for parts that lie outside the scope of CSS, such as the case-sensitivity of the (X)HTML attributes id and class, font names and URIs. For example, when we’re working with XML documents, we should always keep in mind that element names are case-sensitive. In CSS, element names, classes and ID selectors can’t start with a digit, a hyphen or a hyphen followed by a digit. However, some browsers accept these illegal characters when their rendering engine is in quirks mode.

When we have to specify special characters, we must escape them by using a backslash (\). In CSS, a backslash indicates three types of characters escapes:

  1. Inside a string, a backslash followed by a newline is ignored.
  2. A backslash cancels the meaning of special CSS characters. Any character (except a hexadecimal digit) can be escaped with a backslash to remove its special meaning. For example, a CSS scanner reads “pr\operty” as “prperty”, since the escaped character is not a hexadecimal digit. In other words, there is no match in the range of characters supported by the browser.
  3. As said above, backslash escapes allow authors to refer to characters that can’t be easily put in a document. When authors want to insert a special character taken from a specific Unicode range (ISO 10646), the backslash is followed by at most six hexadecimal digits (0…9A…F). Since the Unicode characters set is terribly wide, care must be taken when dealing with such characters. In fact, it’s very likely that browsers won’t cover the entire Unicode range. Alan Wood provides an excellent resource at http://www.alanwood.net/unicode/. Point your browser there and test its support. Testing is always a good solution.

At-rules

At-rules start with an at-keyword (an @ character) followed immediately by a specific identifier (e.g. media, import etc.). In the W3C terminology, the expressions with “immediately” mean without whitespace or comments between the components. An at-rule may simply end with a semicolon (;) or introduce a new set of rules. Some examples:

Listing 2. At-rules

  1. @import "style.css"
  2. @media print {
  3. body {font-size: 12pt}
  4. }

Comments

CSS comments begin with the characters /* and end with the characters */. Comments may be placed anywhere within a style sheet, and their content have no influence on the parsing (ie. they are ignored). SGML comments (<!– and –>) are allowed only in the style element of an (X)HTML document in order to hide style rules from pre-HTML 3.2 browsers. However, the style rules are applied only in a document served as text/html. On the contrary, SGML comments cause the style rules to be ignored in case of a document served as application/xhtml+xml.

Rules for handling parsing errors

Since parsing is a process that should be interrupted only in case of a fatal error (e.g. unavailable data, transmission corruption etc.), browers must ignore part of an invalid style sheet. In the CSS terminology, ignore means that a browser parses the invalid part (in order to find its beginning and end) but it acts as if that part didn’t exist. The possibile scenarios are as follows.

Unknown properties

Browsers ignore a declaration with an unknown property. Example:

Listing 3. Unknown property

  1. p {
  2. color: green;
  3. bgcolor: white;
  4. }

Invalid values

Browsers ignore a declaration with an invalid value. For example:

Listing 4. Invalid value

  1. p {
  2. padding: 1em;
  3. display: inner-box;
  4. }

In this case, only the first declaration is valid. However, the expression “invalid value” is rather ambiguous, especially in contexts such as the following:

Listing 5. A value not supported by all browsers

  1. h2 {
  2. margin: 0 0.5em 0 0;
  3. display: run-in;
  4. }

In the above example, the second declaration is currently supported and applied only by Opera. The run-in value, which is valid according to the specifications, is simply “unknown” to browsers other than Opera.

Malformed declarations

CSS specifications state that:

User agents must handle unexpected tokens encountered while parsing a declaration by reading until the end of the declaration, while observing the rules for matching pairs of (), [], {}, “”, and ”, and correctly handling escapes.

This is the most difficult part of the list of algorithms provided for the correct handling of parsing errors. Before going on, just a quick note on the term “token” encountered above.

Note

Formally speaking, every component or indivisible part of an expression is called a token. For example:

A + 100 – ( B * C ) / 2

The tokens used in the above example can be summarized in the following table.

Token Type
A VARIABLE
+ DELIM
100 NUMBER
DELIM
( DELIM
B VARIABLE
* DELIM
C VARIABLE
) DELIM
/ DELIM
2 NUMBER
null null

A CSS scanner receives the characters sequence from the source code. It checks whether characters form correct words and gives then to the parser. A CSS parser should be able to skip certain token sequences and recognize the end of the expression. CSS uses Lex-style regular expressions to define tokens. The following table lists some of the regular expressions notations.

Character Matches
[…] Any of the character set in brackets
* Zero or more times
? Zero or one time
+ One or more times
^ At beginning of a string or line

The token S in the CSS grammar stands for whitespace. Only the characters “space”, “tab”, “line feed”, “carriage return” and “form feed” can occur in whitespace. The pattern used in curly braces are macros.

Some simple examples of malformed declarations are provided below.

Listing 6. Malformed declaration. Missing ‘:’ and value

  1. p {
  2. color: green;
  3. color
  4. }

In Listing 6 only the first declaration will be applied, since the parser expects a sequence of ‘:’ plus a value.

Listing 7. Malformed declaration. Missing ‘:’ and value, with expected recovery

  1. p {
  2. color: red;
  3. color;
  4. color: green;
  5. }

This example is similar to the previous one, but in this case the error recovery is made simpler by the occurrence of a semicolon after the property name.

Listing 8. Malformed declaration. Missing value

  1. p {
  2. color: green;
  3. color:
  4. }

This example is similar to the first one. In this case the parser expects a value after the ‘:’ token.

Listing 9. Malformed declaration. Missing value, with expected recovery

  1. p {
  2. color: red;
  3. color:;
  4. color: green;
  5. }

This example is similar to the second one. In this case the error recovery is made simpler by the occurrence of the token ‘;’ after the colon.

Listing 10. Malformed declaration. Unexpected tokens

  1. p {
  2. color: green;
  3. color
  4. {; color: red
  5. }
  6. }

The case of unexpected tokens is the most complicated one. In this example, the sequence of tokens after the first declaration tries to emulate a pseudo-block by inserting matching curly braces in place of the usual (and expected) sequence. In particular, the occurrence of a matching pair of curly braces can be confusing for some parsers. Consider the following example:

Listing 11. Malformed declaration. Unexpected tokens. Nested curly braces

  1. p {color: green;}
  2. p {{color: red;}}

In this case, the last rule will be applied only by Internet Explorer 7 and 6. A similar example is the following:

Listing 12. Malformed declaration. Unexpected tokens. Nested @media rules

  1. @media all {
  2. @media screen {
  3. p {color: red}
  4. }
  5. }

Again, also in this case Internet Explorer 7 and 6 will apply the rule. In my opinion, this is due to the occurrence of well-formed declaration blocks, although their nesting is actually violating the core syntax of CSS.

Invalid at-keywords

Browsers ignore an invalid at-keyword together with everything following it, up to and including the next semicolon or block. For example:

Listing 13. Invalid at-keywords

  1. @media movie {
  2. p {color: red}
  3. }
  4. p {color: green}

In this case, only the last declaration will be applied. However, something inside an at-rule that is invalid doesn’t make the entire at-rule invalid. There are also rules for handling parsing errors of at-rules. Example:

Listing 14. Invalid occurrence of at-rules

  1. @import "style.css";
  2. p {color: green}
  3. @import "main.css";

The second at-rule is invalid, since an @import rule cannot occur after any valid rule other than a @charset or another @import rule. Another example:

Listing 15. Invalid occurrence of at-rules

  1. @import "style.css";
  2. @media screen {
  3. @import "main.css";
  4. p {color: green}
  5. }

In the example above, the second @import rule is invalid, since it cannot occur inside a block.

Unexpected end of style sheet

Browsers close all open constructs (blocks, parentheses, brackets, rules, strings and comments) at the end of the style sheet. For example:

Listing 16. An example of not-closed constructs

  1. p {color: green}
  2. p:before {
  3. content: "Note

Although the last rule is not applied, the parsing error is recovered as follows:

Listing 17. Error recovery of an open construct

  1. p:before {
  2. content: "Note";
  3. }

This allows browsers to keep the preceding declaration and apply the valid rule.

Unexpected end of string

Browsers close strings until they reach the end of a line, but then they don’t apply the styles of the construct (declaration or rule) in which the string was found. Example:

Listing 18. Unexpected end of string

  1. p {
  2. text-indent: 1em;
  3. font-family: "Times New Roman serif
  4. padding: 0.5em;
  5. color: green;
  6. }

Although the second and third declarations are not applied, the parsing error is recovered as follows:

Listing 19. Error recovery of an open string

  1. p {
  2. text-indent: 1em;
  3. color: green;
  4. }

This allows browsers to keep the valid declarations and apply the specified styles.

prajapat