开发者

Having trouble grokking CSS 2.1 grammar

开发者 https://www.devze.com 2023-03-06 04:41 出处:网络
I am writing a hand-coded CSS 2.1 parsing engine (in C#), and I\'m working directly off the W3C CSS 2.1 grammar (http://www.w3.org/TR/CS开发者_运维问答S21/grammar.html). However, there\'s a token that

I am writing a hand-coded CSS 2.1 parsing engine (in C#), and I'm working directly off the W3C CSS 2.1 grammar (http://www.w3.org/TR/CS开发者_运维问答S21/grammar.html). However, there's a token that I just don't quite get:

url     ([!#$%&*-~]|{nonascii}|{escape})*

...

"url("{w}{url}{w}")"    {return URI;}
"url("{w}{string}{w}")" {return URI;}

I don't get what the URL production is supposed to do. It appears to be a string of only !#$%&*-~, non-ascii, or escaped unicode code points. How is that a URL? Is this production just really badly named, and what purpose is it supposed to serve?

Any help appreciated. FYI, I've added the C# tag only to increase the audience to actual programmers who might have encountered this or have insights - I apologize if you think I shouldn't apply.


Dude, did you read the CONTEXT surrounding that expression?

baduri1         url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}
baduri2         url\({w}{string}{w}
baduri3         url\({w}{badstring}

Hmmm... Bad, bad, bad. Bit of a giveaway, eh what? Generally, If something in the doco doesn't make sense to you, or appears just plain wrong, maybe it shouldn't make sense? Yes? So you read around it... to acquire the correct context.


[!#$%&*-~] breaks down to:

!, #, $, %, &, plus the character range * - ~.

This takes in most printable ASCII characters, including uppercase, lowercase, digits and a range of punctuation characters.

It's easier to list the printable ASCII characters which this regex doesn't match:

Double quote ", single quote ', and parenthesis (, ); i.e printable ascii characters minus delimiters. This makes it possible to parse urls that do not include quotation marks. E.g. url(http://example.com), instead of url("http://example.com").

Concise, but tricky!

P.S. The token name is confusing as well. A better name would have been something like: url_string or url_arg.

EDIT Feb 2015 The latest CSS3 Syntax Spec names the token url-unquoted


I don't get what the URL production is supposed to do. It appears to be a string of only !#$%&*-~, non-ascii, or escaped unicode code points. How is that a URL? Is this production just really badly named, and what purpose is it supposed to serve?

The first line defines url as a regular expression:

url     ([!#$%&*-~]|{nonascii}|{escape})*

The second line defines URI as a token which can be produced/returned by the lexer:

"url("{w}{url}{w}")"    {return URI;}

The second line says that if the lexer sees url( then {w} then {url} then {w} then ) then it has found a URI.

The {w} expression is optional whitespace.

So according to the definition, the {url} is a regular expression: which defines what characters are allow inside a URI token, between the initial url( and the final ).

0

精彩评论

暂无评论...
验证码 换一张
取 消