XML の有識者に聞きたいんですが、
https://www.w3.org/TR/REC-xml/#NT-CharData
[14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
が [2] Char で定義されていない (Char ではない制御文字等も使えるかのように読める) のは意図してのもの? それとも [^<&] は単に Char - ('<' | '&') として解釈するべき?
I only found the reference of [14]CharData at [43]content.
And [2]Char occurs at markup elements.
I think those disallowed characters like the surrogate blocks in [2]Char are allowed in [14]CharData.
@lemon
So [2]Char is used in some special contexts (inside PI, tag, comment, CDATA section...), and naked texts are not so restricted...
Makes sense. Thank you
@lo48576 I think so
@lemon Finally I got the correct answer: non-`Char` characters are prohibited at any place in an XML document (even if they are escaped).
https://github.com/sparklemotion/nokogiri/issues/1581#issuecomment-272515864
EBNF itself used in the spec requires matched characters to also match `Char`, so any production rules implicitly uses only `Char` characters.
(BTW it seems `Char` is defined recursively lol)
@lo48576 Thank you for your correction