[ Pobierz całość w formacie PDF ]
.An entity may refer to other entitiesto cause their inclusion in the document.A document begins in a root ordocument entity.Logically, the document is composed of declarations, elements,comments, character references, and processing instructions, all of which areindicated in the document by explicit markup.The logical and physical structuresmust nest properly, as described in 4.3.2 Well-Formed Parsed Entities.2.1 Well-Formed XML DocumentsA textual object is a well-formed XML document if:&' Taken as a whole, it matches the production labeled document.&' It meets all the well-formedness constraints given in this specification.Each of the parsed entities which is referenced directly or indirectly within thedocument is well-formed.Document[1] document ::= prolog element Misc*Matching the document production implies that:&' It contains one or more elements.3236-7 AppB.F.qc 6/29/99 1:13 PM Page 928Appendixes928&' There is exactly one element, called the root, or document element, no part ofwhich appears in the content of any other element.For all other elements, ifthe start-tag is in the content of another element, the end-tag is in the contentof the same element.More simply stated, the elements, delimited by start- andend-tags, nest properly within each other.&' As a consequence of this, for each non-root element C in the document, thereis one other element P in the document such that C is in the content of P, butis not in the content of any other element that is in the content of P.P isreferred to as the parent of C, and C as a child of P.2.2 CharactersA parsed entity contains text, a sequence of characters, which may representmarkup or character data.A character is an atomic unit of text as specified byISO/IEC 10646 [ISO/IEC 10646].Legal characters are tab, carriage return, line feed,and the legal graphic characters of Unicode and ISO/IEC 10646.The use of compatibility characters , as defined in section 6.8 of [Unicode], is discouraged.Character Range[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] /* any Unicodecharacter,| [#xE000-#xFFFD] excluding the surrogate| [#x10000-#x10FFFF] blocks, FFFE, and FFFF.*/The mechanism for encoding character code points into bit patterns may vary fromentity to entity.All XML processors must accept the UTF-8 and UTF-16 encodings of10646; the mechanisms for signaling which of the two is in use, or for bringing otherencodings into play, are discussed later, in 4.3.3 Character Encoding in Entities.2.3 Common Syntactic ConstructsThis section defines some symbols used widely in the grammar.S (white space) consists of one or more space (#x20) characters, carriage returns,line feeds, or tabs.White Space[3] S ::= (#x20 | #x9 | #xD | #xA)+Characters are classified for convenience as letters, digits, or other characters.Letters consist of an alphabetic or syllabic base character possibly followed by oneor more combining characters, or of an ideographic character.Full definitions ofthe specific characters in each class are given in B.Character Classes.A Name is a token beginning with a letter or one of a few punctuation characters,and continuing with letters, digits, hyphens, underscores, colons, or full stops,together known as name characters.Names beginning with the string xml , or anystring which would match (( X | x ) ( M | m ) ( L | l )), are reserved forstandardization in this or future versions of this specification.3236-7 AppB.F.qc 6/29/99 1:13 PM Page 929Appendix B &' The XML 1.0 Specification929Note: The colon character within XML names is reserved for experimentation withname spaces.Its meaning is expected to be standardized at some future point, atwhich point those documents using the colon for experimental purposes may needto be updated.(There is no guarantee that any name-space mechanism adopted forXML will in fact use the colon as a name-space delimiter.) In practice, this meansthat authors should not use the colon in XML names except as part of name-spaceexperiments, but that XML processors should accept the colon as a namecharacter.An Nmtoken (name token) is any mixture of name characters.Names and Tokens[4] NameChar ::= Letter | Digit | . | - | _ | :| CombiningChar | Extender[5] Name ::= (Letter | _ | : ) (NameChar)*[6] Names ::= Name (S Name)*[7] Nmtoken ::= (NameChar)+[8] Nmtokens ::= Nmtoken (S Nmtoken)*Literal data is any quoted string not containing the quotation mark used as adelimiter for that string.Literals are used for specifying the content of internalentities (EntityValue), the values of attributes (AttValue), and externalidentifiers (SystemLiteral).Note that a SystemLiteralcan be parsed withoutscanning for markup.Literals[9] EntityValue ::= ([^%& ] | PEReference | Reference)* | ([^%& ] | PEReference |Reference)* [10] AttValue ::= ([^.To allow attribute values to contain both single and double quotes, the apostropheor single-quote character ( ) may be represented as ' , and the double-quotecharacter ( ) as ".Character Data[14] CharData ::= [^
[ Pobierz całość w formacie PDF ]