This module reads and writes tokens from and to XML documents. Tokens are defined for each basic structure you can find in XML.
You can read a document as tokens using the powerfull tokenize function which calls user-defined function for each token it encounters. You can also use the generally more convenient XMLForwardRange class to expose a document as a range of tokens on which you can loop easily.
You can also write a document by puting tokens into a XMLWriter range.
Token for regular character data in the document.
Actual text value.
Token for a comment.
Text content of the comment.
Token for a processing instruction.
Processor target identifier
Text content for the processing instruction
Content for a CData section.
Character data content.
Content for an attribute inside an open element tag.
Attribute's name.
Attribute's value.
Gives the parsed content of an XML declartion.
Note: The tokenizer doesn't parse the XML declaration. You should call for readXMLDecl first prior calling tokenize.
Document's XML version.
Document's character encoding, defaults to UTF-8.
Indicate whether the document is standalone or
Scan the start of an XML document for an XML declaration and skip it if found.
input | input text for the document |
decl | data extracted from the XML declaration, or default values if not found. |
Returns: true
if an XML declaration is found, false
otherwise
Start of a document type declaration. This token is emitted when encountering a DOCTYPE markup declaration.
Document type name.
Public identifier literal.
System identifier literal.
End of a document type declaration. This token is emitted when encoutening the final ">" of a DOCTYPE declaration.
Note: For now, this token will always directly follow a DoctypeToken since we do not currently support the internal subset. Adding support for the internal subset in the parser will make other tokens appear between a DoctypeToken and a DoctypeDoneToken.
Indicate that we're opening an element of the given name. Attributes will follow in separate tokens.
Empty token indicating that we are done parsing an open tag and its attributes. Only used by the callback API,
Empty token indicating that an open tag has been closed with '/>', making it an empty element. Used as a replacement for OpenTagDoneToken.
Indicate that we're closing an element of the given name.
Parsing state flag allowing the tokenizer to stop and restart from where it left.
Searching for tags.
Searching for attributes inside a tag.
Searching for inner subset inside doctype.
Tokenize input string by calling output for each encountered token.
Stop when reaching the end of input or when output returns
true
.
output | alias to a callable object or overloaded function or template function to call after each token. |
state | alias to a ParsingState variable for holding the state of the parser when tokenize returns before the input's end. |
input | reference to string input which will contain the remaining text after parsing. |
Returns: true
if there is still content to parse (was stopped by a callback)
or false
if the end of input was reached.
Throws: for any tokenizer-level well-formness error.
Note: The tokenizer is not a full XML parser in the sense that it cannot check for all well-formness contrains of an XML document.
Example:
// Parse up to the first caption open element token.
bool skipUpToCaption(ref string input)
{
bool isCaption(TokenType)(TokenType token)
{
static if (is(TokenType : OpenElementToken))
return token.name == "caption"; // stop if tag name matches
else
return false; // continue tokenizing
}
return tokenize!isCaption(input);
}
Abstract XML writer class for writing tokens to something.
See Also: XMLWriter
Serialize given token in XML form to writer's output.
XML writer taking tokens as input. Output is expected to be a character stream with a write function.
Example:
void writeHello(ref File file)
{
Writer!file writer;
CommentToken comment;
comment.content = "hello world";
writer(comment);
}
void stripComments(string input, ref File file)
{
Writer!file writer;
void passToken(TokenType)(TokenType token)
{
static if (!is(TokenType : CommentToken))
writer(token); // pass token to writer
else
return; // do nothing: skip comment token
}
return tokenize!passToken(input);
}
Algebraic type capable of containing any kind of XML token. This is used by XMLForwardRange.
Range interface for iterating over tokens. Each token is encapsulated in the XMLToken Algebraic type defined above, which can contain any token type.
XMLForwardRange tokens(input);
foreach (ref XMLToken token; tokens)
{
// FIXME: Algebraic should work with a switch statement.
if (token.peek!OpenElementToken)
writefln("<%s>", token.peek!OpenElementToken.name);
else if (token.peek!CloseElementToken)
writefln("</%s>", token.peek!CloseElementToken.name);
}
Current token at the front of the range. This is only valid when empty
returns false
.
Remaining unparsed XML input after parsing current token.
Create a range using the given XML input. This will also parse the first token and make it available to front.
string input | XML input to parse using this range. |
Throws: for any tokenizer-level well-formness error.
Advance the range of one token.
Throws: for any tokenizer-level well-formness error.
Tell if we are finished parsing.
Returns: true
if the last popFront did not find any more token, or false
if more tokens can be found.