
This module reads and writes tokens from and to XML documents. Tokens are defined for each basic structure you can find in XML.

You can read a document as tokens using the powerfull tokenize function which calls user-defined function for each token it encounters. You can also use the generally more convenient XMLForwardRange class to expose a document as a range of tokens on which you can loop easily.

You can also write a document by puting tokens into a XMLWriter range.

struct CharDataToken;

Token for regular character data in the document.

string data;

Actual text value.

struct CommentToken;

Token for a comment.

string content;

Text content of the comment.

struct PIToken;

Token for a processing instruction.

string target;

Processor target identifier

string content;

Text content for the processing instruction

struct CDataSectionToken;

Content for a CData section.

string content;

Character data content.

struct AttrToken;

Content for an attribute inside an open element tag.

string name;

Attribute's name.

string value;

Attribute's value.

struct XMLDecl;

Gives the parsed content of an XML declartion.

Note: The tokenizer doesn't parse the XML declaration. You should call for readXMLDecl first prior calling tokenize.

string versionNum;

Document's XML version.

string encName;

Document's character encoding, defaults to UTF-8.

bool standalone;

Indicate whether the document is standalone or

bool readXMLDecl(CharType)(ref immutable(CharType)[] input, out XMLDecl decl);

Scan the start of an XML document for an XML declaration and skip it if found.

input input text for the document
decl data extracted from the XML declaration, or default values if not found.

Returns: true if an XML declaration is found, false otherwise

struct DoctypeToken;

Start of a document type declaration. This token is emitted when encountering a DOCTYPE markup declaration.

string name;

Document type name.

string pubidLiteral;

Public identifier literal.

string systemLiteral;

System identifier literal.

struct DoctypeDoneToken;

End of a document type declaration. This token is emitted when encoutening the final ">" of a DOCTYPE declaration.

Note: For now, this token will always directly follow a DoctypeToken since we do not currently support the internal subset. Adding support for the internal subset in the parser will make other tokens appear between a DoctypeToken and a DoctypeDoneToken.

struct OpenElementToken;

Indicate that we're opening an element of the given name. Attributes will follow in separate tokens.

struct OpenTagDoneToken;

Empty token indicating that we are done parsing an open tag and its attributes. Only used by the callback API,

struct EmptyOpenTagDoneToken;

Empty token indicating that an open tag has been closed with '/>', making it an empty element. Used as a replacement for OpenTagDoneToken.

struct CloseElementToken;

Indicate that we're closing an element of the given name.

enum ParsingState;

Parsing state flag allowing the tokenizer to stop and restart from where it left.


Searching for tags.


Searching for attributes inside a tag.


Searching for inner subset inside doctype.

void tokenize(alias output)(string input);
bool tokenize(alias output, alias state)(ref string input);

Tokenize input string by calling output for each encountered token. Stop when reaching the end of input or when output returns true.

output alias to a callable object or overloaded function or template function to call after each token.
state alias to a ParsingState variable for holding the state of the parser when tokenize returns before the input's end.
input reference to string input which will contain the remaining text after parsing.

Returns: true if there is still content to parse (was stopped by a callback) or false if the end of input was reached.

Throws: for any tokenizer-level well-formness error.

Note: The tokenizer is not a full XML parser in the sense that it cannot check for all well-formness contrains of an XML document.


// Parse up to the first caption open element token.
bool skipUpToCaption(ref string input)
    bool isCaption(TokenType)(TokenType token)
        static if (is(TokenType : OpenElementToken))
            return == "caption"; // stop if tag name matches
            return false; // continue tokenizing

    return tokenize!isCaption(input);

abstract class Writer;

Abstract XML writer class for writing tokens to something.

See Also: XMLWriter

abstract void opCall(DoctypeToken token);
abstract void opCall(DoctypeDoneToken token);
abstract void opCall(CharDataToken token);
abstract void opCall(OpenElementToken token);
abstract void opCall(CloseElementToken token);
abstract void opCall(AttrToken token);
abstract void opCall(OpenTagDoneToken token);
abstract void opCall(EmptyOpenTagDoneToken token);
abstract void opCall(PIToken token);
abstract void opCall(CommentToken token);
abstract void opCall(CDataSectionToken token);

Serialize given token in XML form to writer's output.

class XMLWriter(alias output): Writer;

XML writer taking tokens as input. Output is expected to be a character stream with a write function.


void writeHello(ref File file)
    Writer!file writer;

    CommentToken comment;
    comment.content = "hello world";

void stripComments(string input, ref File file)
    Writer!file writer;

    void passToken(TokenType)(TokenType token)
        static if (!is(TokenType : CommentToken))
            writer(token); // pass token to writer
            return; // do nothing: skip comment token

    return tokenize!passToken(input);

alias XMLToken;

Algebraic type capable of containing any kind of XML token. This is used by XMLForwardRange.

struct XMLForwardRange;

Range interface for iterating over tokens. Each token is encapsulated in the XMLToken Algebraic type defined above, which can contain any token type.

XMLForwardRange tokens(input);
foreach (ref XMLToken token; tokens)
    // FIXME: Algebraic should work with a switch statement.
    if (token.peek!OpenElementToken)
        writefln("<%s>", token.peek!;
    else if (token.peek!CloseElementToken)
        writefln("</%s>", token.peek!;

XMLToken front;

Current token at the front of the range. This is only valid when empty returns false.

string unparsedInput;

Remaining unparsed XML input after parsing current token.

this(string input);

Create a range using the given XML input. This will also parse the first token and make it available to front.

string input XML input to parse using this range.

Throws: for any tokenizer-level well-formness error.

void popFront();

Advance the range of one token.

Throws: for any tokenizer-level well-formness error.

bool empty();

Tell if we are finished parsing.

Returns: true if the last popFront did not find any more token, or false if more tokens can be found.