mfr.xmltok

This module reads and writes tokens from and to XML documents. Tokens are defined for each basic structure you can find in XML.

You can read a document as tokens using the powerfull tokenize function which calls user-defined function for each token it encounters. You can also use the generally more convenient XMLForwardRange class to expose a document as a range of tokens on which you can loop easily.

You can also write a document by puting tokens into a XMLWriter range.

struct CharDataToken;

Token for regular character data in the document.

string data;: Actual text value.

struct CommentToken;

Token for a comment.

string content;: Text content of the comment.

struct PIToken;

Token for a processing instruction.

string target;: Processor target identifier
string content;: Text content for the processing instruction

struct CDataSectionToken;

Content for a CData section.

string content;: Character data content.

struct AttrToken;

Content for an attribute inside an open element tag.

string name;: Attribute's name.
string value;: Attribute's value.

struct XMLDecl;

Gives the parsed content of an XML declartion.

Note: The tokenizer doesn't parse the XML declaration. You should call for readXMLDecl first prior calling tokenize.

string versionNum;: Document's XML version.
string encName;: Document's character encoding, defaults to UTF-8.
bool standalone;: Indicate whether the document is standalone or

bool readXMLDecl(CharType)(ref immutable(CharType)[] input, out XMLDecl decl);

Scan the start of an XML document for an XML declaration and skip it if found.

**Params:**
input	input text for the document
decl	data extracted from the XML declaration, or default values if not found.

Returns: true if an XML declaration is found, false otherwise

struct DoctypeToken;

Start of a document type declaration. This token is emitted when encountering a DOCTYPE markup declaration.

string name;: Document type name.
string pubidLiteral;: Public identifier literal.
string systemLiteral;: System identifier literal.

struct DoctypeDoneToken;

End of a document type declaration. This token is emitted when encoutening the final ">" of a DOCTYPE declaration.

Note: For now, this token will always directly follow a DoctypeToken since we do not currently support the internal subset. Adding support for the internal subset in the parser will make other tokens appear between a DoctypeToken and a DoctypeDoneToken.

struct OpenElementToken;

Indicate that we're opening an element of the given name. Attributes will follow in separate tokens.

struct OpenTagDoneToken;

Empty token indicating that we are done parsing an open tag and its attributes. Only used by the callback API,

struct EmptyOpenTagDoneToken;

Empty token indicating that an open tag has been closed with '/>', making it an empty element. Used as a replacement for OpenTagDoneToken.

struct CloseElementToken;

Indicate that we're closing an element of the given name.

enum ParsingState;

Parsing state flag allowing the tokenizer to stop and restart from where it left.

TAGS: Searching for tags.
ATTRS: Searching for attributes inside a tag.
IN_DOCTYPE: Searching for inner subset inside doctype.

void tokenize(alias output)(string input);

bool tokenize(alias output, alias state)(ref string input);

Tokenize input string by calling output for each encountered token. Stop when reaching the end of input or when output returns true.

**Params:**
output	alias to a callable object or overloaded function or template function to call after each token.
state	alias to a ParsingState variable for holding the state of the parser when tokenize returns before the input's end.
input	reference to string input which will contain the remaining text after parsing.

Returns: true if there is still content to parse (was stopped by a callback) or false if the end of input was reached.

Throws: for any tokenizer-level well-formness error.

Note: The tokenizer is not a full XML parser in the sense that it cannot check for all well-formness contrains of an XML document.

Example:

// Parse up to the first caption open element token.
bool skipUpToCaption(ref string input)
{
    bool isCaption(TokenType)(TokenType token)
    {
        static if (is(TokenType : OpenElementToken))
            return token.name == "caption"; // stop if tag name matches
        else
            return false; // continue tokenizing
    }

    return tokenize!isCaption(input);
}

abstract class Writer;

Abstract XML writer class for writing tokens to something.

See Also: XMLWriter

abstract void opCall(DoctypeToken token);
abstract void opCall(DoctypeDoneToken token);
abstract void opCall(CharDataToken token);
abstract void opCall(OpenElementToken token);
abstract void opCall(CloseElementToken token);
abstract void opCall(AttrToken token);
abstract void opCall(OpenTagDoneToken token);
abstract void opCall(EmptyOpenTagDoneToken token);
abstract void opCall(PIToken token);
abstract void opCall(CommentToken token);
abstract void opCall(CDataSectionToken token);: Serialize given token in XML form to writer's output.

class XMLWriter(alias output): Writer;

XML writer taking tokens as input. Output is expected to be a character stream with a write function.

Example:

void writeHello(ref File file)
{
    Writer!file writer;

    CommentToken comment;
    comment.content = "hello world";
    writer(comment);
}

void stripComments(string input, ref File file)
{
    Writer!file writer;

    void passToken(TokenType)(TokenType token)
    {
        static if (!is(TokenType : CommentToken))
            writer(token); // pass token to writer
        else
            return; // do nothing: skip comment token
    }

    return tokenize!passToken(input);
}

alias XMLToken;

Algebraic type capable of containing any kind of XML token. This is used by XMLForwardRange.

struct XMLForwardRange;

Range interface for iterating over tokens. Each token is encapsulated in the XMLToken Algebraic type defined above, which can contain any token type.

XMLForwardRange tokens(input);
foreach (ref XMLToken token; tokens)
{
    // FIXME: Algebraic should work with a switch statement.
    if (token.peek!OpenElementToken)
        writefln("<%s>", token.peek!OpenElementToken.name);
    else if (token.peek!CloseElementToken)
        writefln("</%s>", token.peek!CloseElementToken.name);
}

XMLToken front;

Current token at the front of the range. This is only valid when empty returns false.

string unparsedInput;

Remaining unparsed XML input after parsing current token.

this(string input);

Create a range using the given XML input. This will also parse the first token and make it available to front.

**Params:**
string `input`	XML `input` to parse using this range.

Throws: for any tokenizer-level well-formness error.

void popFront();

Advance the range of one token.

Throws: for any tokenizer-level well-formness error.

bool empty();

Tell if we are finished parsing.

Returns: true if the last popFront did not find any more token, or false if more tokens can be found.