Ant
no notes

Load XHTML into an XDocument?

I thought XLinq would be a convenient way to load and manipulate XHTML data but was scuppered as it wouldn't parse due to some undeclared entities   for example.

In order to load/parse an XHTML document using XLinq you must do two things.

  1. Declare the DOCTYPE in your source
  2. Tell XLinq what entities to expect, like   and £ and a few more besides, and for this you need to create an XmlResolver
/// <summary>
/// <para>XmlResolver for XHTML</para>
/// </summary>
public class XhtmlResolver : XmlUrlResolver {

    /// <summary>
    /// <para>URN string to identify all the Entities</para>
    /// </summary>
    private const string ENTITIES_URN = "urn:Entities";

    /// <summary>
    /// <para>Get Entity</para>
    /// </summary>
    public override object GetEntity(
            Uri absoluteUri, string role, 
            Type ofObjectToReturn) {

        if (absoluteUri.AbsoluteUri == ENTITIES_URN) {

            return "<!ENTITY nbsp \"&#x000A0;\" >";
        }

        return null;
    }

    /// <summary>
    /// <para>Resolves XHTML DOCTYPE</para>
    /// </summary>
    public override Uri ResolveUri(
            Uri baseUri, string relativeUri) {

        // make sure the doc is declared as XHTML
        if (relativeUri.Equals("-//W3C//DTD XHTML 1.0 Transitional//EN", StringComparison.OrdinalIgnoreCase)
            || relativeUri.Equals("-//W3C//DTD XHTML 1.0 Strict//EN", StringComparison.OrdinalIgnoreCase)
            || relativeUri.Equals("-//W3C//DTD XHTML 1.0 Frameset//EN", StringComparison.OrdinalIgnoreCase)
            || relativeUri.Equals("-//W3C//DTD XHTML 1.1//EN", StringComparison.OrdinalIgnoreCase)) {

            return new Uri(ENTITIES_URN);
        }

        return base.ResolveUri(baseUri, relativeUri);
    }
}

This will take care to the &nbsp; entities, to take care of all of them, either list all the entities in the GetEntity method or create an embedded resource file with the contents of http://www.w3.org/2003/entities/2007/isonum.ent.

Now to find out if XLinq is a convenient way to play with XHTML or not ...

References:

http://www.w3.org/2003/entities/2007/isonum.ent
http://www.martinwilley.com/net/code/xhtmldoc.html

Post a Note

(required)

(required never shown)

On Twitter Follow MrAntix on Twitter

11 hours ago
verge
Microsoft teases Windows 8 'Consumer Preview' with Bing betta fish site http://t.co/lcJICazH