The tags package contains specific tags.

This package has implementations of tags that have functionality beyond the capability of a generic tag. For example, the {@.html } tag has methods to get the {@link org.htmlparser.tags.MetaTag#getMetaContent CONTENT} and {@link org.htmlparser.tags.MetaTag#getMetaTagName NAME} attributes (although this could be done with generic attribute manipulation) and an implementation of {@link org.htmlparser.tags.MetaTag#doSemanticAction doSemanticAction} that alters the lexer's encoding.

The classes in this package have been added in an ad-hoc fashion, with the most useful ones having existed a long time, while some obvious ones are rather new. Please feel free to add your own custom tags, and register them with the {@link org.htmlparser.PrototypicalNodeFactory PrototypicalNodeFactory}, and they will be treated like any other in-built tag. In fact tags do not need to reside in this package.


Custom Tags

Creating custom tags is fairly straight forward. Simply copy one of the simpler tags you find in this package and alter it as follows.

If the tag can contain other nodes, i.e. {@.html

My Heading

}, then it should derive from (i.e. be a subclass of) {@link org.htmlparser.tags.CompositeTag}. In this way it will inherit the {@link org.htmlparser.scanners.CompositeTagScanner CompositeTagScanner} and nodes between the start and end tag will be gathered into the list of children. Most of the tags in this package derive from CompositeTag, and that is why the nodes returned from the Parser are nested.

If it is a simple tag, i.e. {@.html
}, then it should derive from {@link org.htmlparser.nodes.TagNode TagNode}. See for example {@link org.htmlparser.tags.MetaTag} or {@link org.htmlparser.tags.ImageTag}.

To be registered with {@link org.htmlparser.PrototypicalNodeFactory#registerTag}, and especially if it is a composite tag, the tag needs to implement getIds which returns the UPPERCASE list of names for the tag (usually only one), for example "HTML". If the tag can be smart enough to know what other tags can't be contained within it, it should also implement {@link org.htmlparser.nodes.TagNode#getEnders getEnders()} which returns the list of other tags that should cause this tag to close itself, and {@link org.htmlparser.nodes.TagNode#getEndTagEnders getEndTagEnders()} which returns the list of end tags (i.e. {@.html }), other than it's own name, that should cause this tag to close itself. When these 'ender' lists cause a tag to end before seeing it's own end tag, a virtual end tag is created and 'inserted' at the location where the end tag should have been. These end tags can be distinguished because their {@link org.htmlparser.Node#getStartPosition starting} and {@link org.htmlparser.Node#getEndPosition ending} locations are the same (i.e. they take up no character length in the HTML stream).

For example, the {@.html