roxen.lists.pike.general

Subject Author Date
Re: submit modify of Parser.SGML Martin Stjernholm <mast[at]lysator[dot]liu[dot]se> 05-04-2009
"PeterPan" <<zenothing[at]hotmail.com>> wrote:

> Sorry, The description of open is a bit confusing.
>
> open!=0 mains: the tag itself is not such as "<p/>" and there is not a
> paired "</p>" exists.

Ok. Then I think it's best to not print "/>" for all non-open tags.
It's confusing.

Btw, if you just want to extract stuff out if an xml/html page, then
maybe Parser.XML.SloppyDOM is an alternative. It provides a function
simple_path which lets you pick out nodes and subtrees very
conveniently using an XPath subset.

> By the way, something more about the compatibility of SGML(or HTML):
> 
> <a href=http://www.somethin.com/?q=abc&arg1=efg&arg2=hij>haha</a>
> 
> In the real browner this is ok, but SGML recognize it as:
> 
> ({ /* 1 element */
>    SGMLatom(<a ="hij" href="http://www.somethin.com/?q"/>
>      "haha")
> })

Although that's incorrect HTML (at least - I don't know about SGML
really), the parser could indeed treat it better. I've got a fix, but
I won't risk 7.8 stability and compatibility with it, so it'll go into
7.9 as well.

It's a bit odd that such a fairly obvious case has gone by undetected
for so long.