Roxen Mailing List Mirror

roxen.lists.pike.general

Subject	Author	Date
Re: submit modify of Parser.SGML	Martin Stjernholm <mast[at]lysator[dot]liu[dot]se>	05-04-2009
"PeterPan" <<zenothing[at]hotmail.com>> wrote: > Sorry, The description of open is a bit confusing. > > open!=0 mains: the tag itself is not such as "<p/>" and there is not a > paired "</p>" exists. Ok. Then I think it's best to not print "/>" for all non-open tags. It's confusing. Btw, if you just want to extract stuff out if an xml/html page, then maybe Parser.XML.SloppyDOM is an alternative. It provides a function simple_path which lets you pick out nodes and subtrees very conveniently using an XPath subset. > By the way, something more about the compatibility of SGML(or HTML): > > <a href=http://www.somethin.com/?q=abc&arg1=efg&arg2=hij>haha</a> > > In the real browner this is ok, but SGML recognize it as: > > ({ / 1 element / > SGMLatom(<a ="hij" href="http://www.somethin.com/?q"/> > "haha") > }) Although that's incorrect HTML (at least - I don't know about SGML really), the parser could indeed treat it better. I've got a fix, but I won't risk 7.8 stability and compatibility with it, so it'll go into 7.9 as well. It's a bit odd that such a fairly obvious case has gone by undetected for so long.

Subject

Author

Date

Martin Stjernholm <mast[at]lysator[dot]liu[dot]se>

05-04-2009

"PeterPan" <<zenothing[at]hotmail.com>> wrote:

> Sorry, The description of open is a bit confusing.
>
> open!=0 mains: the tag itself is not such as "<p/>" and there is not a
> paired "</p>" exists.

Ok. Then I think it's best to not print "/>" for all non-open tags.
It's confusing.

Btw, if you just want to extract stuff out if an xml/html page, then
maybe Parser.XML.SloppyDOM is an alternative. It provides a function
simple_path which lets you pick out nodes and subtrees very
conveniently using an XPath subset.

> By the way, something more about the compatibility of SGML(or HTML):
> 
> <a href=http://www.somethin.com/?q=abc&arg1=efg&arg2=hij>haha</a>
> 
> In the real browner this is ok, but SGML recognize it as:
> 
> ({ /* 1 element */
>    SGMLatom(<a ="hij" href="http://www.somethin.com/?q"/>
>      "haha")
> })

Although that's incorrect HTML (at least - I don't know about SGML
really), the parser could indeed treat it better. I've got a fix, but
I won't risk 7.8 stability and compatibility with it, so it'll go into
7.9 as well.

It's a bit odd that such a fairly obvious case has gone by undetected
for so long.

Roxen & Pike List Archives

roxen.lists.pike.general