Roxen Mailing List Mirror

roxen.lists.pike.general

Subject	Author	Date
Re: submit modify of Parser.SGML	PeterPan <zenothing[at]hotmail[dot]com>	05-04-2009
> Ok. Then I think it's best to not print "/>" for all non-open tags. > It's confusing. you mean never print "/>" in any case ? for "<haha><p><x/></haha></p>": ({ /* 2 elements / SGMLatom(<haha> SGMLatom(<p> SGMLatom(<x>))), SGMLatom(</p>) }) or you mean never print "/>" except <x/> ? like this: ({ / 2 elements / SGMLatom(<haha> SGMLatom(<p> SGMLatom(<x/>))), SGMLatom(</p>) }) I'm puzzled because my code is res="<"+res+(open?">":"/>"); If you request "not print '/>' for all non-open tags", it result to no "/>" at all. -------------------------------------------------- From: "Martin Stjernholm" <<mast[at]lysator.liu.se>> Sent: Sunday, April 05, 2009 7:29 PM To: "PeterPan" <<zenothing[at]hotmail.com>> Cc: <<pike[at]roxen.com>> Subject: Re: submit modify of Parser.SGML > "PeterPan" <<zenothing[at]hotmail.com>> wrote:* > >> Sorry, The description of open is a bit confusing. >> >> open!=0 mains: the tag itself is not such as "<p/>" and there is not a >> paired "</p>" exists. > > Ok. Then I think it's best to not print "/>" for all non-open tags. > It's confusing. > > Btw, if you just want to extract stuff out if an xml/html page, then > maybe Parser.XML.SloppyDOM is an alternative. It provides a function > simple_path which lets you pick out nodes and subtrees very > conveniently using an XPath subset. > >> By the way, something more about the compatibility of SGML(or HTML): >> >> <a href=http://www.somethin.com/?q=abc&arg1=efg&arg2=hij>haha</a> >> >> In the real browner this is ok, but SGML recognize it as: >> >> ({ / 1 element / >> SGMLatom(<a ="hij" href="http://www.somethin.com/?q"/> >> "haha") >> }) > > Although that's incorrect HTML (at least - I don't know about SGML > really), the parser could indeed treat it better. I've got a fix, but > I won't risk 7.8 stability and compatibility with it, so it'll go into > 7.9 as well. > > It's a bit odd that such a fairly obvious case has gone by undetected > for so long. > >

Subject

Author

Date

PeterPan <zenothing[at]hotmail[dot]com>

05-04-2009

> Ok. Then I think it's best to not print "/>" for all non-open tags.
> It's confusing.

you mean never print "/>" in any case ?

for "<haha><p><x/></haha></p>":

({ /* 2 elements */
    SGMLatom(<haha>
      SGMLatom(<p>
        SGMLatom(<x>))),
    SGMLatom(</p>)
})

or you mean never print "/>" except <x/> ? like this:

({ /* 2 elements */
    SGMLatom(<haha>
      SGMLatom(<p>
        SGMLatom(<x/>))),
    SGMLatom(</p>)
})

I'm puzzled because my code is
            res="<"+res+(open?">":"/>");
If you request "not print '/>' for all non-open tags", it result to no "/>" 
at all.

--------------------------------------------------
From: "Martin Stjernholm" <<mast[at]lysator.liu.se>>
Sent: Sunday, April 05, 2009 7:29 PM
To: "PeterPan" <<zenothing[at]hotmail.com>>
Cc: <<pike[at]roxen.com>>
Subject: Re: submit modify of Parser.SGML

> "PeterPan" <<zenothing[at]hotmail.com>> wrote:
>
>> Sorry, The description of open is a bit confusing.
>>
>> open!=0 mains: the tag itself is not such as "<p/>" and there is not a
>> paired "</p>" exists.
>
> Ok. Then I think it's best to not print "/>" for all non-open tags.
> It's confusing.
>
> Btw, if you just want to extract stuff out if an xml/html page, then
> maybe Parser.XML.SloppyDOM is an alternative. It provides a function
> simple_path which lets you pick out nodes and subtrees very
> conveniently using an XPath subset.
>
>> By the way, something more about the compatibility of SGML(or HTML):
>>
>> <a href=http://www.somethin.com/?q=abc&arg1=efg&arg2=hij>haha</a>
>>
>> In the real browner this is ok, but SGML recognize it as:
>>
>> ({ /* 1 element */
>>    SGMLatom(<a ="hij" href="http://www.somethin.com/?q"/>
>>      "haha")
>> })
>
> Although that's incorrect HTML (at least - I don't know about SGML
> really), the parser could indeed treat it better. I've got a fix, but
> I won't risk 7.8 stability and compatibility with it, so it'll go into
> 7.9 as well.
>
> It's a bit odd that such a fairly obvious case has gone by undetected
> for so long.
>
>

Roxen & Pike List Archives

roxen.lists.pike.general