encoding of Mirrors.masterlist (was: Flaw in Mirrors.masterlist)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

encoding of Mirrors.masterlist (was: Flaw in Mirrors.masterlist)

Peter Palfrader
On Mon, 27 Feb 2017, Frans Spiesschaert wrote:

> Since several days the Dutch translation team is receiving "Tidy
> validation failed" emails.
> This is due to a flaw in webwml/english/mirror/Mirrors.masterlist
> Please find attached a patch that solves the problem.

Thanks for your patch.  I think it might be nicer to declare the
masterlist to be utf-8, and then deal with entity encoding in the
script.  Options?

Cheers,

> Index: webwml/english/mirror/Mirrors.masterlist
> ===================================================================
> RCS file: /cvs/webwml/webwml/english/mirror/Mirrors.masterlist,v
> retrieving revision 1.2665
> diff -u -r1.2665 Mirrors.masterlist
> --- webwml/english/mirror/Mirrors.masterlist 18 Feb 2017 10:17:03 -0000 1.2665
> +++ webwml/english/mirror/Mirrors.masterlist 27 Feb 2017 19:35:00 -0000
> @@ -1117,7 +1117,7 @@
>  Maintainer: Jakob-Tobias Winter <[hidden email]>, [hidden email]
>  Country: US United States
>  Location: Lenexa, KS
> -Sponsor: 1&1 Internet http://1and1.com
> +Sponsor: 1&amp;1 Internet http://1and1.com
>  
>  Site: mirror.it.ubc.ca
>  Type: Push-Secondary




--
                            |  .''`.       ** Debian **
      Peter Palfrader       | : :' :      The  universal
 https://www.palfrader.org/ | `. `'      Operating System
                            |   `-    https://www.debian.org/

Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist (was: Flaw in Mirrors.masterlist)

Bastian Blank
Hi Peter

On Mon, Feb 27, 2017 at 08:10:30PM +0000, Peter Palfrader wrote:
> On Mon, 27 Feb 2017, Frans Spiesschaert wrote:
> > Since several days the Dutch translation team is receiving "Tidy
> > validation failed" emails.
> > This is due to a flaw in webwml/english/mirror/Mirrors.masterlist
> > Please find attached a patch that solves the problem.
> Thanks for your patch.  I think it might be nicer to declare the
> masterlist to be utf-8, and then deal with entity encoding in the
> script.  Options?

HTML::Entities?  WML can't handle this for us?

Bastian

--
A woman should have compassion.
                -- Kirk, "Catspaw", stardate 3018.2

Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist (was: Flaw in Mirrors.masterlist)

Peter Palfrader
On Mon, 27 Feb 2017, Bastian Blank wrote:

> Hi Peter
>
> On Mon, Feb 27, 2017 at 08:10:30PM +0000, Peter Palfrader wrote:
> > On Mon, 27 Feb 2017, Frans Spiesschaert wrote:
> > > Since several days the Dutch translation team is receiving "Tidy
> > > validation failed" emails.
> > > This is due to a flaw in webwml/english/mirror/Mirrors.masterlist
> > > Please find attached a patch that solves the problem.
> > Thanks for your patch.  I think it might be nicer to declare the
> > masterlist to be utf-8, and then deal with entity encoding in the
> > script.  Options?
>
> HTML::Entities?  WML can't handle this for us?

I don't know.  Maybe?

--
                            |  .''`.       ** Debian **
      Peter Palfrader       | : :' :      The  universal
 https://www.palfrader.org/ | `. `'      Operating System
                            |   `-    https://www.debian.org/

Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist

Laura Arjona Reina-4
El 27/02/17 a las 21:30, Peter Palfrader escribió:

> On Mon, 27 Feb 2017, Bastian Blank wrote:
>
>> Hi Peter
>>
>> On Mon, Feb 27, 2017 at 08:10:30PM +0000, Peter Palfrader wrote:
>>> On Mon, 27 Feb 2017, Frans Spiesschaert wrote:
>>>> Since several days the Dutch translation team is receiving "Tidy
>>>> validation failed" emails.
>>>> This is due to a flaw in webwml/english/mirror/Mirrors.masterlist
>>>> Please find attached a patch that solves the problem.
>>> Thanks for your patch.  I think it might be nicer to declare the
>>> masterlist to be utf-8, and then deal with entity encoding in the
>>> script.  Options?
>>
>> HTML::Entities?  WML can't handle this for us?
>
> I don't know.  Maybe?
>

mmm. But we already have escaped & in mirror_list.pl in lines 601, 621
and 675:

https://anonscm.debian.org/viewvc/webwml/webwml/english/mirror/mirror_list.pl?view=markup


601  $sponsorname =~ s/&(\s+)/&amp;$1/g;

And from my commandline:

$ ./mirror_list.pl --type sponsors | grep und

produces:

<a href="http://www.1und1.de/">1&amp;1 Internet AG</a>

So how 1&amp;1 becomes again 1&1?

Cheers
--
Laura Arjona Reina
https://wiki.debian.org/LauraArjona

Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist

Peter Palfrader
On Mon, 27 Feb 2017, Laura Arjona Reina wrote:

> mmm. But we already have escaped & in mirror_list.pl in lines 601, 621
> and 675:
>
> https://anonscm.debian.org/viewvc/webwml/webwml/english/mirror/mirror_list.pl?view=markup

Yes, and I argue we shouldn't do that.

> $ ./mirror_list.pl --type sponsors | grep und
> produces:
> <a href="http://www.1und1.de/">1&amp;1 Internet AG</a>
>
> So how 1&amp;1 becomes again 1&1?

There are two mirrors provided by 1&1:

| [git|master] weasel@orinoco:~/projects/debian/mirror/masterlist$ ./mirror_list.pl --type sponsors | grep -i internet | grep -i '1[au]nd1'
| <a href="http://www.1und1.de/">1&amp;1 Internet AG</a>
| <a href="http://1and1.com">1&1 Internet</a>

--
                            |  .''`.       ** Debian **
      Peter Palfrader       | : :' :      The  universal
 https://www.palfrader.org/ | `. `'      Operating System
                            |   `-    https://www.debian.org/

Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist

Adam D. Barratt-29
In reply to this post by Laura Arjona Reina-4
On 2017-02-27 22:16, Laura Arjona Reina wrote:
> mmm. But we already have escaped & in mirror_list.pl in lines 601, 621
> and 675:
>
> https://anonscm.debian.org/viewvc/webwml/webwml/english/mirror/mirror_list.pl?view=markup
>
>
> 601  $sponsorname =~ s/&(\s+)/&amp;$1/g;

That's "ampersand followed by some whitespace", which will match e.g.
"fish & chips", but not "1&1". (I assume the intent was to avoid
accidentally double-escaping.)

Regards,

Adam

Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist

Frans Spiesschaert
In reply to this post by Peter Palfrader
Peter Palfrader schreef op di 28-02-2017 om 07:04 [+0000]:

> On Mon, 27 Feb 2017, Laura Arjona Reina wrote:
>
> > mmm. But we already have escaped & in mirror_list.pl in lines 601, 621
> > and 675:
> >
> > https://anonscm.debian.org/viewvc/webwml/webwml/english/mirror/mirror_list.pl?view=markup
>
> Yes, and I argue we shouldn't do that.
>
> > $ ./mirror_list.pl --type sponsors | grep und
> > produces:
> > <a href="http://www.1und1.de/">1&amp;1 Internet AG</a>
> >
> > So how 1&amp;1 becomes again 1&1?
>
> There are two mirrors provided by 1&1:
>
> | [git|master] weasel@orinoco:~/projects/debian/mirror/masterlist$ ./mirror_list.pl --type sponsors | grep -i internet | grep -i '1[au]nd1'
> | <a href="http://www.1und1.de/">1&amp;1 Internet AG</a>
> | <a href="http://1and1.com">1&1 Internet</a>
>
So, what action should be taken?
For the moment the "Tidy validation failed" messages keep coming on a
daily basis.
Nothing really grave, but still a little bit annoying.


--
Regards,
Frans


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist

Paul Wise via nm
On Tue, Mar 21, 2017 at 12:42 AM, Frans Spiesschaert wrote:

> So, what action should be taken?

The mirrors masterlist should have all of the  HTML entities converted
to UTF-8. There are a number of them in the
Sponsor/Location/Maintainer fields. I've done this just now:

https://anonscm.debian.org/cgit/mirror/mirror-masterlist.git/commit/?id=f7402a9685b7e6174f126b335534de20396d0919

The mirrors.pl script should be rewritten to use a templating system
that understands HTML and converts any unsafe characters.

If that is going to take a while, then a short-term solution could be
to remove any custom manual encoding from the mirrors.pl script and
use a HTML entity escaping library like HTML::Entities from
libhtml-parser-perl. I've done this just now:

https://anonscm.debian.org/viewvc/webwml/webwml/english/mirror/mirror_list.pl?r1=1.170&r2=1.171

--
bye,
pabs

https://wiki.debian.org/PaulWise

Reply | Threaded
Open this post in threaded view
|

Re: encoding of Mirrors.masterlist

Frans Spiesschaert
Hi Paul,

Thank you very much

Paul Wise schreef op di 21-03-2017 om 12:05 [+0800]:

> On Tue, Mar 21, 2017 at 12:42 AM, Frans Spiesschaert wrote:
>
> > So, what action should be taken?
>
> The mirrors masterlist should have all of the  HTML entities converted
> to UTF-8. There are a number of them in the
> Sponsor/Location/Maintainer fields. I've done this just now:
>
> https://anonscm.debian.org/cgit/mirror/mirror-masterlist.git/commit/?id=f7402a9685b7e6174f126b335534de20396d0919
>
> The mirrors.pl script should be rewritten to use a templating system
> that understands HTML and converts any unsafe characters.
>
> If that is going to take a while, then a short-term solution could be
> to remove any custom manual encoding from the mirrors.pl script and
> use a HTML entity escaping library like HTML::Entities from
> libhtml-parser-perl. I've done this just now:
>
> https://anonscm.debian.org/viewvc/webwml/webwml/english/mirror/mirror_list.pl?r1=1.170&r2=1.171
>
--
Regards,
Frans


signature.asc (836 bytes) Download Attachment