[UDD] Is there some effort to port UDD to Python3?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[UDD] Is there some effort to port UDD to Python3?

Andreas Tille-5
Hi,

we all know that Python2 is end of life but several UDD code is using
Python2.  Is there any effort to port it to Python3.  If not are there
any volunteers to do this?

Kind regards

      Andreas.

--
http://fam-tille.de

Reply | Threaded
Open this post in threaded view
|

Re: [UDD] Is there some effort to port UDD to Python3?

Lucas Nussbaum-4
Hi,

On 01/04/20 at 15:24 +0200, Andreas Tille wrote:
> Hi,
>
> we all know that Python2 is end of life but several UDD code is using
> Python2.  Is there any effort to port it to Python3.

Not as far as I know. I suspect that, once it becomes necessary, it will
be easy to do given the codebase is relatively small.

Lucas

Reply | Threaded
Open this post in threaded view
|

Re: [UDD] Is there some effort to port UDD to Python3?

Andreas Tille-2
Hi Lucas,

On Mon, Apr 13, 2020 at 10:40:07PM +0200, Lucas Nussbaum wrote:
> > we all know that Python2 is end of life but several UDD code is using
> > Python2.  Is there any effort to port it to Python3.
>
> Not as far as I know. I suspect that, once it becomes necessary, it will
> be easy to do given the codebase is relatively small.

I agree that the small code base makes it probably easy.  But I'm
worried about the "once it becomes necessary" part.  We all know that
Python2 is only alive due to our security team and we should actively
work on getting rid of the dependency rather sooner than later.  Working
"under pressure" makes things always uneasy - no matter how easy it
would be in principle.

I know probably nobody will stop me from doing it - but I'm hesitating
adding another item on my table which is full of Debian Med - Covid-19
stuff.  I'd volunteer to port those importers I've written myself once
somebody gives the signal - but I'd love if those who have written the
core parts would take the lead (rather sooner than later).

Kind regards

      Andreas.

--
http://fam-tille.de

Reply | Threaded
Open this post in threaded view
|

Re: [UDD] Is there some effort to port UDD to Python3?

Andreas Tille-2
Hi Lucas,

On Tue, Apr 14, 2020 at 08:47:11AM +0200, Andreas Tille wrote:

> >
> > Not as far as I know. I suspect that, once it becomes necessary, it will
> > be easy to do given the codebase is relatively small.
>
> I agree that the small code base makes it probably easy.  But I'm
> worried about the "once it becomes necessary" part.  We all know that
> Python2 is only alive due to our security team and we should actively
> work on getting rid of the dependency rather sooner than later.  Working
> "under pressure" makes things always uneasy - no matter how easy it
> would be in principle.
>
> I know probably nobody will stop me from doing it - but I'm hesitating
> adding another item on my table which is full of Debian Med - Covid-19
> stuff.  I'd volunteer to port those importers I've written myself once
> somebody gives the signal - but I'd love if those who have written the
> core parts would take the lead (rather sooner than later).

I need to come back to this topic since I like to test the importers on
my local machines which are usually running testing.  I now get a
conflict since python-debian is needed but this can not be installed any
more since it would need python-chardet which in turn conflicts with
latest python3-chardet.  So simply picking from snapshot.d.o is no
option and I think its time to do the Python3 port.  There are code
contributions from:

$ git log --pretty=format:"%an <%ae>" udd/*.py | sed 's/@3b15d4d3-bb24-0410-9696-dc0fab150647/@debian.org/' | sort | uniq | grep -v -e 'Akshita Jha' -e 'Emmanouil Kiagias' -e ^lucas -e ^tille
Andreas Tille <[hidden email]>
Bas Couwenberg <[hidden email]>
Gianfranco Costamagna <[hidden email]>
Ivo De Decker <[hidden email]>
kroeckx <[hidden email]>
laney <[hidden email]>
Lucas Nussbaum <[hidden email]>
Mattia Rizzolo <[hidden email]>
Ole Streicher <[hidden email]>
Paul Wise <[hidden email]>
themill-guest <[hidden email]>
zack <[hidden email]>

(I left out former GSoC students of mine where I can take over the code
as well as duplicates that are obvious to me.)

So how can we organise the Python3 port of the UDD code base?

Kind regards

      Andreas.

--
http://fam-tille.de

Reply | Threaded
Open this post in threaded view
|

Re: [UDD] Is there some effort to port UDD to Python3?

Lucas Nussbaum
On 13/05/20 at 16:38 +0200, Andreas Tille wrote:

> Hi Lucas,
>
> On Tue, Apr 14, 2020 at 08:47:11AM +0200, Andreas Tille wrote:
> > >
> > > Not as far as I know. I suspect that, once it becomes necessary, it will
> > > be easy to do given the codebase is relatively small.
> >
> > I agree that the small code base makes it probably easy.  But I'm
> > worried about the "once it becomes necessary" part.  We all know that
> > Python2 is only alive due to our security team and we should actively
> > work on getting rid of the dependency rather sooner than later.  Working
> > "under pressure" makes things always uneasy - no matter how easy it
> > would be in principle.
> >
> > I know probably nobody will stop me from doing it - but I'm hesitating
> > adding another item on my table which is full of Debian Med - Covid-19
> > stuff.  I'd volunteer to port those importers I've written myself once
> > somebody gives the signal - but I'd love if those who have written the
> > core parts would take the lead (rather sooner than later).
>
> I need to come back to this topic since I like to test the importers on
> my local machines which are usually running testing.

Use the Vagrant development environment?

Lucas

Reply | Threaded
Open this post in threaded view
|

Re: [UDD] Is there some effort to port UDD to Python3?

Andreas Tille-2
On Wed, May 13, 2020 at 05:09:02PM +0200, Lucas Nussbaum wrote:

> On 13/05/20 at 16:38 +0200, Andreas Tille wrote:
> > Hi Lucas,
> >
> > On Tue, Apr 14, 2020 at 08:47:11AM +0200, Andreas Tille wrote:
> > > >
> > > > Not as far as I know. I suspect that, once it becomes necessary, it will
> > > > be easy to do given the codebase is relatively small.
> > >
> > > I agree that the small code base makes it probably easy.  But I'm
> > > worried about the "once it becomes necessary" part.  We all know that
> > > Python2 is only alive due to our security team and we should actively
> > > work on getting rid of the dependency rather sooner than later.  Working
> > > "under pressure" makes things always uneasy - no matter how easy it
> > > would be in principle.
> > >
> > > I know probably nobody will stop me from doing it - but I'm hesitating
> > > adding another item on my table which is full of Debian Med - Covid-19
> > > stuff.  I'd volunteer to port those importers I've written myself once
> > > somebody gives the signal - but I'd love if those who have written the
> > > core parts would take the lead (rather sooner than later).
> >
> > I need to come back to this topic since I like to test the importers on
> > my local machines which are usually running testing.
>
> Use the Vagrant development environment?

I admit I've never worked with this.  You said its pretty simple and I
would guess a python3 branch where everybody commits the code he feels
responsible for and test it would be sufficient.  But I'm fine to adapt
(if you point me to some doc).

Kind regards

      Andreas.

--
http://fam-tille.de

Reply | Threaded
Open this post in threaded view
|

Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Andreas Tille-5
On Wed, May 13, 2020 at 07:40:56PM +0200, Andreas Tille wrote:
> >
> > Use the Vagrant development environment?
>
> I admit I've never worked with this.  You said its pretty simple and I
> would guess a python3 branch where everybody commits the code he feels
> responsible for and test it would be sufficient.  But I'm fine to adapt
> (if you point me to some doc).

I've just followed my proposal to create a python3 branch, fired up 2to3
and fixed some issues manually.  My gatherers blends-prospective, ftpnew
and screenshots should work.  I'm now stumbling upon:

udd(python3) $ ./only-run.sh ddtp
Traceback (most recent call last):
  File "/srv/udd.debian.org/udd//udd.py", line 88, in <module>
    exec("gatherer.%s()" % command)
  File "<string>", line 1, in <module>
  File "/srv/udd.debian.org/udd/udd/ddtp_gatherer.py", line 125, in run
    h.update(f.read())
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 11: invalid continuation byte


I tried to fix this using this patch:


$ git diff
diff --git a/udd/ddtp_gatherer.py b/udd/ddtp_gatherer.py
index 46e588d..7cc625c 100644
--- a/udd/ddtp_gatherer.py
+++ b/udd/ddtp_gatherer.py
@@ -117,7 +117,7 @@ class ddtp_gatherer(gatherer):
           trfile = trfilepath + file
           # check whether hash recorded in index file fits real file
           try:
-            f = open(trfile)
+            f = open(trfile, encoding='utf-8')
           except IOError as err:
             self.log.error("%s: %s.", str(err), trfile)
             continue


but it does not help.  Any hint would be welcome.

Kind regards

       Andreas.

--
http://fam-tille.de

Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Stéphane Blondon
On 14/05/2020 11:43, Andreas Tille wrote:
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 11: invalid continuation byte

The error is like [1] where the file is not encoded utf-8.

1:
https://stackoverflow.com/questions/5552555/unicodedecodeerror-invalid-continuation-byte


> +            f = open(trfile, encoding='utf-8')

`f = open(trfile, encoding='latin-1')`

could be a (temporary?) solution.


Regards,
Stéphane


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Mattia Rizzolo-5
And, ideally, somebody would contact whoever is providing that file so that they re-encode it with utf8...

On Thu, 14 May 2020, 9:16 pm Stéphane Blondon, <[hidden email]> wrote:
On 14/05/2020 11:43, Andreas Tille wrote:
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 11: invalid continuation byte

The error is like [1] where the file is not encoded utf-8.

1:
https://stackoverflow.com/questions/5552555/unicodedecodeerror-invalid-continuation-byte


> +            f = open(trfile, encoding='utf-8')

`f = open(trfile, encoding='latin-1')`

could be a (temporary?) solution.


Regards,
Stéphane

Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Stéphane Blondon
On 14/05/2020 21:25, Mattia Rizzolo wrote:
> And, ideally, somebody would contact whoever is providing that file so that
> they re-encode it with utf8...

Yes, it's the best long term solution.


> On Thu, 14 May 2020, 9:16 pm Stéphane Blondon, <[hidden email]>
> wrote:
>>
>> `f = open(trfile, encoding='latin-1')`
>>
>> could be a (temporary?) solution.

Andreas, it's possible that changing the encoding will fix the bug for
some files but you will get new errors on other files (encoded in
utf-8). Trying several encoding or using 'chardet' library could be a
better workaround.


Regards,
Stéphane


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Andreas Tille-5
On Fri, May 15, 2020 at 08:51:05PM +0200, Stéphane Blondon wrote:
> > And, ideally, somebody would contact whoever is providing that file so that
> > they re-encode it with utf8...
>
> Yes, it's the best long term solution.

Definitely.  But who is providing that file?
 
> >> `f = open(trfile, encoding='latin-1')`
> >>
> >> could be a (temporary?) solution.
>
> Andreas, it's possible that changing the encoding will fix the bug for
> some files but you will get new errors on other files (encoded in
> utf-8). Trying several encoding or using 'chardet' library could be a
> better workaround.

Would you mind providing a patch with chardet?

Kind regards

      Andreas.

--
http://fam-tille.de

Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Stéphane Blondon
Hello,

On 15/05/2020 21:10, Andreas Tille wrote:> Would you mind providing a
patch with chardet?
There is a patch attached to this e-mail.

I used [1] for the base file. I don't think the patch is great (because
there are two 'open()' calls) but it has minimal modifications of the
current source code. I think it's a better solution for the success the
migration to python3 (because it avoid introducing bugs during the
migration).


Feel free to ask for more explanations or other stuff if you need.

1: https://salsa.debian.org/qa/udd/-/blob/master/udd/ddtp_gatherer.py

--
Stéphane

ddtp_gatherer.py.diff (991 bytes) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Andreas Tille-6
Hi Stéphane,

thanks for your patch which I applied in the python3 branch.  Unfortunately
it does not solve the issue:


udd(python3) $ ./update-and-run.sh ddtp
Traceback (most recent call last):
  File "/srv/udd.debian.org/udd//udd.py", line 88, in <module>
    exec("gatherer.%s()" % command)
  File "<string>", line 1, in <module>
  File "/srv/udd.debian.org/udd/udd/ddtp_gatherer.py", line 127, in run
    h.update(f.read())
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 11: invalid continuation byte


Thanks a lot anyway

      Andreas.

On Mon, May 18, 2020 at 01:15:11PM +0200, Stéphane Blondon wrote:

> Hello,
>
> On 15/05/2020 21:10, Andreas Tille wrote:> Would you mind providing a
> patch with chardet?
> There is a patch attached to this e-mail.
>
> I used [1] for the base file. I don't think the patch is great (because
> there are two 'open()' calls) but it has minimal modifications of the
> current source code. I think it's a better solution for the success the
> migration to python3 (because it avoid introducing bugs during the
> migration).
>
>
> Feel free to ask for more explanations or other stuff if you need.
>
> 1: https://salsa.debian.org/qa/udd/-/blob/master/udd/ddtp_gatherer.py
>
> --
> Stéphane

> --- ddtp_gatherer.py.orig 2020-05-17 22:54:21.793075000 +0200
> +++ ddtp_gatherer.py 2020-05-18 13:02:47.210764004 +0200
> @@ -25,6 +25,8 @@
>  import logging
>  import logging.handlers
>  
> +import chardet
> +
>  debug=0
>  
>  def get_gatherer(connection, config, source):
> @@ -117,7 +119,7 @@
>            trfile = trfilepath + file
>            # check whether hash recorded in index file fits real file
>            try:
> -            f = open(trfile)
> +            f = _open_file(trfile)
>            except IOError, err:
>              self.log.error("%s: %s.", str(err), trfile)
>              continue
> @@ -236,6 +238,13 @@
>          except IOError, err:
>            self.log.exception("Error reading %s%s", dir, filename)
>  
> +def _open_file(path):
> +    with open(path, 'rb') as f:
> +        raw_content = f.read()
> +        encoding = chardet.detect(raw_content)["encoding"]
> +    return open(path, encoding=encoding)
> +
> +
>  if __name__ == '__main__':
>    main()
>  





--
http://fam-tille.de

Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Stéphane Blondon
On 18/05/2020 14:59, Andreas Tille wrote:
> thanks for your patch which I applied in the python3 branch.  Unfortunately
> it does not solve the issue

Can you send me the file 'gatherer.${I_dont_know_the_command}' which
raises the UnicodeDecodeError exception? I will try to write a working
patch.


Regards,
Stéphane


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Andreas Tille-5
On Mon, May 18, 2020 at 08:35:33PM +0200, Stéphane Blondon wrote:
>
> Can you send me the file 'gatherer.${I_dont_know_the_command}' which
> raises the UnicodeDecodeError exception? I will try to write a working
> patch.

I simply added a debug line:

udd(python3) $ git diff
diff --git a/udd/ddtp_gatherer.py b/udd/ddtp_gatherer.py
index bbf041b..d32b85f 100644
--- a/udd/ddtp_gatherer.py
+++ b/udd/ddtp_gatherer.py
@@ -239,6 +239,7 @@ class ddtp_gatherer(gatherer):
           self.log.exception("Error reading %s%s", dir, filename)
 
 def _open_file(path):
+    print(path)
     with open(path, 'rb') as f:
         raw_content = f.read()
         encoding = chardet.detect(raw_content)["encoding"]


which leads to


udd(python3) $ ./update-and-run.sh ddtp
/srv/mirrors/debian/dists/squeeze-proposed-updates/main/i18n/Translation-en.bz2
/srv/mirrors/debian/dists/squeeze-proposed-updates/non-free/i18n/Translation-en.bz2
/srv/mirrors/debian/dists/squeeze-proposed-updates/contrib/i18n/Translation-en.bz2
/srv/mirrors/debian/dists/stretch-proposed-updates/main/i18n/Translation-en.bz2
Traceback (most recent call last):
  File "/srv/udd.debian.org/udd//udd.py", line 88, in <module>
    exec("gatherer.%s()" % command)
  File "<string>", line 1, in <module>
  File "/srv/udd.debian.org/udd/udd/ddtp_gatherer.py", line 127, in run
    h.update(f.read())
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 11: invalid continuation byte


While you can download the files from any Debian mirror I've attached
   /srv/mirrors/debian/dists/stretch-proposed-updates/main/i18n/Translation-en.bz2
to this mail.  My guess is that translations from stretch will not be
touched any more and thus we need to cope somehow with the existing
encoding.

Thanks a lot for your help

    Andreas.

--
http://fam-tille.de

Translation-en.bz2 (144K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Started porting UDD to Python3 (Was: [UDD] Is there some effort to port UDD to Python3?)

Lucas Nussbaum-4
Hi,

Do all those people in Cc need to read this? If you really want to keep
this public, maybe debian-qa@ is enough?  (I personally don't feel I
need to read this at this time; if I had time to spend on UDD, I would
fix actual bugs)

Thanks

Lucas


On 18/05/20 at 21:57 +0200, Andreas Tille wrote:

> On Mon, May 18, 2020 at 08:35:33PM +0200, Stéphane Blondon wrote:
> >
> > Can you send me the file 'gatherer.${I_dont_know_the_command}' which
> > raises the UnicodeDecodeError exception? I will try to write a working
> > patch.
>
> I simply added a debug line:
>
> udd(python3) $ git diff
> diff --git a/udd/ddtp_gatherer.py b/udd/ddtp_gatherer.py
> index bbf041b..d32b85f 100644
> --- a/udd/ddtp_gatherer.py
> +++ b/udd/ddtp_gatherer.py
> @@ -239,6 +239,7 @@ class ddtp_gatherer(gatherer):
>            self.log.exception("Error reading %s%s", dir, filename)
>  
>  def _open_file(path):
> +    print(path)
>      with open(path, 'rb') as f:
>          raw_content = f.read()
>          encoding = chardet.detect(raw_content)["encoding"]
>
>
> which leads to
>
>
> udd(python3) $ ./update-and-run.sh ddtp
> /srv/mirrors/debian/dists/squeeze-proposed-updates/main/i18n/Translation-en.bz2
> /srv/mirrors/debian/dists/squeeze-proposed-updates/non-free/i18n/Translation-en.bz2
> /srv/mirrors/debian/dists/squeeze-proposed-updates/contrib/i18n/Translation-en.bz2
> /srv/mirrors/debian/dists/stretch-proposed-updates/main/i18n/Translation-en.bz2
> Traceback (most recent call last):
>   File "/srv/udd.debian.org/udd//udd.py", line 88, in <module>
>     exec("gatherer.%s()" % command)
>   File "<string>", line 1, in <module>
>   File "/srv/udd.debian.org/udd/udd/ddtp_gatherer.py", line 127, in run
>     h.update(f.read())
>   File "/usr/lib/python3.8/codecs.py", line 322, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 11: invalid continuation byte
>
>
> While you can download the files from any Debian mirror I've attached
>    /srv/mirrors/debian/dists/stretch-proposed-updates/main/i18n/Translation-en.bz2
> to this mail.  My guess is that translations from stretch will not be
> touched any more and thus we need to cope somehow with the existing
> encoding.
>
> Thanks a lot for your help
>
>     Andreas.
>
> --
> http://fam-tille.de