Bug#841499: uscan: support searching in multiple directories for matching files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug#841499: uscan: support searching in multiple directories for matching files

Paul Wise via nm
Package: devscripts
Severity: wishlist
User: [hidden email]
Usertags: uscan

For upstreams that store their downloads in a directory tree with one
branch per directory, maintainers might want to exclude release
candidates but uscan currently only considers the newest directory,
which might only contain release candidates, resulting in not being
able to see the latest stable release. uscan should scan each directory
in descending order of version until at least one file was found.

When this is fixed, this sentence needs removing from the manual:

(If multiple directories match, the highest version is picked.)

Here is an example of a watch file that would be fixed by this:

pabs@chianamo ~ $ cat watch
version=3
https://cmake.org/files/(v[\d.]+)/cmake-([\d.]+).tar.gz
pabs@chianamo ~ $ uscan --watchfile watch --verbose --package cmake --upstream-version 3.5
uscan info: uscan (version 2.16.8) See uscan(1) for help
uscan info: Option --watchfile=watch used
uscan info: Process ./watch (package=cmake version=3.5)
uscan info: Last orig.tar.* tarball version (from debian/changelog): 3.5
uscan info: Last orig.tar.* tarball version (dversionmangled): 3.5
uscan info: dir=>/files/  dirpattern=>(v[\d.]+)
uscan info: Requesting URL:
   https://cmake.org/files/
uscan info: Matching pattern:
   (?:(?:https://cmake.org)?\/files\/)?(v[\d.]+)
uscan info: Matching target for dirversionmangle:   ?C=N;O=D
uscan info: Matching target for dirversionmangle:   ?C=M;O=A
uscan info: Matching target for dirversionmangle:   ?C=S;O=A
uscan info: Matching target for dirversionmangle:   ?C=D;O=A
uscan info: Matching target for dirversionmangle:   /
uscan info: Matching target for dirversionmangle:   LatestRelease/
uscan info: Matching target for dirversionmangle:   Tutorial.tar.gz
uscan info: Matching target for dirversionmangle:   contracts/
uscan info: Matching target for dirversionmangle:   contrib/
uscan info: Matching target for dirversionmangle:   cygwin/
uscan info: Matching target for dirversionmangle:   dev/
uscan info: Matching target for dirversionmangle:   lapack_test.tar.gz
uscan info: Matching target for dirversionmangle:   logos/
uscan info: Matching target for dirversionmangle:   mongochem-sample.json.bz2
uscan info: Matching target for dirversionmangle:   radiance/
uscan info: Matching target for dirversionmangle:   temdata/
uscan info: Matching target for dirversionmangle:   tmp/
uscan info: Matching target for dirversionmangle:   tpl/
uscan info: Matching target for dirversionmangle:   v0.5/
uscan info: Matching target for dirversionmangle:   v0.6/
uscan info: Matching target for dirversionmangle:   v0.7/
uscan info: Matching target for dirversionmangle:   v0.8/
uscan info: Matching target for dirversionmangle:   v1.2/
uscan info: Matching target for dirversionmangle:   v1.4/
uscan info: Matching target for dirversionmangle:   v1.6/
uscan info: Matching target for dirversionmangle:   v1.8/
uscan info: Matching target for dirversionmangle:   v2.0/
uscan info: Matching target for dirversionmangle:   v2.2/
uscan info: Matching target for dirversionmangle:   v2.3/
uscan info: Matching target for dirversionmangle:   v2.4/
uscan info: Matching target for dirversionmangle:   v2.6/
uscan info: Matching target for dirversionmangle:   v2.8/
uscan info: Matching target for dirversionmangle:   v3.0/
uscan info: Matching target for dirversionmangle:   v3.1/
uscan info: Matching target for dirversionmangle:   v3.2/
uscan info: Matching target for dirversionmangle:   v3.3/
uscan info: Matching target for dirversionmangle:   v3.4/
uscan info: Matching target for dirversionmangle:   v3.5/
uscan info: Matching target for dirversionmangle:   v3.6/
uscan info: Matching target for dirversionmangle:   v3.7/
uscan info: Matching target for dirversionmangle:   vCVS/
uscan info: Found the following matching directories (newest first):
   v3.7/ (v3.7)
   v3.6/ (v3.6)
   v3.5/ (v3.5)
   v3.4/ (v3.4)
   v3.3/ (v3.3)
   v3.2/ (v3.2)
   v3.1/ (v3.1)
   v3.0/ (v3.0)
   v2.8/ (v2.8)
   v2.6/ (v2.6)
   v2.4/ (v2.4)
   v2.3/ (v2.3)
   v2.2/ (v2.2)
   v2.0/ (v2.0)
   v1.8/ (v1.8)
   v1.6/ (v1.6)
   v1.4/ (v1.4)
   v1.2/ (v1.2)
   v0.8/ (v0.8)
   v0.7/ (v0.7)
   v0.6/ (v0.6)
   v0.5/ (v0.5)
uscan info: newest_dir => 'v3.7'
uscan info: Requesting URL:
   https://cmake.org/files/v3.7/
uscan info: Matching pattern:
   (?:(?:https://cmake.org)?\/files\/v3\.7\/)?cmake-([\d.]+).tar.gz
uscan warn: In watch no matching files for watch line
  https://cmake.org/files/(v[\d.]+)/cmake-([\d.]+).tar.gz

--
bye,
pabs

https://wiki.debian.org/PaulWise

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#841499: uscan: support searching in multiple directories for matching files

Osamu Aoki-7
Hi,

On Fri, Oct 21, 2016 at 04:28:05PM +0800, Paul Wise wrote:
> Package: devscripts
> Severity: wishlist
> User: [hidden email]
> Usertags: uscan
>
> For upstreams that store their downloads in a directory tree with one
> branch per directory, maintainers might want to exclude release
> candidates but uscan currently only considers the newest directory,

Yes, I am aware of this limitation.

> which might only contain release candidates, resulting in not being
> able to see the latest stable release. uscan should scan each directory
> in descending order of version until at least one file was found.
>
> When this is fixed, this sentence needs removing from the manual:
>
> (If multiple directories match, the highest version is picked.)

If we do not do this, we need to loop over scanning many pages... Not a
good idea.  Can you think of non-invasive change?

> Here is an example of a watch file that would be fixed by this:
...
> uscan info: Requesting URL:
>    https://cmake.org/files/

How about scanning https://cmake.org/download/

Most HTTP site has this kind of page.  I think complicating page
scanning mechanism for FTP further doesn't seem to be good idea.

Osamu

Reply | Threaded
Open this post in threaded view
|

Bug#841499: uscan: support searching in multiple directories for matching files

Paul Wise via nm
On Tue, 2016-10-25 at 01:54 +0900, Osamu Aoki wrote:

> If we do not do this, we need to loop over scanning many pages... Not a
> good idea.  Can you think of non-invasive change?

As I said in the original bug report, scan each directory in descending
order of version until at least one file was found. 

In the normal case this change will not change the behaviour of uscan
at all since a file will be matched on the first directory.

Only in watch files where uscan fails to find a file in the first
directory will my proposal change the behaviour.

For the most common case (RCs in the first directory and releases in
the second), uscan will only download one extra page.

For the cases where the file part of the regex does not match any file
in any subdirectory, we can limit it to 5 requests by default, with a
0.5 second delay between them to reduce impact.

> How about scanning https://cmake.org/download/

That is only a workaround for this uscan flaw.

> Most HTTP site has this kind of page.

I've encountered a number of cases over the years on mentors IRC and
other places where this wasn't possible.

The cmake one and most others only show the latest release, which means
that I can't use uscan to download a particular version.

> I think complicating page scanning mechanism

It isn't much of a complication at all really:

On error, if we scanned a directory, go back and scan the next
directory. Possibly with a configurable limit of scanned dirs.

> FTP

FTP has nothing to do with this issue, why do you mention it?

--
bye,
pabs

https://wiki.debian.org/PaulWise

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#841499: uscan: support searching in multiple directories for matching files

Osamu Aoki-7
Hi,

On Tue, Oct 25, 2016 at 08:41:35AM +0800, Paul Wise wrote:
> On Tue, 2016-10-25 at 01:54 +0900, Osamu Aoki wrote:
>
> > If we do not do this, we need to loop over scanning many pages... Not a
> > good idea.  Can you think of non-invasive change?
...
> It isn't much of a complication at all really:
>
> On error, if we scanned a directory, go back and scan the next
> directory. Possibly with a configurable limit of scanned dirs.

I was thinking to bunch up all possible URL results by scanning all
directory from low version to the high version.  But you have a point.
Scan from high version and pick page which has matching URL.

This makes sense and not as bad situation as I thought.

Just push down all the directories.  Scan from the latest one.

> > FTP
>
> FTP has nothing to do with this issue, why do you mention it?

Yes.  I meant HTTP site which looks like old FTP site in terms of its
directory and page structure.

Osamu

Reply | Threaded
Open this post in threaded view
|

Bug#841499: uscan: support searching in multiple directories for matching files

Paul Wise via nm
On Tue, 2016-10-25 at 22:39 +0900, Osamu Aoki wrote:

> I was thinking to bunch up all possible URL results by scanning all
> directory from low version to the high version.  But you have a point.
> Scan from high version and pick page which has matching URL.
>
> This makes sense and not as bad situation as I thought. 
>
> Just push down all the directories.  Scan from the latest one.

I'm glad you agree. I had trouble understanding the code, otherwise I
would have just committed this myself.

> Yes.  I meant HTTP site which looks like old FTP site in terms of its
> directory and page structure.

Ah, ok.

--
bye,
pabs

https://wiki.debian.org/PaulWise

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#841499: uscan: support searching in multiple directories for matching files

Paul Gevers-4
In reply to this post by Osamu Aoki-7
On Tue, 25 Oct 2016 22:39:40 +0900 Osamu Aoki <[hidden email]>
wrote:

> Hi,
>
> On Tue, Oct 25, 2016 at 08:41:35AM +0800, Paul Wise wrote:
> > On Tue, 2016-10-25 at 01:54 +0900, Osamu Aoki wrote:
> >
> > > If we do not do this, we need to loop over scanning many pages... Not a
> > > good idea.  Can you think of non-invasive change?
> ...
> > It isn't much of a complication at all really:
> >
> > On error, if we scanned a directory, go back and scan the next
> > directory. Possibly with a configurable limit of scanned dirs.
>
> I was thinking to bunch up all possible URL results by scanning all
> directory from low version to the high version.  But you have a point.
> Scan from high version and pick page which has matching URL.
>
> This makes sense and not as bad situation as I thought.
>
> Just push down all the directories.  Scan from the latest one.
This would fix my current issue with uscan the version I am looking for
is in 1.95. I don't think upstream want to publish a newer version, but
nevertheless I like to add a watch file to be sure. There are multiple
festvox voices that have the same issue (because of the same upstream).

paul@testavoira ~/packages/festvox/festvox-ellpc11k $ uscan -v
uscan info: uscan (version 2.17.11) See uscan(1) for help
uscan info: Scan watch files in .
uscan info: Check debian/watch and debian/changelog in .
uscan info: package="festvox-ellpc11k" version="1.95-1" (as seen in
debian/changelog)
uscan info: package="festvox-ellpc11k" version="1.95" (no epoch/revision)
uscan info: Check debian/watch and debian/changelog in ./.git/refs/tags
uscan info: Check debian/watch and debian/changelog in
./.git/dgit/unpack/festvox-ellpc11k-1.4.0
uscan info: ./debian/changelog sets package="festvox-ellpc11k"
version="1.95"
uscan info: Process ./debian/watch (package=festvox-ellpc11k version=1.95)
uscan info: opts:
filenamemangle=s#.*/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/festvox_ellpc11k#festvox-ellpc11k_$1#
uscan info: line:
http://festvox.org/packed/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/
festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan info: Parsing
filenamemangle=s#.*/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/festvox_ellpc11k#festvox-ellpc11k_$1#
uscan info: line:
http://festvox.org/packed/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/
festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan info: Last orig.tar.* tarball version (from debian/changelog): 1.95
uscan info: Last orig.tar.* tarball version (dversionmangled): 1.95
uscan info: dir=>/packed/festival/  dirpattern=>[-_]?(\d[\-+\.:\~\da-zA-Z]*)
uscan info: Requesting URL:
   http://festvox.org/packed/festival/
uscan info: Matching pattern:

(?:(?:http://festvox.org)?\/packed\/festival\/)?[-_]?(\d[\-+\.:\~\da-zA-Z]*)
uscan info: Matching target for dirversionmangle:   ?C=N;O=D
uscan info: Matching target for dirversionmangle:   ?C=M;O=A
uscan info: Matching target for dirversionmangle:   ?C=S;O=A
uscan info: Matching target for dirversionmangle:   ?C=D;O=A
uscan info: Matching target for dirversionmangle:   /packed/
uscan info: Matching target for dirversionmangle:   1.4.1/
uscan info: Matching target for dirversionmangle:   1.4.2/
uscan info: Matching target for dirversionmangle:   1.4.3/
uscan info: Matching target for dirversionmangle:   1.95/
uscan info: Matching target for dirversionmangle:   1.96/
uscan info: Matching target for dirversionmangle:   2.0.95/
uscan info: Matching target for dirversionmangle:   2.1/
uscan info: Matching target for dirversionmangle:   2.4/
uscan info: Matching target for dirversionmangle:   Linux-1.4.1/
uscan info: Matching target for dirversionmangle:   Linux-1.4.2/
uscan info: Matching target for dirversionmangle:   free-1.4.1/
uscan info: Matching target for dirversionmangle:   free-1.4.2/
uscan info: Matching target for dirversionmangle:   free-1.4.3/
uscan info: Matching target for dirversionmangle:   latest/
uscan info: Found the following matching directories (newest first):
   2.4/ (2.4)
   2.1/ (2.1)
   2.0.95/ (2.0.95)
   1.96/ (1.96)
   1.95/ (1.95)
   1.4.3/ (1.4.3)
   1.4.2/ (1.4.2)
   1.4.1/ (1.4.1)
uscan info: newest_dir => '2.4'
uscan info: Requesting URL:
   http://festvox.org/packed/festival/2.4/
uscan info: Matching pattern:

(?:(?:http://festvox.org)?\/packed\/festival\/2\.4\/)?festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan warn: In debian/watch no matching files for watch line
  http://festvox.org/packed/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/
festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan info: opts:
filenamemangle=s#.*/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/voices/festvox_ellpc11k#festvox-ellpc11k_$1#
uscan info: line:
http://festvox.org/packed/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/voices/
festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan info: Parsing
filenamemangle=s#.*/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/voices/festvox_ellpc11k#festvox-ellpc11k_$1#
uscan info: line:
http://festvox.org/packed/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/voices/
festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan warn: more than one main upstream tarballs listed.
uscan info: Last orig.tar.* tarball version (from debian/changelog): 1.95
uscan info: Last orig.tar.* tarball version (dversionmangled): 1.95
uscan info: dir=>/packed/festival/  dirpattern=>[-_]?(\d[\-+\.:\~\da-zA-Z]*)
uscan info: Requesting URL:
   http://festvox.org/packed/festival/
uscan info: Matching pattern:

(?:(?:http://festvox.org)?\/packed\/festival\/)?[-_]?(\d[\-+\.:\~\da-zA-Z]*)
uscan info: Matching target for dirversionmangle:   ?C=N;O=D
uscan info: Matching target for dirversionmangle:   ?C=M;O=A
uscan info: Matching target for dirversionmangle:   ?C=S;O=A
uscan info: Matching target for dirversionmangle:   ?C=D;O=A
uscan info: Matching target for dirversionmangle:   /packed/
uscan info: Matching target for dirversionmangle:   1.4.1/
uscan info: Matching target for dirversionmangle:   1.4.2/
uscan info: Matching target for dirversionmangle:   1.4.3/
uscan info: Matching target for dirversionmangle:   1.95/
uscan info: Matching target for dirversionmangle:   1.96/
uscan info: Matching target for dirversionmangle:   2.0.95/
uscan info: Matching target for dirversionmangle:   2.1/
uscan info: Matching target for dirversionmangle:   2.4/
uscan info: Matching target for dirversionmangle:   Linux-1.4.1/
uscan info: Matching target for dirversionmangle:   Linux-1.4.2/
uscan info: Matching target for dirversionmangle:   free-1.4.1/
uscan info: Matching target for dirversionmangle:   free-1.4.2/
uscan info: Matching target for dirversionmangle:   free-1.4.3/
uscan info: Matching target for dirversionmangle:   latest/
uscan info: Found the following matching directories (newest first):
   2.4/ (2.4)
   2.1/ (2.1)
   2.0.95/ (2.0.95)
   1.96/ (1.96)
   1.95/ (1.95)
   1.4.3/ (1.4.3)
   1.4.2/ (1.4.2)
   1.4.1/ (1.4.1)
uscan info: newest_dir => '2.4'
uscan info: Requesting URL:
   http://festvox.org/packed/festival/2.4/voices/
uscan info: Matching pattern:

(?:(?:http://festvox.org)?\/packed\/festival\/2\.4\/voices\/)?festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan warn: In debian/watch no matching files for watch line

http://festvox.org/packed/festival/[-_]?(\d[\-+\.:\~\da-zA-Z]*)/voices/
festvox_ellpc11k(?i)\.(?:tar\.xz|tar\.bz2|tar\.gz|zip)
uscan info: Scan finished


signature.asc (499 bytes) Download Attachment