Bug#881692: command-not-found: I re-wrote command-not-found

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5
Package: command-not-found
Severity: wishlist

I re-wrote command-not-found to get rid of the python dependancy, and
to reduce the database size, as to reduce memory usage.

https://github.com/shawnl/command-not-found

I was preparing to upload it to mentors as command-not-found-ng

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386, arm64, armhf

Kernel: Linux 4.14.0-rc7-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages command-not-found depends on:
ii  apt-file     3.1.5
ii  lsb-release  9.20170808
ii  python       2.7.14-1
ii  python-gdbm  2.7.14-1

command-not-found recommends no packages.

command-not-found suggests no packages.

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5
Oh, I forgot about the spelling feature....

On Nov 13, 2017 22:36, "Shawn Landden" <[hidden email]> wrote:
Package: command-not-found
Severity: wishlist

I re-wrote command-not-found to get rid of the python dependancy, and
to reduce the database size, as to reduce memory usage.

https://github.com/shawnl/command-not-found

I was preparing to upload it to mentors as command-not-found-ng

-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386, arm64, armhf

Kernel: Linux 4.14.0-rc7-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages command-not-found depends on:
ii  apt-file     3.1.5
ii  lsb-release  9.20170808
ii  python       2.7.14-1
ii  python-gdbm  2.7.14-1

command-not-found recommends no packages.

command-not-found suggests no packages.
Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5
close 881692
thanks

On Mon, Nov 13, 2017 at 10:53 PM, Shawn Landden <[hidden email]> wrote:
Oh, I forgot about the spelling feature....

On Nov 13, 2017 22:36, "Shawn Landden" <[hidden email]> wrote:
Package: command-not-found
Severity: wishlist

I re-wrote command-not-found to get rid of the python dependancy, and
to reduce the database size, as to reduce memory usage.

https://github.com/shawnl/command-not-found

I was preparing to upload it to mentors as command-not-found-ng

closing as I forgot to code the spelling feature
-- System Information:
Debian Release: buster/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386, arm64, armhf

Kernel: Linux 4.14.0-rc7-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages command-not-found depends on:
ii  apt-file     3.1.5
ii  lsb-release  9.20170808
ii  python       2.7.14-1
ii  python-gdbm  2.7.14-1

command-not-found recommends no packages.

command-not-found suggests no packages.

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Julian Andres Klode-4
In reply to this post by Shawn Landden-5
(forwarding this to ubuntu-devel-discuss and Zygmunt)

On Mon, Nov 13, 2017 at 10:33:39PM -0800, Shawn Landden wrote:
> Package: command-not-found
> Severity: wishlist
>
> I re-wrote command-not-found to get rid of the python dependancy, and
> to reduce the database size, as to reduce memory usage.
>
> https://github.com/shawnl/command-not-found
>
> I was preparing to upload it to mentors as command-not-found-ng

I also rewrote it years ago, but using the same database format,
just in C. It was a lot faster. I don't understand the memory usage
bit - it should not matter how large the database is, it's memory
mapped, and not read into memory, as such memory usage should be
roughly constant.

Questions/Comments for your approach:

* Did you test your format on a slow HDD with caches dropped? It
  must not be slower than the Python one (that one is way too slow
  already) - I did, it seems to be faster (0.4 vs 0.68 seconds)
  - I believe the database-based C rewrite was even much faster,
  though.
* update-command-not-found should use apt-get indextargets
* You don't store components, hence you cannot tell people to enable
  component. That's a very important use case for Ubuntu, where
  not all components are enabled by default, but the database is
  shipped in the package.

  You could just append /<component> to each package name I think,
  and strip it away when displaying.
* You should use getopt_long() to parse command-line options, and
  support -h, --help :)
* pts_lbsearch belongs into usr/lib/..., not usr/share/...

* You don't implement a closest matches function:

        $ command-not-found thunderbrd
        No command 'thunderbrd' found, did you mean:
         Command 'thunderbird' from package 'thunderbird' (main)
        thunderbrd: command not found
        $ ./command-not-found thunderbrd
        thunderbrd: command not found

   This one is really important. People do make typos or misremember
   command names, so the tool needs to be able to deal with that

   Should be easy to implement though, although you might have to
   search multiple times - once for each alternative. All you need is

        def similar_words(word):
            """ return a set with spelling1 distance alternative spellings
       
                based on http://norvig.com/spell-correct.html"""
            alphabet = 'abcdefghijklmnopqrstuvwxyz-_'
            s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
            deletes    = [a + b[1:] for a, b in s if b]
            transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
            replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]
            inserts    = [a + c + b     for a, b in s for c in alphabet]
            return set(deletes + transposes + replaces + inserts)

    And search for what that returns. And you don't need to search for those
    at all if you have a direct match.

* It needs to be translated - also very important.

* You need to Conflict with command-not-found and not Break AFAIUI

* You should not depend on grep, sed, coreutils, they are Essential.

* You do have to Depend on apt-file, as that configures apt to download
  the Contents files

* You should not have identifiers starting with _ in the program, these
  are reserved for the C implementation (like _cleanup_free_).

Yes, and these are basically the same reasons my C prototype is
not in the archive. Also, I did not put a lot of work into it, as
I was waiting for PackageKit to take that over, but that was not
done yet.

I think it's a worthwhile approach, and I can see it replacing
command-not-found if those tiny issues have been fixed. Then you
could also avoid the -ng moniker, and just take over the main
package (if Zygmunt does not mind), which also avoids a month
long NEW process :)

--
Debian Developer - deb.li/jak | jak-linux.org - free software dev
Ubuntu Core Developer                              de, en speaker

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Zygmunt Krynicki-4
Hey everyone.

Thank you for your interest in command-not-found.

On Tue, Nov 14, 2017 at 8:50 AM, Julian Andres Klode <[hidden email]> wrote:

> (forwarding this to ubuntu-devel-discuss and Zygmunt)
>
> On Mon, Nov 13, 2017 at 10:33:39PM -0800, Shawn Landden wrote:
>> Package: command-not-found
>> Severity: wishlist
>>
>> I re-wrote command-not-found to get rid of the python dependancy, and
>> to reduce the database size, as to reduce memory usage.
>>
>> https://github.com/shawnl/command-not-found
>>
>> I was preparing to upload it to mentors as command-not-found-ng
>
> I also rewrote it years ago, but using the same database format,
> just in C. It was a lot faster. I don't understand the memory usage
> bit - it should not matter how large the database is, it's memory
> mapped, and not read into memory, as such memory usage should be
> roughly constant.
>
> Questions/Comments for your approach:
>
> * Did you test your format on a slow HDD with caches dropped? It
>   must not be slower than the Python one (that one is way too slow
>   already) - I did, it seems to be faster (0.4 vs 0.68 seconds)
>   - I believe the database-based C rewrite was even much faster,
>   though.

> * update-command-not-found should use apt-get indextargets

> * You don't store components, hence you cannot tell people to enable
>   component. That's a very important use case for Ubuntu, where
>   not all components are enabled by default, but the database is
>   shipped in the package.
>
>   You could just append /<component> to each package name I think,
>   and strip it away when displaying.

I would love if we have a compact representation of mapping from name
to list of bits of information where each bit can be a small structure
with some data. Apart from components for ubuntu archive it could be
used to store facts about snap packages, flatpaks, etc. I would try to
avoid a simplistic command -> package mapping as that will force us to
encode things into strings in an ad-hoc way.

> * You should use getopt_long() to parse command-line options, and
>   support -h, --help :)

> * pts_lbsearch belongs into usr/lib/..., not usr/share/...
>
> * You don't implement a closest matches function:
>
>         $ command-not-found thunderbrd
>         No command 'thunderbrd' found, did you mean:
>          Command 'thunderbird' from package 'thunderbird' (main)
>         thunderbrd: command not found
>         $ ./command-not-found thunderbrd
>         thunderbrd: command not found
>
>    This one is really important. People do make typos or misremember
>    command names, so the tool needs to be able to deal with that

+1 on this, the function should be not too hard to implement in C.

>    Should be easy to implement though, although you might hav

> * You need to Conflict with command-not-found and not Break AFAIUI

Ideally, to ease the transition, you should do something about the
python APIs. If yo can keep them (either as pure-python bindings or
just as a compatible implementation) that would be a plus. If you want
to drop them then please announce that and see if anything rdepends on
it.

> * You do have to Depend on apt-file, as that configures apt to download
>   the Contents files

I didn't look at the details but I (hope) this is a build dependency
and this will be processed somewhere on the archive side.

> I think it's a worthwhile approach, and I can see it replacing
> command-not-found if those tiny issues have been fixed. Then you
> could also avoid the -ng moniker, and just take over the main
> package (if Zygmunt does not mind), which also avoids a month
> long NEW process :)

Yes, though I'd like to participate as we're working on
command-not-found improvements in snapd and would like to have
something that fits Debian, Ubuntu as well as (eventually but not
conflicting at least) Fedora and openSUSE (at least the snapd part).

Best regards
ZK

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Julian Andres Klode-4
On Tue, Nov 14, 2017 at 01:00:54PM +0100, Zygmunt Krynicki wrote:

> Hey everyone.
>
> Thank you for your interest in command-not-found.
>
> On Tue, Nov 14, 2017 at 8:50 AM, Julian Andres Klode <[hidden email]> wrote:
> > (forwarding this to ubuntu-devel-discuss and Zygmunt)
> >
> > On Mon, Nov 13, 2017 at 10:33:39PM -0800, Shawn Landden wrote:
> >> Package: command-not-found
> >> Severity: wishlist
> >>
> >> I re-wrote command-not-found to get rid of the python dependancy, and
> >> to reduce the database size, as to reduce memory usage.
> >>
> >> https://github.com/shawnl/command-not-found
> >>
> >> I was preparing to upload it to mentors as command-not-found-ng
> >
> > I also rewrote it years ago, but using the same database format,
> > just in C. It was a lot faster. I don't understand the memory usage
> > bit - it should not matter how large the database is, it's memory
> > mapped, and not read into memory, as such memory usage should be
> > roughly constant.
> >
> > Questions/Comments for your approach:
> >
> > * Did you test your format on a slow HDD with caches dropped? It
> >   must not be slower than the Python one (that one is way too slow
> >   already) - I did, it seems to be faster (0.4 vs 0.68 seconds)
> >   - I believe the database-based C rewrite was even much faster,
> >   though.
>
> > * update-command-not-found should use apt-get indextargets
>
> > * You don't store components, hence you cannot tell people to enable
> >   component. That's a very important use case for Ubuntu, where
> >   not all components are enabled by default, but the database is
> >   shipped in the package.
> >
> >   You could just append /<component> to each package name I think,
> >   and strip it away when displaying.
>
> I would love if we have a compact representation of mapping from name
> to list of bits of information where each bit can be a small structure
> with some data. Apart from components for ubuntu archive it could be
> used to store facts about snap packages, flatpaks, etc. I would try to
> avoid a simplistic command -> package mapping as that will force us to
> encode things into strings in an ad-hoc way.

That makes sense to me. But then we're back on a db, I guess. I sort
like this minimal approach. An approach of course would be to store
a key/value map after the package, something like:

        file<SEP><LEN>package name
        followed by multiple:
                <LEN>key
                <LEN>value

        where lengths are 32-bit (16 bit?) integers.

Should not be too hard. Alternatively, this also works

        file<SEP><LEN>packagename
                 <LEN> for each field
                 value for each field

        and then you can index stuff

                offset(attr_i) = offset(attr_i) + attrs[i]

Lots of options to extend.

>
> > * You should use getopt_long() to parse command-line options, and
> >   support -h, --help :)
>
> > * pts_lbsearch belongs into usr/lib/..., not usr/share/...
> >
> > * You don't implement a closest matches function:
> >
> >         $ command-not-found thunderbrd
> >         No command 'thunderbrd' found, did you mean:
> >          Command 'thunderbird' from package 'thunderbird' (main)
> >         thunderbrd: command not found
> >         $ ./command-not-found thunderbrd
> >         thunderbrd: command not found
> >
> >    This one is really important. People do make typos or misremember
> >    command names, so the tool needs to be able to deal with that
>
> +1 on this, the function should be not too hard to implement in C.
>
> >    Should be easy to implement though, although you might hav
>
> > * You need to Conflict with command-not-found and not Break AFAIUI
>
> Ideally, to ease the transition, you should do something about the
> python APIs. If yo can keep them (either as pure-python bindings or
> just as a compatible implementation) that would be a plus. If you want
> to drop them then please announce that and see if anything rdepends on
> it.

Oh, hmm.

>
> > * You do have to Depend on apt-file, as that configures apt to download
> >   the Contents files
>
> I didn't look at the details but I (hope) this is a build dependency
> and this will be processed somewhere on the archive side.

That's a Debian-only dependency forced upon us by ftpmaster, on Ubuntu
we can ship the data in the package (or preferably a separate
command-not-found-data source package).


>
> > I think it's a worthwhile approach, and I can see it replacing
> > command-not-found if those tiny issues have been fixed. Then you
> > could also avoid the -ng moniker, and just take over the main
> > package (if Zygmunt does not mind), which also avoids a month
> > long NEW process :)
>
> Yes, though I'd like to participate as we're working on
> command-not-found improvements in snapd and would like to have
> something that fits Debian, Ubuntu as well as (eventually but not
> conflicting at least) Fedora and openSUSE (at least the snapd part).

I'd like that.

--
Debian Developer - deb.li/jak | jak-linux.org - free software dev
Ubuntu Core Developer                              de, en speaker

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

John R. Lenton
On 14 November 2017 at 12:34, Julian Andres Klode <[hidden email]> wrote:

> On Tue, Nov 14, 2017 at 01:00:54PM +0100, Zygmunt Krynicki wrote:
>> I would love if we have a compact representation of mapping from name
>> to list of bits of information where each bit can be a small structure
>> with some data. Apart from components for ubuntu archive it could be
>> used to store facts about snap packages, flatpaks, etc. I would try to
>> avoid a simplistic command -> package mapping as that will force us to
>> encode things into strings in an ad-hoc way.
>
> That makes sense to me. But then we're back on a db, I guess. I sort
> like this minimal approach.

I was thinking in the other direction, was going to see how it behaved
with sqlite as the store. Would that be objectionable?

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Julian Andres Klode-4
On Tue, Nov 14, 2017 at 03:35:02PM +0000, John Lenton wrote:

> On 14 November 2017 at 12:34, Julian Andres Klode <[hidden email]> wrote:
> > On Tue, Nov 14, 2017 at 01:00:54PM +0100, Zygmunt Krynicki wrote:
> >> I would love if we have a compact representation of mapping from name
> >> to list of bits of information where each bit can be a small structure
> >> with some data. Apart from components for ubuntu archive it could be
> >> used to store facts about snap packages, flatpaks, etc. I would try to
> >> avoid a simplistic command -> package mapping as that will force us to
> >> encode things into strings in an ad-hoc way.
> >
> > That makes sense to me. But then we're back on a db, I guess. I sort
> > like this minimal approach.
>
> I was thinking in the other direction, was going to see how it behaved
> with sqlite as the store. Would that be objectionable?

Using a relational database for a simple key -> structure mapping seems
overkill and a mismatch for the problem, and the SQL does not make it
more readable.

I'd play with lmdb and kyotocabinet, these are two high-performance
key-value file databases and then encode a structure as mentioned
before.

For the text file approach, we can even go human, readable, like git:

git just encodes a number in a fixed-length decimal number, we can do
the same, and then just encode (length, key), (length, data) pairs after
each other (or as mentioned, just use the "index" as the field id, and
store field ids in the progrma). Uses a bit more space, but encodes
everything in a format you could read with a text editor, and should
not be terribly less efficient.

The thing is: This needs to be as efficient as possible: it should
be below 100ms (or better 50ms), regardless of whether caches are dropped
or not.

                Python code | Shawn's code

SSD, cache 50ms 5ms
SSD, " dropped       256ms       15ms
HDD, cache 50ms        5ms
HDD, " dropped         530ms                   15ms

I guess Shawn's code could even be improved in performance by
avoiding the subprocess execution, avoiding various ld cache
lookups and library loads.

That said, space requirements might matter too.
--
Debian Developer - deb.li/jak | jak-linux.org - free software dev
Ubuntu Core Developer                              de, en speaker

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5


On Nov 14, 2017 8:15 AM, "Julian Andres Klode" <[hidden email]> wrote:
On Tue, Nov 14, 2017 at 03:35:02PM +0000, John Lenton wrote:
> On 14 November 2017 at 12:34, Julian Andres Klode <[hidden email]> wrote:
> > On Tue, Nov 14, 2017 at 01:00:54PM +0100, Zygmunt Krynicki wrote:
> >> I would love if we have a compact representation of mapping from name
> >> to list of bits of information where each bit can be a small structure
> >> with some data. Apart from components for ubuntu archive it could be
> >> used to store facts about snap packages, flatpaks, etc. I would try to
> >> avoid a simplistic command -> package mapping as that will force us to
> >> encode things into strings in an ad-hoc way.
> >
> > That makes sense to me. But then we're back on a db, I guess. I sort
> > like this minimal approach.
>
> I was thinking in the other direction, was going to see how it behaved
> with sqlite as the store. Would that be objectionable?

Using a relational database for a simple key -> structure mapping seems
overkill and a mismatch for the problem, and the SQL does not make it
more readable.

I'd play with lmdb and kyotocabinet, these are two high-performance
key-value file databases and then encode a structure as mentioned
before.
I had some kyotocabinet code, (i maintain that package, which btw is in mentors) but this way is at least half the size. (Kyotocabinet is 1mb and it almost doubles the size of the db, even using lower overhead b-tree back end. These entries are just very small.

For the text file approach, we can even go human, readable, like git:

git just encodes a number in a fixed-length decimal number, we can do
the same, and then just encode (length, key), (length, data) pairs after
each other (or as mentioned, just use the "index" as the field id, and
store field ids in the progrma). Uses a bit more space, but encodes
everything in a format you could read with a text editor, and should
not be terribly less efficient.

The thing is: This needs to be as efficient as possible: it should
be below 100ms (or better 50ms), regardless of whether caches are dropped
or not.

                Python code     |       Shawn's code

SSD, cache              50ms                    5ms
SSD, " dropped         256ms                   15ms
HDD, cache              50ms                    5ms
HDD, " dropped         530ms                   15ms

I guess Shawn's code could even be improved in performance by
avoiding the subprocess execution, avoiding various ld cache
lookups and library loads.
I am going to have to bring it in process to add the spell check code.

That said, space requirements might matter too.
--
Debian Developer - deb.li/jak | jak-linux.org - free software dev
Ubuntu Core Developer                              de, en speaker

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5
In reply to this post by Julian Andres Klode-4
On Mon, Nov 13, 2017 at 11:50 PM, Julian Andres Klode <[hidden email]> wrote:
(forwarding this to ubuntu-devel-discuss and Zygmunt)

On Mon, Nov 13, 2017 at 10:33:39PM -0800, Shawn Landden wrote:
> Package: command-not-found
> Severity: wishlist
>
> I re-wrote command-not-found to get rid of the python dependancy, and
> to reduce the database size, as to reduce memory usage.
>
> https://github.com/shawnl/command-not-found
>
> I was preparing to upload it to mentors as command-not-found-ng

I also rewrote it years ago, but using the same database format,
just in C. It was a lot faster. I don't understand the memory usage
bit - it should not matter how large the database is, it's memory
mapped, and not read into memory, as such memory usage should be
roughly constant.
 
Questions/Comments for your approach:

* Did you test your format on a slow HDD with caches dropped? It
  must not be slower than the Python one (that one is way too slow
  already) - I did, it seems to be faster (0.4 vs 0.68 seconds)
  - I believe the database-based C rewrite was even much faster,
  though.
Yes, as the disk IO is all the time, I think its best to keep the file size small. Then it has more chance of staying in memory.
* update-command-not-found should use apt-get indextargets
fixed
* You don't store components, hence you cannot tell people to enable
  component. That's a very important use case for Ubuntu, where
  not all components are enabled by default, but the database is
  shipped in the package.

  You could just append /<component> to each package name I think,
  and strip it away when displaying.
fixed
* You should use getopt_long() to parse command-line options, and
  support -h, --help :)
fixed
* pts_lbsearch belongs into usr/lib/..., not usr/share/...
the seperate binary is gone

* You don't implement a closest matches function:

        $ command-not-found thunderbrd
        No command 'thunderbrd' found, did you mean:
         Command 'thunderbird' from package 'thunderbird' (main)
        thunderbrd: command not found
        $ ./command-not-found thunderbrd
        thunderbrd: command not found

   This one is really important. People do make typos or misremember
   command names, so the tool needs to be able to deal with that

   Should be easy to implement though, although you might have to
   search multiple times - once for each alternative. All you need is

        def similar_words(word):
            """ return a set with spelling1 distance alternative spellings

                based on http://norvig.com/spell-correct.html"""
            alphabet = 'abcdefghijklmnopqrstuvwxyz-_'
            s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
            deletes    = [a + b[1:] for a, b in s if b]
            transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
            replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]
            inserts    = [a + c + b     for a, b in s for c in alphabet]
            return set(deletes + transposes + replaces + inserts)

    And search for what that returns. And you don't need to search for those
    at all if you have a direct match.

fixed, and I believe bit-for-bit identical
* It needs to be translated - also very important.
I made a pot file and used translations from the python version, but I can't get my app to look for translations (as examined through strace). I read the gettext manual and do not know what I am doing wrong.

* You need to Conflict with command-not-found and not Break AFAIUI

fixed
* You should not depend on grep, sed, coreutils, they are Essential.

fixed, now it uses ruby as my shell was hacky.
* You do have to Depend on apt-file, as that configures apt to download
  the Contents files

fixed
* You should not have identifiers starting with _ in the program, these
  are reserved for the C implementation (like _cleanup_free_).

fixed
Yes, and these are basically the same reasons my C prototype is
not in the archive. Also, I did not put a lot of work into it, as
I was waiting for PackageKit to take that over, but that was not
done yet.

I think it's a worthwhile approach, and I can see it replacing
command-not-found if those tiny issues have been fixed. Then you
could also avoid the -ng moniker, and just take over the main
package (if Zygmunt does not mind), which also avoids a month
long NEW process :)

--
Debian Developer - deb.li/jak | jak-linux.org - free software dev
Ubuntu Core Developer                              de, en speaker

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5

* Did you test your format on a slow HDD with caches dropped? It
  must not be slower than the Python one (that one is way too slow
  already) - I did, it seems to be faster (0.4 vs 0.68 seconds)
  - I believe the database-based C rewrite was even much faster,
  though
I tested with kyotocabinet backend and it was slower with dropped caches on a hard drive (1 second), which is the slow case I am most concerned with. Small  makes a difference. The code is at https://github.com/shawnl/command-not-found/tree/kyotocabinet
Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Colin Watson-2
In reply to this post by Shawn Landden-5
On Thu, Nov 16, 2017 at 05:10:19PM -0800, Shawn Landden wrote:
> On Mon, Nov 13, 2017 at 11:50 PM, Julian Andres Klode <[hidden email]>
> wrote:
> > * It needs to be translated - also very important.
>
> I made a pot file and used translations from the python version, but I
> can't get my app to look for translations (as examined through strace). I
> read the gettext manual and do not know what I am doing wrong.

Looking at
https://github.com/shawnl/command-not-found/blob/master/command-not-found.c,
your problem appears to be that you aren't calling setlocale().  You
should normally call this before calling bindtextdomain() and
textdomain():

  setlocale(LC_ALL, "");

(The gettext manual does cover this, but possibly you were looking at
some different bit of it.)

--
Colin Watson                                       [[hidden email]]

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5
On Thu, Nov 16, 2017 at 6:44 PM, Colin Watson <[hidden email]> wrote:
On Thu, Nov 16, 2017 at 05:10:19PM -0800, Shawn Landden wrote:
> On Mon, Nov 13, 2017 at 11:50 PM, Julian Andres Klode <[hidden email]>
> wrote:
> > * It needs to be translated - also very important.
>
> I made a pot file and used translations from the python version, but I
> can't get my app to look for translations (as examined through strace). I
> read the gettext manual and do not know what I am doing wrong.

Looking at
https://github.com/shawnl/command-not-found/blob/master/command-not-found.c,
your problem appears to be that you aren't calling setlocale().  You
should normally call this before calling bindtextdomain() and
textdomain():

  setlocale(LC_ALL, "");

(The gettext manual does cover this, but possibly you were looking at
some different bit of it.)
Managed to re-use all the translations from launchpad of the existing command-not-found.

--
Colin Watson                                       [[hidden email]]

Xen
Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Xen
In reply to this post by Julian Andres Klode-4
Julian Andres Klode schreef op 14-11-2017 8:50:

> * You should not depend on grep, sed, coreutils, they are Essential.

Can I ask what this means?

I actually assume that these dependencies are not *required*, not that
you can't use the tools.

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5

On Thu, Nov 16, 2017 at 11:39 PM, Xen <[hidden email]> wrote:
Julian Andres Klode schreef op 14-11-2017 8:50:

* You should not depend on grep, sed, coreutils, they are Essential.

Can I ask what this means?

I actually assume that these dependencies are not *required*, not that you can't use the tools.
Required: yes. The highest priority. sysvinit was Required: yes until systemd came along https://www.debian.org/doc/debian-policy/#priorities

Speaking of, I can't use 'apt-get indextargets' from shell and had to rewrite in ruby, because sed doesn't not support lazy matching, and I don't know how else to match NOT \n\n. (it also doesn't seem to support multiples of submatches.) Old regular expression implementations are showing their age (not to mention perl's non-regular features).
Xen
Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Xen
Shawn Landden schreef op 17-11-2017 8:55:

> On Thu, Nov 16, 2017 at 11:39 PM, Xen <[hidden email]> wrote:
> Julian Andres Klode schreef op 14-11-2017 8:50:
>
> * You should not depend on grep, sed, coreutils, they are Essential.
> Can I ask what this means?
>
> I actually assume that these dependencies are not *required*, not that
> you can't use the tools.

Required: yes. The highest priority. sysvinit was Required: yes until
systemd came along https://www.debian.org/doc/debian-policy/#priorities

----------

What I mean is that since they are already on a system you don't have to
require them.

"Packages are not required to declare any dependencies they have on
other packages which are marked Essential (see below), and should not do
so unless they depend on a particular version of that package. [4]"

 From the same document.


As to sed, yeah it has issue with multi-line matching anyway.

Some people use perl to achieve the same.

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Julian Andres Klode-4
In reply to this post by Shawn Landden-5
On Thu, Nov 16, 2017 at 11:55:07PM -0800, Shawn Landden wrote:

> On Thu, Nov 16, 2017 at 11:39 PM, Xen <[hidden email]> wrote:
>
> > Julian Andres Klode schreef op 14-11-2017 8:50:
> >
> > * You should not depend on grep, sed, coreutils, they are Essential.
> >>
> >
> > Can I ask what this means?
> >
> > I actually assume that these dependencies are not *required*, not that you
> > can't use the tools.
>
> Required: yes. The highest priority. sysvinit was Required: yes until
> systemd came along https://www.debian.org/doc/debian-policy/#priorities

What it actually means is that you don't have to declare them in Depends
fields. And required is a priority, that's distinct. Essential basically
is the set of packages dpkg needs for its own operation.

>
> Speaking of, I can't use 'apt-get indextargets' from shell and had to
> rewrite in ruby, because sed doesn't not support lazy matching, and I don't
> know how else to match NOT \n\n. (it also doesn't seem to support multiples
> of submatches.) Old regular expression implementations are showing their
> age (not to mention perl's non-regular features).

Ruby is just a major no go. At that system level, the best choices
are Perl, Shell, and C++. Maybe Python (on Ubuntu it's in ubuntu-minimal,
but in Debian it's only used by standard priority and less, perl on the
other hand is required and essential). Ruby has the lowest priority
- optional.

--
Debian Developer - deb.li/jak | jak-linux.org - free software dev
Ubuntu Core Developer                              de, en speaker

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5

Ruby is just a major no go.
Re-written in C.

And in the future, what about Lua? It is only 300KB.
At that system level, the best choices
are Perl, Shell, and C++. Maybe Python (on Ubuntu it's in ubuntu-minimal,
but in Debian it's only used by standard priority and less, perl on the
other hand is required and essential). Ruby has the lowest priority
- optional.

--
Debian Developer - deb.li/jak | jak-linux.org - free software dev
Ubuntu Core Developer                              de, en speaker

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5
In reply to this post by Julian Andres Klode-4
On Mon, Nov 13, 2017 at 11:50 PM, Julian Andres Klode <[hidden email]> wrote:

> (forwarding this to ubuntu-devel-discuss and Zygmunt)
>
> On Mon, Nov 13, 2017 at 10:33:39PM -0800, Shawn Landden wrote:
>> Package: command-not-found
>> Severity: wishlist
>>
>> I re-wrote command-not-found to get rid of the python dependancy, and
>> to reduce the database size, as to reduce memory usage.
>>
>> https://github.com/shawnl/command-not-found
>>
>> I was preparing to upload it to mentors as command-not-found-ng
>
> I also rewrote it years ago, but using the same database format,
> just in C. It was a lot faster. I don't understand the memory usage
> bit - it should not matter how large the database is, it's memory
> mapped, and not read into memory, as such memory usage should be
> roughly constant.
>
> Questions/Comments for your approach:
>
> * Did you test your format on a slow HDD with caches dropped? It
>   must not be slower than the Python one (that one is way too slow
>   already) - I did, it seems to be faster (0.4 vs 0.68 seconds)
>   - I believe the database-based C rewrite was even much faster,
>   though.
I switched it to mmap() and am now getting 0.27-0.45 with caches
dropped, even after adding translations. It is 100% C and sh. (same
postinst and postrm)

Ping.

Reply | Threaded
Open this post in threaded view
|

Bug#881692: command-not-found: I re-wrote command-not-found

Shawn Landden-5
I re-wrote command-not-found in C. It consists of two C programs: command-not-found, which gets triggered by bash, and update-command-not-found, which digests the data obtained with apt-file update.

AFAIK there is only one rough edge, in that the parsing of /etc/apt/sources.list is not the same as apt's parsing. I do not know enough C++ to use libapt to do this.

https://github.com/shawnl/command-not-found/

-Shawn Landden
12