our repository grew to 0.5 gig - can we reduce that somehow?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

our repository grew to 0.5 gig - can we reduce that somehow?

Steffen Möller
Hello,

we should probably be proud of it, but I admit to also find it rather irritating that we have accumulated now half a gig on Debian
folders for Debian Med.

$ du -sh debian-med/
404M    debian-med/

I just checked out the whole tree and, well, a couple of years ago this was my month's download limit. In short: I think we are in
dire need to cut bits out.

My suggestion is to go for the tags, which I never understood or liked, in particular since I today (many thanks to Charles) for
the first time made good use of Debian's snapshot machine http://snapshot.debian.org/ . This would bring it all down to around 130MB.

I don't know about how much bandwidth this would save our servers, but the least I can tell is that it would save me some time and
probably would help to shy off newbies a bit less to have the full thing on their laptop.

Other ideas? Or should we just not care?

Best,

Steffen


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/4C44BE88.4070906@...

Reply | Threaded
Open this post in threaded view
|

Re: our repository grew to 0.5 gig - can we reduce that somehow?

Andreas Tille-6
On Mon, Jul 19, 2010 at 11:07:20PM +0200, Steffen Möller wrote:
> we should probably be proud of it, but I admit to also find it rather irritating that we have accumulated now half a gig on Debian
> folders for Debian Med.
>
> $ du -sh debian-med/
> 404M    debian-med/
>
> I just checked out the whole tree and, well, a couple of years ago this was my month's download limit. In short: I think we are in
> dire need to cut bits out.

Well, comparing with "a couple of years ago" just sucks in computer
science, right? ;-)
However, I see your point.

So I did a

  du -s * | grep -e "^[5-9][0-9]\{3\}" -e "^[0-9]\{5\}"

in the packages dir to find out which actual packages are needing the
most of the space.  I decided to keep only the last two or three tags of
libgenome, arb, gnumed-client and gnumed-server which I more or less
maintained on my own.
 
> My suggestion is to go for the tags, which I never understood or liked, in particular since I today (many thanks to Charles) for
> the first time made good use of Debian's snapshot machine http://snapshot.debian.org/ . This would bring it all down to around 130MB.

While I know snapshot.debian.org it comes sometimes handy to have old
packaging stuff when beeing offline.  However I surely agree that this
very old stuff is probably really rarely used and if you sometimes are
grepping the SVN for some solution it just consums time and bloats the
output.  If I remember correctly emboss is now in git and can be removed
from SVN anyway.  if we use the "keep only the latest tags" approach we
probably have a reasonable compromise.
 
> I don't know about how much bandwidth this would save our servers, but the least I can tell is that it would save me some time and
> probably would help to shy off newbies a bit less to have the full thing on their laptop.
>
> Other ideas?

I hope you like my idea and it helps a bit.

> Or should we just not care?

We should care if a member of our team sees a problem. ;-)

Kind regards

     Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20100719212858.GB13473@...

Reply | Threaded
Open this post in threaded view
|

Re: our repository grew to 0.5 gig - can we reduce that somehow?

Charles Plessy-12
In reply to this post by Steffen Möller
Dear Steffen and Andreas,

I do not recommend to remove the tags: they are useful information and on the
server side, they are not duplicating data. The problem is how have lean
checkouts without the tags.

Some packaging teams (pkg-perl, in particular) use an alternate structure for
their repository, whith an early split for tags and trunks. In that case, it is
possible to check out all the trunks for all the pakcages, but it is more
difficult to check out the tags and the trunk together for one single package.
I think that we discussed already about the possibility of migrating to this
layout, and concluded that nobody has time for this.

There are actually only a couple of pakcages that take a lot of space. In the
case of EMBOSS I transferred SVN's commits in a git repository, and then
continued the development there. I just deleted the remaining SVN contents
(~60 Mb). Sorry for having forgotten: I wanted to wait for the package with the
updated VCS fields to migrate to testing, and then lost the momentum.

Back to the problem of checking out packages without tags, I think that we can
use the ‘mr‘ tool to achieve this. I have made a bit of experimentation in the
context of the Euclayptus packaging team. If you are interested, you can have
a look at the following wiki page:

http://wiki.debian.org/pkg-eucalyptus#Cross-repositorycheckout

We could also have for Debian Med a single command that would check out all our
packages, regardless of the repository. In parallel, the Blends metapackages
could also include a mr configuration file to check out all the pakcages
installed by the Blends metapackage. There would be a couple of design
decisions to take (for instance, would we require that environment variables
DEBEMAIL and DEBFULLNAME are available ?), and this could be discussed
on the Blends mailing list.

Have a nice day,

--
Charles


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20100719233531.GA25751@...

Reply | Threaded
Open this post in threaded view
|

Re: our repository grew to 0.5 gig - can we reduce that somehow?

Andreas Tille-5
On Tue, Jul 20, 2010 at 08:35:31AM +0900, Charles Plessy wrote:
> I do not recommend to remove the tags: they are useful information and on the
> server side, they are not duplicating data.

Well, this might be different from case to case.  I'd call a sequence of
tags containing just a new changelog entry "New upstream version" more
or less redundant.  In case you might care for historical data I would
consider snapshot.debian.org as more reliable, because only what showed
up there was really released.  People might have forgotten to tag a
release or a tagged release might have been rejected for whatever reason.

For me our SVN as a whole is rather some kind of knowledge base how
people have solved similar problems I might have.  Most of the tags do
not really add additional knowledge and thus are rather noise than
information if you are seeking some solution.  So handling the tagging
just for the sake of completeness does not make much sense to me.  It
would probably be better to keep only those tags where some basic
restructuring in the packaging might have happened.  IMHO the Debian
packaging repository is not really a continuos development tree as in
software development but a discrete number of development states which
has frequently not to many changes from step to step.  Not storing some
steps inbetween while beeing able to reconstruct them from a different
source (snapshots.debian.org) seems to be a reasonable compromise to
me (in case I do not have overlooked some important thing).

> The problem is how have lean checkouts without the tags.

That's correct in any case.
 
> Some packaging teams (pkg-perl, in particular) use an alternate structure for
> their repository, whith an early split for tags and trunks. In that case, it is
> possible to check out all the trunks for all the pakcages, but it is more
> difficult to check out the tags and the trunk together for one single package.

But IMHO this is not the problem here.  Steffen would probably be happy
to have only trunk.  And we actually *have* the trunk directory in the
root of our SVN and thus we could move to the layout as it is used in
other teams.

> I think that we discussed already about the possibility of migrating to this
> layout, and concluded that nobody has time for this.

Well, thinking again about this:  I would not have time to do this
migration manually, there are two options:

  1. Somebody writes some code which does the migration automatically.
  2. We might consider a "soft migration" by just moving the tags for every
     package you are touching anyway.  I somehow have some preference
     for this kind of moves because it isolates those packages which are
     not touched for a long time and probably need some care.
 

> Back to the problem of checking out packages without tags, I think that we can
> use the ???mr??? tool to achieve this. I have made a bit of experimentation in the
> context of the Euclayptus packaging team. If you are interested, you can have
> a look at the following wiki page:
>
> http://wiki.debian.org/pkg-eucalyptus#Cross-repositorycheckout
>
> We could also have for Debian Med a single command that would check out all our
> packages, regardless of the repository. In parallel, the Blends metapackages
> could also include a mr configuration file to check out all the pakcages
> installed by the Blends metapackage.

Care to provide such a configuration file template?  I have never dived
into mr and will not have time to do so in the near future but the idea
sound neat.

> There would be a couple of design
> decisions to take (for instance, would we require that environment variables
> DEBEMAIL and DEBFULLNAME are available ?), and this could be discussed
> on the Blends mailing list.

Yes.  I would be happy if you would foreward this issue to the Blends list.

Kind regards

         Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20100720072726.GA1097@...

Reply | Threaded
Open this post in threaded view
|

Re: our repository grew to 0.5 gig - can we reduce that somehow?

Michael Banck
In reply to this post by Charles Plessy-12
On Tue, Jul 20, 2010 at 08:35:31AM +0900, Charles Plessy wrote:
> I do not recommend to remove the tags: they are useful information and on the
> server side, they are not duplicating data. The problem is how have lean
> checkouts without the tags.

If you are using 'UNRELEASED' as distribution during development, and
just finalize the changelog before upload (e.g. via dch -r) and commit
that changelog edit with a standardized commit message (e.g. "Final
changelog for foo_1.2-3"), then it is also reasonably easy to check out
a particular revision without needing tags.  Having tags doesn't hurt,
though (except for the reasons mentioned in this thread), of course


Michael


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20100720081347.GD20556@...