Данилово блогче

Wed, 28 Apr 2010

Last year we introduced a feature we internally called 'message sharing': basically a mechanism to directly share translations between different releases of a same project or of the same distribution.

That was a huge improvement in both usability (IMO, at least: you translate a string in one release and it's instantly translated in all the others) and allowed us to make Launchpad Translations much more scalable (this one is very tangible). Eg. compared to one full week it took us to "open" a new Ubuntu release for translations, it took us full 25 minutes to do that for Karmic and 45 minutes for Lucid.

However, while 'message sharing' has reduced duplication of efforts a lot, it still happens: translators work at the same time upstream and in Ubuntu, and might be translating exactly the same strings.

What can we do to solve that?

Importing latest upstream translations

Well, first off, Launchpad doesn't even know about latest upstream translations. What it gets is upstream translations as they were packaged in a tarball that is the base of an Ubuntu package.

However, that might mean very old translations. For instance, perhaps there was no Ubuntu package re-upload for 3 months. Translations upstream usually get committed directly to a VCS. They'll flow into Ubuntu only when they get packaged into tarballs, and those tarballs become basis for a new package in Ubuntu.

Today, maintainers decide when to release translations to the world, and packagers decide what upstream releases go to Ubuntu users

This means that there are two high bars for translations to flow over before they can get into Ubuntu:

  • Upstream maintainers need to release a tarball with updated translations
  • Ubuntu packagers need to prepare updated packages from these tarballs, and sometimes they can't even do that (without merging from VCS directly, because upstream might not be releasing 'translations updates' tarballs)

How about we eliminate these blockers with Launchpad?

So, we want Launchpad to directly import upstream translations from their VCS of choice. Luckily, we can depend on our amazing Launchpad Code team and Bazaar community to provide us with a bzr branch no matter what the upstream VCS of choice is. And we already have imports from bzr branches, so we are all set, right?

Well, not exactly. Projects don't like to keep their generated files in their repos. And for upstream projects, we can't really ask them to (since we know it's a bad idea anyway). So, we need to be able to generate templates (POT files) on the fly.

However, that is a very touchy job which depends on the upstream. I.e. it's not the same thing if you are generating a template for GNOME, KDE or regular GNU (gettext-using) project. And many a script that needs to be run to do this could be very risky: intltool itself has a number of obvious implementation details such that any upstream committer would be able to take over the machine it was run on. So, this has to happen in a safe, sandboxed environment.

Not surprisingly, Launchpad already has this with Soyuz. We just need to slightly modify it so we can run template generation jobs on it.

We've split this into two separate steps: developing a library that allows us to generate templates for a particular source code layout (module named "pottery" inside the LP tree, currently only supporting intltool layouts), and working on the infrastructure to run these on the existing Launchpad build farm.

After translations are committed to upstream VCS, we should import them into Launchpad asap

We are in the process of doing extensive QA on this code, and we expect to roll it out next week. But, this is just a step of our bigger vision.

As a side-note, this feature will also be useful for intltool-based projects hosting their code and translations in Launchpad: they won't have to keep POT files committed either.

In Ubuntu or in Launchpad

We could have gone one route and simply imported these upstream translations directly into Ubuntu. It'd be a big win, but it wouldn't work very well for those upstreams which are already in Launchpad. And, since we are looking a bit further into the future, there are other drawbacks to that approach as well (like being able to send translations back upstream).

So, we decided that it's best to import them directly into Launchpad projects, keep their upstream templates there for the future, but keep those translations read-only.

Now, Launchpad internal database model already has a sort of definition of "upstream", though it was never exactly so (which is why we always struggled with the name: over time, the term went from "published" to "imported", and now finally to "upstream").

Through many discussions on different approaches, we decided to go with the fix is_imported flag one.

This will enable us to share translations directly between upstreams in LP (and because of the feature that we are QAing right now, we'll have latest upstream translations in there already, no matter where project is hosted) and Ubuntu source packages.

The way we are going about this is very similar to message sharing we have today. It's just that now different privileges come into action as well, making it all suitably more complex to handle.

This is something that we are actively working on, and something that we hope to deliver in May.

Pushing latest imported translations into regular Ubuntu language pack updates is the final stage

Before we can even consider calling this done, we'll have to do a lot of testing. And we'll need help from community to get everything set-up. First thing to do is to go around Launchpad and make sure that for every source package with translations in Ubuntu there is a linked upstream project, and that upstream project has a trunk branch that syncs with the latest upstream source code.

Next, we'll really need some serious QA to happen. If you are no stranger to Python code, checking out Launchpad tree and trying out pottery on all the intltool branches you can think of would be very useful input.

Or, if there is your favourite i18n layout that you'd like us to support, extending pottery and our auto-approver to deal with it would be a very welcome addition.

Even going ahead and splitting pottery into a separate branch and module would be nice, because it would make it more re-usable (for instance, it could then be used in GNOME's damned-lies) and easier to extend for people not directly interested in Launchpad.

And... How about giving back?

Ubuntu will get latest translations from upstreams then, which is all pretty neat. But, how about contributing the translation fixes back as well?

That is a natural next step. Having the latest templates and translations in Launchpad will allow us to generate very precise diffs between Ubuntu and upstream translations (i.e. we'll know what string is Ubuntu-specific, and we'll know which translations are newer). Then, we'll have to figure out how to submit those upstream?

Should that happen automatically or should it be user-initiated? How will Launchpad talk to each of the upsterams? Launchpad should talk to every upstream as they prefer it, and that may mean per-project, per-translation-team policies. But, I'll come back to this topic once we have the foundation done with getting latest upstream translations into Ubuntu.

[10:17] | [] | # | G | | TB
Nice to see this moving forward! :)
— Posted by Peteris Krisjanis at Wed Apr 28 12:16:41 2010
In my view, the failure is built into the very foundations here.

The idea that there are 'Ubuntu translations' vs 'upstream translations' is the wrong premise on which you build this house of cards.

Translations are an integral part of development, and should happen upstream, just like everything else.

Building an elaborate workflow to move translations back and forth is just an attempt to cover that up.
— Posted by Matthias Clasen at Wed Apr 28 12:52:09 2010
Matthias, well, it's as wrong as saying that 'Ubuntu code' vs 'upstream code' is wrong. Yes, ideally, all development would happen upstream, but unfortunately, it doesn't happen.

At least not with "bigger" distributions like Ubuntu, Fedora, Debian, OpenSuse: they all have their code patches against the upstream tarball releases.

One can decide they don't like the reality, but it doesn't make it not be real.
— Posted by Danilo at Wed Apr 28 13:00:43 2010
And not to mention that upstreams never "maintain" an old stable release after a new one comes out. With this complex "unnecessary" machinery, we can get latest translation updates even into Ubuntu Hardy which includes Gnome 2.22.
— Posted by Danilo at Wed Apr 28 13:09:13 2010

You provide translation updates as part of LTS updates on a regular basis?

For gnome... how many times have you rolled updated translations into packages for gnome packages in Hardy?  Once a quarter? Once every 6 months?

Or more generally speaking... how much of the translation work that is going on in Launchpad would not be acceptable to external upstream translation teams? 10%? 40%? Does the LTS relevance of this machinery live in the 1% of launchpad translator activity that is currently happening?  if so...then yes.. you are building an overly complex machinery for dealing with LTS translation "patchsets"

Speaking of patchsets.  As a percentage of lines of code.. if you took all the published packages and examined the applied patches...what is the percentage of distro specific patching that any distributor does? 1%? 10%

In the gnome project codebase right now... how much patching does ubuntu do prior to release versus post release? How much of that is submitted to upstream?

Now how much translation patching as a percentage of translatable strings does Ubuntu do? How much of that is prior to Ubuntu release versus via updates? How much of that is submitted to upstream?

Let's have a data-centric view of what the actual workflow to ground some of the discussion about what an efficient workflow should actually look like.
— Posted by Jef Spaleta at Wed Apr 28 19:57:41 2010
Jeft, those are all nice questions to ask, but I only have some data that you wonder about.  Hardy translation updates happen more often early on after the release, and only maybe once every six months today (eg, the last one we did was in January: http://packages.ubuntu.com/hardy-updates/language-pack-sr-base). However, this is not only about LTSes: other distribution releases get translation updates way after they initial release as well, and way after upstream decides to stop releasing translation updates.

With "message sharing", we also increase the activity over all distribution releases (because translators have to fix stuff only once in any series). That means that we get translation updates to Karmic today even though you might be working only on Lucid.

As far as how much translation activity going on in Launchpad that would not be acceptable to upstream, there is no metric you can have (it involves knowing each and every language, and fighting history). You can guess, though. It's likely more than 10%, but also lot less than 40%. There are things we can (and do) measure that will implicitely tell you about these things, but nothing will give you a clear cut answer.

As far as the amount of translations though, lot more than 1% of them come through Launchpad. It's more like 10%. For example, in our database we have something like 5% of strings that have first appeared in Launchpad, and then appeared upstream as well (this is across all of LP, and when restricted to lucid as well). Of all the translation submissions we have, another 20% originate in Launchpad (for Lucid at least).

Today most of it goes upstream as well, but it's hard to know exactly where something got in first (i.e. it might be that someone has uploaded a translation to LP and committed it upstream at about the same time, and actually, I think that's quite common). I also know quite a few people that don't work in Launchpad because it's so arcane to get translations back upstream. They still do prefer the Launchpad workflow over the upstream one. Most of the very active Launchpad translators are upstream translators as well, so it's hard to come up with good numbers about where they do most of their translations.

I don't have data on the actual code patches.
— Posted by Danilo at Thu Apr 29 10:03:08 2010
A note on "upstream" wrt. KDE. Some languages use internal workflow similar to message sharing in intent, but implemented such that packaged POs (or branch POs within VCS) are not the actual source POs ("summit" POs within VCS). A summit PO is most of the time named same as its dependent branch POs, but not always. When deriving branch POs from a summit PO, messages may undergo some automatic processing, which is language specific and generally not reversible. In this setup, I think that automatic upstreaming of translations (based on packaged/VCS branch POs) is not practically feasible.

One probably quick and easy (though not totally trivial) helper for upstreams could be making available the up-to-date compendium PO of all Launchpad translations for a given language, at a given fixed link. Possibly also project-specific compendiums for sufficiently big aggregations (Gnome, KDE, etc.) Then upstream translators could use them in various workflows as fitting.
— Posted by Chusslove Illich at Thu Apr 29 11:35:28 2010
Chusslove, thanks for the input: sounds very interesting. Is this something like conversion to Latin and Jekavian (Ijekavian?) for Serbian? Or is that post-processing that happens as part of the built-in scripting language for translations in KDE?
(I don't know enough about it, so I might be totally off-base here :)
— Posted by Danilo at Thu Apr 29 16:47:37 2010
For Serbian, the summit-to-branches ("scatter") processing is indeed primarily about conversion to script and dialect combinations, but there are few other things, like expansion of XML-like entities or substitution of ordinary with nobreak hyphens on case endings (to acronyms). And, last I've seen, in Norwegian they even do some ortographic modifications. But, this processing on scatter is only a secondary convenience, the primary purpose of the summit system is to smooth out handling of translation branches (http://techbase.kde.org/Localization/Workflows/PO_Summit), like translation sharing in Launchpad.

The (runtime) translation scripting bit is technically orthogonal to previous, and thus also to Launchpad operations, no issues there.
— Posted by Chusslove Illich at Thu Apr 29 18:28:11 2010
Oh, that sounds pretty cool. It should be technically possible to import directly from the "summit" POs into Launchpad, though it will involve a bit more work.

The conversion itself would be a different topic, though, and we are not yet in a position to tackle such things.
— Posted by Danilo at Thu Apr 29 19:48:41 2010
<a href=http://www.coachbagoutletmise.com/%E3%82%B3%E3%83%BC%E3%83%81-%E5%8C%96%E7%B2%A7%E3%83%9D%E3%83%BC%E3%83%81-%E3%82%BB%E3%83%BC%E3%83%AB-22.html>銈炽兗銉?銈兗銉夈偙銉笺偣 </a>

<a href=http://www.coachsaifuoutlet.com/products_new.html>www.coachsaifuoutlet.com </a>
— Posted by EmpottDob at Wed Sep 19 08:27:58 2012





Danilo Segan

This is blog (web log) of Danilo Šegan (or Данило Шеган).


< April 2010 >
    1 2 3 4
5 6 7 8 91011

My study page
Friends' Blogs
alex (en)
bc (en)
Bojan Živanović (sr)
Carlos (en)
Goran (sr)
imp (sr)
lilit (sr)
Oskuro (en)
Zombie (sr/en)