proposal for splitting translation tests into a separate project

Discussion:

David Shea

2015-11-19 16:29:40 UTC

Sometimes we make a Fedora release in which translations crash anaconda,
and that sucks, and I'd like us not to do that. I have the beginnings of
an idea of how not to do that and I'd like some feedback.

We have a handful of tests against translations in anaconda, but it's
not just anaconda that's translated. Given that errors in translations
in any library used by anaconda can crash anaconda, I feel that it's
important to test the translations in all of the code we maintain to
make sure that mistakes from translators aren't going to make anything
blow up.

The transifex/zanata model is nice in that the code repository isn't
crudded up with constant translation commits, but its biggest pitfall is
that translations, and mistakes in translations, can be changed right up
until the moment we create a release tarball. For that reason I'd like
the tests to be run both as part of the jenkins CI and just before a
release is built. The input would be the source tarball: the output of
make dist for anaconda, make archive for blivet, etc.

This would not be as simple as moving tests/gettext/ into its own repo
and adding a script or two. Most of the tests in tests/gettext/ test
translatable strings rather than translated strings, and tests against
translatable strings are mostly anaconda-specific.

The tests I would like to move out of anaconda and run against every
project are:
- gettext/gettext_warnings.sh, which mostly catches busted format
strings (including errors on our end where the string needs named
parameters)
- the translation portion of glade/check_markup.py
(I could go either on this, since it should really only be
anaconda that has markup strings, but I think it should at least be
something that is run at package build time)
- a new test in reaction to 1283599 in which the translator screwed
up the Plural-Forms header in lt.po in blivet.

Which leave these staying behind:
- click.py, which ensures "clickable" anaconda info bars have
something to click
- contexts.py, which ensures that strings with keyboard accelerators
have a translation context. Stuff from outside anaconda shouldn't be
adding keyboard accelerators
- gettext_potfiles.py, which ensures we keep POTFILES.in up to date.
anaconda is the only one using autotools so the only one with a POTFILES.in
- style_guide.py, which checks that we say "host name" instead of
"hostname" and stuff like that. I dunno, maybe every project should get
that treatment? Argue your case.

Discussion question: should this stuff go in a for-real package in
Fedora? The pros would be easier release-time testing (could be an
automatic part of rc-release or similar), integration with make ci, and
Chris wouldn't need to configure any new stuff in jenkins. Cons would be
that there's another package in Fedora and that testing translations in
make ci would be adding yet more things out of our control.

As far as how all of this looks, I was thinking just a directory of
checkers (maybe convert gettext_warnings.sh to python for consistency)
and a script that runs them all. I suppose how the script finds the
checkers depends on how real of a program we want it to be, whether it
would need an argument to the directory or could just search via python
imports.

Leslie S Satenstein

2015-11-19 16:53:31 UTC

Permalink

Hi David
Yes, you are right. But is it necessary to test all languages? I think not.Â One should test a Latin Language like French or Spanish, simply because it takes up to 1/3 more characters to express oneself in French than it does in English.Â The 5 Latin languages can be compared to see which of them takes the most text, and use that language as representative of the others.
For the non-Latin languages, one should choose a language that is most wordy.Â How to choose?Compare the lengths in words / pages for release notes, or system-admin guides.Â
The longest one should be the third language chosen for testing.
And perhaps a coding practice should have rules for text buffers to prevent overflows. Those overflows, I assume, are the reasons for the crashing you mentioned.
Â Regards
Â Leslie
Mr. Leslie Satenstein
MontrÃ©al QuÃ©bec, Canada

From: David Shea <***@redhat.com>
To: anaconda-devel-***@redhat.com
Sent: Thursday, November 19, 2015 11:29 AM
Subject: proposal for splitting translation tests into a separate project

Sometimes we make a Fedora release in which translations crash anaconda,
and that sucks, and I'd like us not to do that. I have the beginnings of
an idea of how not to do that and I'd like some feedback.

We have a handful of tests against translations in anaconda, but it's
not just anaconda that's translated. Given that errors in translations
in any library used by anaconda can crash anaconda, I feel that it's
important to test the translations in all of the code we maintain to
make sure that mistakes from translators aren't going to make anything
blow up.

The transifex/zanata model is nice in that the code repository isn't
crudded up with constant translation commits, but its biggest pitfall is
that translations, and mistakes in translations, can be changed right up
until the moment we create a release tarball. For that reason I'd like
the tests to be run both as part of the jenkins CI and just before a
release is built. The input would be the source tarball: the output of
make dist for anaconda, make archive for blivet, etc.

This would not be as simple as moving tests/gettext/ into its own repo
and adding a script or two. Most of the tests in tests/gettext/ test
translatable strings rather than translated strings, and tests against
translatable strings are mostly anaconda-specific.

The tests I would like to move out of anaconda and run against every
project are:
Â - gettext/gettext_warnings.sh, which mostly catches busted format
strings (including errors on our end where the string needs named
parameters)
Â - the translation portion of glade/check_markup.py
Â Â Â (I could go either on this, since it should really only be
anaconda that has markup strings, but I think it should at least be
something that is run at package build time)
Â - a new test in reaction to 1283599 in which the translator screwed
up the Plural-Forms header in lt.po in blivet.

Which leave these staying behind:
Â - click.py, which ensures "clickable" anaconda info bars have
something to click
Â - contexts.py, which ensures that strings with keyboard accelerators
have a translation context. Stuff from outside anaconda shouldn't be
adding keyboard accelerators
Â - gettext_potfiles.py, which ensures we keep POTFILES.in up to date.
anaconda is the only one using autotools so the only one with a POTFILES.in
Â - style_guide.py, which checks that we say "host name" instead of
"hostname" and stuff like that. I dunno, maybe every project should get
that treatment? Argue your case.

Discussion question: should this stuff go in a for-real package in
Fedora? The pros would be easier release-time testing (could be an
automatic part of rc-release or similar), integration with make ci, and
Chris wouldn't need to configure any new stuff in jenkins. Cons would be
that there's another package in Fedora and that testing translations in
make ci would be adding yet more things out of our control.

As far as how all of this looks, I was thinking just a directory of
checkers (maybe convert gettext_warnings.sh to python for consistency)
and a script that runs them all. I suppose how the script finds the
checkers depends on how real of a program we want it to be, whether it
would need an argument to the directory or could just search via python
imports.

David Shea

2015-11-19 16:55:50 UTC

Permalink

Post by Leslie S Satenstein
Hi David
Yes, you are right. But is it necessary to test all languages?

Yes. This is not about testing whether the language works with the
interface or anything like that. This is testing whether the
translations will work at all. Currently, in Fedora 23, if you attempt
to install in Lithuanian it will crash. This is avoidable. This needs to
test every language.

David Shea

2015-11-19 17:03:30 UTC

Permalink

Post by David Shea
The tests I would like to move out of anaconda and run against every

One more for this list: give the boot to languages whose percentage of
translated strings falls below a certain threshold. For example,
anaconda includes Iloko, which is 2.3% translated in f23-branch. This is
not useful.

I don't know whether we should bother looking for the opposite, when
someone suddenly translates something that was previously untranslated.
You would think someone give us a heads up, but I can only think of that
ever happening once in the past.

Chris Lumens

2015-11-19 20:06:34 UTC

Permalink

Post by David Shea
One more for this list: give the boot to languages whose percentage of
translated strings falls below a certain threshold. For example, anaconda
includes Iloko, which is 2.3% translated in f23-branch. This is not useful.

How do we kick something out? Just remove it from po/LINGUAS?

Post by David Shea
I don't know whether we should bother looking for the opposite, when someone
suddenly translates something that was previously untranslated. You would
think someone give us a heads up, but I can only think of that ever
happening once in the past.

We're never going to hear if something passes a threshold or not. At
the most, we might find out if a whole new language needs to be added,
but that's pretty rare.

- Chris

David Shea

2015-11-19 20:10:35 UTC

Permalink

Post by Chris Lumens

How do we kick something out? Just remove it from po/LINGUAS?

Yes. I think the last language we did that to was Hebrew, since the guy
translating Hebrew let us know he was quitting.

Post by Chris Lumens

We're never going to hear if something passes a threshold or not. At
the most, we might find out if a whole new language needs to be added,
but that's pretty rare.

I was thinking of a case of, for example, a translator going through
anaconda strings in Zanata and seeing a bunch of untranslated Hebrew
strings and saying "hey I know Hebrew I can fix that" but then not
telling us. We can test for that, I think. I can't remember whether
transifex did or not, but zanata pulls down everything regardless of
LINGUAS, so we could look for languages not in LINGUAS that are above a
high threshold of percentage translated.

But yes, I agree that this would be rare.

Matthew Miller

2015-11-19 21:12:43 UTC

Permalink

Post by Chris Lumens
We're never going to hear if something passes a threshold or not. At
the most, we might find out if a whole new language needs to be added,
but that's pretty rare.

Probably worth noting this:
https://fedoraproject.org/wiki/Fedora_Khmer_Translation_Sprint

I think it's completely fair to tell the G11N team to ask when they
want new languages pulled in Anaconda.

--
Matthew Miller
<***@fedoraproject.org>
Fedora Project Leader

David Shea

2015-11-19 21:29:45 UTC

Permalink

Post by Matthew Miller

Post by Chris Lumens
We're never going to hear if something passes a threshold or not. At
the most, we might find out if a whole new language needs to be added,
but that's pretty rare.

https://fedoraproject.org/wiki/Fedora_Khmer_Translation_Sprint
I think it's completely fair to tell the G11N team to ask when they
want new languages pulled in Anaconda.

Fair enough, I just brought up automatic additions in relation to
automatic ejections. Just glancing at the top of the msgfmt output,
Afrikaans, Amharic, Belarusian and Bosnian are all about 98%
untranslated. All of those languages are "in" anaconda and have projects
in Zanata and everything, so if removed them and then someone came along
and updated the translations for Afrikaans, would F4A G11N notify us?
For a less extreme case, Marathi is 58% untranslated. It probably hasn't
been updated in a couple of releases (it would be nice to base this
decision on that kind of a metric, but that sounds like a real pain to
figure out). I would consider it inactive and go ahead and remove it
LINGUAS. A hypothetical Marathi translator may not agree, so it might be
good to just automatically check back on it since we're pulling it down
anyway.

I'm also not real sure how to handle thresholds against the translation
schedules. Presumably the percentage of untranslated strings will get
higher and higher as we approach a string freeze, though we can probably
find a threshold that kicks out the inactive ones without kicking out
everything a week or so before beta.

Chris Lumens

2015-11-19 20:03:17 UTC

Permalink

- click.py, which ensures "clickable" anaconda info bars have something to
click
- contexts.py, which ensures that strings with keyboard accelerators have
a translation context. Stuff from outside anaconda shouldn't be adding
keyboard accelerators
- gettext_potfiles.py, which ensures we keep POTFILES.in up to date.
anaconda is the only one using autotools so the only one with a POTFILES.in
- style_guide.py, which checks that we say "host name" instead of
"hostname" and stuff like that. I dunno, maybe every project should get that
treatment? Argue your case.

Some of these may apply to blivet-gui, though.

Discussion question: should this stuff go in a for-real package in Fedora?
The pros would be easier release-time testing (could be an automatic part of
rc-release or similar), integration with make ci, and Chris wouldn't need to
configure any new stuff in jenkins. Cons would be that there's another
package in Fedora and that testing translations in make ci would be adding
yet more things out of our control.

I don't think it needs to be a real package in Fedora. I made
pocketlint one, though in retrospect I kind of wish I hadn't. This is
just stuff used in the process of us testing and building other
software. I'm not concerned about having to set up more stuff in
jenkins. I'm pretty used to it by now.

If it's not a real package, we'd need some way of checking the tests out
from git either into an existing source repo or alongside it and then
running the tests. Checking it out into the repo of the thing being
tested would probably be much easier to do in jenkins.

As far as how all of this looks, I was thinking just a directory of checkers
(maybe convert gettext_warnings.sh to python for consistency) and a script
that runs them all. I suppose how the script finds the checkers depends on
how real of a program we want it to be, whether it would need an argument to
the directory or could just search via python imports.

Sounds good to me.

- Chris

David Shea

2015-11-19 20:12:07 UTC

Permalink

Post by Chris Lumens
If it's not a real package, we'd need some way of checking the tests out
from git either into an existing source repo or alongside it and then
running the tests. Checking it out into the repo of the thing being
tested would probably be much easier to do in jenkins.

From the point of view of testing before release, could this be
somewhere where a git submodule is useful? If I understand them right,
we could have the translation-tests (or a better name) project as a
submodule in anaconda and blivet and pykickstart and whatever else, then
the release scripts could add a step to run the tests to accept or
reject the source tarball. Brian, please explain why this is a bad idea.