Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text: localization support #12750

Open
mpvl opened this issue Sep 25, 2015 · 55 comments
Open

x/text: localization support #12750

mpvl opened this issue Sep 25, 2015 · 55 comments
Assignees
Labels
Milestone

Comments

@mpvl
Copy link
Contributor

mpvl commented Sep 25, 2015

This issue is intended as an umbrella tracking issue for localization support.

Localization support includes:

  • formatting values, such as numbers, currencies, and dates in a language- and region-specific way.
  • marking text to be translated in fmt-ed text and templates.
  • integration into translation pipeline

Details to be covered in design docs.

@mpvl mpvl self-assigned this Sep 25, 2015
@mpvl
Copy link
Contributor Author

mpvl commented Sep 28, 2015

@maximilien
Copy link

Here is one solution: https://github.com/maximilien/i18n4go

@mpvl
Copy link
Contributor Author

mpvl commented Sep 29, 2015

@maximilien: i18n4go does not address localized formatting of values like numbers and I think it will be difficult to retrofit it properly. In case of selecting translation variants based on the linguistic features of the arguments, you'll end up with the same struggle one witnesses with localization frameworks for other languages.
Also, i18n4go extracts all strings and then uses an exclusion file. This may work well for command line tools or applications where most strings need localization, but this is not the norm. It breaks down when a large number of the strings in code do not need to be localized. For example, internal error messages are often not localized and may actually be the bulk of the text.
Addressing both issues will likely result in a different API, for example like the one proposed. The implementation of the proposed API is more complex, but it eliminates the need to generate a parallel version of the code and T wrappers.

This proposal is fairly agnostic about translation pipelines, though. So it may be possible to fit this proposal on top of the i18n4go translation pipeline. Seems like a convenient first target.

@infogulch
Copy link
Contributor

Using the Printf of message.Printer has the following consequences:

  • ...
  • the format string is now a key used for looking up translations

Is the format string by itself sufficient for determining the context? I can imagine a very simple Printf used like m.Printf("%s: %d", m, i) where the format string %s: %d could appear a dozen times throughout a codebase with very different contexts. (You could argue that this is a very poor format string to begin with, but it still demonstrates my concern.)

I must admit I'm not very familiar with localization problems and this may not be an issue in practice.

@mpvl
Copy link
Contributor Author

mpvl commented Sep 30, 2015

@infogulch It is indeed not enough. In my provisionally worked out API I do define a Key function that can be used for things like adding meaning and alternatives. I left it out of the design doc to not go into details too much. (I also stripped about 1/3rd of my original draft; maybe I went a bit overboard.)

Note that as the string has no meaning in itself, you could always write the format string as, for example, "Archive (verb)" and "Archive (noun)" and supply a "translation" for these in English ("Archive" for both). But this does not address all concerns. A more general solution:

Printf would have the following signature:

func (p *Printer) Printf(key Reference, args ...interface{}) (n int, err error) {

where Reference is either a string or a result from a func like

func Key(id string, fallback ...string) Reference {

This allows the familiar Printf() usage while addressing the concerns you raised. Many localization frameworks have a solution of a similar nature.

But the example string you provide does raise another good point: there may be format strings one does not want to translate at all while still using the message package to substitute localized values. This is possible as is (e.g. fmt.Printf("%s: %d", m.Print(m), m.Print(i))), but may be a bit clunky. A bit better may be something like m.Printf(message.Raw("%s: %d"), m, i), where the use of Raw makes extraction skip the string. I don't think there are too many cases where this is used, though. Even "%s: %d" will vary per language. But single-value substitutions like "%2.3d" should probably be excluded from translation.

@akavel
Copy link
Contributor

akavel commented Oct 1, 2015

As far as plurals are concerned, I've seen some elaborate examples, but what somewhat skimming the doc, it seems they can use only "<", ">", and "=" operators; I didn't read it 100% thoroughly however, so I may be wrong. I'll thus let myself ask here for clarifications: are the proposed mechanisms enough to cater for the rule for e.g. Polish language? In a version I found on Weblate site, it's described as [1] [2]:

n==1 ? 0 :                                              // "single"
 n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 :    // "few"
 2                                                      // "many"

which seems to me quite fine, correctly giving e.g.:

1 orangutan
2-4 orangutany
5-9 orangutanów
10, 11, 12, ..., 21 orangutanów
22 orangutany
101 orangutanów
102 orangutany
etc.

@mpvl
Copy link
Contributor Author

mpvl commented Oct 1, 2015

@akavel: one should distinguish selectors from the rules you mention. The rules you refer to (which are defined in CLDR) would be used by the plural package to map numbers to a small set of plural categories (in the case of your example: single, few and many). The selectors subsequently pick alternatives based on these simplified categories. The maximum number of such categories, IIRC, is 6 (e.g. for Arabic).
Most localization frameworks that support plural, allow selecting on these categories only. ICU adds selecting on the number value (using "="). The matching algorithm defined in this proposal is a bit different from ICU, allowing also for Vars and selecting on "<" and ">". The selectors will often be generated or written by translators (with the help of a GUI) so they should remain simple.

In my foreseen implementation, it is really up to the feature implementation to interpret selectors. This means that there is a lot of flexibility in supporting wild feature value matching. However, if one looks at linguistic grammars like LFG and HSPG, which use many more features, the set of possible feature values is usually small.

The doc is indeed a bit sparse here (as well as all other topics, really).

@maximilien
Copy link

@mpvl, sounds good. Happy to try and integrate once you have something ready to try. Best.

@abourget
Copy link

Have you guys seen this one? https://github.com/nicksnyder/go-i18n seems pretty solid at first sight.

@abourget
Copy link

It uses JSON as its base format, has tooling to help with the translation workflows

@BenLubar
Copy link

By the way, I submitted some formatting fixes for the proposal doc a few weeks ago.

https://go-review.googlesource.com/19753

Not sure what I was supposed to do to get it reviewed.

@morriswinkler-simple
Copy link

morriswinkler-simple commented Nov 22, 2016

Any updates on how far the proposal is implemented in x/text/language, I find it a bit hard to figure out if this is anywhere near production readiness.

@mpvl
Copy link
Contributor Author

mpvl commented Nov 29, 2016

x/text/language is definitely production ready. But if you mean the specific functionality of this issue, it is still under development. Lately the focus had been more on other parts, my intention for the upcoming months to specifically focus on segmentation and this.

That said, string substitution is available with limited functionality, so you could play around with it. I recently checked in a tool to extract strings.

@morriswinkler-simple
Copy link

Thanks for your reply, I have so far only used x/text/language in production and coded something around it that translates and formats messages for different countries. Just wanted to check if the language API is still up for changes.

@mpvl
Copy link
Contributor Author

mpvl commented Nov 29, 2016

No plans to change. Works well enough to the point it is not useful breaking people.

@MickMonaghan
Copy link

Hi ,
Which package handles localized formatting/display of dates/times - or is this functionality not yet complete?

@MickMonaghan
Copy link

Hi @mpvl, others,
I'm using x/text/collate to test the sorting of some random strings.
Below I use a Korean collator.

import (
  "fmt"
  "golang.org/x/text/collate"
  "golang.org/x/text/language"
)
func main() {
  strs := []string{"boef", "音声認識", "音声認識1", "aaland", "amsterdam", "월요일", "日付と時刻"}
  cl := collate.New(language.Korean) //Korean collator
  cl.SortStrings(strs)
  fmt.Println(strs)
}

Output: [aaland amsterdam boef 월요일 音声認識 音声認識1 日付と時刻]

If I use ICU to sort these strings (using level 3 strength), then I get the strings back like this:

[월요일 音声認識 音声認識1 日付と時刻 aaland amsterdam boef]

Am I setting up the collator incorrectly?
I'm using v1.8beta.

@morriswinkler-simple
Copy link

Hello @MickMonaghan,

look slike there is not so much interest in this discussion, I just add my findings so far.

I looked into the collate code and could not really figure how the sorting is made up.
There are some byte blocks that are loaded by offsets, no idea how they work. I had also not so much time to figure that. So if someone likes to explain how that actually works I would be grateful.

I asked a Japanese friend of mine how he would sort a list of German and Japanese cities.
This is what he came up with.

img_9351

So he either converts the Japanese into Latin or the Latin into Japanese alphabet and sorts it then. I think that is also a good way to sort this list, first translate the syllables into the other alphabet and then sort it correspondingly.

@MickMonaghan
Copy link

MickMonaghan commented Jan 16, 2017

Hey @morriswinkler-simplesurance - thanks for the response.
I'm not entirely concerned with how it works, more concerned with does it work.
In some situations the collator clearly does work:

strs := []string{"champion", "humble"}
cl := collate.New(language.Slovak)
cl.SortStrings(strs)
//this correctly sorts 'champion' *after* 'humble' - as expected in Slovak

With a Korean sort, the Latin characters should be sorted after the Korean characters. But that's not happening.

@mpvl
Copy link
Contributor Author

mpvl commented Feb 7, 2017

@MickMonaghan: the implementation is based on the CLDR UCA tables. If I look at the collation elements of both the DUCET (Unicode's tables) and CLDR (the tailorings) they both show Hangul to have a higher primary collation value then Latin. So that explains why Korean is sorted later.

What is probably happening in ICU is that the the script for the selected language is sorted before other scripts. The Go implementation currently does not support script reordering, though. This is an TODO, but depends on changing the implementation to using fractional weights. This is a huge change and may take a while.

@mpvl
Copy link
Contributor Author

mpvl commented Feb 8, 2017

@MickMonaghan: I suggest you file a separate issue for this so it can be tracked individually.

@mpvl
Copy link
Contributor Author

mpvl commented Feb 8, 2017

@MickMonaghan: dates/times is on the list, but only after number etc. is completed.

@MickMonaghan
Copy link

Thanks @mpvl , I'll log the collation bug

@Draqir
Copy link

Draqir commented Jun 3, 2017

I started trying out golang seriously today to create a small application just for fun. However when I tried to localize my little application I didn't figure out any good solution. I just got a big headache. This is what I would do normally in TypeScript

export const Exceptions = {
    "AuthenticationError": {
        "Invalid": {
            "en-GB": "Invalid username or password",
            "sv-SE": "Fel användarnamn eller lösensenord"
        },
        "Required": {
            "en-GB": "You must be authenticated to see this resource",
            "sv-SE": "Du måste vara inloggad för att se denna resurs"
        }
    }
}

export class AuthenticationError extends Error {
    constructor(language:  "en-GB" | "sv-SE", message: "Invalid" | "Required") {
        super(Exceptions.AuthenticationError[message][language]);
    }
}

I would get errors if I typed any string wrong and it would simply just work. I tried to do something similar in go but the pain just got unbearable,

package localization

type labels struct {
	enGB string
	svSE string
}

type authenticationErrorMessages struct {
	Invalid  labels
	Required labels
}

type exceptionMessages struct {
	authErrors authenticationErrorMessages
}

// ExceptionMessage damnit, need to write a comment in an odd way.
func ExceptionMessage(language string, category string, exceptionType string, params []string) string {
	var exceptionMsg = exceptionMessages{
		authErrors: authenticationErrorMessages{
			Invalid: labels{
				enGB: "Invalid username or password",
				svSE: "Fel användarnamn eller lösenord",
			},
			Required: labels{
				enGB: "You must be authenticated to see this resource",
				svSE: "Du måste vara inloggad för att se denna resurs",
			},
		},
	}

	switch category {
	case "AuthenticationError":
		switch category {
		case "Invalid":
			switch language {
			case "enGB":
				return exceptionMsg.authErrors.Invalid.enGB
			case "svSE":
				return exceptionMsg.authErrors.Invalid.svSE
			}
		case "Required":
			switch language {
			case "enGB":
				return exceptionMsg.authErrors.Required.enGB
			case "svSE":
				return exceptionMsg.authErrors.Required.svSE
			}
		}
	}

	return "Error message not found"
}

// AuthenticationError damnit, need to write a comment in an odd way.
func AuthenticationError(message string) string {
	return ExceptionMessage("enOps", "AuthenticationErrors", message)
}

TL;DR

  • Go solution contains Magical strings
  • Go solution has no auto completion
  • Go solution is three times larger

So far everything has been really smooth writing golang code but this is just painful. I've tried out some localization packages as well but that hasn't worked out well so far. I'm of course not an expert in go after less than a day, maybe I missed something obvious in the language specification when I went through it this morning but regardless I'd really like to see some progress on this issue.

@KarthikJay
Copy link

Just wanted to know the status of the repo, especially that of the gotext tool.
It seems a lot of changes were made that don't match up with the docs, such as instead of a textdata directory I now get a locales folder, etc...

It seems that the gotext tool is broken as well currently preventing me from trying localization.

@mpvl
Copy link
Contributor Author

mpvl commented May 2, 2018

The gotext tool is under active development and one of the main focuses at the moment. Progress is a bit bursts, but definitely active. A documentation overhaul is part of that.

@mvrhov
Copy link

mvrhov commented Jul 19, 2018

@mpvl: I'm looking at this and deciding if I want to use this or something else and manually format numbers/money/date . My users are a bit peculiar. What they usually want is the same behavior as in os. The language is set to e.g English, but other formatting is based on the country. Or even better overridden via some settings page also just like in OS.

@mpvl
Copy link
Contributor Author

mpvl commented Jul 19, 2018 via email

@ericox
Copy link

ericox commented Feb 26, 2019

Is there a recommended way to localize [text|html]/templates per the proposal? I like the idea proposed, it doesn't seem to be implemented yet. Is that the case?

@mpvl
Copy link
Contributor Author

mpvl commented Mar 3, 2019 via email

@jeffreydwalter
Copy link

Not yet. There is a design for it, but it requires added functionality of the core template libraries.

On Tue, 26 Feb 2019 at 13:10 Eric Cox @.***> wrote: Is there a good way to mark text in go templates for translation? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12750 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AGJZR7curUH0IDM4-4BJ1TG3b4n3Swq_ks5vRWqggaJpZM4GD3-F .

Is there any ETA for this feature or a suggested work-around?

@nkall
Copy link

nkall commented Oct 18, 2019

Hi, just found this issue and cross-posting my recent proposal: #34989

Are compact number formats something which could potentially fall under the responsibilities of the x/text package, and if so, what would be the process for creating a contribution to add this functionality?

@mpvl
Copy link
Contributor Author

mpvl commented Oct 19, 2019

Anything part of Unicode, including CLDR fits in the x/text mandate. You could modify the existing package to include it. The same process as with Go applies. As that is CLDR 35, it would require an upgrade to CLDR 35 of x/text first, which may take some effort.

@nkall
Copy link

nkall commented Oct 19, 2019

Great, thank you. I'll look into the difficulty of getting that upgraded. In the meantime I put together a library which serves my purpose well enough for now (for anyone who happens to stumble upon this): https://github.com/nkall/compactnumber

@josineto
Copy link

@MickMonaghan: dates/times is on the list, but only after number etc. is completed.

Hi, since that message is from February 2017, I would like to know: date/time localization is getting closer to be implemented, or is still far in Go roadmap?

Thank you!

@Xpert85
Copy link

Xpert85 commented Apr 11, 2020

Hi, the documentation of x/text is mentioning the gender feature in several places.

Do I understand correctly, that this feature is currently not implemented?

Thank you.

@mpvl
Copy link
Contributor Author

mpvl commented Apr 11, 2020

@Xpert85 That is correct.

@purpleidea
Copy link

I was hacking on my https://github.com/purpleidea/mgmt/ and it occurred to me that I'd like proper gettext support! Sadly, you can't have an underscore function:

package main

import (
	"fmt"
)

// gettext!
func _(format string, a ...interface{}) string {
	return "just an example"
}

func main() {
	fmt.Println("Hello, ", _("world"))
}

```./prog.go:13:26: cannot use _ as value```

But you can use two underscores! Sadly, the usefulness of this is not great, because if you stick that in a gettext package and do a dot import:

import (
	. "github.com/purpleidea/gettext"
)

it doesn't work because the function is seen as private, not public.

My proposal:

I'd like golang to consider treating the single underscore as a valid, public function. If that's too hard to do in the compiler, then to treat two underscores as a public function. This would go a long way into improving the readability of gettext translations in code =D

Thanks!

@seankhliao
Copy link
Member

@purpleidea please file a separate proposal for that

@mvrhov
Copy link

mvrhov commented Feb 20, 2021

Just use T instead of _ as a function name.

@youthlin

This comment has been minimized.

@ianlancetaylor

This comment has been minimized.

@Junaid-Sakib
Copy link

Do we have the localisation support for date now? Or if anyone can suggest an alternate to have localisation support for dates.

@robfig
Copy link
Contributor

robfig commented Mar 22, 2022

It would be nice to have date localization in x/text. In the meantime, between x/text and github.com/klauspost/lctime everything might be covered. I'm not sure why lctime is archived though.. perhaps that means it won't receive updates to the language rules. cc @klauspost

@klauspost
Copy link
Contributor

@robfig It is a fork from a now deleted package by @variadico

I mainly added a non-stateful interface so it could be used without affecting global package state.

Since I don't really have intentions of maintaining/bugfixing this and there was a low user count, I thought it would be most fair to archive it. If someone wants to maintain a fork I'd be happy to link to it.

@wolfgangmeyers
Copy link

golang.org/x/text/number offers a way to localize decimal values, but it seems that I can only initialize the Formatter using a float. I have a need to preserve the precision present in a decimal value that is represented as a string, but when converting to float we are getting rounding errors. It is important to prevent the rounding errors, because we're trying to represent the exact values in financial transactions in different locales. Would it be possible to add support to initialize number.Decimal using a string instead of a float?

@bojanz
Copy link

bojanz commented Aug 10, 2022

@wolfgangmeyers
You might find my package useful: https://github.com/bojanz/currency

It solves the problem of representing and formatting currency amounts for the time being. I'd love to see x/text maintained again though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests