Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/vgo: go.mod format should not have a bespoke syntax #23966

Closed
robpike opened this issue Feb 20, 2018 · 30 comments
Closed

x/vgo: go.mod format should not have a bespoke syntax #23966

robpike opened this issue Feb 20, 2018 · 30 comments
Milestone

Comments

@robpike
Copy link
Contributor

robpike commented Feb 20, 2018

It's a mistake to create a private syntax for a configuration file when there are existing, perfectly fine formats available that are well understood and have publicly available parsers.

@gopherbot gopherbot added this to the Unreleased milestone Feb 20, 2018
@robpike
Copy link
Contributor Author

robpike commented Feb 21, 2018

P.S. This issue is made thornier by the peculiar assertion that the format is fixed before anyone has a chance to comment on it.

@bradfitz bradfitz modified the milestones: Unreleased, vgo Feb 21, 2018
@ericlagergren
Copy link
Contributor

ericlagergren commented Feb 21, 2018

My vote is for XML.

(On a more serious note: it'd be nice to have something that's easy to write by humans. JSON is easy to read, annoying to write. YAML is nice both ways.)

@jmank88
Copy link

jmank88 commented Feb 21, 2018

Perhaps some of this dep thread from last year is relevant: golang/dep#119

@as
Copy link
Contributor

as commented Feb 21, 2018

https://github.com/hashicorp/hcl/blob/master/README.md

Mentions YAML being confusing and not well understood. I dont particularly understand it either, considering the standard disallows tabs as separators, which is unusual and awkward for a whitespace agnostic language like Go.

http://www.yaml.org/faq.html

@jimmyfrasche
Copy link
Member

jimmyfrasche commented Feb 21, 2018

I'm having a very hard time reconciling "yaml" and "perfectly fine format". It's not the description that springs to mind based on my experience.

A benefit of a custom format here is that only what's allowed is legal. Another is that everything can be given a nice expressive syntax. Error messages can be more easily tailored.

The similarity to Go syntax means it shouldn't be hard for anyone to learn it and syntax highlighters and the like should be easy to adapt from their Go counterparts.

The only major downside I see is that, as a new format, its implementation will require a certain amount of fuzzing and additional testing that would (hopefully) already be done otherwise. (And if the parser is put in the stdlib no one else will have to worry about that either).

@ericlagergren
Copy link
Contributor

ericlagergren commented Feb 21, 2018

...why not just have it be written Go?

I mean, we all know how to write it. We have well-tested lexers and parsers. We have syntax highlighting. We have formatters and tools that can vet the code. We even have a framework for parsing and running sets of files in go test.

This go.mod file

// My hello, world.

module "rsc.io/hello"

require (
	"golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
	"rsc.io/quote" v1.5.2
)

could become something like

package foobar

import vgo

func ModuleHelloWorld(v *vgo.V) {
    v.Module("rsc.io/hello")
    v.Require("golang.org/x/text", "v0.0.0-20180208041248-4e4a3210bb54")
    v.Require("rsc.io/quote", "v.1.5.2")
}

It'd end up being similar to how go test recognizes xxx_test.go files.

vgo could recognize a go.mod, module, xxx_module.go, or whatever file in the root of the project and run the top-level function similar to ModuleXXX kinda like TestXXX.

Yeah, it doesn't feel like a set of directives as much as runnable code, but since when has Go done something just so it feels good as opposed to the practical option?

Theoretically, this could also take care of #23972

@andybons andybons changed the title x/vgo: go.mod format should be YAML or some other existing format x/vgo: go.mod format should not have a bespoke syntax Feb 21, 2018
@andybons
Copy link
Member

Let’s avoid bikeshedding on which existing format is best and wait for a response on why a custom syntax was chosen in the first place. It may have been an arbitrary decision, or it may not have. If it wasn’t, then understanding the decision will help inform future choices.

@ghost
Copy link

ghost commented Feb 21, 2018

Most folks have settled on TOML. We don't really need another custom format or a format embedded in JSON or YAML.

@davecheney
Copy link
Contributor

davecheney commented Feb 21, 2018 via email

@flibustenet
Copy link

Indeed I felt immediately the tour trying with mod.go instead of go.mod !

@ngrilly
Copy link

ngrilly commented Feb 21, 2018

@davecheney I guess the go.mod file makes it easy to find the project root. If module and require become top level declarations in a "normal" .go files, then it would be more difficult to find the project root (you basically had to look for a .go file containing a moduledeclaration, which requires parsing).

@rogpeppe
Copy link
Contributor

rogpeppe commented Feb 21, 2018

The problem with TOML and YAML is that no-one has written code (AFAIK) that can read those formats (including comments) and write them back out again, gofmt style. See BurntSushi/toml#213 for example. Also, YAML is a terrible format. Please no YAML.

I think I quite like the choice of a custom format as long as there is some straightforward way to convert to/from a well known format, because it can be exactly as simple as necessary, and as clean as possible.

@robpike
Copy link
Contributor Author

robpike commented Feb 21, 2018

Whatever the format is, it must be well defined and well documented, with a canonical formatting and non-internal libraries to read and write it. None of that exists at the moment.

@ecowden
Copy link

ecowden commented Feb 21, 2018

I'm very reluctant to jump in here. I'm liking what I see from vgo so far, and I don't want to bikeshed on what might feel like a trivial topic.

However, I feel that part of the friction I'm feeling from my initial vgo experiments comes from the rest of the tools that I use to write Go and work with code in general. I think this is an opportunity to make adoption a little easier.

Here’s why I think we should consider adopting an existing common data format:

Motivations

  • Editor/Viewer Support. Editors already know how to perform syntax highlighting, formatting, etc. for standard data formats. Even take a look at the code snippets in this thread. GitHub can't parse and highlight go.mod snippets, but we chose a common data format, it would just work out of the box. Let’s not make GitHub, editor authors, and the rest implement and maintain a special parser just for this one file type.
  • Toolchain support. Whether innovative tools like Greenkeeper, or standbys like Artifactory, there are lots of good reasons to analyze a project’s dependencies. Internally, we’ve written a scraper to check and cross-reference different teams’ dependencies on one another. It's easy to ingest files like npm's package.json -- it's just JSON! -- and we could do it from any language. There’s plenty of dependency management tasks to automate. A standard data format makes it that much easier to add vgo support, and could help it propagate through the ecosystem more quickly.
  • Familiarity. Love ‘em or hate ‘em, JSON and YAML are both familiar formats for developers these days. We’re not solving any issues with them by introducing a new format, we’re just making people learn yet another file format.
  • Extensibility. I’ve watched the list of vgo feature requests grow just in the first day, and I can only imagine where we go next. With YAML or JSON, that’s just another key-value set. With go.mod, we have to decide on a bespoke place to put it in the file — and then go update all the parsers, syntax highlighters, and other tools in the rest of the tool chain.

Requirements

I don’t have strong feelings about YAML vs JSON vs whatever else. I’ve used JSON fine with npm and YAML fine with Kubernetes, Helm, and Ansible. They both work, and I’m long past the point in my career where I care about arguments like that. (And for what it’s worth, I’ve never been bugged by the lack of inline comments — READMEs and Issues worked for the rare cases we needed to communicate about dependencies.) From where I’m sitting, the requirements are:

  • Parsers implemented in most popular languages.
  • Existing editor and tooling support from things like GitHub itself.
  • Hierarchical. For instance, .properties files are too restricting to future extension.

Apologies in advance if I'm off base. I'm fairly new to Go myself, and I confess that I don't yet understand some the original motivations for a bespoke file format. There may be good reasons to go another direction that I'm overlooking!

@huguesb
Copy link
Contributor

huguesb commented Feb 21, 2018

@ecowden

Hierarchical. For instance, .properties files are too restricting to future extension.

@rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool. Given that dep made an explicit decision to go with TOML partly because it wasn't hierarchical, it seems unlikely that vgo would reverse that requirement.

From golang/dep#119 (comment)

The one thing that does stick out with TOML is, being not tree-structured, it's possible for us to append constraints to the manifest without rewriting it. That may turn out to be a very important factor in applying sane defaults that help guard us (that is, the entire public Go ecosystem) against nasty exponential growth in solver running time.

@ericlagergren While I like the simplicity of reusing go syntax, using the .go extension for the module file makes it likely that some projects will run into a conflict and have to rename some of their files to switch to vgo, which goes against the goal of making the migration as painless as possible.

@ericlagergren
Copy link
Contributor

ericlagergren commented Feb 21, 2018 via email

@davidpope
Copy link

Other concerns aside, JSON does not allow comments, which is sufficient to disqualify it IMO.

@josharian
Copy link
Contributor

josharian commented Feb 21, 2018

Whatever format is in use, I certainly hope (with Rob) that there are good public manipulation libraries.

Here are some comments from years of working on goimports:

  • Dealing with comments is a nightmare. Part of this is the fault of go/ast and go/printer, but some of it is conceptual—it is often non-obvious what should happen to a comment when adding, deleting, or relocating an import.

  • The current format appears to accept single and factored forms. goimports has moved to factored forms only. This helps some with comments, and also keeps diffs clean. Also, factored-only will be easier to write regexps for, and sadly, lots of editors still use regexps for highlighting etc.

  • Grouping rules inevitably get complex (e.g. stdlib vs vendor vs other), particularly if you try to respect existing groupings.

@rsc
Copy link
Contributor

rsc commented Feb 21, 2018

This issue is made thornier by the peculiar assertion that the format is fixed before anyone has a chance to comment on it.

My point, which was arguably phrased too strongly, is that go.mods people write today will be understood by the eventual official tooling. I want to make clear that people will not have to throw them away and start over. Given that vgo already supports reading nine different legacy file formats (GLOCKFILE, Godeps/Godeps.json, Gopkg.lock, dependencies.tsv, glide.lock, vendor.conf, vendor.yml, vendor/manifest, vendor/vendor.json), I am confident it won't be a burden to read this one too, if we move to something new. And the tooling already rewrites go.mod in place when needed, so updating to a new format will be easy if that's what we decide. I was not attempting to lock this in place.

@rsc
Copy link
Contributor

rsc commented Feb 21, 2018

It's a mistake to create a private syntax for a configuration file when there are existing, perfectly fine formats available that are well understood and have publicly available parsers.

I obviously agree with this in principle. In practice I spent a while looking at all the existing formats and found them not "perfectly fine" for this job. In particular, look at how much shorter and clearer a go.mod is compared to the equivalent Gopkg.toml. I'm happy to return to this question once we're happy with all the other higher-level details.

And to answer @josharian's concern, if we keep the custom format then yes there would be public tooling, probably along the lines of x/vgo/vendor/cmd/go/internal/modfile.

@andradei
Copy link

andradei commented Feb 21, 2018

I like the suggestions of @ericlagergren and @davecheney. It leverages the entirety of the go compiler and its guarantees. But since go.mod is good for detecting the package root, I have a couple of suggestions to keep that advantage while moving towards modules in the source code:

Suggestion 1

Have inline module information on main.go for binaries and lib.go for libraries.

Rust uses the main.rs vs lib.rs to differentiate binaries from libraries, and have a Cargo.toml at the project root. The difference is that, on this suggestion, the module info would be using Go syntax inside a Go source file.

Suggestion 2

Have mod.go for both binaries and libraries, then add vgo.Product = "binary" // or "library" or some sort of const iota instead of strings.

Swift has a Package.swift, which is valid Swift code, at the project root, which specifies whether it is a binary or a library with the Package.products type, which can be .library or .executable

EDIT: Added comparison to suggestions above.

@ecowden
Copy link

ecowden commented Feb 21, 2018

@huguesb

rsc states in his blog post that vgo is meant to be a streamlining/simplification of the general-purpose dep tool...

I’m curious: why does “hierarchical” imply “complex?”

Stepping back, I probably misphrased that last requirement. I was looking for an intersection of the familiar and the extensible, and doing so with YAML and JSON on my mind. “Hierarchical” isn’t really the goal here, and I’m happy to scratch it off the list.

I’m surprised to see the reaction about it being complex, though, and I'm wondering if I'm missing something. When I look at the example mod.go files like this one...

module "rsc.io/hello"

require (
	"golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
	"rsc.io/quote" v1.5.2
)

...personally, I see a “hierarchical” data structure. By that, I mean a list of key-value pairs, where values can be primitives, lists, or other lists of key-value pairs. Changing nothing but formatting and punctuation, it becomes:

module: rsc.io/hello
require: 
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2

…And it’s even 4 characters shorter! (That’s a joke, if it’s not obvious. 😁)

When I jumped in here, I was thinking about extending an existing git repo dependency analyzer written in Node.js to recognize vgo modules. (Well, that, and how I missed the pretty colors my editor makes highlighting files...) Then I realized how much I didn’t want to create and maintain a custom parser, and how much easier it would be with a “standard” data format.

By all means, put this question on the back burner. There are waaay more important things to figure out with vgo, and I like what I’m seeing so far! 👍

@nilebox
Copy link

nilebox commented Feb 23, 2018

Even if YAML is considered to be too complex or confusing, I would still prefer it (or JSON, or TOML or whatever other standard declarative format) over bespoke format, and support the subset of it that we are happy with.

In other words, if go.mod is a valid YAML/TOML/JSON (not necessarily supporting all features of these formats), it would make it immediately familiar to both users and any platform that you want to use for parsing.

@ecowden's example above makes it immediately clear to me which format I would prefer.

Another concern with go.mod is that it doesn't even look declarative or standardized, it looks like imperative code. Is there any reason for that? Do we actually want to make it extensible and support imperative constructions there, e.g. functions?

@ngrilly
Copy link

ngrilly commented Feb 23, 2018

@nilebox go.mod doesn't look more "imperative" or "declarative" than an nginx configuration file, for example.

@lunny
Copy link

lunny commented Feb 24, 2018

Maybe put the go.mod as a comment on go file. For example:

/*
+require "golang.org/x/text" v0.0.0-20180208041248-4e4a3210bb54
+require "rsc.io/quote" v1.5.2
*/
package main // import "rsc.io/hello"

@lunny
Copy link

lunny commented Feb 24, 2018

Or

package main // import "rsc.io/hello"

import (
    "golang.org/x/text" // require v0.0.0-20180208041248-4e4a3210bb54
    "rsc.io/quote" // require v1.5.2
)

@komuw
Copy link
Contributor

komuw commented Feb 24, 2018

amiga-sound

Too bad my OS(ubuntu) thinks the go.mod file is an audio file. This means I can't just double click and edit the file, I have to go through the hassle of letting my OS know that *.mod files should open in an editor.

@cznic
Copy link
Contributor

cznic commented Feb 24, 2018

You can, the file associations are fully user modifiable. However, using any well-established extension for the vgo module file is a rather unfortunate choice.

@rsc
Copy link
Contributor

rsc commented Apr 2, 2018

I think we should continue to use the very simple go.mod format, after the further simplification of making quotes optional (#24641). Once the dust settles, we should also publish a package like x/vgo/vendor/cmd/go/internal/modfile so that other tools can parse and edit mod files too.


As I wrote originally, I do understand the appeal of a standard file format, but I am still unable to find one that worked well for this task. My main concern is ease of editing, for both people and programs.

The files have to be easy for people to edit. For example, the hacked-up blog post system I built stores a JSON blob at the top of each file, above the post text, because it was very easy to implement that. But I am sick of needing to leave out the comma after the last key-value pair, because it makes adding a new key-value mean editing the previous one too. This is exactly why we allow trailing commas in Go literals. Those annoyances add up.

The files also have to be easy for programs to edit, without mangling it. Think about all the benefit we’ve gotten from gofmt and tools being able to collaborate with people to work on Go source files. People and programs working together on go.mod will be similarly beneficial. In fact this is a key part of the design. If you read through the Tour of Versioned Go you’ll see repeated alternation between the developer editing go.mod and vgo itself editing go.mod. That has to run very smoothly.

All the “generalized key-value pair” formats become awkward when there’s more than a single key-value pair to express. It’s true that we could use a YAML-like notation:

module: rsc.io/hello
require: 
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2

but that nice one-line-at-a-time breaks when we get to replace "rsc.io/quote" v1.5.2 => "../quote". Perhaps the best encoding would be:

replace:
- rsc.io/quote: v1.5.2
  with: ../quote

But then what does replace "rsc.io/quote" v1.5.2 => "github.com/you/quote" v0.0.0-myfork encode as? Maybe this?

replace:
- rsc.io/quote: v1.5.2
  with: github.com/you/quote
  at: v0.0.0-myfork

The awkwardness here is not much, but it’s still quite annoying: three lines instead of one, with corresponding reduced readability and ability to use line-based tools like grep, sort, diff.

The fundamental problem is that not everything a developer needs to say is best expressed as key-value pairs. We don’t use shells that require us to write:

cmd:
- prog: echo
- arg1: hello
- arg2: world

Yet somehow many developers accept this in config files. Why? Because, as Rob said, existing formats “are well understood and have publicly available parsers.” At least, we think that’s true. The more I look at these formats the less convinced I become. And even assuming it's true, that benefit has to outweigh the disadvantages imposed by the format itself.

JSON is too picky (for example, about commas) and has no support for comments. It’s out.

XML is equally picky about closing tags and is too noisy in general. It’s out.

TOML and YAML are at least easier for people to edit, but they both have the general key-value problem.

Additionally, TOML requires quotes around both module paths as keys (because they have slashes) and all values ("rsc.io/quote" = "v1.5.2"). Experience with go.mod suggests we want to move in the opposite direction, toward no quotes. (See #24641.)

Both TOML and YAML also turn out to be more complex than they first appear, a detail that’s very important if you need not just a parser but a mechanical editor that can parse, edit, and reprint the file. TOML’s complexity starts to show once you move away from key-value pairs: you have to learn the distinction between [x] and [[x]] and then start thinking about regular key-value pair lines versus inline tables. Of course, that’s nothing compared to YAML. Here’s an illuminating exercise: flip through http://yaml.org/spec/1.2/spec.pdf and try to find out what syntactic restrictions are placed on unquoted keys and values in key-value pairs. I’m still not completely sure. YAML embeds JSON as a subset but they didn’t stop there. As far as I can tell from the document, instead of writing:

module: rsc.io/hello
require: 
- golang.org/x/text: v0.0.0-20180208041248-4e4a3210bb54
- rsc.io/quote: v1.5.2

it appears to be equally valid to write:

%YAML 1.2
---
!!map {
 ? !!str "module" : !!str "rsc.io/hello"
 ? !!str "require" 
 : !!seq [
   !!map { ? !!str "golang.org/x/text" : !!str "v0.0.0-20180208041248-4e4a3210bb54" },
   !!map { ? !!str "rsc.io/quote" : !!str "v1.5.2" },
 ],
}

and it also appears the two forms can be blended arbitrarily. Something as simple as

module: !!str rsc.io/hello

appears to be valid YAML yet mean something different from what our “subset” parser would understand. There would be constant pressure to give up the insistence on using a subset of YAML, and yet it becomes more difficult to write a good mechanical editor (parse+edit+reprint) the more complexity is introduced.

If we had to pick some existing format, I’d pick TOML, but even that seems wrong:

module = "rsc.io/hello"

[require]
"golang.org/x/text" = "v0.0.0-20180208041248-4e4a3210bb54"
"rsc.io/quote" = "v1.5.2"

[[replace]]
"rsc.io/quote" = "v1.5.2"
with = "github.com/you/quote"
at = "v0.0.0-myfork"

The [[ ]] are necessary here because [require] is a single table (of key-value pairs each of which stands alone) while [[replace]] is an array of tables, in which each table is one replacement, with three keys: the path being replaced and the special keys “with” and “at”. If you wanted to reserve any possible future expansion you’d have to use [[require]] too, making it:

[[require]]
"golang.org/x/text" = "v0.0.0-20180208041248-4e4a3210bb54"

[[require]]
"rsc.io/quote" = "v1.5.2"

All in all, it doesn’t seem like these file formats are actually helping advance our goal of making the file easy for people and programs to edit. We’d probably have to write a custom parser+reprinter anyway, so the only real benefit would be syntax highlighting in editors. I think that benefit is easily outweighed by the awkwardness of shoehorning our semantics into these files in the first place. If your configuration is a few basic key-value pairs, they make a lot of sense. Ours is not just key-value pairs, so those files don’t make sense.

P.S. I wondered for a long time why it was that “dep ensure -add” did not modify existing constraints in Gopkg.toml. The answer is that Dep can’t reliably modify hand-written TOML, preserving comments and the like. Dep sometimes appends to Gopkg.toml but otherwise imposes the rule that Gopkg.toml is owned by people and Gopkg.lock is owned by programs. This seems to be an artifact of the available libraries as much as it is a design choice.

@rsc
Copy link
Contributor

rsc commented Apr 5, 2018

Based on (1) discussion with Rob, (2) no one replying to my last comment, and (3) the emoji counters on that comment, I'm going to close this issue and keep the bespoke syntax in go.mod (subject to further refinement like dropping quotes).

@rsc rsc closed this as completed Apr 5, 2018
gcla added a commit to gcla/vgo that referenced this issue Apr 30, 2018
The syntax appears still to be in flux: golang/go#23966

For now, fork vgo and disable this check. Seems to cause no short-term harm...
@golang golang locked and limited conversation to collaborators Apr 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests