Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: embed: add -embed=compress flag to go #61322

Closed
twmb opened this issue Jul 12, 2023 · 14 comments
Closed

proposal: embed: add -embed=compress flag to go #61322

twmb opened this issue Jul 12, 2023 · 14 comments
Labels
Milestone

Comments

@twmb
Copy link
Contributor

twmb commented Jul 12, 2023

The design proposal for embed contains a section on compression to reduce binary size. This has come up for me a few times recently and I wonder if there is any appetite for the final sentence of this section,

"Future work might be to add -embed=compress as a go build option for use in limited environments."

Having written the same decompression logic a few times now in a few different ways (decompress one file, decompress a tarball, gunzip vs. zstd), it'd be super convenient to have compression of embedded files or dirs built directly into Go.

Another benefit of of allowing compression at the build stage (and on-the-fly decompression once) is for code repositories. Currently, I am checking in a tiny compressed tarball and have tests against the sha256 of the checkin. Tarballs aren't really great for GH repositories, but considering how infrequently this tarball changes, it's an alright tradeoff right now.

This isn't a super pressing need, but I couldn't find a previous open nor closed proposal, and in the original proposal, the only comment that mentions compression is this one (the point of which is that TinyGo rarely targets RAM heavy systems).

@twmb twmb added the Proposal label Jul 12, 2023
@gopherbot gopherbot added this to the Proposal milestone Jul 12, 2023
@ianlancetaylor
Copy link
Contributor

When are the embedded files decompressed? If compression is signaled by an option like -embed=compress then presumably the code must work correctly whether the files are compressed for not. When using //go:embed with a string or []byte there is no step where we can decompress the file. So do we need to decompress all embedded files at program startup? That doesn't seem ideal for all use cases.

For an embedded io/fs.FS we can presumably store an embedded archive/zip file. The archive/zip package already implements io/fs.FS. Then compression should happen automatically?

@twmb
Copy link
Contributor Author

twmb commented Jul 12, 2023

Agree, this seems hard to impossible to do with []byte and string. Would it be feasible to add special embed.String or embed.Bytes types that can have optimizations such as this?

Opting into embed.FS also being compressed is another good option.

@bcmills
Copy link
Contributor

bcmills commented Jul 13, 2023

Would it be feasible to add special embed.String or embed.Bytes types that can have optimizations such as this?

I don't think we should have embed.String or embed.Bytes types that look like ordinary string or []byte but decompress data at run-time.

But maybe we could support go:embed on func() []byte or func() string declarations instead? That is, this declaration:

//go:embed bigfile.bin
func content() string

could be a shorthand for something like:

//go:embed bigfile.bin
var contentFS embed.FS

var content = sync.OnceValue(func() string {
	f, err := content.Open("bigfile.bin")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	b := new(strings.Builder)
	if _, err := io.Copy(b, f); err != nil {
		panic(err)
	}
	return b.String()
})

@rsc
Copy link
Contributor

rsc commented Aug 9, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@hherman1
Copy link

This sounds a bit premature to me. Unless there is more demand than I realize

@tux21b
Copy link
Contributor

tux21b commented Aug 14, 2023

We use Brotli to compress all bundled UI artifacts (HTML, JS, CSS, etc.) with our app. A special HTTP handler either serves those files directly (if the client supports Brotli) or decompresses the bundled Brotli files and serves them uncompressed.

The proposal seems to completely ignore this use cases:

  • zstd is nice, but has no browser support - other algorithms like brotli or gzip might be more useful
  • compressing files individually might be preferred
  • accessing the compressed data might be useful in some scenarios
  • configuring the compression algorithm on a per file basis might be needed

I think it's ok to not support all those features upfront, but the proposal in its current form can not be easily extended later.

@rsc
Copy link
Contributor

rsc commented Aug 16, 2023

This seems too difficult and fraught with problems and there doesn't seem to be much demand for it.

@rsc
Copy link
Contributor

rsc commented Aug 16, 2023

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@twmb
Copy link
Contributor Author

twmb commented Aug 16, 2023

Ok, sounds good.

@twmb twmb closed this as completed Aug 16, 2023
@rsc
Copy link
Contributor

rsc commented Aug 30, 2023

No change in consensus, so declined.
— rsc for the proposal review group

@silverwind
Copy link

silverwind commented Sep 9, 2023

Sad to see go:embed compression rejected. It would be very useful to reduce binary size by tens of megabytes for a moderately sized web application as well as enable direct streaming of compressed assets to HTTP clients, which is ideal for performance as well.

In fact, the lack of compression suport in go:embed is the main reason we have not migrated to it yet in gitea, because without it, the binary size is increased by over 20MB with additional detriment to HTTP clients that support gzip compression which can be served with the compressed bytestream directly.

Any details on the rejection? Are web apps not a priority of golang?

@tux21b
Copy link
Contributor

tux21b commented Sep 9, 2023

@silverwind I just looked at Gitea and discovered that you are using Webpack. Therefore you can configure webpack to output gzip compressed files (we do something similar with brotli and vite) and leave the go:embed statements as is. This has several advantages compared to this concrete proposal:

  • all files are compressed individually, so you can directly serve them using "Content-Encoding: gzip" if supported by the browser (we have written a small http handler to decompress brotli if it's not supported)
  • webpack / vite offer many configuration options, like only compressing certain file types and only emitting the compressed file if it's actually smaller than the original (in addition to embedding the file as is and not even generating extra files for certain file sizes)

Alternatively if using Webpack isn't a solution for you, I guess one more line in your 1026 line long Makefile to preprocess the files wouldn't hurt either.

@twmb
Copy link
Contributor Author

twmb commented Sep 9, 2023 via email

@tux21b
Copy link
Contributor

tux21b commented Sep 9, 2023

I am also in favor of a simple compression option, i just do not like the proposal in it's current state. There are a lot of open questions that need to be answered first.

  • compressing files globally (with a command line flag) will make it impossible to use the feature in non-trivial apps:
    • third-party assets might not be compressable
    • third party code might not work with compressed assets
    • even your own app / monorepo might not work if you compress everything, especially if everything is compressed into one large zip
    • if your code expects compressed assets, commands like "go run" and "go test" will not work anymore (since you forgot to set the "-embed=compress" flag)
    • having an option like //go:embed(compress) ... or a type like embed.CompressFS which can be set on a per-embed-directive seems to be a better solution
  • we do not have to support every use-case, but bundling web assets was the primary goal of "go-bindata" and most of the tools that came after it (and is also what was requested by silverwind just now). So I think we should support it
    • having one large zip might be good for compression, but makes it impossible to serve those assets directly (without decompressing and re-compressing every asset)
    • if we want to keep it simple, I would argue each file should be compressed individually, even if it would mean a slightly larger binary size. If you want to compress it as a bundle, you can always create a zip or tar archive first
    • I am not sure about how files should be handled that become larger during compression. Should the compression be skipped in this case? Sounds like a source for rarely occurring bugs if the consumer isn't aware of that
  • I think it's ok to just support "gzip" at the beginning (the more advanced compression algorithms aren't part of the stdlib at the moment), but I think a flag like "-embed=compress" is to limited and makes it hard to add other options later. Having something like //go:embed(gzip) would allow the addition of other algorithms later. It would even allow the "one-large archive" use-case with a syntax like //go:embed(tar+gzip) or something similar if we later decide to implement something like that
  • to keep things simple, I think compressed assets should not be decompressed by default. The hard part is getting the assets into the Go binary (without Makefiles, Bazel or custom build scripts), once the Go binary is running it's easy to initialize a gzip.Reader on the fly. With this, the programmer can decide when to uncompress those assets (during init(), on-the-fly or not at all) and how long to keep them (no hidden start-up / memory cost).

This comment is not a proposal yet, I am sure there are lot's of other points that need clarification as well. That's just what I meant when I said that the proposal isn't extendable in it's current form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Declined
Development

No branches or pull requests

8 participants