Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: image: add generic metadata support for jpeg, gif, png #33457

Closed
drswork opened this issue Aug 4, 2019 · 54 comments
Closed

proposal: image: add generic metadata support for jpeg, gif, png #33457

drswork opened this issue Aug 4, 2019 · 54 comments

Comments

@drswork
Copy link

drswork commented Aug 4, 2019

Proposal

Currently the standard image libraries provide no way to read or write most of the metadata associated with an image. I'd like to change that so that the standard image loading libraries surface the metadata in images they read and allow code to annotate images with metadata.

Each image format has its own metadata. Additionally there are two common image metadata formats (eXif and XMP) which are each used in multiple image formats. (XMP is used in GIF, JPEG, and PNG files, while eXif is used in PNG and JPEG files) The formats themselves also have some interesting quirks which make excessive cleverness unwise. (There are three(!) separate key/value stores attached to PNG files, for example, and the format doesn't disallow a key being used multiple times in the different, or even the same, kv store)

With this in mind, the three guiding principles behind this proposal are:

  1. Metadata should be minimally processed on read and write
  2. It should be possible to read an image and immediately write it out in the same format, with the output having semantically identical metadata to the original.
  3. It should be drop-in compatible with the existing image interfaces.

Point 1 means that PNG's three key/value stores will be exposed as three separate stores, and can't be a plain string/string map. It also means that code which wishes to pull information out of a PNG needs to know to look in both the XMP and eXif data.

Point 2 means we need to at least record any metadata that we don't currently know how to interpret so it can be written back out, and the images that are returned from decoding an image need to have its metadata silently attached.

Point 3 means we can't add methods to existing interfaces, since it's possible there are other packages that implement them. It also means we can't alter the calling conventions of RegisterFormat() in image.

This proposal should make it possible for third-party image libraries to participate in the new standard if they choose.

This proposal has some decisions which are either potentially controversial or potentially fragile. The ones I can think of are noted in the Potential Issues section

NOTE: "Metadata" means any data not directly related to the pixels in an image on disk. For example PPI and rotation, for example, are metadata, while alpha channel is not. (Rotation is arguably pixel data the way that alpha channel is, but since it massively affects the translation of disk pixels to image pixels we'll classify it as metadata for now)

Changes to current packages

Some of the standard packages will be changed. The changes are backwards-compatible.

image

  • New registration function RegisterFormatExtended
  type Format struct {
    Name           string
    MagicString    string
    Decode         func(io.Reader) (Image, error)
    DecodeConfig   func(io.Reader) (Image, error)
    DecodeMetadata func(io.Reader) (*metadata.Metadata, error)
    GetMetadata    func(Image) (applies bool, *metadata.Metadata, error)
  }

  RegisterFormatExtended(*Format)

The new registration format includes a function to decode the metatada, as well as a format-specific function to return the parsed metadata for an image.

  • RegisterFormat marked deprecated

It will still work, but internally will just construct a Format struct and call RegisterFormatExtended.

  • GetMetadata added as registered function

This function will be passed an Image. Because there's no way to tell what type the image is, the metadata getting code must iterate through each format's get function. The function sets applies to false if it doesn't apply. Iteration short-circuits at the first function that says it does apply.

image/gif

  • Metadata type, and supporting types, added
type Metadata struct {
  Comment     []string
  XMPMetadata *xmp.XMP
  AppBlocks   []AppBlock
}  

type AppBlock struct {
  AppID [8]byte
  AppAuthCode [3]byte
  BlockData []byte
}
  • Metadata added to Options
type Options struct {
    NumColors int
    Quantizer draw.Quantizer
    Drawer    draw.Drawer
    Metadata  *Metadata // Added
}

image/png

  • Metadata added. Note that for the initial proposal not all the chunks are proposed to be exported. Any chunk the library can't process is stored in the unexported field, which is only accessible to the library's reader.
type Metadata struct {
  KVText           []struct{string, string}
  CompressedKVText []struct{string, string}
  UnicodeKVText    []struct{string, string}
  ChangeTime       *time.Time
  XMPMetadata      *xmp.XMP
  ExifMetadata     *exif.Exif
  unexported       map[string][]byte // For currently uninterpreted chunks
}
  • Encode function gets optional Options parameter:
func Encode(w io.Writer, m image.Image, o *Options) error

type Options {
  Metadata *Metadata
}

image/jpeg

  • Metadata added
type Metadata struct {
  ExifMetadata *exif.Exif
  XMPMetadata  *xmp.XMP
  unexported []byte // For other uninterpreted data
}
  • Metadata added to Options struct
type Options struct {
    Quality  int
    Metadata *Metadata
}

New packages

We add a new general-purpose metadata package to hold the metadata get function. We also add packages for the two shared metadata formats, XMP and eXif.

images/metadata

  interface Metadata {
    // Make sure the Metadata struct is actually valid, as each format has
    // its own quirks and restrictions.
    Validate() error
    // Here mostly to disambiguate from all the other interfaces with just
    // Validate in them.
    IsImageLibraryMetadata()
  }

  // Returns the filetype-specific metadata. Type-switch to tell what kind.
  Get(image.Image) (*Metadata, error)

Get will internally iterate through the registered GetMetadata functions until it finds one that matches.

The Validate method allows code to make sure the metadata struct is valid for the format before writing it out. This is important since many of the fields have quirks and limitations -- for example the key in png's key/value store must be ISO 8859-1 and between 1 and 79 characters.

images/metadata/exif

The exif package decodes metadata in eXif format.

  type Exif {
    // Handwave on the internals until the proposal is generally deemed OK
  }

  // Decode a binary blob, without any leading or trailing markers, as exif
  func Decode([]byte) (*Exif, error)

images/metadata/xmp

The xmp package decodes metadata in XMP format. (Or, if you prefer, ISO 16684-1:2019 as it's an official ISO standard)

  type XMP {
    // Handwave on the internals until the proposal is generally deemed OK
  }

  // Decode a binary blob, without leading or trailing markers, as xmp
  func Decode([]byte) (*XMP, error)

Potential Issues

The proposal has some decisions which are either potentially controversial or potentially fragile. They are noted

Iterating through metadata get functions is fragile

I can't think of a better way, without having the libraries tag the returned Image with a type, which the API doesn't support that I can see.

Adding an optional *Options parameter to png.Encode

This has to be done by actually adding a ...*Options parameter to the signature and then complaining if multiple ones are passed. This requires runtime signature validation, which isn't great

Adding *Metadata to existing Options parameters may cause unexpected behaviour

jpeg.Options has an integer quality parameter, so it can't actually be left off. We'd have to interpret 0 == default. gif.Options has an integer color count parameter, and we'd have to interpret 0 == default.

No access to uninterpreted metadata fields and chunks

This is intentional. It means that writers can't add their own chunks to the output, and it means that readers can't implement their own interpretation code. This is definitely inconvenient, but it means that when the additional chunks are interpreted by the standard library there won't potentially be two ways to access the same data.

Once the implementation is deemed complete then read/write access to these additional chunks shoud be added.

Optional metadata entites are stored as pointers to things.

This allows us to interpret nil pointers as elided entities and not require a HasFoo functions for each element. Not everyone is fond of this style.

No translation between image format metadata

If you read a jpeg in, then write it as a gif, the metadata isn't translated. This is intentional, since there's currently no clear 1:1 translation between file format metadata information.

@gopherbot gopherbot added this to the Proposal milestone Aug 4, 2019
@odeke-em odeke-em changed the title Proposal: Adding metadata support to images/{jpeg, gif,png} Proposal: add metadata support to images/{jpeg, gif,png} Aug 4, 2019
@odeke-em
Copy link
Member

odeke-em commented Aug 4, 2019

Thank you for this proposal @drswork and welcome to the Go project!

We shall take a look at this perhaps during Go1.14, and I'll also kindly ping our image expert @nigeltao.

@odeke-em odeke-em changed the title Proposal: add metadata support to images/{jpeg, gif,png} Proposal: add metadata support to image/{jpeg, gif,png} Aug 4, 2019
@odeke-em odeke-em changed the title Proposal: add metadata support to image/{jpeg, gif,png} proposal: add metadata support to image/{jpeg, gif,png} Aug 4, 2019
@drswork
Copy link
Author

drswork commented Aug 5, 2019

Sure. Just to be clear, I'm not asking anyone else to implement this -- I'm perfectly happy to do the work. I just want to make sure that the API and semantics are mostly correct before starting anything.

@smasher164
Copy link
Member

smasher164 commented Aug 5, 2019

This may also enable a general solution to #11420 and #20613, since color-space information is also metadata.

@drswork
Copy link
Author

drswork commented Aug 6, 2019

I fully expect to surface the color space information as metadata. Actually applying it to the image is well beyond my image manipulation skills so I'll leave that part to someone else.

@rsc
Copy link
Contributor

rsc commented Aug 6, 2019

/cc @nigeltao

@rsc rsc changed the title proposal: add metadata support to image/{jpeg, gif,png} proposal: image: add generic metadata support for jpeg, gif, png Aug 6, 2019
@nigeltao
Copy link
Contributor

Thanks for the detailed write-up. A few quick observations:

  • ICCP metadata isn't mentioned?

  • I don't understand how Format.GetMetadata is supposed to work. Is it for decoding, encoding, or both? If it's decoding, Decode will return you e.g. an *image.RGBA. How are you going to squeeze some metadata out of an *image.RGBA?

Similarly, you say:

Get will internally iterate through the registered GetMetadata functions until it finds one that matches.

but Get takes an image.Image. How does a png.Format or a jpeg.Format know that it applies to an *image.Gray?

  • Is there no way to decode both the image and metadata in one pass, without having to rewind the io.Reader (which might require arbitrarily large buffering)? Yes, DecodeConfig doesn't let you do this either, but in hindsight, it might have been nice for DecodeConfig to return some sort of way to continue a partial decoding without having to rewind the io.Reader.

  • You also say:

Each image format has its own metadata.

And they also implement metadata.Metadata (right?), but if I have a value of type metadata.Metadata, is there a good way to ask it for its Exif (if it has one)?

  • Do you need a Metadata.Validate method? It's not like you're going to pass a jpeg.Metadata to png.Encode, right? It seems like you could have a (non-exported) validate(*Metadata) function in the png package, which only took a png.Metadata.

  • You also say:

Adding an optional *Options parameter to png.Encode

That'll break any existing code that looks like:

var encode func(io.Writer, image.Image) error = png.Encode


In general, there's a lot of promising ideas here, but I also think that landing on the right design will take longer than e.g. 3 months of iterative feedback, and 3 months is the "feature development" part of the 6-monthly Go release cycle (https://github.com/golang/go/wiki/Go-Release-Cycle). I'm hesitant to add something in e.g. the 1.14 release cycle that, 9 months later, we realize could work much better with a different API design, but by then we're then stuck by backwards compatibility constraints.

For example, if an encoded image contains megabytes of some flavor of metadata, but all we want is an EXIF orientation, will we still have to decode all of the metadata into an in-memory representation up front? It's not exactly the same, but there might be some food for API thought from #5465 "The Request struct... has all its fields publicly exposed, most of which require memory allocation".

I'd advise forking the stdlib's image/* packages for now, and adding these new features there. Once we get some good iterations and usage on that one, then we can consider adding stdlib API that we're going to have to support for many years.

@drswork
Copy link
Author

drswork commented Aug 14, 2019

ICCP metadata isn't mentioned?

Yeah, at the time I wrote the proposal I wasn't aware of any of the details of ICCP. Now that I am (I gave the spec a scan over the weekend after writing a bunch of png chunk decoders) I admit I'm inclined to punt and surface it as a binary blob, leaving it to someone else to decode.

While on the one hand it's not great since it isn't decoded, on the other at least the data's surfaced so it can be decoded, so it's a step up from the current state of things.

I don't understand how Format.GetMetadata is supposed to work. Is it for decoding, encoding, or both? If it's decoding, Decode will return you e.g. an *image.RGBA. How are you going to squeeze some metadata out of an *image.RGBA?

and

but Get takes an image.Image. How does a png.Format or a jpeg.Format know that it applies to an *image.Gray?

This bit of API awkwardness is because image.Image is an interface, so I can't add anything to it without breaking things. (I would've preferred to just hang a Metadata field off image.Image, but alas that's not an option)

My assumption here is that image/{jpeg,gif,png} would hang a real metadata field off whatever concrete type they returned, and when you called .GetMetadata it would query the thing it was handed to see if it had 's metadata hanging off it. That was something I figured was an implementation detail and was handwaving past to focus on the user-facing API, but I should've been a bit more concrete here.

Each image format has its own metadata.

And they also implement metadata.Metadata (right?), but if I have a value of type metadata.Metadata, is there a good way to ask it for its Exif (if it has one)?

Because the three image formats have such... interestingly divergent ideas of what metadata to store and how to store it, I hadn't planned on having the different metadata types be particularly interchangeable. In concrete terms it means that if you're handed a metadata.Metadata there's very little you can do with it without a type switch and casting it to a {png,gif,jpeg}.Metadata.

In the few rare cases where the different formats hold the same data then the fields that hold it should be named the same, but that's a matter of convention more than anything. (Also I'm not sure it's a good idea anyway, since the way the data is stored causes issues. XMP metadata is in both GIF and PNG files, but PNG stores it as a text blob with a particular key in its key/value store and a good case could be made that pulling it out and decoding it hides an essential, if very annoying, property of the data)

Do you need a Metadata.Validate method? It's not like you're going to pass a jpeg.Metadata to png.Encode, right? It seems like you could have a (non-exported) validate(*Metadata) function in the png package, which only took a png.Metadata.

Oh, yes, this is definitely required. It's not to validate that you're handing jpeg metadata to the png encoder, but more that the information in the metadata is valid in the first place. The specs have some firm(ish) ideas about data validation -- for example, in png files it's bogus to have both iCCP and sRGB data, the keys in its key/value store have to be latin-1, and they can't be more than 79 characters. It'd be lovely if each type in the metadata structure was strictly specified such that you can't possibly create bogus metadata, but that seems infeasible.

I am also 100% sure every single data validation requirement (well, maybe not the png key length, though I wouldn't be surprised if that's a good attack vector for some decoders) has been hilariously violated by software out in the wild, and as such I'd actually argue that encode should write out bogus metadata when handed it, since that metadata may well have come from an existing image. I don't think it's a good idea to not be able to write out data we just read (opinions differ here, of course), so encode needs to not complain

Is there no way to decode both the image and metadata in one pass, without having to rewind the io.Reader (which might require arbitrarily large buffering)? Yes, DecodeConfig doesn't let you do this either, but in hindsight, it might have been nice for DecodeConfig to return some sort of way to continue a partial decoding without having to rewind the io.Reader.

It's actually much, much easier to decode the metadata and image in a single pass. Honestly DecodeMetadata is in there mostly because DecodeConfig exists. I assumed there were cases where code only wanted the configuration info and not fully decode the image, and I'm 100% good with dropping it. (I can see a few cases where only wanting the metadata would be useful, but I'm good with requiring folks to decode the image for now and consider adding the API in later)

Adding an optional *Options parameter to png.Encode

That'll break any existing code that looks like:

var encode func(io.Writer, image.Image) error = png.Encode

Bah. That was the least bad option I could come up with. I'm open to alternatives -- adding another encode function, the way image/gif has done, was my Plan B, with some kind of metadata attachment function being the backup plan for that.

In general, there's a lot of promising ideas here, but I also think that landing on the right design will take longer than e.g. 3 months of iterative feedback, and 3 months is the "feature development" part of the 6-monthly Go release cycle (https://github.com/golang/go/wiki/Go-Release-Cycle). I'm hesitant to add something in e.g. the 1.14 release cycle that, 9 months later, we realize could work much better with a different API design, but by then we're then stuck by backwards compatibility constraints.

I fully expect the result will be sub-optimal in some way, sure. APIs always are. (Heck, the current API, with each format having different Encode function signatures, is a clear example there) That's part of the reason why I wrote this up for discussion before starting in on coding, to try and work out a reasonable API.

I have the free time to work on this, though I realize it may not be my time constraints that are the issue for the API design.

For example, if an encoded image contains megabytes of some flavor of metadata, but all we want is an EXIF orientation, will we still have to decode all of the metadata into an in-memory representation up front? It's not exactly the same, but there might be some food for API thought from #5465 "The Request struct... has all its fields publicly exposed, most of which require memory allocation".

I'm sympathetic to lazy decoding, sure. In regards to #5465 is the concern the amount of memory allocated or the number of objects created? Having function calls intermediate access to metadata elements isn't without cost either, though I can certainly see deferring, say, XMP decoding until something actually fetches the XMP data.

This does kinda argue for a DecodeMetadata() call of some sort, FWIW. :)

I'd advise forking the stdlib's image/* packages for now, and adding these new features there. Once we get some good iterations and usage on that one, then we can consider adding stdlib API that we're going to have to support for many years.

I'm unclear here -- are you suggesting making a branch and flddling with image/ in that branch, or actually releasing cloned version of the image packages under a different name and see how that goes?

@drswork
Copy link
Author

drswork commented Aug 14, 2019

Also, semi-separately, is there a better way to iterate on the API than github issue comments? At work I'd spin up a google doc for this but that doesn't seem viable here. (I suppose I still can, though that shuts most folks out)

@networkimprov
Copy link

Each major API draft could be a new issue. Or each major feature of the API could be a separate issue.

@nigeltao
Copy link
Contributor

My assumption here is that image/{jpeg,gif,png} would hang a real metadata field off whatever concrete type they returned,

So, png.Decode, when passed (PNG-encoded) RGBA data, wouldn't return an *image.RGBA? That might not be a breaking change, from the compiler's point of view, but it's probably a breaking change, in the http://www.hyrumslaw.com sense. For example, this test code assumes that decoding a known PNG-encoded image results in an *image.Gray:

https://github.com/golang/image/blob/cff245a6509b8c4de022d0d5b9037c503c5989d6/webp/decode_test.go#L80
https://github.com/golang/image/blob/cff245a6509b8c4de022d0d5b9037c503c5989d6/webp/decode_test.go#L101

There are undoubtedly similar tests out there in the wild.

Or, for example, you can currently pass the result of a png.Decode call to the image/draw package, whose implementation has optimized code paths for e.g. *image.RGBA sources and destinations. Interposing an intermediate image-with-metadata type will force skipping the optimized path, in favor of a more general but substantially slower path. Sure, you could patch image/draw at the same time that you introduce your metadata code, but it'll still affect all the third party Go code, outside of the stdlib, that does a similar type switch.

In the few rare cases where the different formats hold the same data then the fields that hold it should be named the same,

One possibility is that they all satisfy the same interface{ GetExif() etc }. I'm not saying that it's necessarily a good idea, but it's an idea.

It's not to validate that you're handing jpeg metadata to the png encoder, but more that the information in the metadata is valid in the first place.

I'm not questioning the need to validate, I'm questioning the need for validate to be (1) a upper-case-V (i.e. exported) method, (2) on the interface type (not just on concrete types).

If it's for decoding, not encoding, then png.Decode can call a lower-case-v validate method on its png.Metadata before it returns. Everything's within the package boundary. Or just validate it incrementally, as it's built incrementally.

Another way to put it: who would call the upper-case-V Validate method, on a value of the interface type (metadata.Metadata) instead of a concrete type (png.Metadata)?

adding another encode function, the way image/gif has done, was my Plan B

I think you're going to have to go with Plan B. I guess this reinforces my point that I'm hesitant to commit to API too early. There are some clear mistakes in the stdlib image API, but we can't change them now.

In regards to #5465 is the concern the amount of memory allocated or the number of objects created?

Both? For example, if you look for "sky_with_icc.gif" in the (private) google3 repository, it's a 3MB GIF file for which over 60% of that bulk is an ICC profile. Because the GIF format is... sub-optimal, that 1.8MB profile is cut into over 7000 255-byte chunks. If I want to decode that GIF file in Go, will I be obligated to allocate something (many things?) to hold all of that metadata, even if I don't actually care about it?

are you suggesting making a branch and flddling with image/ in that branch, or actually releasing cloned version of the image packages under a different name

The latter: make a github.com/google/drsimage repository (you can obviously change "drsimage" to whatever you please), and make e.g. import "github.com/google/drsimage/image/png" be a drop-in replacement for import "image/png". Type aliases will undoubtedly be useful here.

is there a better way to iterate on the API than github issue comments?

You could start a new CL to the main go repository (containing the stdlib), with no actual implementation, just API (new types, methods, etc) and, ideally, doc comments. You could add panic's at various points so that it all compiled, if you wished. Send it out for review, noting that it's not for submission, just for API discussion.

Or, if you prefer a google doc, you could make it world-editable.

@drswork
Copy link
Author

drswork commented Aug 15, 2019

My assumption here is that image/{jpeg,gif,png} would hang a real metadata field off whatever concrete type they returned,
So, png.Decode, when passed (PNG-encoded) RGBA data, wouldn't return an *image.RGBA?

It would, yes. image.RGBA would just get a metadata field attached to it.

In the few rare cases where the different formats hold the same data then the fields that hold it should be named the same,
One possibility is that they all satisfy the same interface{ GetExif() etc }. I'm not saying that it's necessarily a good idea, but it's an idea.

Possibly, though I'd very much prefer to use formal interfaces as little as possible with this since they're impossible to extend in core libraries and I fully expect that there will be things that are missed in the first version of this. I was more thinking just naming the struct fields the same, though having accessors for the expensive-to-decode fields (I'm specifically thinking of the XMP stuff because it's XML-encoded) and deferring decoding is reasonable.

It's not to validate that you're handing jpeg metadata to the png encoder, but more that the information in the metadata is valid in the first place.
I'm not questioning the need to validate, I'm questioning the need for validate to be (1) a upper-case-V (i.e. exported) method, (2) on the interface type (not just on concrete types).
[snip]
Another way to put it: who would call the upper-case-V Validate method, on a value of the interface type (metadata.Metadata) instead of a concrete type (png.Metadata)?

Fair point. My thinking here was that:

  1. The base image types would get a metadata field attached to them
  2. That field should be more strongly typed than interface{}
  3. Interfaces with a single dummy entry for type-enforcement are kinda silly

Since the concrete types would all have Validate methods it made a certain amount of sense to hoist it up into the interface to address point 3. I'm not particularly strongly wedded to that one, though.

adding another encode function, the way image/gif has done, was my Plan B
I think you're going to have to go with Plan B. I guess this reinforces my point that I'm hesitant to commit to API too early. There are some clear mistakes in the stdlib image API, but we can't change them now.

If I'm going to be touching all this stuff anyway then is it worth regularizing the existing library API? (which I realize means "adding new functions" rather than changing the existing ones) I can see that as reasonable, though perhaps premature since image/gif has some additional issues beyond just inconsistencies with Encode's parameter list.

In regards to #5465 is the concern the amount of memory allocated or the number of objects created?
Both? For example, if you look for "sky_with_icc.gif" in the (private) google3 repository, it's a 3MB GIF file for which over 60% of that bulk is an ICC profile. Because the GIF format is... sub-optimal, that 1.8MB profile is cut into over 7000 255-byte chunks. If I want to decode that GIF file in Go, will I be obligated to allocate something (many things?) to hold all of that metadata, even if I don't actually care about it?

Since you can't properly interpret the bits of the image without it, you're kind of obliged to?

Past that, I suppose the underlying question is "should the default image decoder fully decode the image?" The metadata is part of the image, and currently the decoder doesn't touch it. I think it should, but on reflection I think the image libraries should also take the image-affecting metadata into account (including, for example, the gamma and rotation data) when decoding and manipulating the images and that's definitely not currently happening, and making decode properly respect those would definitely be a semantically breaking change.

This does argue for separate metadata-aware encoding and decoding functions.

Re: API iteration options:

  1. Cloned image libraries
  2. Dummy CL with API/doc proposal
  3. World-readable google doc

Since I apparently can't get docs to share beyond google.com (I assume there's a switch, I just don't know what it is), and I'm not comfortable tossing up code on GitHub that'll potentially end up abandoned, I'll go with option 2 here.

@nigeltao
Copy link
Contributor

Interfaces with a single dummy entry for type-enforcement are kinda silly

Maybe. IIRC protobuf messages have those. There might be other precedent. I started a discussion at https://groups.google.com/forum/#!topic/golang-dev/aRvnYIcaIaA

If I'm going to be touching all this stuff anyway then is it worth regularizing the existing library API?

Quite possibly, yes.

  1. Cloned image libraries
  2. Dummy CL with API/doc proposal
  3. World-readable google doc

... I'm not comfortable tossing up code on GitHub that'll potentially end up abandoned, I'll go with option 2 here.

To be clear, I'm perfectly happy to go with option 2 in terms of discussing API. In terms of landing code, I would still strongly prefer to avoid working directly in the stdlib to start with, unless absolutely necessary. I think this will take more than one 6 month cycle to iterate to something great, and as I've said before, I don't want to prematurely commit to an API we'd have to support for a long time.

Thus, I strongly prefer that this work happens in either a fork (e.g. new GitHub repo) or a branch. I just think that a fork would be easier to work with (e.g. you can "go get" a GitHub repo), but I don't have much experience with working on a branch of the main Go repo, and it might be easier than I fear.

I wouldn't think of it as code that will be abandoned. I would think of it as an experimental branch. It's just that, as I said, I think telling someone "go get aSeparateRepo" to try this out would be easier than to have them build Go from source, but from a branch.

@nigeltao
Copy link
Contributor

Instead of github.com/google/drsimage/image/png, another option is to make an x/imagemetadata sub-repo, so the import path for the experimental code would be golang.org/x/imagemetadata/image/png. OTOH, that might be confused with the existing x/image sub-repo.

It's not necessarily a good idea, just an idea.

@nigeltao
Copy link
Contributor

If I'm going to be touching all this stuff anyway then is it worth regularizing the existing library API?

If we're making new API, another possibility (in e.g. package png) is
DecodeImage(io.Reader) (Image, Metadata, error)
instead of somehow augmenting (in a back-compat way)
Decode(io.Reader) (Image, error)

That way, we don't need to add anything to e.g. the image.RGBA types. We could also possibly add an options argument, or make it a method, not a function.

Again, not necessarily a good idea, just an idea.

@bcmills
Copy link
Contributor

bcmills commented Aug 16, 2019

If the Metadata interface doesn't actually have anything in common between implementations, the DecodeImage alternative above seems best: then it can return a concrete type — with well-documented methods and fields! — instead of an empty abstraction.

package png

type Metadata struct {
	[…]  // Fields specific to PNG metadata!
}

func DecodeImage(io.Reader) (image.Image, *Metadata, error)

@nigeltao
Copy link
Contributor

If the Metadata interface doesn't actually have anything in common between implementations

Well, it'd be nice if decoding a JPEG and re-encoding to PNG could preserve e.g. EXIF metadata.

@nigeltao
Copy link
Contributor

Another thing that'd be nice is if image.Decode, not just png.Decode, returned metadata that we could inspect if it contained EXIF information.

@drswork
Copy link
Author

drswork commented Aug 17, 2019

The one thing I'd prefer to avoid is having the decoded metadata be disconnected from the associated image. While returning it separately as well can be useful, most of the time the metadata you get from an image is specific to that image and so really ought to be attached to the image. Otherwise it makes keeping the image and its associated metadata together much more of a pain (and, indeed, couldn't be done transparently with existing code) if they're two independent bits of data that have to be passed as a pair.

@bcmills
Copy link
Contributor

bcmills commented Aug 19, 2019

@drswork, note that you could have the metadata refer to its corresponding image (rather than vice-versa). Then you'd still only have one item to pass around (the metadata), but you'd have proper type information for it (exactly which kind of metadata it is, and possibly for exactly which kind of image as well).

@nigeltao
Copy link
Contributor

Oh, if we're minting new image API... please bear in mind #27830 and its links to even older issues.

@drswork
Copy link
Author

drswork commented Aug 23, 2019

There are some interesting ideas in #27830. I was unaware of it, but now I know and it mirrors some of what I was contemplating so I'll keep that in mind while I'm sketching things out this weekend.

I am half tempted to revisit how image/gif handles animated gifs by default, but I'm not sure if that's something I want to get into.

@nigeltao
Copy link
Contributor

Yeah, I'm also not happy about animated GIFs in hindsight (e.g. requiring the whole animation to be in-memory; being probably too low-level in giving Disposal enum values but not actually implementing them for you). But mission creep (e.g. "define an animation/video API") is also a problem. As the scope grows, it will definitely take more than one 6 month cycle to iterate to something great.

FWIW, I have a long term goal for an "image 2.0" API for Go, but progress is very slow. It's not something I'm working on week to week, or even month to month.

In any case, for "image 2.0", (1) I'd probably make a new git repo instead of augmenting the stdlib, and (2) as I allluded to in #27830, I'd base it on https://github.com/google/wuffs and have Wuffs generate Go code, not just C code. For example, one benchmark measures Wuffs' GIF decoder at 3x faster than Go's image/gif, partly because I don't really use the io.Reader interface directly (look for Transformer in the #27830 discussion).

@glerchundi
Copy link

@drswork did you found a free slot to sketch something up? Would love to see the progress of this and #27830!

@drswork
Copy link
Author

drswork commented Dec 24, 2019

I've redone the API a bit and updated the patchset appropriately. The major changes are:

  1. Added contexts to everything, to support decoding timeouts.
  2. Redid the way the shared metadata (XMP, EXIF, and the ICC color stuff) is handled, and made those decoders optional
  3. Added in option structs to control basic decoding, error handling, image transformation, and resource limits.

The separate control option structs could be merged into a single struct, though I like the multiple struct way of handling this -- it means that if there's another class of control knobs that need adding then they can be just go into a new struct. It's a little odd, but since we're already likely to have multiple options passed in (one for the 'generic' image settings and a separate one for format-specific settings) it isn't that odd.

Using contexts is definitely unusual for the standard libraries, but cancelling I/O operations is kinda what they're for so it maps in nicely. They do seem to be slowly making their way into things, so this follows with that. (Plus it feels weird to write code where the first parameter of every function isn't "ctx context.Context" :)

The example programs demonstrate this stuff, though if they aren't clear I can definitely make them better.

FWIW I'm not sure that applying metadata-specified image transforms is best done this way, but that's mostly because it's not stuff I do. If there is One True Way to apply gamma and color correction then it make sense to have them in here. OTOH if gamma/color correction is complex tricky business then pulling that out is OK. (Or, I suppose, do both -- leave them in for folks writing run of the mill code and assume people writing Gotoshop will turn that stuff off and interpret the metadata themselves)

And finally, re point 2, the "shared" metadata (all of which is kind of expensive) APIs all are set up so that you have to explicitly import the package somewhere in your code if you want encoding/decoding of them to work. I'm not sure this is the best way to do it, so I'm good if there are better ways.

@nigeltao
Copy link
Contributor

Copy/pasting from the https://go-review.googlesource.com/c/go/+/208559 discussion, for those here following only the issue:

A couple of (possibly combinable) other ideas to explore are:

  • Have image.DecodeWithOptions also take an arg (perhaps a map) that lets the caller say "I want EXIF, but don't care about XMP".

  • Have format-specific packages (e.g. package jpeg, package png) just treat metadata as an opaque []byte or io.Reader / io.Writer, and it's up to the caller (e.g. package main) to forward them on to metadata-specific packages (e.g. package exif, package iccp).

Discussing my suggestions should go smoother with actual code examples. There's a sketch at https://go-review.googlesource.com/c/go/+/216799 which might need some tweaking in the details, but hopefully gives the broad idea.

@glerchundi
Copy link

Following the "do one thing and do it well" philosophy, I would opt for the format-specific packages and treat metadata as an opaque io.{Reader,Writer} (why even use []byte?)

@drswork
Copy link
Author

drswork commented Jan 30, 2020

And to follow up the follow up (wheee! :)

  • Have image.DecodeWithOptions also take an arg (perhaps a map) that lets the caller say "I want EXIF, but don't care about XMP".

With the API as-is, you can call decode with an option of DEFER for the metadata rather than DECODE. That puts off decoding the expensive metadata until first access, and if you don't access it (because, for example, your code has no idea it exists)

I opted not to have individual DECODE_XXX bits in the API since it seemed to add complexity but not much value, but if folks prefer the bits I can add it.

  • Have format-specific packages (e.g. package jpeg, package png) just treat metadata as an opaque []byte or io.Reader / io.Writer, and it's up to the caller (e.g. package main) to forward them on to metadata-specific packages (e.g. package exif, package iccp).

I considered this, but it has three main problems:

  1. I would hate to use an API like this :)
  2. What's considered "metadata" is kind of broad and hierarchical.
  3. PNG, at least (I haven't squinted too hard at the other formats) have a lot of what'd be considered top-level metadata chunks which doesn't map too well to a []byte or io.Reader, which are kinda inherently single-chunked.

Using PNG as an example, it encodes random bits of text in tEXt, zTXt, and iTXt chunks. There's one chunk per thing encoded -- if you have forty different text annotations you'll have forty text-ish chunks. This includes things like the image title, author, description, copyright, creation time, and other things. (cf. https://www.w3.org/TR/2003/REC-PNG-20031110/#11keywords) There's also stuff like the image gamma, the standard RGB color space, and the chromaticities and white point, each in its own chunk.

Now, I consider all those "metadata", and there are definitely more than one of them per image. Having them all in a single []byte blob would be kind of weird and I'm not sure how to even do that in a way that's not more expensive and internally ugly than just decoding the damn things. (Or requiring the decoding code to know there are a bunch of different named data blobs, which also has scaling issues since you'd need to know the grubby details about all the formats, since PNG, GIF, and JPEG are all different in their encoding-ness)

And, to point 2, there's the issue where there's recursive decoding. For reasons that I'm sure made sense at the time the XMP data is stored (sometimes) in an iTXt chunk. If metadata was a binary blob then code would need to first decode the top-level binary blob, extract out the XMP blob from the appropriate text entry, and decode it.

Anyway, to point 1, as a user of the decoded medatata, I just want the data. I don't want to have to know how it's encoded, sub-encoded, stored, tweaked, or otherwise fragmented. Having learned more than I wanted to know about how it's encoded and stored in the different formats while digging through the specs has also convinced me that keeping user code as far as possible away from anything low-level is best.

In my code I really just want to say "gimme the !@$ metadata" and get back a metadata struct. I'm OK with it being type-specific, that's sadly inevitable, but I don't really want anything more complex than that. I'm even OK getting back an object with accessor methods to allow deferred decoding for performance reasons. And when I want to update the metadata I want to just stuff properly typed data into fields (or call setters, if I must) and have the details of serializing it blissfully hidden from me because if I need to do more than hand a png.Metadata struct to png.EncodeWithOptions I (and everyone else) will get it wrong in weird ways.

I can understand the reluctance to commit, since committing means being stuck with the support, but being really wishy-washy in the API isn't too great either.

@nigeltao
Copy link
Contributor

treat metadata as an opaque io.{Reader,Writer} (why even use []byte?)

I chose []byte because it seemed simpler (my CL is really a quick sketch, I haven't tried implementing it end-to-end), and because the encoding.Binary{Marshler,Unmarshaler} interfaces already exist. I don't know of any equivalent interfaces for io.{Reader,Writer} in the stdlib.

io.ReaderFrom does exist and sort of matches syntactically, but doesn't really match "something that can initialize itself by parsing from an io.Reader" semantically.

In any case, if PNG iTXt metadata is structured so that two iTXTt chunks mean something different than one iTXT chunk (containing the concatenation), then neither []byte nor io.Reader would work. I could imagine something like interface MetadataLoader{ LoadChunk(chunk []byte) error }, but more experimentation would be needed.

@nigeltao
Copy link
Contributor

I opted not to have individual DECODE_XXX bits in the API since it seemed to add complexity but not much value, but if folks prefer the bits I can add it.

Suppose there's a new Foo image format with a Bar metadata field. It could be a little awkward (e.g. managing version skew) if the DECODE_BAR bit needed to be defined in the stdlib but "package foo" was outside of the stdlib.

Now, I consider all those "metadata", and there are definitely more than one of them per image. Having them all in a single []byte blob would be kind of weird and I'm not sure how to even do that in a way that's not more expensive and internally ugly than just decoding the damn things

...

In my code I really just want to say "gimme the !@$ metadata" and get back a metadata struct... And when I want to update the metadata I want to just stuff properly typed data into fields

Well, it's not a single []byte blob. There'd be a map entry for gamma, a map entry for creationTime, etc. Or, if you want everything, we could possibly let you write:

kitchenSink := png.Metadata{}
img, _, err := md.DecodeWithOptions(os.Stdin, &image.DecodeOptions{
	Metadata: map[string]encoding.BinaryUnmarshaler{
		"png/*": &kitchenSink,
	},
})

The user, the package main author, doesn't have to know about []byte at all, only the package png author, or if it's more modular, the package exif author too. See src/image/example/rotate.go in https://go-review.googlesource.com/c/go/+/216799 for a lack of explicit []byte anywhere, and instead the exif.Exif type indeed has properly typed fields (ex.Rotation).

there's the issue where there's recursive decoding

Interesting. More thought needed.

In any case, as I've said, I think this discussion would be more productive if we could see the equivalent of 216799's examples (my CL) in 208559's API (your CL).

@nigeltao
Copy link
Contributor

Metadata: map[string]encoding.BinaryUnmarshaler{
	"png/*": &kitchenSink,
},

To clarify, encoding.BinaryUnmarshaler would no longer be sufficient, but see also my previous comment suggesting some sort of interface MetadataLoader.

@drswork
Copy link
Author

drswork commented Jan 31, 2020

Semi-related, we should talk about contexts, I think.

Since I'm a googler I'm used to contexts being plumbed through basically everything everywhere, and chose to use them to implement timeouts and cancellation. The upside to them is that they do this nicely. The downside is that nearly no APIs in the standard library use them so they're a little unfamiliar to folks and also it means that some of the standard patterns (like the binary marshal/unmarshal interfaces) can't be implemented. It still feels like they're the right way to go, but if they aren't then I'm going to have to go rethink how to handle cancellation and timeouts.

@nigeltao
Copy link
Contributor

nigeltao commented Feb 2, 2020

some of the standard patterns (like the binary marshal/unmarshal interfaces) can't be implemented.

Well, encoding.BinaryMarshaler works with []byte, not io.Writer, so it wouldn't care about a context.Context either way.

As for Contextualizing the stdlib, I haven't really been following the discussion (other than skimming e.g. #20280). I'd ask the golang-dev@ mailing list for advice on whether and how new stdlib API involving I/O should incorporate Contexts. For example, which is less jarring?

  • New stdlib API to ignore Contexts, despite Context being in the stdlib now.
  • New stdlib API to look inconsistent to old stdlib API, despite being in the same package.

@drswork
Copy link
Author

drswork commented Feb 3, 2020

OK, so I pinged Sameer on this and, after being hit with a clue a half dozen times I have actually noticed its existence. :)

What I'm going to do, as recommended, is fork off the image libraries, start a thread on golang-nuts about it, and just get this done. At that point it can live or die depending on whether anyone else needs the functionality or not, or whether something like it gets baked into the standard libraries.

I'll close this thread out and withdraw the proposed CL once I get this done so there's not extra mess hanging around.

@shabbyrobe
Copy link

I'll close this thread out

Does this mean the entire concept is rejected, or just that this specific idea for how to handle it has reached the end? If the former, is it worth keeping some sort of tracking issue open?

The general concept seems very useful to me and something worth having in some form in the stdlib, though I admit I have not kept up with this particular attempt in as much detail as I should have in order to contribute more beyond "yeah please let's do this or something like it at some point".

@drswork
Copy link
Author

drswork commented Feb 3, 2020

It means that I'm going to go implement a version of the image package, along with the three standard image format packages, that read and write metadata with an API based on the discussion in this issue along with the discussion on the CL. (Plus some tweaks to the API; since I don't have to maintain such strict API compliance in a fork I think I'll probably rework the API to handle multi-image files better, or at least in a way that makes me happier)

If at some point there's interest and time on the part of the library maintainers to take another look at things that's cool, and there'll be either a good or bad example to look at depending on how my implementation goes. 😊

@shabbyrobe
Copy link

Great! I'll be watching with keen interest! Can you please publish the URL here once you feel it is at a state where others can offer feedback and possibly contribute?

I would still like to request that this issue not be closed as this will remain a gap in the stdlib that I believe should be filled, even if it isn't quite time yet and there are pending experiments to evaluate.

@rsc rsc moved this from Incoming to Active in Proposals (old) Feb 5, 2020
@drswork
Copy link
Author

drswork commented Feb 9, 2020

OK, so I created a new repository, drswork/image, patched up a few bugs, copied the proposal code over, and started twiddling with it. (I'm torn at the moment whether to reexamine the API for animated GIFs and PNGs, but I'll think about that later)

Closing this out.

@drswork drswork closed this as completed Feb 9, 2020
@shabbyrobe
Copy link

@drswork - I really don't think this should be closed. Even if this specific proposal may have been somewhat withdrawn into an experimental fork, this issue still represents a real need, and perhaps your fork can help reveal the right pathway. Can we please re-open?

@smasher164
Copy link
Member

Reopening to track proposal for generic metadata support, and because there is some good discussion here.

@smasher164 smasher164 reopened this Feb 10, 2020
@rsc
Copy link
Contributor

rsc commented Feb 12, 2020

@nigeltao, do you think there's anything we should do here? If so, what? Thanks.

@nigeltao
Copy link
Contributor

do you think there's anything we should do here?

@rsc, I don't think there's anything to do.

I agree with @shabbyrobe and @smasher164 to keep this issue open for the reasons they mentioned. The linked CLs (https://go-review.googlesource.com/c/go/+/208559 and https://go-review.googlesource.com/c/go/+/216799) are abandoned, but anyone from the future who wants to take a crack at fixing this issue can still read those CL discussions (and #27830 "proposal: image: decoding options").

@rsc
Copy link
Contributor

rsc commented Feb 26, 2020

Based on the discussion above, this seems like a likely decline (work can be done in other repos, no consensus now about the APIs).

FWIW, closing the issue (assuming the proposal does get declined) marks that the proposal is not active anymore, which would be true. It will still show up in search and the links will still work.

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Feb 26, 2020
@rsc
Copy link
Contributor

rsc commented Mar 4, 2020

No change in consensus, so declined.

@rsc rsc closed this as completed Mar 4, 2020
@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Mar 4, 2020
@golang golang locked and limited conversation to collaborators Mar 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests