-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/xml: add generic representation of XML data #13504
Comments
Just found something similar here: Would be great to have something of that kind for XMLs as well. |
I agree: package xml could use something like package json's ability to unmarshal into interface{}. What that is I'm not sure. I think a reasonable starting point would be something minimal like:
I'm not sure it makes sense to add all kinds of navigation help on top of this. The more specialized it is, the more likely it is to not work for a significant number of users. Edit: fixed s/*Node/*Element/ in the comment on Child. |
I agree about simplification. Maybe node types can follow what the tokenizer currently returns. You can see my implementation here (that's what I use for my work). It offers the information that the tokenizer can handle plus children and parent, with only a single exported type (Node). That's the simplest interface I could come up with for representing the different types of data. Would be happy to hear your opinion. |
I saw your implementation, and I think it's too complex. There's no need for an interface. What I wrote above is significantly simpler and it does follow what the tokenizer currently returns. |
Thanks for your comment! I trust your solution so I won't argue too much, but can you explain why your approach is simpler? As far as I understand it exposes more types (involves users with token types while my representation encapsulates them), and requires type-checks in code that uses it. How is that making things easier for users? To demonstrate, here is what traversal looks like using my library:
And with your solution I guess it would look something like (correct me if I am wrong):
Can you specify use cases that would be simpler to solve using your approach? My library is always available so I won't argue :) but I am happy to learn from your insights. |
What's simpler is how much users have to read to get started and how much
they have to remember to keep using the API. Better to reuse the concepts
already present in the API (specifically in the definition of Token) than
to create a whole separate API with a rather large interface definition to
learn and understand.
|
Understood. Thanks for sharing your thoughts! |
If we truly want a way to losslessly represent XML data, don't we need a way to represent empty-element tags? I.E. we need to have a way to distinguish between and . Or perhaps that level of losslessness isn't a requirement for what you are proposing. |
Oops. Let's try that again. I.E. we need to have a way to distinguish |
I don't think we're talking about lossless. If you needed lossless you'd also have to keep track of how every literal character was written: A vs A and so on. |
Makes sense. I just assumed that when @fluhus said "It would be helpful to have a more 'natural' data-object for XML data, so that all the information is preserved," he really meant all |
Is anyone interested in moving this forward? If so, please write a proposal doc. |
Ping? |
I will begin writing a design doc since I use the XML package heavily for XMPP which I think could benefit from a more tree-like API. EDIT: Removed the "if no one else is interested" disclaimer; I will start writing a design doc either way, if someone else already has something or wants to handle it themselves, feel free to ping me. EDIT 2: Quickly threw together a draft: https://go-review.googlesource.com/c/30364/ happy to continue working on it or turn over work to one of the original issue participants. It currently does exactly what was proposed by rsc above. Current usage would look something like this: // A simple XMPP message; don't worry about the syntax, what's important is that it's XML.
const msg = `<message type="chat" to="notviola@chat.shakespeare.lit" from="feste@shakespeare.lit">
<body>Foolery, sir, does walk about the orb, like the sun; it shines everywhere.</body>
<thread>0297358d-df91-4741-9435-c3783ec456ba</thread>
</message>`
d := xml.NewDecoder(strings.NewReader(msg))
tok, _ := d.Token()
// Decode full element:
// el, err := d.Element(tok.(StartElement))
// Decode partial element (we only care about the body)
el := xml.Element{StartElement: tok.(StartElement)}
for ; err == nil; tok, err = d.Token() {
if start, ok := tok.(StartElement); ok && start.Name.Local == "body" {
child := xml.Child{}
_ = xml.DecodeElement(&child, start)
el.Child = append(el.Child, child)
return
}
} The only real problem I see with this API (which may not be a problem at all) is that it doesn't really simplify things over working with the raw token stream all that much when we only want to partially decode an element, although this may be outside of the scope of this proposal and not something we care to solve right now (if at all). |
CL https://golang.org/cl/29861 mentions this issue. |
Not sure how, but I appear to have overwritten the proposal with a completely different one and didn't notice. Correct CL is now submitted. |
CL https://golang.org/cl/30364 mentions this issue. |
See golang/go#13504 Change-Id: Ie9877b10ae3eed8ad5e5763d35e48d94c6f8f584 Reviewed-on: https://go-review.googlesource.com/30364 Reviewed-by: Russ Cox <rsc@golang.org>
Submitted the proposal CL by @SamWhited, now at https://golang.org/design/13504-natural-xml. I think it seems like a reasonable start (unsurprisingly). I'd also like to make sure we can marshal those back; I assume that's straightforward. @SamWhited, if you're still interested and want to sketch an implementation, that seems like a reasonable next step. Now that the proposal is viewable online I'll wait to see if there are more comments here. My guess is that we're targeting Go 1.10 for this but it's fine to have CLs out for review during this cycle. If it's super-easy and uncontroversial we could think about Go 1.9. Thanks. |
Also, apologies for the long delay here. Catching up with some proposals that fell to the bottom of the "I need to read and think about this" stack. |
Oh wow, I'd forgotten about this one again; thanks Russ. I'll see if I can't dig up some old code or knock together a CL in the next few weeks. |
CL https://golang.org/cl/37945 mentions this issue. |
Done ⤴ I've also submitted a related, but ultimately orthogonal proposal to futher improve the experience while using the XML library: XML stream tracking issue: https://golang.org/issue/19480 (#19480) |
I'm sufficiently happy with where the CL is headed, and there's been no objection here, that I think we can accept this proposal. |
@rsc, can you review the open CL? https://go-review.googlesource.com/c/37945/ |
Ping. What are the next steps here? |
I also agree that there should be XML parsing in golang in the standard library, to be honest. |
any updates on this ? |
It would be helpful to have a more 'natural' data-object for XML data, so that all the information is preserved. Something like the DOM nodes in javascript.
Here is an implementation of what I would expect an XML parser to return:
https://github.com/fluhus/gostuff/tree/master/xmlnode
What do you think? Can we add such a feature to the standard XML parser?
The text was updated successfully, but these errors were encountered: