Questions about XML processing?
No, it is XML metadata. I also believe there should be a better way using
[@...] expressions in the path.
Den l?r 7 nov. 2020 13:14Shaozhong SHI <shishaozhong at gmail.com> skrev:
> Hi, Hernan,
> Did you try to parse GML?
> Surely, there can be very concise and smart ways to do these things.
> On Fri, 6 Nov 2020 at 20:57, Hern?n De Angelis <
> variablestarlight at gmail.com> wrote:
>> Thank you Terry, Dan and Dieter for encouraging me to post here. I have
>> already solved the problem albeit with a not so efficient solution.
>> Perhaps, it is useful to present it here anyway in case some light can
>> be added to this.
>> My job is to parse a complicated XML (iso metadata) and pick up values
>> of certain fields in certain conditions. This goes for the most part
>> well. I am working with xml.etree.elementtree, which proved sufficient
>> for the most part and the rest of the project. JSON is not an option
>> within this project.
>> The specific trouble was in this section, itself the child of a more
>> complicated parent: (for simplicity tags are renamed and namespaces
>> <string>Something else</string>
>> <code blah lots of strange things blah />
>> Basically, I have to get what is in tagC/string but only if the value of
>> tagC/note/title/string is "value". As you see, there are several tagC,
>> all children of tagB, but tagC can have different meanings(!). And no, I
>> have no control over how these XML fields are constructed.
>> In principle it is easy to make a "findall" and get strings for tagC,
>> and then get the content and append in case there is more than one
>> tagC/string like: "Something, Something else".
>> However, the hard thing to do here is to get those only when
>> tagC/note/title/string='value'. I was expecting to find a way of
>> specifying a certain construction in square brackets, like
>> [@string='value'] or [@/tagC/note/title/string='value'], as is usual in
>> XML and possible in xml.etree. However this proved difficult (at least
>> for me). So this is the "brute" solution I implemented:
>> - find all children of tagA/tagB
>> - check if /tagA/tagB/tagC/note/title/string has "value"
>> - if yes find all tagA/tagB/tagC/string
>> In quasi-Python:
>> string = 
>> element0 = elem.findall("./tagA/tagB/")
>> for element1 in element0:
>> element2 = element1.find("./tagA/tagB/tagC/note/title/string")
>> if element2.text == 'value'
>> element3 = element1.findall("./tagA/tagB/tagC/string)
>> for element4 in element3:
>> Crude, but works. As I wrote above, I was wishing that a bracketed
>> clause of the type [@ ...] already in the first "findall" would do a
>> more efficient job but alas my knowledge of xml is too rudimentary.
>> Perhaps something to tinker on in the coming weeks.
>> Have a nice weekend!
>> On 2020-11-06 20:10, Terry Reedy wrote:
>> > On 11/6/2020 11:17 AM, Hern?n De Angelis wrote:
>> >> I am confronting some XML parsing challenges and would like to ask
>> >> some questions to more knowledgeable Python users. Apparently there
>> >> exists a group for such questions but that list (xml-sig) has
>> >> apparently not received (or archived) posts since May 2018(!). I
>> >> wonder if there are other list or forum for Python XML questions, or
>> >> if this list would be fine for that.
>> > If you don't hear otherwise, try here. Or try stackoverflow.com and
>> > tag questions with python and xml.