|
msg377698 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 14:20 |
#Just run it in Python v3.8.6 of win7 32bit
import xml.etree.ElementTree as ET
xmlstr='''<root>
<a no="1"/>
<b/>
<c/>
</root>'''
etroot = ET.fromstring(xmlstr)
for ELEMchild in etroot:
if ELEMchild.get("no") == "1" :
etroot.remove(ELEMchild) #so far so good
print (ELEMchild.tag)
#It should be : "b /n c" or "a /n b /n c",I can live with it both.
#But it is : "a /n c".
#The index of ELEMchild should not +1 when there was a remove method worked on one of the before ELEMs.
#I was forced to use while and if to control the index +1/+0.
#BTW,ELEM has no method returning index in siblings, which is buging me too.
|
|
msg377699 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 14:31 |
I assume that ElementTree doesn't support mutation while iterating.
However, the docs at https://docs.python.org/3/library/xml.etree.elementtree.html#modifying-an-xml-file show removing an item while iterating. It probably only works because the child being removed is the last one.
|
|
msg377700 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2020-09-30 14:43 |
You will get the same behavior for lists:
>>> a = [1, 2, 3]
>>> for x in a:
... if x == 1:
... a.remove(x)
... print(x)
...
1
3
Lists are iterated by index. First you get an item at index 0, then at index 1, etc, to the end of the list. Initially the list is [1, 2, 3]. After removing 1 at the first iteration it becomes [2, 3], and at the next iteration you get an item at index 1 which is 3.
|
|
msg377701 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 14:50 |
I think the only action here is to improve the documentation. That example is especially problematic.
|
|
msg377702 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2020-09-30 15:09 |
> That example is especially problematic.
No, it's not. It uses .findall(), which returns a list. It's like when you make a copy of a list to iterate over, when you want to modify the original list in the loop.
That could be made explicit in the text that introduces the example, but I think it's a very good example.
|
|
msg377703 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 15:13 |
Ah, good point. I agree the example should make that clear. And I think a note in .remove() about using it while iterating would be a good idea, too.
|
|
msg377704 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 15:19 |
I'm green hand in Coding.
But I think the problem caused by "for each in list".
Anything changed in "list" should not act at once,but should act when "for end" or break.
|
|
msg377705 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 15:22 |
Only that makes "for each in list" literally.
|
|
msg377706 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-09-30 15:24 |
The example is iterating over the list returned by root.findall(), but removing from a different data structure in root, so it won't have a problem.
|
|
msg377707 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2020-09-30 15:31 |
@WoodyWoo, you can (and should) do something like this:
for ELEMchild in list(etroot):
Modifying a container while iterating over it is usually not a good idea. The same applies to Python lists and dicts, for example.
|
|
msg377724 - (view) |
Author: (WoodyWoo) |
Date: 2020-09-30 23:27 |
I think "for each in list" should not work by index,but should work by pointer like thing,could be a list of pointers.
When the "for each in list" was acting,the pointers list would be fixed,and you need not to worry about the "list" changing.
|
|
msg377732 - (view) |
Author: (WoodyWoo) |
Date: 2020-10-01 01:59 |
@eric.smith @scoder @serhiy.storchaka Thank U all.
I get what to do,and still think the "for in" structure should rebuilding.
All three methods:
import xml.etree.ElementTree as ET
xmlstr=\
r'''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama1">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
<country name="Panama2">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>'''
print(xmlstr)
#orginal code
root = ET.fromstring(xmlstr)
for country in root.findall('country'):
rank = int(country.find('rank').text)
if rank > 50:
root.remove(country)
print("___orginal___")
for country in root.findall('country'):
print (country.get("name"))
print("^^^orginal^^^^\n")
#wrong code in my mind
root = ET.fromstring(xmlstr)
for country in root:
rank = int(country.find('rank').text)
if rank > 50:
root.remove(country)
print("___bad___")
for country in root.findall('country'):
print (country.get("name"))
print("^^^bad^^^^\n")
#my code
root = ET.fromstring(xmlstr)
index=0
count=len(root.findall("./*"))
while index <count:
rank = int (root[index].find("rank").text)
if rank>50:
root.remove(root[index])
index=index+0
count=count-1 # avoid index err
continue
index=index+1
print("___new___")
for country in root.findall('country'):
print (country.get("name"))
print("^^^new^^^^\n")
|
|
msg377756 - (view) |
Author: (WoodyWoo) |
Date: 2020-10-01 14:56 |
Could I say the mutable sequence containing not the object but the pointer like C++.
So they can changed in def functions.
|
|
msg377764 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2020-10-01 15:34 |
I can't say how ElementTree works without more checking, but this solution cannot work in general. Given a pointer to an object that's in a list, how would you get to the next item? Say the parent list-like object has a C array of pointers to the objects it contains, and removing one of the objects re-shuffles that list. How would keeping a pointer to the current object help you?
In any event, I don't think this behavior is going to change.
|
|
msg377982 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2020-10-04 23:13 |
New changeset 40db798692ca783fc2163656f196ac77e8b9e792 by scoder in branch 'master':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464)
https://github.com/python/cpython/commit/40db798692ca783fc2163656f196ac77e8b9e792
|
|
msg377984 - (view) |
Author: miss-islington (miss-islington) |
Date: 2020-10-04 23:23 |
New changeset 6bd058e0ff5d4a63fb35f6d45161cdf51cb68c58 by Miss Skeleton (bot) in branch '3.8':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464)
https://github.com/python/cpython/commit/6bd058e0ff5d4a63fb35f6d45161cdf51cb68c58
|
|
msg377989 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2020-10-04 23:57 |
New changeset d5719247ba32b3ac700454bad9a9e2cc7edeac6a by Miss Skeleton (bot) in branch '3.9':
bpo-41892: Clarify that an example in the ElementTree docs explicitly avoids modifying an XML tree while iterating over it. (GH-22464) (GH-22554)
https://github.com/python/cpython/commit/d5719247ba32b3ac700454bad9a9e2cc7edeac6a
|
|
msg377992 - (view) |
Author: Stefan Behnel (scoder) *  |
Date: 2020-10-05 04:22 |
Closing since this works as designed.
Users are responsible for avoiding concurrent tree modifications during iteration.
|