Usted está aquí: Inicio CS-Workshop CS-Workshop Blog Topics xml


How to parse big XML files in Python

by Gari Araolaza — última modificación 05/03/2008 13:20
Parsing big  XML files in Python (some 60 MB is already big for me) was a bit painful until now. I used to import minidom and sometimes sax

. The problem with minidom is that the whole XML file loads into memory. Unless you have a 16GB machine, go to get a coffee, as you won't be able to do anything else until the cpu ends processing the file. If you try to do it with SAX, you have to work detecting every element start and end.  Quite crappy.


Today I learned a better solution from Erral: use lxml library. Here is an example so that you see how can we convert an XML file into a list of dicts:
from lxml import etree
coords = etree.parse("/path/to/your/xml/file").getroot()
coords_list = []
for coord in coords:
    this = {}
    for child in coord.getchildren():
        this[child.tag] = child.text
Quite straightforward, isn't it? It's already in Kelpi: XML to list of dict parsing