Usted está aquí: Inicio CS-Workshop CS-Workshop Blog Topics python


How to parse big XML files in Python

by Gari Araolaza — última modificación 05/03/2008 13:20
Parsing big  XML files in Python (some 60 MB is already big for me) was a bit painful until now. I used to import minidom and sometimes sax

. The problem with minidom is that the whole XML file loads into memory. Unless you have a 16GB machine, go to get a coffee, as you won't be able to do anything else until the cpu ends processing the file. If you try to do it with SAX, you have to work detecting every element start and end.  Quite crappy.


Today I learned a better solution from Erral: use lxml library. Here is an example so that you see how can we convert an XML file into a list of dicts:
from lxml import etree
coords = etree.parse("/path/to/your/xml/file").getroot()
coords_list = []
for coord in coords:
    this = {}
    for child in coord.getchildren():
        this[child.tag] = child.text
Quite straightforward, isn't it? It's already in Kelpi: XML to list of dict parsing


Paster is your friend

by Mikel Larreategi — última modificación 26/02/2008 12:58

If you are starting a new Plone project, paster is definetely your friend. Your necesary and helpful friend.paster can be seen as a code generation tool, and perhaps it can be seen as something to avoid, but it helps, and helps a lot, to write all that boilerplate code you need to write each time you start a Plone project, such as the buildout.cfg file (now that zc.buildout seems to be the de facto standard to manage both development and deployment of Plone projects), skin, css and javascript registratin in a so called theme product, or new profile and content-type registration ina content-type or archetype product.

One of the thing I like the most from paster is the '--svn-repository' option. Before using paste, I found myself many times importing incomplete projects to our svn repository or deleting and later checkouting products. Now, each time I create a new product, or egg, I only have to add a '--svn-repository=http://url-to-my-svn' and I'm done. paster creates the trunk, branches and tags structure, it checkouts the trunk, adds the files, and everything is set to start working.

paster would be nothing for Plone if ZopeSkel wouldn't exist. ZopeSkel is a collection of paster templates you can use to create your producs. For example, there is a template to create a theme product or a buildout file or an archetypes based product. To use is, you just have to invoke paster with the name of the template:

erral@lindari:/tmp$ paster create -t plone3_buildout myproject --svn-repository=http://myurl

Answer just a couple of questions and you'll have a ready-to-go buildout configuration file in your repository.

The archetype template and the support of local ZopeSkel commands (as explained by Mustapha helps you to create a new Archetypes based content-type. But not only it creates the base boilerplate code. Thanks to the local-command support, you can add new content-types, new portlets or even new browser views, just anytime after creating the project. You can add today a browser view, and tomorrow a new content-type. You just have to worry to invoke the correct `paster`command, everything else (add configure.zcml lines, new Generic Setup profile configuration files, etc) is done by paster:

erral@lindari:/tmp$ paster create -t archetype my.content
erral@lindari:/tmp$ cd my.content/
erral@lindari:/tmp$ ls
erral@lindari:/tmp/my.content$ paster addcontent --list
Available templates:
  atschema:     A handy AT schema builder
  contenttype:  A content type skeleton
  portlet:      A Plone 3 portlet
  view:         A browser view skeleton
  zcmlmeta:     A ZCML meta directive skeleton
erral@lindari:/tmp/my.content$ ls my/content/
browser    configure.zcml    portlets  tests  content  profiles
erral@lindari:/tmp/my.content$ paster addcontent contenttype
erral@lindari:/tmp/my.content$ ls my/content/content

Awesome !!!

Many people don't like code generation tools. I don't know I should call paster a code generation tool or a helpful-boilerplate-writing-avoider-tool. Is something you will want to use after trying it for the first time.