Table Of Contents

imposm.parser - OpenStreetMap XML/PBF parser for Python

imposm.parser is a Python library that parses OpenStreetMap data in XML and PBF format.

It has a simple API and it is fast and easy to use. It also works across multiple CPU/cores for extra speed.

It is developed and supported by Omniscale and released under the Apache Software License 2.0.

Example

Here is an example that parses an OSM file and counts all ways that are tagged as a highway.

from imposm.parser import OSMParser

# simple class that handles the parsed OSM data.
class HighwayCounter(object):
    highways = 0

    def ways(self, ways):
        # callback method for ways
        for osmid, tags, refs in ways:
            if 'highway' in tags:
              self.highways += 1

# instantiate counter and parser and start parsing
counter = HighwayCounter()
p = OSMParser(concurrency=4, ways_callback=counter.ways)
p.parse('germany.osm.pbf')

# done
print counter.highways

Source and issue tracker

Source code and issue tracker are available at https://bitbucket.org/olt/imposm.parser/src.

Requirements

imposm.parser runs with Python 2.5, 2.6 and 2.7 and is tested on Linux and Mac OS X.

The PBF parser is written as a C extension and you need to have a C/C++ compiler, the Python libraries and Google Protobuf.

On Ubuntu:

sudo aptitude install build-essential python-devel protobuf-compiler libprotobuf-dev

Installation

You can install imposm.parser with pip or easy_install.

pip install imposm.parser
easy_install imposm.parser

Concepts

To use imposm.parser you need to understand three basic concepts: Types, Callbacks and Filter

Types

Note

In this document Node, Way, Relation with a capital refer to the OSM types and node, way, relation refer to the Imposm types.

OSM has three fundamental element types: Nodes, Ways and Relations. imposm.parser distinguishes the OSM Nodes between coords and nodes.

coords only store coordinates and there are coords for every OSM Node. nodes also store tags and there are only nodes for OSM Nodes with tags.

coords

A tuple with the OSM ID, the longitude and latitude of that node.

(4234432, 175.2, -32.1)

imposm.parser will return a coord for each OSM Node, even if this OSM Node is also a node (i.e. it has tags).

nodes

A tuple with the OSM ID, a tags dictionary and a nested tuple with the longitude and latitude of that node.

(982347, {'name': 'Somewhere', 'place': 'village'}, (-120.2, 23.21))
ways

A tuple with the OSM ID, a tags dictionary and a list of references.

(87644, {'name': 'my way', 'highway': 'path'}, [123, 345, 567])
relations

A tuple with the OSM ID, a tags dictionary and a list of member tuples. Each member tuple contains the reference, the type (one of ‘node’, ‘way’, ‘relation’) and the role.

(87644, {'type': 'multipolygon'}, [(123, 'way', 'outer'), (234, 'way', 'inner')])

Callbacks

The parser takes four callback functions for each data type (coords, nodes, ways and relations). The callbacks are optional, i.e. you don’t need to pass a relations callback if you are not interested in relations.

The functions should expect a list with zero or more items of the corresponding type.

Here is an example callback that prints the coordinates of all Nodes.

def coords_callback(coords):
  for osm_id, lon, lat in coords:
    print '%s %.4f %.4f' % (osm_id, lon, lat)

Tag filters

Tag filter are functions that manipulate tag dictionaries. The functions should modify the dictionary in-place, the return value is ignored.

Elements will be handled different, if you remove all tags from the dictionary. nodes and relations with empty tags will not be returned, but ways will be, since they might be needed for building relations.

Here is an example filter that filters the tags with a whitelist.

whitelist = set(('name', 'place', 'amenity'))

def tag_filter(tags):
  for key in tags.keys():
    if key not in whitelist:
      del tags[key]
  if 'name' in tags and len(tags) == 1:
    # tags with only a name have no information
    # how to handle this element
    del tags['name']

Parsing API

Imposm comes with a single OSMParser class that implements a simple to use, callback-based parser for OSM files.

It supports XML and PBF files. It also supports BZip2 compressed XML files.

Concurrency

The parser uses multiprocessing to distribute the parsing across multiple CPUs. This does work with PBF as well as XML files.

You can pass the concurrency as an argument to OSMParser and it defaults to the number of CPU and cores of the host system. concurrency defines the number of parser processes. The main process where the callbacks are handled and the decompression (if you have a .bzip2 file) are handled in additional processes. So you might get better results if you reduce this number on systems with more than two cores.

You can double the number on systems with hyper threading CPUs.

API

class imposm.parser.OSMParser(concurrency=None, nodes_callback=None, ways_callback=None, relations_callback=None, coords_callback=None, nodes_tag_filter=None, ways_tag_filter=None, relations_tag_filter=None, marshal_elem_data=False)

High-level OSM parser.

Parameters:
  • concurrency – number of parser processes to start. Defaults to the number of CPUs.
  • xxx_callback – callback functions for coords, nodes, ways and relations. Each callback function gets called with a list of multiple elements. See callback concepts.
  • xxx_filter – functions that can manipulate the tag dictionary. Nodes and relations without tags will not passed to the callback. See tag filter concepts.
parse(filename)

Parse the given file. Detects the filetype based on the file suffix. Supports .pbf, .osm and .osm.bz2.

parse_pbf_file(filename)

Parse a PBF file.

parse_xml_file(filename)

Parse a XML file. Supports BZip2 compressed files if the filename ends with .bz2.