Archive for the 'Python' Category

Aug 10 2009

Nice examples of ESRIs geoprocessing python module (9.3)

Published by perrygeo under ESRI, Python

Just thought I’d point out a great presentation about the “new” 9.3 geoprocessing (gp) python module from ESRI.

Ghislain Prince and Elizabeth Flanary do a great job of introduction by examples. The latest gp module is much more pythonic and these examples show how to leverage that to its full advantage. If you try to do this with older gp versions, the code would make most pythonistas cringe. This latest version returns objects and lists, use real booleans, and uses true objects instead of funky string parameters. Basic OO stuff for most python libraries but a big improvement for gp.

Here’s the powerpoint presentation. Thanks to Jamey Rosen for the tip!

No responses yet

Jun 16 2009

IronPython (2.6) and ArcGIS - ready for prime time!!

Published by perrygeo under ESRI, Python

Not sure why this didn’t occur to me before I wrote that last post but I tried the “pythonic” version of the code under the IronPython 2.6 Beta 1 release and it works!

lyr = Carto.LayerFileClass()
lyr.Open('C:\\test.lyr')
print lyr.Filename

Works perfectly now. So IronPython 2.6 promises to be a viable option for extending ArcGIS. My enthusiasm has been renewed.

4 responses so far

Jun 16 2009

IronPython and ArcGIS - not quite ready for prime time

Published by perrygeo under ESRI, Python

Occasionally I find myself in the C#/.NET world in order to write code using ESRI ArcObjects. Today I was toying with the idea of automating the creation of ESRI Layer files (a file which defines the cartographic styling of a dataset). Of course they are in an undocumented binary file format, inaccessible to anything but ESRI software. So I pop open Visual Studio ….

I feel a nagging unease every time I type a set of curly braces. And VB just makes me insane. I prefer, of course, to use python. Luckily there is IronPython which runs on .NET - which means I could theoretically use it to interact with ArcGIS.

I only found a single working example of using ArcObjects through IronPython. But it looked promising enough to close Visual Studio and give it a go.

The first nagging problem is an IronPython-specific one. Relatively minor annoyance but you have to add the reference to a .NET assembly (library) before you can load it.

import clr
clr.AddReference('ESRI.ArcGIS.System')
clr.AddReference('ESRI.ArcGIS.Carto')
from ESRI.ArcGIS import esriSystem
from ESRI.ArcGIS import Carto

Now there is the issue of grabbing an ESRI license. A little verbose IMO but it could easily be encapsulated in a helper function to clean things up.

aoc = esriSystem.AoInitializeClass()
res = esriSystem.IAoInitialize.IsProductCodeAvailable(aoc,
         esriSystem.esriLicenseProductCode.esriLicenseProductCodeArcView)
if res == esriSystem.esriLicenseStatus.esriLicenseAvailable:
    esriSystem.IAoInitialize.Initialize(aoc,
      esriSystem.esriLicenseProductCode.esriLicenseProductCodeArcView)

Now that we’ve satisfied the demands of our proprietary license overlords, we can proceed with the real work .. in this case I just want to open an existing Layer file and see if the resulting object knows it’s own file path. Really simple, right?

lyr = Carto.LayerFileClass()
if "Open" in dir(lyr): print "The Layer object has an Open method but...."
lyr.Open('C:\\test.lyr')
print lyr.Filename

The Layer object has an Open method but....
Traceback (most recent call last):
 File "“, line 1, in 
AttributeError: ‘GenericComObject’ object has no attribute ‘Open’

Hrm. Looks like we’ve run across bug 1506 which doesn’t allow access to the properties and methods of a given instance - instead your have to work through the functions provided by the implementation. Grr…

Carto.ILayerFile.Open(lyr, 'C:\\test.lyr')
print Carto.ILayerFile.Filename.GetValue(lyr)

That is unwieldy, ugly and unpythonic. What’s the point of object oriented programming if you can’t access the methods and properties of an object directly? Since all ArcObjects applications are based on extending COM interfaces, this would be a major pain in any non-trivial application. Basically, until these .NET-accessible COM objects can be treated in a pythonic way, I don’t see any compelling reason to pursue IronPython and ArcGIS integration. Looks like its back to C# for the moment … (/me take a deep sigh and opens Visual Studio) … unless of course anyone has some brilliant solution to share!!

3 responses so far

Apr 19 2008

A quick Cython introduction

Published by perrygeo under Python, Uncategorized

I love python for its beautiful code and practicality. But it’s not going to win a pure speed race with most languages. Most people think of speed and ease-of-use as polar opposites - probably because they remember the pain of writing C. Cython tries to eliminate that duality and lets you have python syntax with C data types and functions - the best of both worlds. Keeping in mind that I’m by no means an expert at this, here are my notes based on my first real experiment with Cython:

EDIT: Based on some feedback I’ve received there seems to be some confusion - Cython is for generating C extensions to Python not standalone programs. The whole point is to speed up an existing python app one function at a time. No rewriting the whole application in C or Lisp. No writing C extensions by hand. Just an easy way to get C speed and C data types into your slow python functions.


So lets say we want to make this function faster. It is the “great circle” calculation, a quick spherical trig problem to calculate distance along the earth’s surface between two points:

p1.py

import math

def great_circle(lon1,lat1,lon2,lat2):
    radius = 3956 #miles
    x = math.pi/180.0

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = math.acos((math.cos(a)*math.cos(b)) +
                  (math.sin(a)*math.sin(b)*math.cos(theta)))
    return radius*c

Lets try it out and time it over 1/2 million function calls:

import timeit  

lon1, lat1, lon2, lat2 = -72.345, 34.323, -61.823, 54.826
num = 500000

t = timeit.Timer("p1.great_circle(%f,%f,%f,%f)" % (lon1,lat1,lon2,lat2),
                       "import p1")
print "Pure python function", t.timeit(num), "sec"

About 2.2 seconds. Too slow!

Lets try a quick rewrite in Cython and see if that makes a difference:
c1.pyx

import math

def great_circle(float lon1,float lat1,float lon2,float lat2):
    cdef float radius = 3956.0
    cdef float pi = 3.14159265
    cdef float x = pi/180.0
    cdef float a,b,theta,c

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = math.acos((math.cos(a)*math.cos(b)) + (math.sin(a)*math.sin(b)*math.cos(theta)))
    return radius*c

Notice that we still import math - cython lets you mix and match python and C data types to some extent. The conversion is handled automatically though not without cost. In this example all we’ve done is define a python function, declare its input parameters to be floats, and declare a static C float data type for all the variables. It still uses the python math module to do the calcs.

Now we need to convert this to C code and compile the python extension. The best way to do this is through a setup.py distutils script. But we’ll do it the manual way for now to see what’s happening:

# this will create a c1.c file - the C source code to build a python extension
cython c1.pyx

# Compile the object file
gcc -c -fPIC -I/usr/include/python2.5/ c1.c

# Link it into a shared library
gcc -shared c1.o -o c1.so

Now you should have a c1.so (or .dll) file which can be imported in python. Lets give it a run:

    t = timeit.Timer("c1.great_circle(%f,%f,%f,%f)" % (lon1,lat1,lon2,lat2),
                     "import c1")
    print "Cython function (still using python math)", t.timeit(num), "sec"

About 1.8 seconds. Not the kind of speedup we were hoping for but its a start. The bottleneck must be in the usage of the python math module. Lets use the C standard library trig functions instead:

c2.pyx

cdef extern from "math.h":
    float cosf(float theta)
    float sinf(float theta)
    float acosf(float theta)

def great_circle(float lon1,float lat1,float lon2,float lat2):
    cdef float radius = 3956.0
    cdef float pi = 3.14159265
    cdef float x = pi/180.0
    cdef float a,b,theta,c

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = acosf((cosf(a)*cosf(b)) + (sinf(a)*sinf(b)*cosf(theta)))
    return radius*c

Instead of importing the math module, we use cdef extern which uses the C function declarations from the specified include header (in this case math.h from the C standard library). We’ve replaced the calls to some of the expensive python functions and are ready to build the new shared library and re-test:

    t = timeit.Timer("c2.great_circle(%f,%f,%f,%f)" % (lon1,lat1,lon2,lat2),
                     "import c2")
    print "Cython function (using trig function from math.h)", t.timeit(num), "sec"

Now that’s a bit more like it. 0.4 seconds - a 5x speed increase over the pure python function. What else can we do to speed things up? Well c2.great_circle() is still a python function which means that calling it incurs the overhead of the python API, constructing the argument tuple, etc. If we could write it as a pure C function, we might be able to speed things up a bit.

c3.pyx

cdef extern from "math.h":
    float cosf(float theta)
    float sinf(float theta)
    float acosf(float theta)

cdef float _great_circle(float lon1,float lat1,float lon2,float lat2):
    cdef float radius = 3956.0
    cdef float pi = 3.14159265
    cdef float x = pi/180.0
    cdef float a,b,theta,c

    a = (90.0-lat1)*(x)
    b = (90.0-lat2)*(x)
    theta = (lon2-lon1)*(x)
    c = acosf((cosf(a)*cosf(b)) + (sinf(a)*sinf(b)*cosf(theta)))
    return radius*c

def great_circle(float lon1,float lat1,float lon2,float lat2,int num):
    cdef int i
    cdef float x
    for i from 0 < = i < num:
        x = _great_circle(lon1,lat1,lon2,lat2)
    return x

Notice that we still have a python function wrapper (def) which takes an extra argument, num. The looping is done inside this function with for i from 0 < = i < num: instead of the more pythonic but slower for i in range(num):. The actual work is done in a C function (cdef) which returns float type. This runs in 0.2 seconds - a 10x speed boost over the original python function.

Just to confirm that we’re doing things optimally, lets write a little app in pure C and time it:

#include <math .h>
#include <stdio .h>
#define NUM 500000

float great_circle(float lon1, float lat1, float lon2, float lat2){
    float radius = 3956.0;
    float pi = 3.14159265;
    float x = pi/180.0;
    float a,b,theta,c;

    a = (90.0-lat1)*(x);
    b = (90.0-lat2)*(x);
    theta = (lon2-lon1)*(x);
    c = acos((cos(a)*cos(b)) + (sin(a)*sin(b)*cos(theta)));
    return radius*c;
}

int main() {
    int i;
    float x;
    for (i=0; i < = NUM; i++)
        x = great_circle(-72.345, 34.323, -61.823, 54.826);
    printf("%f", x);
}

Now compile it with gcc -lm -o ctest ctest.c and test it with time ./ctest… about 0.2 seconds as well. This gives me confidence that my Cython extension is at least as efficient as my C code (which probably isn’t saying much as my C skills are weak).


Some cases will be more or less optimal for cython depending on how much looping, number-crunching and python-function-calling are slowing you down. In some cases people have reported 100 to 1000x speed boosts. For other tasks it might not be so helpful. Before going crazy rewriting your python code in Cython, keep this in mind:

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” — Donald Knuth

In other words, write your program in python first and see if it works alright. Most of the time it will… some times it will bog down. Use a profiler to find the slow functions and re-implement them in cython and you should see a quick return on investment.

Links:
WorldMill - a python module by Sean Gillies which uses Cython to provide a fast, clean python interface to the libgdal library for handling vector geospatial data.

Writing Fast Pyrex code (Pyrex is the predecessor of Cython with similar goals and syntax)

18 responses so far

Oct 20 2007

Turning Ubuntu into a GIS workstation

It just keeps getting easier and easier to get a fully functional open source GIS workstation up and running thanks to Ubuntu. The following instructions will take your vanilla installation of Ubuntu 7.10 and add the following top-notch desktop GIS applications:

  • Postgresql/PostGIS : a relational database with vector spatial data handling
  • GRASS : A full blown GIS analysis toolset
  • Quantum GIS: A user-friendly graphical GIS application
  • GDAL, Proj, Geos : Libraries and utilities for processing spatial data
  • Mapserver : web mapping program and utilites
  • Python bindings for QGIS, mapserver and GDAL
  • GPSBabel : for converting between various GPS formats
  • R : a high-end statistics package with spatial capabilities
  • GMT : the Generic Mapping Tools for automated high-quality map output

While this is not a comprehensive list of open source GIS software, these packages cover most of my needs. If you want to live on the bleeding edge and have to have the absolute latest versions, you’ll be better off installing these from source. But for those of us that want a stable and highly functional GIS workstation with minimal fuss, this is the way to go:

  1. Go to System > Administration > Software Sources and make sure the universe and multiverse repositories are turned on. Close the window and the list of available software packages will be refreshed.
  2. Open up a terminal (ie the command line) via Applications > Accessories > Terminal and type the following:

    sudo apt-get -y install qgis grass qgis-plugin-grass mapserver-bin gdal-bin cgi-mapserver \
    python-qt4 python-sip4 python-gdal python-mapscript gmt gmt-coastline-data \
    r-recommended gpsbabel shapelib libgdal1-1.4.0-grass

    The sudo part indicates that the command will be run as the administrator user, apt-get -y install is the command telling it to install the list of packages and answer yes to any questions that pop up.

  3. There is one package that is worth upgrading to the latest and greatest - Quantum GIS. The latest version (0.9) is due out very shortly and has the ability to write plugins using the python programming language. A big plus!

    Download the latest build from http://qgis.org/uploadfiles/testbuilds/qgis0.9.0.debs_ubuntu_gutsy.tar.gz and extract it ( right-click > Extract Here ). In the directory you’ll see 4 .deb files, only 3 of which you’ll need unless you plan on doing any development work.

    Double click libqgis1_0.9.0_i386.deb and you’ll get a message saying an older version is available from directly from ubuntu. We already know this so just close and ignore it. Click Install Package and wait for it to complete then close out.

    Repeat for qgis_0.9.0_i386.deb and qgis-plugin-grass_0.9.0_i386.deb (in that order).

And there we have it, about 15 minutes depending on your internet speed and you’ve installed a high-end GIS workstation built completely on free and open source software.

19 responses so far

Oct 19 2007

Update to QGIS Geocoding plugin

Published by perrygeo under Python, QGIS, Uncategorized

With the release of QGIS 0.9 imminent , I decided to install in on Windows XP and noticed that the geocoding plugin was failing… sure enough I had hardcoded linux temporary directories. So I reworked the python code to determine the temp dir in a more cross-platform way (using tempfile.gettempdir() ) and it works fine.

The update can be downloaded here.

Assuming you’ve installed qgis in the standard location, just unzip this into C:\Program Files\Quantum GIS\python\plugins (windows) or /usr/share/qgis/python/plugins (Linux) and you should be good to go. Note that you’ll have to create the “plugins” directory if it doesn’t exist.

2 responses so far

Sep 18 2007

Parallel python and GIS

Published by perrygeo under Python

Let’s face it - processing speeds aren’t going to be increasing according to Moore’s Law anymore; Instead of faster CPUs, we’ll be getting more of them. The future of programming, it seems to me, lies in the ability to leverage multiple processors. In other words, we have to write parallel code. Until I read Seans’ post, I was unware that there was a viable python solution. I had been growing quite dissillusioned by python’s dreaded Global Interpreter Lock which confines python to a single processing core. I’ve even started learning Erlang to leverage SMP processing (until I realized that Erlang and it’s standard libraries are virtually useless for anything that needs to handle geospatial data).

So I gave Parallel Python (pp) a shot. Since Sean also offered up a bounty for the first GIS application that used pp, I thought it might be a good time to try ;-)

A good candidate for parallel processing is any application that has to crunch away on lists/arrays of data and whose individual members be handled independently (see pmap in Erlang). I have been working on an application to smooth linework using bezier curves. It’s not quite polished yet but the image below shows the before and after

… but bezier curves aren’t quite the subject of this post. Let’s just say the algorithm takes some time to compute (if you’re using a high density of verticies) and can be handled one LineString feature at a time. This makes it a prime candidate for parallelization.

Given a list of input LineStrings, I could process them the sequential way:

smooth_lines = [] for line in lines: smooth_lines.append( calcBezierFromLine( line, num_bezpts, beztype, t) )

Or use pp to start up a “job server” which doles the tasks out to as many “workers”. A busy worker utilizes a single processing core so a good rule of thumb would be to start up as many workers as you have CPU cores:

numworkers = 2 # dual-core machine job_server = pp.Server(numworkers, ppservers=ppservers) smooth_lines = [] jobs = [(line, job_server.submit(calcBezierFromLine, (line, num_bezpts, beztype, t), \ (computeBezier, getPointOnCubicBezier), (”numpy”,) )) for line in lines] for input, job in jobs: smooth_lines.append( job() )

Theoretically the parellized version should run twice as fast as the sequential version on my core2 duo machine. And reality was pretty darn close to that:

$ time python bezier_smooth_pp.py 2 Shapefile contains 1114 lines Starting pp with 2 workers Completed 1114 new lines with 8 additional verticies for each line segment along a cubic bezier curve real 0m10.908s … $ time python bezier_smooth_pp.py 1 Shapefile contains 1114 lines Starting pp with 1 workers Completed 1114 new lines with 8 additional verticies for each line segment along a cubic bezier curve real 0m20.007s …

Just think of the possibilities. In the forseeable future, the average computer might have 8+ cores to work with. This could mean that your app will move 8x faster if you parallize the code (assuming there are no IO or bandwidth bottlenecks). I’d love to test it out on a system with more than 2 processing cores but, unfortunately, I don’t have access to any beowulf clusters, Sun UltraSparc servers, or 8-core Xeon Mac Pros. This is what I really need to complete my research ;-) So if anyone want to donate to the cause, send me an email!

And to answer Sean’s bounty, I don’t consider this an actual application (yet) but I hope it can spur some interest and move things in that direction. But if you feel the need to send me some New Belgium swag (or one of the machines listed above), feel free ;-)

3 responses so far

Jun 10 2007

OGR and matplotlib examples

Published by perrygeo under Python, postgis

Jose Gomez-Dans posted a great example of using OGR, Postgis and Matplotlib with Python - OGR, Python y Matplotlib (Spanish only).

One response so far

May 28 2007

QGIS Geocoding plugin

Published by perrygeo under Python, QGIS, Software

A few weeks back, I decided to take the plunge and learn the python bindings for QGIS 0.9. My first experiment was to implement a geocoder plugin. What started mostly as a learning experiment turned into something that might actually be useful!

The idea was to use web services to do all the actual geocoding work (the hard part!) and the delimited text provider to load the results into qgis. Right now it’s built on top of the Yahoo geocoder which is, IMO, the best out there.. very flexible about the input format. The geopy module is used to interact with the geocoding services so it could potentially support other engines such as geocoder.us, virtual earth, google, etc.

The user interface is very straightforward; enter list of addresses/placenames seperated by a line break, pick an output file and go. To be legitimate, you should also sign up for a yahoo api key, though the ‘YahooDemo’ key will work ok for testing purposes.

Here’s the install process (assuming you already have python, pyqt4, qgis 0.9, qgis bindings, etc. set up):

 svn checkout http://perrygeo.googlecode.com/svn/trunk/qgis/geocode
 cd geocode
 emacs Makefile # change install directory if needed
 sudo make install

This is just a rough cut and it’s my first attempt at using the qgis and qt apis so there are probably many things that could be improved upon. Ideally this plugin could:

  • Parse text files as input
  • Allow for a choice of geocoding engine
  • ???

Feedback (and patches) welcome ;-)

7 responses so far

May 27 2007

Python gpsd bindings

Published by perrygeo under Python

If you want to get a linux/unix machine talking to your GPS unit, most likely you’ll be using gpsd. There are many great apps that build off of gpsd such as kismet and gpsdrive.

Installing gpsd on debian/ubuntu systems is as simple as

sudo apt-get install gpsd gpsd-clients

You should be able to connect your gps via serial port and start a gpsd server

sudo gpsd /dev/ttyS0

The gpsd server reads NMEA sentences from the gps unit and is accessed on port 2947. You can test if everything is working by running a pre-built gpsd client such as xgps.

This is very useful for situations where you need lower-level access to the gps data; for logging your position to a postgres database for example. The debian packages (and most others I’m assuming) come with gps.py, a python interface to gpsd allowing you to pull your lat/long from the gps in real time. This opens the door for all sorts of neat real-time gps apps.

import gps, os, time

session = gps.gps()

while 1:
    os.system('clear')
    session.query('admosy')
    # a = altitude, d = date/time, m=mode,
    # o=postion/fix, s=status, y=satellites

    print
    print ' GPS reading'
    print '----------------------------------------'
    print 'latitude    ' , session.fix.latitude
    print 'longitude   ' , session.fix.longitude
    print 'time utc    ' , session.utc, session.fix.time
    print 'altitude    ' , session.fix.altitude
    print 'eph         ' , session.fix.eph
    print 'epv         ' , session.fix.epv
    print 'ept         ' , session.fix.ept
    print 'speed       ' , session.fix.speed
    print 'climb       ' , session.fix.climb

    print
    print ' Satellites (total of', len(session.satellites) , ' in view)'
    for i in session.satellites:
        print '\t', i

    time.sleep(3)

… which gives you a simple readout to the terminal every 3 seconds.

Obviously there are much more interesting applications for this ( logging data to postgis, displaying real-time tracking data in QGIS via a python plugin, etc). But this is a good start for any python based app.

13 responses so far

Next »