Archive for the 'Python' Category

May 19 2007

Sparklines in python

Published by perrygeo under Python

Edward Tufte, the outspoken guru of data visualization, has long been an advocate of clear and concise (almost minimalist) graphical representations of data. He’s got a lot of great ideas relevant to cartography (my cartography course at Humboldt State used his book “The Visual Display of Quantitative Information” as our text).

One of the coolest ideas are “sparklines” which he describes as “data-intense, design-simple, word-sized graphics”. Instead of standalone charts that are often placed on their own and separate from the text that discusses them, sparklines are meant to be placed in-line with the text and provide memorable, simple and contextually-relevant data to support the surrounding text. For example:

The US National Debt as a percentage of GDP increased during the Reagan and Bush presidencies but dropped off slightly during the Clinton administration .

Now of course I had to figure out how to produce these in python. Theres a great cgi application, written in python by Joe Gregorio, that does sparklines. I needed something that was abstracted away from the CGI framework, more of a proper python module. Replacing all the CGI-specific code was straightforward and I came up with a standalone sparkline python module ( View / Download the Source Code. ) The only dependencies are python and the python imaging library.

In the minimalist spirt of sparklines, the interface was kept simple. First you create a list of data values then simply pass the list to one of the sparkline generators:

import spark
a = [32.5,35.2,39.9,40.8,43.9,48.2,50.5,51.9,53.1,55.9,60.7,64.4]
spark.sparkline_smooth(a).show()

Or if you prefer a more discrete, bar-graph-style instead of a smooth line:

spark.sparkline_discrete(a).show()

There’s plenty of room for configuration. For example, in the national debt example above I wanted to keep the y axis at the same scale (instead of the default min-max scaling) and make each step 6 pixels wide:

spark.sparkline_smooth(a, dmin=30,dmax=70, step=6).show()

How does this relate to cartography? GIS typically takes a snapshot representation of earth, frozen in time. Since sparklines seem particularly good at representing change-over-time, it could be an interesting way to add a time dimension to a 2-D map. For example, instead of just displaying country polygons with labels, you could place a sparkline right under the label showing the population changes over the last century. It seems like it would be an ideal way to embed alot of useful information into a small map.

Anyone know of any good examples?

6 responses so far

May 18 2006

More on Mapnik WMS

Published by perrygeo under Python, WMS

One of my initial complaints about the Mapnik WMS server was that it would not accept any parameters that were not in the OGC WMS spec. Some WMS clients will tag on extra parameters for various reasons and the OGC supports this in relation to vendor-specific parameters. The fix was pretty simple;in mapnik/ogcserver/common.py you can simply comment out

#for paramname in params.keys():
# if paramname not in self.SERVICE_PARAMS[requestname].keys():
# raise OGCException(’Unknown request parameter “%s”.’ % paramname)

to get the desired effect.


There was also the question of speed and how it compared to other WMS servers such as Mapserver. Since I already had both a Mapnik and Mapserver WMS set up using the exact same data source, styled in the same fashion, it was pretty simple to write a quick python script that would smack each WMS server with a given number of back-to-back WMS GetMap requests:

#!/usr/bin/env python
import urllib

server = sys.argv[1]
hits = int(sys.argv[2])

if server == 'mapnik':
    url = "http://localhost/fcgi-bin/wms?VERSION=1.1.1&REQUEST=GetMap&SERVICE=WMS&LAYERS=world_borders&SRS=EPSG:4326&BBOX=-4.313249999999993,20.803500000000003,59.58675000000002,52.75350000000002&WIDTH=800&HEIGHT=400&FORMAT=image/png&STYLES=&TRANSPARENT=TRUE&UNIQUEID="
elif server == 'mapserver':
    url = "http://localhost/cgi-bin/mapserv?map=/home/perrygeo/mapfiles/world.map&VERSION=1.1.1&REQUEST=GetMap&SERVICE=WMS&LAYERS=worldborders&SRS=EPSG:4326&BBOX=-4.313249999999993,20.803500000000003,59.58675000000002,52.75350000000002&WIDTH=800&HEIGHT=400&FORMAT=image/png&STYLES=&TRANSPARENT=TRUE&UNIQUEID="

for i in range(0,hits):
    urllib.urlretrieve(url)

Then just run the script from the command line, specifying the server and number of hits, and wrap it in the time command. Here are the results:

Pretty close. Mapserver was just slightly faster in every case. Now this is just a preliminary test and it would be interested to see a comparison:

  • With larger datasets and more complex styling including classification and text labelling
  • With data from other sources such as postgis where the connection overhead might be significant
  • With Mapserver running as a fastcgi
  • With concurrent requests as opposed to back-to-back requests

Overall though, my opinion of Mapnik WMS remains high and I’d love to put it in production use in the near future. Stay tuned…

3 responses so far

May 17 2006

Mapnik WMS Server

Published by perrygeo under GIS Tutorials, Python, WMS

A few months ago, Mapnik came onto my radar and I was immediately impressed with the beautiful cartography. But, until recently, it was just a C++ libary with some python bindings that could be used to programmatically build nice map images from shapfiles, geotiffs or postgis layers. There were no common interfaces such as WMS to access mapnik… until last month. Jean Francois Doyon recently added a prototype WMS interface to Mapnik. It runs as a fastcgi script under apache. It is still a bit rough around the edges but the result is well worth a little extra setup effort.

I set up Mapnik as a WMS server recently and would like to share my process and results. This tutorial assumes you already have python, postgresql/postgis, proj4, python imaging library and apache2 already running. The examples are for Ubuntu Dapper Drake.. they may work well on other versions of Ubuntu and Debian but for other *nixes (and certainly windows) many things may need to be tweaked.

First off, we have to install the base mapnik libs. These depend on the boost python bindings and the whole compile process is very simple (if a bit slow) in Ubuntu:

sudo apt-get install \
 libboost-python1.33.1 libboost-python-dev \
 libboost-regex1.33.1 libboost-regex-dev \
 libboost-serialization-dev \
 libboost-signals1.33.1 libboost-signals-dev \
 libboost-thread1.33.1 libboost-thread-dev \
 libboost-program-options1.33.1 libboost-program-options-dev \
 libboost-filesystem1.33.1 libboost-filesystem-dev \
 libboost-iostreams1.33.1 libboost-iostreams-dev
cd ~/src
svn checkout svn://svn.berlios.de/mapnik/trunk mapnik
cd mapnik
python scons/scons.py PYTHON=/usr/bin/python PGSQL_INCLUDES=/usr/local/include/postgresql \
  PGSQL_LIBS=/usr/local/lib/postgresql BOOST_INCLUDES=/usr/include/boost BOOST_LIBS=/usr/lib
sudo python scons/scons.py install PYTHON=/usr/bin/python PGSQL_INCLUDES=/usr/local/include/postgresql \
  PGSQL_LIBS=/usr/local/lib/postgresql BOOST_INCLUDES=/usr/include/boost BOOST_LIBS=/usr/lib
sudo ldconfig

Now we have to set up some additional libs in order to run the WMS:

cd ~/src
wget http://easynews.dl.sourceforge.net/sourceforge/jonpy/jonpy-0.06.tar.gz
tar -xzvf jonpy-0.06.tar.gz
cd jonpy-0.06/
sudo python setup.py install
# copy the ogcserver stuff into its own dir
mkdir /opt/mapnik; cd /opt/mapnik
cp ~/src/mapnik/utils/ogcserver/* .

Now you’ll want to edit the ogcserver.conf file and change the following lines. The module is essentially the name of a python file (minus the .py extension) that we’ll create later. The height and width just cutoff the maximum possible image size that can be requested.

	module=worldMapFactory
	maxheight=2048
	maxwidth=2048

Create our “map factory” module defining data sources, styles, etc.( worldMapFactory.py ). Most of this configuration is explained in the mapnik docs and well-commented examples. One thing to note is that the shapefile must be specified without the .shp extension :

from mapnik.ogcserver.WMS import BaseWMSFactory
from mapnik import *

class WMSFactory(BaseWMSFactory):

        def __init__(self):
                BaseWMSFactory.__init__(self)
                sty = Style()

		rl = Rule()
		rl.symbols.append(PolygonSymbolizer(Color(248,216,136)))
		rl.symbols.append(LineSymbolizer(Color(0,0,0),1))
                sty.rules.append( rl )

		self.register_style('style1', sty)

                lyr = Layer(name='world_borders', type='shape', \
                            file='/opt/data/world_borders/world_borders')

                lyr.styles.append('style1')
                self.register_layer(lyr)
                self.finalize()

Now we need to set up apache2 to handle fastcgi:

sudo apt-get install libapache2-mod-fcgid
sudo a2enmod fcgid

… and add some config lines to the apache config files, usually /etc/apache/httpd.conf but, in the case of this Ubuntu install, /etc/apache2/sites-enabled/default :


        ScriptAlias /fcgi-bin/ /usr/lib/fcgi-bin/
        < Directory "/usr/lib/fcgi-bin" >
                AllowOverride All
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Order allow,deny
                Allow from all
                SetHandler fastcgi-script
        < Directory>

Create the fast-cgi directory refered to by apache

sudo mkdir /usr/lib/fcgi-bin

Now create the actual server script as /usr/lib/fcgi-bin/wms

#!/usr/bin/env python

# Your mapnik dir containing the map factory
# must be in the python path!

import sys
sys.path.append('/opt/mapnik')

from mapnik.ogcserver.cgiserver import Handler
import jon.fcgi as fcgi

class WMSHandler(Handler):
    configpath = '/opt/mapnik/ogcserver.conf'

fcgi.Server({fcgi.FCGI_RESPONDER: WMSHandler}).run()

Finally restart the apache server

sudo /etc/init.d/apache2 force-reload

Now you can access it with a WMS request like so:

http://localhost/fcgi-bin/wms?VERSION=1.1.1&REQUEST=GetMap&LAYERS=world_borders&
FORMAT=image/png&SRS=EPSG:4326&STYLES=&BBOX=-81.54375,-58.3125,-59.04375,-47.0625&
EXCEPTIONS=application/vnd.ogc.se_inimage&width=600&height=300

Compare the linework with a comparable WMS service with UMN Mapserver on the backend. I’ll let the results speak for themselves…

Even if it’s map rendering is smooth, Mapnik’s WMS server is still a bit rough around the edges:

  • It does not support GetFeatureInfo requests
  • The server has trouble with extra parameters. For instance some WMS clients like mapbuilder like to
    tag on an extra ‘UNIQUEID’ parameter to the URL and this causes an unnecessary error with mapnik’s WMS server.
  • Mapnik intself does not support reprojection
  • It only supports shapefiles, geotiffs and postgis layers.

The readme.txt file in docs/ogcserver/ directory of the recent mapnik SVN checkout has a full list of known features and caveats so refer to them for the complete story.

But, all in all, I am very impressed with the quality of the Mapnik WMS server. I figured that, since Mapnik’s goal has been high-quality cartographic output, speed would be sacrificed but I didn’t notice any significant lag; on the contrary I think it was actually about on-par with Mapserver running as a CGI. If it was any slower, I didn’ t notice it immediately. But then again it was only working with a relatively small shapefile and I was the only user. I’d like to do more rigourous stress tests on the Mapnik WMS to see how it compares to Mapserver and Geoserver under varying loads with greater volumes of data.

7 responses so far

Apr 01 2006

LIDAR data processing with open source tools

Published by perrygeo under GIS Tutorials, GRASS, Python

LIDAR data is certainly a hot technology these days. LIght Detection And Ranging data can be used to create extremely detailed terrain models but there are lots of barriers to using LIDAR data effectively. USGS Center for LIDAR Information Coordination and Knowledge was put in place to “facilitate data access, user coordination and education of lidar remote sensing for scientific needs“.

Beyond the sheer size of the datasets and the knowledge and hardware required to process them, software is a big issue. In the realm of open-source GIS tools, there are many applications (GRASS being the most prominent) for dealing with elevation point data and processing it into more meaningful products such as elevation DEMs and contours.

Usually the data comes as simple ASCII text files and the x,y and z values are easily extracted from such a file. But take a look at the USGS data distribution site and you’ll notice some of the datasets are distributed as LAS binary files. It makes sense to store such massive datasets in binary so I started looking for some LAS conversion tools.So after some searching, I found a bunch of proprietary products for working with LAS but no open source tools. Luckily, the format is well documented thanks to the efforts by the ASPRS to make it an open specification.

So dusting off my notes about parsing binary files in python, I set out to create a python module for extracting LIDAR data from LAS files. The LAS format contains a header which needs to be parsed first in order to read the point cloud. Once you have the header info, you can scan your way through the dataset to pick out the x,y,z values.

Here’s an example of the python interface that will read the first 10,000 points into a 2D shapefile with the elevation as a attribute in the dbf:

import pylas
infile = 'sanand000001.las'
outfile = 'lidar.shp'
header = pylas.parseHeader(infile)
pylas.createShp(outfile, header, numpts=10000, rand=False)

The issue I struggled with is the sheer size of these datasets. A USGS quarter quad can contain 10 million points which is an excessive number of points to create, say, a 10 meter DEM over such a small area. Clearly there was a need to extract a subset of this dataset but just taking the points sequentially gives you a subset of the total area. So, by default, pylas randomly scans the data to pull the number of specified points so that the point cloud could cover the entire area (at a much lower point density). Without numpts specified, it will randomly select 1/2000th of the total number.

So the simplified interface to make a more manageable lidar shapefile would be:

header = pylas.parseHeader(infile)
pylas.createShp(outfile, header)

Once the shapefile is created, you can bring it into GRASS to do the processing to generate DEMs, contours and other derived elevation products:

v.in.ogr dsn=lidar.shp layer=lidar output=lidar
g.region vect=lidar
g.region res=10
v.surf.rst input=lidar elev=lidar_dem zcolumn=elev

# Launch the interactive 3D viewer
nviz lidar_dem

Of course the method I just described is very simplistic and does not even come close to utilizing the full potential of the LIDAR point cloud, but it’s a start.

The pylas.py module can be downloaded here. The code has worked for me on the few datasets I’ve tested it with but it should certainly be considered a rough-cut, alpha product. There is much room for improvement and, of course, if you have any suggestions or contributions, please get in touch.

19 responses so far

Jan 20 2006

Geocoding an address list to shapefile

Published by perrygeo under GIS Tutorials, Python

Most commercial software comes with fairly elaborate geocoding engines and there are nice geocoding services on the web that can do one-at-a-time geocoding but the recent post at Spatially Adjusted pointed out a great free resource for batch geocoding named, conveniently enough, Batch Geocode. Just give it a list of tab or pipe delimited addresses and it outputs a table with your original data plus a lat/long for every row.

I have been working on a python script to convert text files into point shapefiles and thought this would be a great chance to put it to work. The only dependency is a recent version of python with the ogr module (see FWTools for an easy to install package for windows or linux).

First, I take a list of cities and feed it to batchgeocode.com (a very nice feature is that the yahoo geocoder, on which batchgeocode is based, does not require street level addresses):

City|State
Santa Barbara|CA
Arcata|CA
New Milford|CT
Blacksburg|VA

After running the geocoder, I get back a table with lat/longs:

City|State|lat|long|precision
Santa Barbara|CA|34.419769|-119.696747|city
Arcata|CA|40.866261|-124.081673|city
New Milford|CT|41.576599|-73.408821|city
Blacksburg|VA|37.229359|-80.413963|city

Copy and paste that into a text file and add a second header row that defines the data type for each column. It would be possible to autodetect the column types but there are cases where a string of numeric digits should be kept as a string (for instance the zipcode 06776 would become 6776 if it was read as an integer).The possible column types are string, integer,real, x and y with x and y representing the coordinates.

City|State|lat|long|precision
string|string|y|x|string
Santa Barbara|CA|34.419769|-119.696747|city
Arcata|CA|40.866261|-124.081673|city
New Milford|CT|41.576599|-73.408821|city
Blacksburg|VA|37.229359|-80.413963|city

Now run the txt2shp.py utility. The input and output parameters are self-explanatory and the d parameter defines the string used as a delimiter. Notice that the syntax follows the GRASS standard of parameter=value:

txt2shp.py input=cities.txt output=cities.shp d='|'

And now you’ve got a shapefile of the geocoded cities!

Cities Shapefile

The txt2shp.py script can be downloaded here. Try it out and let me know how it’s working for you.

Update: In order to generate a .prj file for your output shapefile, you can use the epsg_tr.py utility if you know the EPSG code. Batch Geocoder returns everything in lat/long (presumably with a WGS84 datum?) so you can use EPSG code 4326:

epsg_tr.py -wkt 4326 > cities.prj

4 responses so far

« Prev