Archive for the 'GIS Tutorials' Category

Jan 26 2008

Impervious surface deliniation with GRASS

Published by perrygeo under GIS Tutorials, GRASS

Watersheds with lots of roads, buildings, parking lots, rock surfaces, compacted dirt, etc tend to prevent inflitration and cause rapid runoff in response to rainfall. This poses a number of challenges for managing stormwater and water quality. Not surprisingly, the percentage of hydrologically impervious surface in a given watershed is an important factor in many hydrologic models. Using standard aerial photography and GRASS, it’s a relatively simple process to create an impervious surface map using supervised classification.

First find an aerial photo. I grabbed a NAIP image from CASIL but you might want to try using OpenAerialMap. The red, green and blue visible bands are usually sufficient for differentiating between impervious and pervious land use types… For distinguishing different types of vegetation you might want to use a multispectral imagery source with non-visible bands (ie near infrared) but this is usually lower resolution (eg. 30 meter pixels of Landsat) or much more expensive.

Next we jump into GRASS and import our image into a new location:

r.in.gdal -e input=naip.img output=naip location=impervious

Exit and log back into your new location. If you look at the imported rasters, you’ll see three rasters, not one. Each band (R, G and B) gets imported separately.

GRASS 6.3.cvs (impervious):~/>  g.list rast
raster files available in mapset permanent:
naip.1 naip.2 naip.3 

We need to indicate that these rasters form a logical group

i.group group=naip2 subgroup=naip2 input=naip.3@PERMANENT,naip.2@PERMANENT,naip.1@PERMANENT
i.target -c group=naip2

At any time you can list the rasters in a given group/subgroup to confirm.

i.group -l -g group=naip2 subgroup=naip2

Now the real heart of the process. We need to define “training areas” which are polygons around representative land use types. I used QGIS to load the aerial photo and create a new polygon layer with an integer attribute field called vegnum. I digitized a few rocks, paved areas, rooftops and dirt roads to represent the impervious areas to which I assigned vegnum=1. Then I selected some grasslands, forests, lakes and chaparral and assigned 2 as the vegnum. The next step is to load the polygon data into GRASS and rasterize it (in retrospect it would have just been easier to create the grass vector layer from scratch in QGIS to avoid the import step). Note that the vegnum field is specified as the raster value column.

v.in.ogr -o dsn=./training/train1_utm/train1_utm.shp output=train1 layer=train1_utm min_area=0.0001 type=boundary snap=-1
v.to.rast input=train1 output=train1 use=attr column=vegnum type=point,line,area layer=1 value=1 rows=4096

Next we use i.gensig to generate a spectral signature (the statistical profile; mean and covariance matrix of the input pixels) for the training areas.

i.gensig trainingmap=train1 group=naip2 subgroup=naip2 signaturefile=naip2_train1.sig

Now that we have a signature of impervious vs. non-impervious surfaces, we can use the maximum likelihood method to classify each pixel into the highest probability category.

i.maxlik group=naip2 subgroup=naip2 sigfile=naip2_train1.sig class=imperv

You might notice a slight speckled, noisy appearance due to things like shadows, reflections or imperfect training areas. Usually these small 1-pixel deviations are not interesting enough to keep so we can smooth out the image taking the mode (most comon) cell in a 3×3 window.

r.neighbors input=imperv output=imperv_mode method=mode size=3 

And here are the results… calculating imperviousness will most likely be an iterative process so be prepared to evaluate the output, tweak the training areas and rerun the process a few times. Once you’re happy with the results, you can use zonal statistics with a tool like starspan to find the percent imperviousness of your watersheds or other regions.

6 responses so far

Oct 20 2007

Turning Ubuntu into a GIS workstation

It just keeps getting easier and easier to get a fully functional open source GIS workstation up and running thanks to Ubuntu. The following instructions will take your vanilla installation of Ubuntu 7.10 and add the following top-notch desktop GIS applications:

  • Postgresql/PostGIS : a relational database with vector spatial data handling
  • GRASS : A full blown GIS analysis toolset
  • Quantum GIS: A user-friendly graphical GIS application
  • GDAL, Proj, Geos : Libraries and utilities for processing spatial data
  • Mapserver : web mapping program and utilites
  • Python bindings for QGIS, mapserver and GDAL
  • GPSBabel : for converting between various GPS formats
  • R : a high-end statistics package with spatial capabilities
  • GMT : the Generic Mapping Tools for automated high-quality map output

While this is not a comprehensive list of open source GIS software, these packages cover most of my needs. If you want to live on the bleeding edge and have to have the absolute latest versions, you’ll be better off installing these from source. But for those of us that want a stable and highly functional GIS workstation with minimal fuss, this is the way to go:

  1. Go to System > Administration > Software Sources and make sure the universe and multiverse repositories are turned on. Close the window and the list of available software packages will be refreshed.
  2. Open up a terminal (ie the command line) via Applications > Accessories > Terminal and type the following:

    sudo apt-get -y install qgis grass qgis-plugin-grass mapserver-bin gdal-bin cgi-mapserver \
    python-qt4 python-sip4 python-gdal python-mapscript gmt gmt-coastline-data \
    r-recommended gpsbabel shapelib libgdal1-1.4.0-grass

    The sudo part indicates that the command will be run as the administrator user, apt-get -y install is the command telling it to install the list of packages and answer yes to any questions that pop up.

  3. There is one package that is worth upgrading to the latest and greatest - Quantum GIS. The latest version (0.9) is due out very shortly and has the ability to write plugins using the python programming language. A big plus!

    Download the latest build from http://qgis.org/uploadfiles/testbuilds/qgis0.9.0.debs_ubuntu_gutsy.tar.gz and extract it ( right-click > Extract Here ). In the directory you’ll see 4 .deb files, only 3 of which you’ll need unless you plan on doing any development work.

    Double click libqgis1_0.9.0_i386.deb and you’ll get a message saying an older version is available from directly from ubuntu. We already know this so just close and ignore it. Click Install Package and wait for it to complete then close out.

    Repeat for qgis_0.9.0_i386.deb and qgis-plugin-grass_0.9.0_i386.deb (in that order).

And there we have it, about 15 minutes depending on your internet speed and you’ve installed a high-end GIS workstation built completely on free and open source software.

20 responses so far

Jun 25 2006

Mapserver Include

Published by perrygeo under GIS Tutorials

If you mange even a small number of Mapserver sites, eventually you notice that you use a number of identical layers in multiple mapfiles. The way this is typically done is to copy and paste the LAYER definition into each mapfile. But inevitably you’ll need to change the styling or the data source and you have to manually go through each mapfile to sync the changes. Wouldn’t it be nice to define the layer in a single file and use it in many mapfiles?

While Mapserver has no concept of an “include”, the C preprocessor (cpp) does. This is mentioned on the Mapserver list every time the subject of includes comes up. Still I have yet to find an actual example so I thought I’d share my notes on how I accomplish a mapserver include:

  1. Create your mapfile as usual but leave out any LAYER definitions that you wish to share amongst mapfiles. Instead use something like :

    #include “landsat.layer”

  2. The C Preprocessor doesn’t deal well with “#” which is the mapfile’s chosen comment charachter. Instead replace with “##” to indicate a comment
  3. Save this pseudo-mapfile as mymap.template
  4. Create a file in the same directory called landsat.layer with the LAYER block.
  5. Run the template through the preprocessor to generate the real mapfile :
    cpp -P -C -o mymap.map mymap.template 

The next step would be to script the preprocessing of all your mapfiles so that changing a layer definition in multiple mapfiles was as simple as changing the *.layer file and running the script.

No responses yet

May 17 2006

Mapnik WMS Server

Published by perrygeo under GIS Tutorials, Python, WMS

A few months ago, Mapnik came onto my radar and I was immediately impressed with the beautiful cartography. But, until recently, it was just a C++ libary with some python bindings that could be used to programmatically build nice map images from shapfiles, geotiffs or postgis layers. There were no common interfaces such as WMS to access mapnik… until last month. Jean Francois Doyon recently added a prototype WMS interface to Mapnik. It runs as a fastcgi script under apache. It is still a bit rough around the edges but the result is well worth a little extra setup effort.

I set up Mapnik as a WMS server recently and would like to share my process and results. This tutorial assumes you already have python, postgresql/postgis, proj4, python imaging library and apache2 already running. The examples are for Ubuntu Dapper Drake.. they may work well on other versions of Ubuntu and Debian but for other *nixes (and certainly windows) many things may need to be tweaked.

First off, we have to install the base mapnik libs. These depend on the boost python bindings and the whole compile process is very simple (if a bit slow) in Ubuntu:

sudo apt-get install \
 libboost-python1.33.1 libboost-python-dev \
 libboost-regex1.33.1 libboost-regex-dev \
 libboost-serialization-dev \
 libboost-signals1.33.1 libboost-signals-dev \
 libboost-thread1.33.1 libboost-thread-dev \
 libboost-program-options1.33.1 libboost-program-options-dev \
 libboost-filesystem1.33.1 libboost-filesystem-dev \
 libboost-iostreams1.33.1 libboost-iostreams-dev
cd ~/src
svn checkout svn://svn.berlios.de/mapnik/trunk mapnik
cd mapnik
python scons/scons.py PYTHON=/usr/bin/python PGSQL_INCLUDES=/usr/local/include/postgresql \
  PGSQL_LIBS=/usr/local/lib/postgresql BOOST_INCLUDES=/usr/include/boost BOOST_LIBS=/usr/lib
sudo python scons/scons.py install PYTHON=/usr/bin/python PGSQL_INCLUDES=/usr/local/include/postgresql \
  PGSQL_LIBS=/usr/local/lib/postgresql BOOST_INCLUDES=/usr/include/boost BOOST_LIBS=/usr/lib
sudo ldconfig

Now we have to set up some additional libs in order to run the WMS:

cd ~/src
wget http://easynews.dl.sourceforge.net/sourceforge/jonpy/jonpy-0.06.tar.gz
tar -xzvf jonpy-0.06.tar.gz
cd jonpy-0.06/
sudo python setup.py install
# copy the ogcserver stuff into its own dir
mkdir /opt/mapnik; cd /opt/mapnik
cp ~/src/mapnik/utils/ogcserver/* .

Now you’ll want to edit the ogcserver.conf file and change the following lines. The module is essentially the name of a python file (minus the .py extension) that we’ll create later. The height and width just cutoff the maximum possible image size that can be requested.

	module=worldMapFactory
	maxheight=2048
	maxwidth=2048

Create our “map factory” module defining data sources, styles, etc.( worldMapFactory.py ). Most of this configuration is explained in the mapnik docs and well-commented examples. One thing to note is that the shapefile must be specified without the .shp extension :

from mapnik.ogcserver.WMS import BaseWMSFactory
from mapnik import *

class WMSFactory(BaseWMSFactory):

        def __init__(self):
                BaseWMSFactory.__init__(self)
                sty = Style()

		rl = Rule()
		rl.symbols.append(PolygonSymbolizer(Color(248,216,136)))
		rl.symbols.append(LineSymbolizer(Color(0,0,0),1))
                sty.rules.append( rl )

		self.register_style('style1', sty)

                lyr = Layer(name='world_borders', type='shape', \
                            file='/opt/data/world_borders/world_borders')

                lyr.styles.append('style1')
                self.register_layer(lyr)
                self.finalize()

Now we need to set up apache2 to handle fastcgi:

sudo apt-get install libapache2-mod-fcgid
sudo a2enmod fcgid

… and add some config lines to the apache config files, usually /etc/apache/httpd.conf but, in the case of this Ubuntu install, /etc/apache2/sites-enabled/default :


        ScriptAlias /fcgi-bin/ /usr/lib/fcgi-bin/
        < Directory "/usr/lib/fcgi-bin" >
                AllowOverride All
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Order allow,deny
                Allow from all
                SetHandler fastcgi-script
        < Directory>

Create the fast-cgi directory refered to by apache

sudo mkdir /usr/lib/fcgi-bin

Now create the actual server script as /usr/lib/fcgi-bin/wms

#!/usr/bin/env python

# Your mapnik dir containing the map factory
# must be in the python path!

import sys
sys.path.append('/opt/mapnik')

from mapnik.ogcserver.cgiserver import Handler
import jon.fcgi as fcgi

class WMSHandler(Handler):
    configpath = '/opt/mapnik/ogcserver.conf'

fcgi.Server({fcgi.FCGI_RESPONDER: WMSHandler}).run()

Finally restart the apache server

sudo /etc/init.d/apache2 force-reload

Now you can access it with a WMS request like so:

http://localhost/fcgi-bin/wms?VERSION=1.1.1&REQUEST=GetMap&LAYERS=world_borders&
FORMAT=image/png&SRS=EPSG:4326&STYLES=&BBOX=-81.54375,-58.3125,-59.04375,-47.0625&
EXCEPTIONS=application/vnd.ogc.se_inimage&width=600&height=300

Compare the linework with a comparable WMS service with UMN Mapserver on the backend. I’ll let the results speak for themselves…

Even if it’s map rendering is smooth, Mapnik’s WMS server is still a bit rough around the edges:

  • It does not support GetFeatureInfo requests
  • The server has trouble with extra parameters. For instance some WMS clients like mapbuilder like to
    tag on an extra ‘UNIQUEID’ parameter to the URL and this causes an unnecessary error with mapnik’s WMS server.
  • Mapnik intself does not support reprojection
  • It only supports shapefiles, geotiffs and postgis layers.

The readme.txt file in docs/ogcserver/ directory of the recent mapnik SVN checkout has a full list of known features and caveats so refer to them for the complete story.

But, all in all, I am very impressed with the quality of the Mapnik WMS server. I figured that, since Mapnik’s goal has been high-quality cartographic output, speed would be sacrificed but I didn’t notice any significant lag; on the contrary I think it was actually about on-par with Mapserver running as a CGI. If it was any slower, I didn’ t notice it immediately. But then again it was only working with a relatively small shapefile and I was the only user. I’d like to do more rigourous stress tests on the Mapnik WMS to see how it compares to Mapserver and Geoserver under varying loads with greater volumes of data.

7 responses so far

Apr 01 2006

LIDAR data processing with open source tools

Published by perrygeo under GIS Tutorials, GRASS, Python

LIDAR data is certainly a hot technology these days. LIght Detection And Ranging data can be used to create extremely detailed terrain models but there are lots of barriers to using LIDAR data effectively. USGS Center for LIDAR Information Coordination and Knowledge was put in place to “facilitate data access, user coordination and education of lidar remote sensing for scientific needs“.

Beyond the sheer size of the datasets and the knowledge and hardware required to process them, software is a big issue. In the realm of open-source GIS tools, there are many applications (GRASS being the most prominent) for dealing with elevation point data and processing it into more meaningful products such as elevation DEMs and contours.

Usually the data comes as simple ASCII text files and the x,y and z values are easily extracted from such a file. But take a look at the USGS data distribution site and you’ll notice some of the datasets are distributed as LAS binary files. It makes sense to store such massive datasets in binary so I started looking for some LAS conversion tools.So after some searching, I found a bunch of proprietary products for working with LAS but no open source tools. Luckily, the format is well documented thanks to the efforts by the ASPRS to make it an open specification.

So dusting off my notes about parsing binary files in python, I set out to create a python module for extracting LIDAR data from LAS files. The LAS format contains a header which needs to be parsed first in order to read the point cloud. Once you have the header info, you can scan your way through the dataset to pick out the x,y,z values.

Here’s an example of the python interface that will read the first 10,000 points into a 2D shapefile with the elevation as a attribute in the dbf:

import pylas
infile = 'sanand000001.las'
outfile = 'lidar.shp'
header = pylas.parseHeader(infile)
pylas.createShp(outfile, header, numpts=10000, rand=False)

The issue I struggled with is the sheer size of these datasets. A USGS quarter quad can contain 10 million points which is an excessive number of points to create, say, a 10 meter DEM over such a small area. Clearly there was a need to extract a subset of this dataset but just taking the points sequentially gives you a subset of the total area. So, by default, pylas randomly scans the data to pull the number of specified points so that the point cloud could cover the entire area (at a much lower point density). Without numpts specified, it will randomly select 1/2000th of the total number.

So the simplified interface to make a more manageable lidar shapefile would be:

header = pylas.parseHeader(infile)
pylas.createShp(outfile, header)

Once the shapefile is created, you can bring it into GRASS to do the processing to generate DEMs, contours and other derived elevation products:

v.in.ogr dsn=lidar.shp layer=lidar output=lidar
g.region vect=lidar
g.region res=10
v.surf.rst input=lidar elev=lidar_dem zcolumn=elev

# Launch the interactive 3D viewer
nviz lidar_dem

Of course the method I just described is very simplistic and does not even come close to utilizing the full potential of the LIDAR point cloud, but it’s a start.

The pylas.py module can be downloaded here. The code has worked for me on the few datasets I’ve tested it with but it should certainly be considered a rough-cut, alpha product. There is much room for improvement and, of course, if you have any suggestions or contributions, please get in touch.

19 responses so far

Mar 26 2006

WMS layers: My Top Ten

Published by perrygeo under GIS Tutorials, WMS

Web Mapping Services (WMS) are not always my prefered option for accessing data; relying on a remote server to generate a pretty picture of the data is hardly a substitute for having the raw data in hand. But for many cases, I just need a decent looking basemap image and don’t want to download gigabytes of data, especially if that data is updated frequently.

Software like GeoServer and Mapserver are making it easier to publish data via WMS and the number of WMS servers is surely growing… but how do you find them? There is no central registry for WMS servers but efforts like the refractions research ogc survey, mapdex and a few google tricks are making it easier to find data distributed via WMS. After many hours digging through WMS services to find the ones that suite my mapping needs, I’ve come across a number of gems that I use time and time again. Hopefully this will inspire some others to share their secret stash of WMS servers!

(Update: Anything Geospatial has a great link to a well-organized WMS server list for public use. Nice. )

You should be able to provide the online resource URL to your favorite WMS client software (my personal choice is openjump) and the client should display the list of layers available from that service.

If you’re contructing WMS URLs “by hand” or in a browser, you can do a capabilities request (the online resource URL with service=WMS?request=GetCapabilities appended to it) which will return an XML document describing the available layers, image formats, projections,etc. Take a look at the image src for any of the thumbnails below to see how the map request is constructed.

  1. TerraServer Digital Raster Graphic (DRG): USGS Topo Quads
    Online Resource URL : http://terraservice.net/ogcmap.ashx?
    Layer Name : DRG

  2. TerraServer Digital Ortho Photo Quads (DRG): Black and white aerial photos for the US
    Online Resource URL : http://terraservice.net/ogcmap.ashx?
    Layer Name : DOQ

  3. NASA Landsat Imagery
    The Landsat mosaic is available in fase color (default) or in natural color (style=visual) as shown below.
    Online Resource URL : http://onearth.jpl.nasa.gov/wms.cgi?
    Layer Name : global_mosaic

  4. 45-minute Weather Radar Images (NEXRAD Base Reflectivity).
    Since this is a dynamic data source, the image below may look really boring (ie blank) if there’s no storms over the Continental US.
    Online Resource URL : http://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r.cgi?
    Layer Name : nexrad-n0r-m45m

  5. USGS National Landcover
    The 30-meter natial landcover dataset. USGS is nice enough to provide a legend, of course.
    Online Resource URL : http://gisdata.usgs.net/servlet/com.esri.wms.Esrimap?ServiceName=USGS_WMS_NLCD&
    Layer Name : US_NLCD

  6. USGS National Elevation - Shaded Relief
    Online Resource URL : http://gisdata.usgs.net:80/servlet/com.esri.wms.Esrimap?servicename=USGS_WMS_NED&
    Layer Name : US_NED_Shaded_Relief

  7. USGS Reference Maps
    Online Resource URL : http://gisdata.usgs.net:80/servlet/com.esri.wms.Esrimap?servicename=USGS_WMS_REF&
    Layer Names : States,County,Roads,Route_Numbers,Streams,Federal_Lands

  8. Life Mapper
    Besides the standard WMS paramters, some services can take extra parameters in order to render a map. In this excellent service, LifeMapper requires that you provide the species name and it will render maps of known species locations and modelled distributions. Here’s an example of the distribution of Black Bear (ie. Ursus americanus) over central california
    Online Resource URL : http://www.lifemapper.org/Services/WMS/?ScientificName=Ursus%20americanus&
    Layer Names : Species Distribution Models,Political Boundaries,Species Data Points

  9. MODIS Daily Satellite Imagery
    Online Resource URL : http://wms.jpl.nasa.gov/wms.cgi?
    Layer Names : daily_terra, daily_aqua

    Terra Aqua
  10. SRTMPlus 90 Meter DEM
    The image below doesn’t make for a very good basemap OR a very good DEM for analytical purposes since all the values are scaled to an 8-bit color depth. However, JPL also offers this layer as an integer (16bit) GeoTIFF (Use format=image/geotiff and styles=short_int), so this can be a valuable way to quickly grab a DEM for a given region.
    Online Resource URL : http://wms.jpl.nasa.gov/wms.cgi?
    Layer Names : srtmplus

If you’d like to view these layers interactively, here’s a mapserver application which “cascades” the above WMS layers through a single interface. If you’re interested in setting up these layers in a mapserver application, check out the WMS Mapfile for some examples.

15 responses so far

Feb 17 2006

StarSpan for vector-on-raster analysis

Published by perrygeo under GIS Tutorials

It’s amazing how many excellent open source GIS applications are out there just waiting to be discovered. I’ve been working with open source GIS for over 3 years now and I still find new and interesting software on a regular basis. The latest “Why haven’t I heard of this before?” discovery came from the GRASS mailing list discussion on StarSpan, a tool developed at University of California at Davis “designed to bridge the raster and vector worlds of spatial analysis using fast algorithms for pixel level extraction from geometry features“.

Our research project for the Ecosystem Based Management group at UCSB is in need of this exact tool in order to extract raster statistics based on a vector watersheds layer. ArcGIS and GRASS both have some of the capabilities we need through the Zonal_Statistics and v.rast.stats functions respectively. However they have their limitations and neither really handles categorical raster summaries by polygon. StarSpan looks like a more efficient option in terms of speed, scriptability and capabilities.

Installation is very smooth. It requires a recent version of GDAL (>= 1.2.6) and GEOS (>= 2.1.2). Once the dependencies are met, compilation on a *nix system is as easy as configure, make, make install (There are also Windows binaries available). There is a single command line interface for all the functionality and StarSpan is able to handle all GDAL rasters and OGR vectors.

For classified rasters such as a land cover raster, we’d like to get the number of pixels for each landcover class by watershed. StarSpan creates a nice, normalized csv with three columns; The vector feature id, the raster value, and the number of pixels. There will be up to (number of features X number of classes) rows.

starspan --vector watershed.shp --raster landcover.tif --count-by-class landcover_by_watershed.csv

In order to find the percentage of a given raster class for each watershed, you can bring the csv into a relational database and do a quick SQL query. Here’s an example of finding the percentage of cropland (class value is 12) for each watershed:

SELECT t.fid AS fid, (t.count::numeric / s.total::numeric) * 100 AS percentage_cropland
FROM landcover_by_watershed t,
               (SELECT fid, sum(count) AS total
                FROM lancover_by_watershed
                GROUP BY fid) as s
WHERE t.fid = s.fid
AND t.class = 12;

Which gives us…

 fid |      percentage_cropland
-----+------------------------------------------------
   1 | 28.571428571428571429
   2 | 71.428571428571428571
   3 | 36.363636363636363636
   4 | 63.636363636363636364

For continuous surfaces such as elevations and slopes, we’ll need to get quantitative statistics of those rasters by watershed. StarSpan can easily generate averages, mode, standard deviation, min and max:

starspan --vector watershed.shp --raster slope.tif --stats slope_stats.csv avg mode stdev min max

Which outputs a csv with one row per feature identified by feature id and each stat as a column:

FID,numPixels,avg_Band1,mode_Band1,stdev_Band1,min_Band1,max_Band1
1,25921,34.694822,38.917000,14.491952,0.347465,66.241035
2,21755,7.965552,0.000000,5.484245,0.000000,42.017155
...

While I can confirm that these small test cases work very quickly and give us pretty much the exact outputs we need, it will be interesting to see how well it stacks up to ArcGIS and GRASS when it comes to cranking out the big datasets. We’ll likely try all three methods and I’ll make sure to post the results.

Oh and the comparison between StarSpan and GRASS may become at moot point in the future since there is talk about integrating it with the GRASS project. While a GRASS module would be nice, not everyone has GRASS installed so I would hope the stand-alone version is still maintained since it can deal with pretty much any vector or raster data source.

3 responses so far

Jan 20 2006

Geocoding an address list to shapefile

Published by perrygeo under GIS Tutorials, Python

Most commercial software comes with fairly elaborate geocoding engines and there are nice geocoding services on the web that can do one-at-a-time geocoding but the recent post at Spatially Adjusted pointed out a great free resource for batch geocoding named, conveniently enough, Batch Geocode. Just give it a list of tab or pipe delimited addresses and it outputs a table with your original data plus a lat/long for every row.

I have been working on a python script to convert text files into point shapefiles and thought this would be a great chance to put it to work. The only dependency is a recent version of python with the ogr module (see FWTools for an easy to install package for windows or linux).

First, I take a list of cities and feed it to batchgeocode.com (a very nice feature is that the yahoo geocoder, on which batchgeocode is based, does not require street level addresses):

City|State
Santa Barbara|CA
Arcata|CA
New Milford|CT
Blacksburg|VA

After running the geocoder, I get back a table with lat/longs:

City|State|lat|long|precision
Santa Barbara|CA|34.419769|-119.696747|city
Arcata|CA|40.866261|-124.081673|city
New Milford|CT|41.576599|-73.408821|city
Blacksburg|VA|37.229359|-80.413963|city

Copy and paste that into a text file and add a second header row that defines the data type for each column. It would be possible to autodetect the column types but there are cases where a string of numeric digits should be kept as a string (for instance the zipcode 06776 would become 6776 if it was read as an integer).The possible column types are string, integer,real, x and y with x and y representing the coordinates.

City|State|lat|long|precision
string|string|y|x|string
Santa Barbara|CA|34.419769|-119.696747|city
Arcata|CA|40.866261|-124.081673|city
New Milford|CT|41.576599|-73.408821|city
Blacksburg|VA|37.229359|-80.413963|city

Now run the txt2shp.py utility. The input and output parameters are self-explanatory and the d parameter defines the string used as a delimiter. Notice that the syntax follows the GRASS standard of parameter=value:

txt2shp.py input=cities.txt output=cities.shp d='|'

And now you’ve got a shapefile of the geocoded cities!

Cities Shapefile

The txt2shp.py script can be downloaded here. Try it out and let me know how it’s working for you.

Update: In order to generate a .prj file for your output shapefile, you can use the epsg_tr.py utility if you know the EPSG code. Batch Geocoder returns everything in lat/long (presumably with a WGS84 datum?) so you can use EPSG code 4326:

epsg_tr.py -wkt 4326 > cities.prj

4 responses so far

Dec 11 2005

Tissot Indicatrix - Examining the distortion of map projections

Published by perrygeo under GIS Tutorials

The Tissot Indicatrix is a valuable tool for showing the distortions caused by map projections. It is essentially a series of imaginary polygons that represent perfect circles of equal area on a 3D globe. When projected onto a 2D map, their shape, size and/or angles will be distorted accordingly allowing you to quickly assess the projection’s accuracy for a given part of the globe.

I’ve seen great Tissot diagrams in text books but I wanted to create the indicatrix as a polygon dataset so that I could project and overlay it with other data in a GIS. To do this I wrote a python script using the OGR libraries, which I will revist in a minute. But first the visually interesting part:

Here is a world countries shapefile overlaid with the Tissot circles in geographic (unprojected lat-long) coordinates:

Latlong tissot

Next I reprojected the datasets to the Mercator projection using ogr2ogr:

ogr2ogr -t_srs “+proj=merc” countries_merc.shp countries_simpl.shp countries_simpl
ogr2ogr -t_srs “+proj=merc” tissot_merc.shp tissot.shp tissot

Note that the angles are perfectly preserved (the trademark feature of the Mercator projection) but the size is badly distorted.

Mercator tissot

Now lets try Lambert Azimuthal Equal Area (in this case the US National Atlas standard projection - EPSG code 2163).

ogr2ogr -t_srs “epsg:2163″ countries_lambert.shp countries_simpl.shp countries_simpl
ogr2ogr -t_srs “epsg:2163″ tissot_lambert.shp tissot.shp tissot

This is a great projection for preserving area but get outside the center and shapes become badly distorted:

LAEA tissot

The best way to experiment with this is to bring the tissot.shp file into ArcMap (or another program that supports on-the-fly projection) and play with it in real time. The distortions of every projection just leap off the screen…

OK, now for the geeky part. Here’s the python/OGR script used to create the tissot shapefile. The basic process is to lay out a grid of points across the globe in latlong, loop through the points and reproject each one to an orthographic projection centered directly on the point, buffer it, then reproject to latlong. The end result is a latlong shapefile representing circles of equal area on a globe.

#!/usr/bin/env python
# Tissot Circles
# Represent perfect circles of equal area on a globe
# but will appear distorted in ANY 2d projection.
# Used to show the size, shape and directional distortion
# by Matthew T. Perry
# 12/10/2005

import ogr
import os
import osr

output = 'tissot.shp'
debug = False

# Create the Shapefile
driver = ogr.GetDriverByName('ESRI Shapefile')
if os.path.exists(output):
        driver.DeleteDataSource(output)
ds = driver.CreateDataSource(output)
layer = ds.CreateLayer(output, geom_type=ogr.wkbPolygon)

# Set up spatial reference systems
latlong = osr.SpatialReference()
ortho = osr.SpatialReference()
latlong.ImportFromProj4('+proj=latlong')

# For each grid point, reproject to ortho centered on itself,
# buffer by 640,000 meters, reproject back to latlong,
# and output the latlong ellipse to shapefile
for x in range(-165,180,30):
    for y in range (-60,90,30):
        f= ogr.Feature(feature_def=layer.GetLayerDefn())
        wkt = 'POINT(%f %f)' % (x, y)
        p = ogr.CreateGeometryFromWkt(wkt)
        p.AssignSpatialReference(latlong)
        proj = '+proj=ortho +lon_0=%f +lat_0=%f' % (x,y)
        ortho.ImportFromProj4(proj)
        p.TransformTo(ortho)
        b = p.Buffer(640000)
        b.AssignSpatialReference(ortho)
        b.TransformTo(latlong)
        f.SetGeometryDirectly(b)
        layer.CreateFeature(f)
        f.Destroy()

ds.Destroy()

15 responses so far

Dec 11 2005

KML to Shapefile Scripting

Published by perrygeo under GIS Tutorials

Christian Spanring has been doing some great work with Google Earth’s KML data format. The latest offering is a fairly robust XSLT stylesheet for transforming KML into GML.

In the article, he mentions ogr2ogr as a method to convert GML to shapefiles so I immediately had to try it out! I came up with a simple bash script, kml2shp.sh, that provides a quick command-line interface:

kml2shp.sh input.kml output.shp

Here’s the step-by-step:

  1. Make sure you have xsltproc (the command-line xslt processor) and OGR installed.
  2. Copy the xslt stylesheet to /usr/local/share/kml2gml/
  3. Create the kml2shp.sh script below (make sure to change the paths to reflect your system, chmod +x it, etc)

#!/bin/bash

if [ $# -ne 2 ]; then
echo “usage: kml2shp.sh input.kml output.shp”
exit
fi

echo “Processing KML file”
sed ’s/ xmlns=\”http\:\/\/earth.google.com\/kml\/2.0\”//’ $1 > /tmp/temp.kml
xsltproc -o /tmp/temp.gml /usr/local/share/kml2gml/kml2gml.xsl /tmp/temp.kml

echo “Creating new Shapefile”
ogr2ogr $2 /tmp/temp.gml myFeature

echo “Cleaning up temp files”
rm /tmp/temp.gml
rm /tmp/temp.kml

echo “New shapefile has been created:”
echo $2

Now as far as I can tell, the XSLT is fairly robust although I’ve only tested it on a few datasets. The wrapper script, however, could use alot of work. Type and error checking would be nice for starters and a better method to remove the xml namespace might be necessary. This is really meant as a starting point.

One potential problem with this technique is that you will most likely get a 3D shapefile (x, y AND z coordinates). Many applications can handle 3D shapefiles but some (QGIS, others?) cannot at the present time. Once the geometry type is known, one could always specify the ogr2ogr “-nlt” parameter to force 2D output. But that’s all for now… let me know if anyone has any suggestions on improving this technique.

2 responses so far

Next »