Feb 18 2010

Lazy raster processing with GDAL VRTs

Published by perrygeo under Uncategorized

No, not lazy as in REST :-) … Lazy as in “Lazy evaluation“:

In computer programming, lazy evaluation is the technique of delaying a computation until the result is required.

Take an example raster processing workflow to go from a bunch of tiled, latlong, GeoTiff digital elevation models to a single shaded relief GeoTiff in projected space:

  1. Merge the tiles together
  2. Reproject the merged DEM (using bilinear or cubic interpolation)
  3. Generate the hillshade from the merged DEM

Simple enough to do with GDAL tools on the command line. Here’s the typical, process-as-you-go implementation:

  1. gdal_merge.py -of GTiff -o srtm_merged.tif srtm_12_*.tif
  2. gdalwarp -t_srs epsg:3310 -r bilinear -of GTiff srtm_merged.tif srtm_merged_3310.tif
  3. gdaldem hillshade srtm_merged_3310.tif srtm_merged_3310_shade.tif -of GTiff

Alternately, we can simulate lazy evaluation by using GDAL Virtual Rasters (VRT) to perform the intermediate steps, only outputting the GeoTiff as the final step.

  1. gdalbuildvrt srtm_merged.vrt srtm_12_0*.tif
  2. gdalwarp -t_srs epsg:3310 -r bilinear -of VRT srtm_merged.vrt srtm_merged_3310.vrt
  3. gdaldem hillshade srtm_merged_3310.vrt srtm_merged_3310_shade2.tif -of GTiff

So what’s the advantage to doing it the VRT way? They both produce exactly the same output raster. Lets compare:

Process-As-You-Go   “Lazy” VRTs
Merge (#1) time 3.1 sec 0.05 sec
Warp (#2) time 7.3 sec 0.10 sec
Hillshade (#3) time 10.5 sec 19.75 sec
Total processing time 20.9 sec 19.9 sec
Intermediate files 2 tifs 2 vrts
Intermediate file size 261 MB 0.005 MB

The Lazy VRT method delays all the computationally-intensive processing until it is actually required. The intermediate files, instead of containing the raw raster output of the actual computation, are XML files which contain the instructions to get the desired output. This allows GDAL to do all the processing in one step (the final step #3). The total processing time is not significantly different between the two methods but in terms of the productivity of the GIS analyst, the VRT method is superior. Imagine working with datasets 1000x this size with many more steps - having to type the command, wait 2 hours, type the next, etc. would be a waste of human resources versus assembling the instructions into vrts then hitting the final processing step when you leave the office for a long weekend.

Additionaly, the VRT method produces only small intermediate xml files instead of having a potentially huge data management nightmare of shuffling around GB (or TB) of intermediate outputs! Plus those xml files serve as an excellent piece of metadata which describe the exact processing steps which you can refer to later or adapt to different datasets.

So next time you have a multi-step raster workflow, use the GDAL VRTs to your full advantage - you’ll save yourself time and disk space by being lazy.

2 responses so far

Dec 16 2009

Peaksware licensing revisted …

Published by perrygeo under Uncategorized

I had previously bitched and moaned about the licensing restrictions on the TrainingPeaks WKO+ software. Truth be told, the reason I was so put off by their crappy licensing scheme was that my cycling training relied so heavily on their software. It was not perfect but it was the best tool available. I’ve since discovered Golden Cheetah which is a viable open-source alternative but it still lags behind WKO+ in many critical features.

Now, fresh in time for the 2010 training season, Peaksware has released a new version 3.0 of WKO+ which, amongst many UI and functionality improvements, has made considerable progress on the licensing front.

We know, our licensing has been a challenge to deal with for our customers in the past, but we’ve always tried to be as helpful as possible getting you back up and running after a hard drive crash or new computer. To remedy this, we’re pleased to announce an all new flexible licensing system. First, with every purchase we now allow you to install WKO+ 3.0 on up to two computers; second, we’ve built an online activation/deactivation system so you are free to move your active licenses from machine to machine. Are you leaving on a 2 week trip? Just de-activate your home computer, activate your laptop, and you’re on your way. When you get home, de-actiavate your laptop, re-activate your desktop and you’re all set.

It ain’t open source (there is still a place in this world for proprietary software if they can push the boundaries and innovate) but the sensitivity to the licensing issue just may have restored my faith in their company.

One response so far

Aug 10 2009

Nice examples of ESRIs geoprocessing python module (9.3)

Published by perrygeo under ESRI, Python

Just thought I’d point out a great presentation about the “new” 9.3 geoprocessing (gp) python module from ESRI.

Ghislain Prince and Elizabeth Flanary do a great job of introduction by examples. The latest gp module is much more pythonic and these examples show how to leverage that to its full advantage. If you try to do this with older gp versions, the code would make most pythonistas cringe. This latest version returns objects and lists, use real booleans, and uses true objects instead of funky string parameters. Basic OO stuff for most python libraries but a big improvement for gp.

Here’s the powerpoint presentation. Thanks to Jamey Rosen for the tip!

No responses yet

Jun 23 2009

Peaksware licensing hell

Published by perrygeo under Uncategorized

I’ve been using Peaksware’s WKO+, a cycling and running training tool to manage data from heart rate monitors, GPS units, power meters, etc. Its a powerful tool with a clunky UI but I’ve gotten used to it.

You pay $100 for a “personal” license. Not a big deal to me since they basically have a monopoly on this software niche. I first installed it on my work computer to test the data from my daily bike commute. Cool it works. Then I went to install it at home since that’s where I’ll be using it. Works ok. I proceed to gather all my fitness data into their proprietary binary format.

Fast forward a few months. I’m reformatting the hard drive on the laptop and want to move all my data and software to my desktop. But installing WKO+ is giving me a headache (”Error: Too many installations”). The registration process takes a hardware fingerprint and your must active it via the web to get a registration code. However, hidden withing their EULA, is a term which dissallows the transfer of license to another computer other than the one to which it was originally installed. The second installation was just an allowance they make to allow for “hard drive crashes” and such.

Since neither of those machines would be available to me, certainly there would be a way to transfer it? After several progressively more desperate communications with Matt Allen at peaksware support, he informed me that there was no way they would transfer the license (the non-transfer clause IS in the EULA after all). I would need to purchase another license simply because I switched computers!

Here is my response:

Basically what you are telling me is that I can no longer use WKO+
without paying again. I get to use the software for a few months and
you revoke my right to use it because I buy a new computer! I am a
paying customer, trying to be totally legit here, willing to support
your business in exchange for a license to use your software and you
insist on screwing me over. Brilliant.

This is one of the most unprofessional and idiotic stances I have ever
seen from a software company. Your intention appears to be to screw
over your paying customers and milk as much cash from them as possible
- you might want to rethink that business model unless you want to
loose customers! I will never endorse, recommend or purchase another
product or service from peaksware nor will any of my family, friends,
teammates or readers once the word gets out about your disrespectful
policies.

There are numerous typical situations where a new copy of the software
would need to be installed including:

* Hard drive failure
* Operating system upgrades
* New computer purchases
* Extended traveling and touring (installing onto a laptop or netbook)

Now I fully understand why your policy is one license per computer. It
makes perfect sense. I have seen plenty of other software with a
similar licensing model. But they also allow to uninstall the software
and re-register it on another computer due to these circumstances.
There is simply no technological reason why you could not implement a
licensing structure that allowed the user more freedom to transfer
licenses while still preventing piracy. As it stands, your licensing
model treats paying customers like criminals if they happen to run
across any one of the above situations.

So, to sum it up - your foolish license policy has lost you one
customer and many future ones.

Good riddance.

So if you want to support a company that treats its paying customers like criminals because they get a new computer, go right ahead and support Peaksware. But anyone who expects to use software that they pay for even if they happen to buy a new computer should steer clear.

The real kicker is that all that work is locked away in their proprietary file format simply because of their draconian licensing. This is the real take home lesson to all software users (not just fitness geeks): If you lock your data away in a proprietary format and are beholden to a single company in order to access it, they can and will screw you. Always insist on open data formats, even if using proprietary software. Oh and always read the EULA carefully before clicking OK!

5 responses so far

Jun 21 2009

Reading XFS partition from Windows

Published by perrygeo under Linux

When I was setting up my linux system a few years ago, I did some research into filesystems and determined that the XFS file system, being particularly proficient in dealing with large files, would be ideal for my home directory. And it was. But the one factor I didn’t consider was portability. Turns out that there is basically no support for XFS in windows.

So how do you access your files from Windows if they are on an XFS partition? I had just shy of 1 TB of data to transfer so using my other linux box and transferring across the network would have taken forever. The solution I came up with is a bit convoluted but it has some real advantages:

1) Install Sun’s VirtualBox.
2) Download an iso for your favorite linux distribution (mine being Ubuntu 9.04)
3) Create a virtual machine from the linux iso
4) Install the VBOxGuestAdditions in the linux virtual machine.
5) Create a Share folder on the windows host and register it with the virtual machine. This will allow you to transfer files from the guest (linux) to the host(windows) You may have to manually mount the drive in the linux guest:

mount -t vboxsf share_name /mnt/share_name

6) Using the windows host cmd line, create a vmdk from the physical drive that your XFS partition resides on. In this case, PhysicalDrive1 corresponds to the second SATA connector. This will allow your guest OS to talk directly with the drive:

cd C:\Program Files\Sun\xVM VirtualBox
VBoxManage.exe internalcommands createrawvmdk
  -filename "C:\Documents and Settings\perry\.VirtualBox\HardDisks\Physical1.vmdk"
  -rawdisk \\.\PhysicalDrive1 -register

Once completed, you should see:

RAW host disk access VMDK file
C:\Documents and Settings\perry\.VirtualBox\HardDisks\Physical1.vmdk created successfully.

7) Make sure to add the physical drive to your list of hard drives in the linux guest options. Restart the linux guest virtual machine and your XFS partition should already be mounted. Now you can begin transfering files between your XFS partition and the shared folder on the windows host.

Whew. Lots of hassle for a simple file transfer, right! But the side benefit is that now you have a fully functional linux virtual machine with a shared folder set up to the windows host. Very useful - even when you must run windows, it helps to have a linux VM standing by!

11 responses so far

Jun 16 2009

IronPython (2.6) and ArcGIS - ready for prime time!!

Published by perrygeo under ESRI, Python

Not sure why this didn’t occur to me before I wrote that last post but I tried the “pythonic” version of the code under the IronPython 2.6 Beta 1 release and it works!

lyr = Carto.LayerFileClass()
lyr.Open('C:\\test.lyr')
print lyr.Filename

Works perfectly now. So IronPython 2.6 promises to be a viable option for extending ArcGIS. My enthusiasm has been renewed.

4 responses so far

Jun 16 2009

IronPython and ArcGIS - not quite ready for prime time

Published by perrygeo under ESRI, Python

Occasionally I find myself in the C#/.NET world in order to write code using ESRI ArcObjects. Today I was toying with the idea of automating the creation of ESRI Layer files (a file which defines the cartographic styling of a dataset). Of course they are in an undocumented binary file format, inaccessible to anything but ESRI software. So I pop open Visual Studio ….

I feel a nagging unease every time I type a set of curly braces. And VB just makes me insane. I prefer, of course, to use python. Luckily there is IronPython which runs on .NET - which means I could theoretically use it to interact with ArcGIS.

I only found a single working example of using ArcObjects through IronPython. But it looked promising enough to close Visual Studio and give it a go.

The first nagging problem is an IronPython-specific one. Relatively minor annoyance but you have to add the reference to a .NET assembly (library) before you can load it.

import clr
clr.AddReference('ESRI.ArcGIS.System')
clr.AddReference('ESRI.ArcGIS.Carto')
from ESRI.ArcGIS import esriSystem
from ESRI.ArcGIS import Carto

Now there is the issue of grabbing an ESRI license. A little verbose IMO but it could easily be encapsulated in a helper function to clean things up.

aoc = esriSystem.AoInitializeClass()
res = esriSystem.IAoInitialize.IsProductCodeAvailable(aoc,
         esriSystem.esriLicenseProductCode.esriLicenseProductCodeArcView)
if res == esriSystem.esriLicenseStatus.esriLicenseAvailable:
    esriSystem.IAoInitialize.Initialize(aoc,
      esriSystem.esriLicenseProductCode.esriLicenseProductCodeArcView)

Now that we’ve satisfied the demands of our proprietary license overlords, we can proceed with the real work .. in this case I just want to open an existing Layer file and see if the resulting object knows it’s own file path. Really simple, right?

lyr = Carto.LayerFileClass()
if "Open" in dir(lyr): print "The Layer object has an Open method but...."
lyr.Open('C:\\test.lyr')
print lyr.Filename

The Layer object has an Open method but....
Traceback (most recent call last):
 File "“, line 1, in 
AttributeError: ‘GenericComObject’ object has no attribute ‘Open’

Hrm. Looks like we’ve run across bug 1506 which doesn’t allow access to the properties and methods of a given instance - instead your have to work through the functions provided by the implementation. Grr…

Carto.ILayerFile.Open(lyr, 'C:\\test.lyr')
print Carto.ILayerFile.Filename.GetValue(lyr)

That is unwieldy, ugly and unpythonic. What’s the point of object oriented programming if you can’t access the methods and properties of an object directly? Since all ArcObjects applications are based on extending COM interfaces, this would be a major pain in any non-trivial application. Basically, until these .NET-accessible COM objects can be treated in a pythonic way, I don’t see any compelling reason to pursue IronPython and ArcGIS integration. Looks like its back to C# for the moment … (/me take a deep sigh and opens Visual Studio) … unless of course anyone has some brilliant solution to share!!

3 responses so far

Jun 12 2009

The GPS told me to do it

Published by perrygeo under Uncategorized

Another disastrous consequence of inaccurate spatial information… Not only can you accidentally tag your neighbor as a criminal, now it appears that sloppy spatial data has lead to the wrong house getting demolished.

I’ve asked it before but its worth repeating … with all the recent advances in spatial data publishing, where are the advances in metadata and data quality assurance? How do you know where the data comes from, what’s been done to it and by whom? What is the intended use of the data? For the vast majority of the data being shoved out onto the web, these bits of metadata are sorely lacking.

Of course this case is more a matter of one person’s sheer stupidity; I’m not sure any caveats in the metadata would have stopped the wrecking ball!

One response so far

Mar 25 2009

The magic bullet

Published by perrygeo under Uncategorized

Dealing with corrupted shapefiles can be a painful experience: programs crash for seemingly no reason, attribute tables get screwy, features get lost, queries results don’t look right and ArcGIS processing tools fail with mysterious error codes:

Dissolve error

Never fear, OGR is here. The magic bullet for fixing corrupted shapefiles is, 90% of the time, accomplished by using ogr2ogr to convert the shapefile to another shapefile.


ogr2ogr -f “ESRI Shapefile” shiny_new_clean_dataset.shp corrupted_dataset.shp corrupted_dataset

OGR’s internal data model cleans it up and the output is a fresh shiny new shapefile that works without hassle.

6 responses so far

Feb 19 2009

TV cycling coverage is dead

Published by perrygeo under Uncategorized

Real-time spatial application developers take note…

I’ve been following the Tour of California this week (looking forward to the Solvang Time Trial this Friday) and have been disappointed with the TV coverage on Versus. Its not that the coverage is bad, its just that long-distance endurance sports don’t lend themselves to the traditional 2 announcers and 1 camera format. There are multiple groups of riders and so much spatial information to keep track of if one really wants to understand the dynamics of a cycling event.

Maybe I’ve just been spoiled by the Amgen Tour Tracker. It is a crowning example of a spatially-aware real-time web application.

It provides two cameras of live coverage, live commentary with interviews, chat, summary updates, gps tracking of riders shown on both an elevation profile and a yahoo-based aerial map, “gps+” location prediction, race standings, time checks, etc. Far more information than any TV coverage without resorting to information overload.

3 responses so far

Next »