May 18 2011

Optimizing KML for hierarchical polygon data

Published by perrygeo under KML, MarineMap

For all the benefits of KML, it is decidedly a step backwards for handling large vector datasets. Most KML clients, including the cannonical Google Earth application, experience debilitating slow-down when viewing a couple dozen MB of vector data - datasets that I could easily open on a Pentium 4 in ArcView 3.2 10 years ago!

The unfortunate reality is that optimizing the performance of KML datasets is conflated with the structure of the data and is thus the responsibility of the data publisher. The wisdom of combining styling, performance-related structure, organizational structure, geometry and attributes into a single file format may be questionable, but KML has become the defacto geographic markup language due to it’s other benefits.

Anyways, back to performance enhancements on big vector datasets… The concept of “regionation” is used by several KML software to improve performance. From the Google LatLong Blog:

You can think of Regionation as a hierarchical subdivision of points or tiles, which shows less detail from afar, and more detail as you zoom in to the globe. This dynamic loading creates clearer visualizations by minimizing clutter, while simultaneously speeding up the rendering process.

In most implementations, there is a generic strategy for determining this hierarchy based on attributes or geometry size (in the case of vectors) or by a tile system. Neither is ideal when you want to preserve the vector nature of the data, split it into small, easily-loadable files and determine it’s view based on the natural hierarchy that is built into the data structure.

Specifically I am thinking about watersheds here - the US Hydrologic Units. Hydrologic units are watershed boundaries that are organized in a nested hierarchy; higher levels contain smaller watersheds that are contained within a single watershed from a “parent” level. The unique identifiers (hydrologic unit codes or HUCs) are rather ingenious as well; Each level is represented by 2 digits and are concatenated to form a single identifier that can be used to determine it’s “parent”. For example:

Level 4 HUCs Level 5 HUCs Level 6 HUCs
Level 4 HUCs
e.g. 17090011
Level 5 HUCs
e.g. 1709001104
Level 6 HUCs
e.g. 170900110403

Instead of fabricating a hierarchy of features, why not just use this natural hierarchy to structure the KML documents?

hucs-1.png

Or as KML markup:

    <placemark>
        <name>17090009</name>
        <styleurl>#HUC_8-default</styleurl>
        <polygon><outerboundaryis><linearring><coordinates>...
        </coordinates></linearring></outerboundaryis></polygon>
    </placemark>

    <networklink>
    <name>17090009_children</name>
    <region>
      <latlonaltbox>
        <west>-123.001645628</west>
        <south>44.8300083641</south>
        <east>-122.203351254</east>
        <north>45.298653051</north>
      </latlonaltbox>
      <lod>
        <minlodpixels>256</minlodpixels>
        <maxlodpixels>1600</maxlodpixels>
      </lod>
    </region>
    <link>
      <href>./17090009_children.kml</href>
      <viewrefreshmode>onRegion</viewrefreshmode>
    </link>
    </networklink>

The advantages to this design are that you don’t have to break the geometries up to fit into a square tiling pattern, data loads and renders in a logical pattern and there will always be 100 or less (usually far less) placemarks per file due to the design of the HUC data structure. File sizes stay low, network links load quickly and request/rendering occurs only when they come into view. For this example dataset totaling 300M of shapefiles, there are several hundred resulting kmz files without any repeated features and all less than ~ 150K each. In essence, it achieves optimal performance by its very design.

Here’s a video of it in action:

This was all done with a fairly “hackish” python script. I’ll continue to refine it as needed for this particular application but, at this time, it’s not intended to be a reusable tool - if you want to use it, be prepared to dig through the source code and get your hands dirty. The same concept could theoretically be applied to any spatially-hierarchical vector data (think geographic boundaries … country > state > county > city).

2 responses so far

Dec 20 2010

Um - nice “review” of QGIS

Published by perrygeo under QGIS

RJ Zimmer at American Surveyor magazine did what he described as a comparison of several free GIS application entitled “Something for Nothing

First of all, the title bugs me. The idea that the sole benefit of free software is simply cost savings is pretty naive. It disregards openness, community support, ability to transfer knowledge, freedom from restrictive licensing, etc. But I can live with the title.

I can also live with his decision to include only a single open-source GIS application alongside 3 closed-but-gratis applications. He doesn’t claim that it’s a comprehensive review despite the fact that the ecosystem of Free GIS is far more diverse.

But I can’t accept his treatment of Quantum GIS:

I did not fully test Quantum GIS. I did download and install it but the software was too complicated to use “right out of the box”, and I did not have time to learn to use it.

The feature comparison chart includes mainly “?” in the QGIS column.

OK we get it - your deadline hit before you could bother to learn one of the applications you were supposedly reviewing. One even wonders why he included QGIS the review at all. This is nothing short of irresponsible reporting. When people post stuff like this, it really rubs me the wrong way - now a whole audience of users have a inaccurate view of QGIS and entire free GIS ecosystem thanks to his slacker journalism.

12 responses so far

Jun 09 2010

kmltree

Published by perrygeo under MarineMap

When the MarineMap team started delving into the Google Earth plugin, it was apparent that it supported the display and rendering of KML files almost as well as the Google Earth desktop application. The missing piece of functionality was the nice tree-style legend that is provided with the desktop app. The plugin lets you add KML for display but gives you no HTML interface to work with it. For simple apps, you can just roll your own html/js form. But that quickly becomes unmanageable if you’re adding KML dynamically and need to create a tree-style legend for any arbitrary KML document.

Enter kmltree.

kmltree is a javascript tree widget that can be used in conjunction with the Google Earth API. It replicates the functionality of the Google Earth desktop client, and is fast, extensible, and stable for use in advanced web applications. It’s built utilizing the earth-api-utility-library and jQuery.

kmltree

Any arbitrary KML can be parsed and represented in a tree-style legend right in the web browser. Try it out.

Kmltree is the brainchild of Chad Burt who developed it as part of the marinemap codebase but had the foresight to realize that this would be useful to a much wider audience and abstracted it into its own javascript library. If you’re building a web mapping application with the Google Earth API, give it a shot!

No responses yet

May 27 2010

MarineMap wins award for Environmental Conflict Resolution

Published by perrygeo under MarineMap

For the last year or so, I’ve had the pleasure of working with the MarineMap Consortium. We just learned yesterday that the U.S. Institute for Environmental Conflict Resolution awarded MarineMap the “Innovation in Technology and Environmental Conflict Resolution”.

I joined the team after the launch of the South Coast of California site which was already widely recognized as a successful decision-support tool for marine spatial planning. We’ve since been working on version 2 of the MarineMap tool which is deployed currently for the North Coast of California in support of their Marine Life Protection Act (MLPA) process.

It’s been a tremendous challenge to bring a new version of the software to life and have it meet and exceed the standards set by its predecessor. It has also been tremendously rewarding and having our work recognized at this level is a great honor. It’s nice to know that the tools we’ve developed have been so helpful and instrumental in the marine planning process along the coast of California. Looking forward, I see MarineMap growing beyond a tool for a specific purpose (supporting the MLPA Initiative) to a robust framework for developing web-based spatial planning tools for all sorts of environmental applications, both marine and terrestrial. And this award confirms that we are already heading in the right direction. Very exciting news!

3 responses so far

May 06 2010

Exploring Geometry

Published by perrygeo under Java

I don’t know how I let this gem slip past my radar for so long. It was only via a post by Dr. JTS himself (aka Martin Davis) that I saw a screenshot of JTS TestBuilder and decided to check it out.

I was actually just talking with someone about a tool that could provide simple visualization of WKT geometries; JTS Test Builder does that and much more.

You can input geometries (graphically or by well-known text) and compare two geometries based on spatial predicates:

spatial predicates

Do overlay analyses with the two geometries. Note that you can see the result as WKT below.

overlay

And there are a host of other spatial operations to generate geometries using buffers…
buffers

… convex hulls …
convex hull

This app provides a very nice and user-friendly way to quickly and simply explore and test geometric operations. To try it out, download JTS and unzip the contents somewhere. If you’re on windows, the .bat file is provided. If you’re running anything else, you have to cook up a shell script that will set up the environment and run JTS TestBuilder:

JTS_HOME=/usr/share/java/jts-1.11
CP=$CLASSPATH
for i in $JTS_HOME/lib/*.jar; do CP=$i:$CP; done
java -Xmx256m -cp $CP com.vividsolutions.jtstest.testbuilder.JTSTestBuilder $*

One response so far

Mar 31 2010

Distributed

Published by perrygeo under Uncategorized

I’ve been playing around with some distributed version control systems (DVCS) to replace svn.

First, the why: I’ll leave the details up to Joel in his excellent HgInit tutorial. Its mercurial-specific but the general concepts apply to any DVCS. The takeaway message for any project with > 1 developer is this:

Mercurial [ed: DVCS] separates the act of committing new code from the act of inflicting it on everybody else.

Next, the implementation: I’m using git to work on another project (Golden Cheetah) and its been a tough learning curve. Git is no doubt the most powerful DVCS out there. You can do magical things with it like combine commits and mess with history trees. And you can also screw things up pretty badly if you misinterpret the esotric docs for some non-intuitive piece of the workflow.

I just tried mercurial this morning - hg seems to fit my mind well. There is less power but the workflow is very clear and intuitive. And there are docs written for people who don’t want to do an in-depth study of their version control software. It stays out of the way.

Long story short, I’m going to use mercurial/hg for my new projects. Ah what the heck my old/ongoing projects as well. My googlecode repository has been converted over to Mercurial. Svn will stick around but wont be updated.

No responses yet

Feb 18 2010

Lazy raster processing with GDAL VRTs

Published by perrygeo under Uncategorized

No, not lazy as in REST :-) … Lazy as in “Lazy evaluation“:

In computer programming, lazy evaluation is the technique of delaying a computation until the result is required.

Take an example raster processing workflow to go from a bunch of tiled, latlong, GeoTiff digital elevation models to a single shaded relief GeoTiff in projected space:

  1. Merge the tiles together
  2. Reproject the merged DEM (using bilinear or cubic interpolation)
  3. Generate the hillshade from the merged DEM

Simple enough to do with GDAL tools on the command line. Here’s the typical, process-as-you-go implementation:

  1. gdal_merge.py -of GTiff -o srtm_merged.tif srtm_12_*.tif
  2. gdalwarp -t_srs epsg:3310 -r bilinear -of GTiff srtm_merged.tif srtm_merged_3310.tif
  3. gdaldem hillshade srtm_merged_3310.tif srtm_merged_3310_shade.tif -of GTiff

Alternately, we can simulate lazy evaluation by using GDAL Virtual Rasters (VRT) to perform the intermediate steps, only outputting the GeoTiff as the final step.

  1. gdalbuildvrt srtm_merged.vrt srtm_12_0*.tif
  2. gdalwarp -t_srs epsg:3310 -r bilinear -of VRT srtm_merged.vrt srtm_merged_3310.vrt
  3. gdaldem hillshade srtm_merged_3310.vrt srtm_merged_3310_shade2.tif -of GTiff

So what’s the advantage to doing it the VRT way? They both produce exactly the same output raster. Lets compare:

Process-As-You-Go   “Lazy” VRTs
Merge (#1) time 3.1 sec 0.05 sec
Warp (#2) time 7.3 sec 0.10 sec
Hillshade (#3) time 10.5 sec 19.75 sec
Total processing time 20.9 sec 19.9 sec
Intermediate files 2 tifs 2 vrts
Intermediate file size 261 MB 0.005 MB

The Lazy VRT method delays all the computationally-intensive processing until it is actually required. The intermediate files, instead of containing the raw raster output of the actual computation, are XML files which contain the instructions to get the desired output. This allows GDAL to do all the processing in one step (the final step #3). The total processing time is not significantly different between the two methods but in terms of the productivity of the GIS analyst, the VRT method is superior. Imagine working with datasets 1000x this size with many more steps - having to type the command, wait 2 hours, type the next, etc. would be a waste of human resources versus assembling the instructions into vrts then hitting the final processing step when you leave the office for a long weekend.

Additionaly, the VRT method produces only small intermediate xml files instead of having a potentially huge data management nightmare of shuffling around GB (or TB) of intermediate outputs! Plus those xml files serve as an excellent piece of metadata which describe the exact processing steps which you can refer to later or adapt to different datasets.

So next time you have a multi-step raster workflow, use the GDAL VRTs to your full advantage - you’ll save yourself time and disk space by being lazy.

2 responses so far

Dec 16 2009

Peaksware licensing revisted …

Published by perrygeo under Uncategorized

I had previously bitched and moaned about the licensing restrictions on the TrainingPeaks WKO+ software. Truth be told, the reason I was so put off by their crappy licensing scheme was that my cycling training relied so heavily on their software. It was not perfect but it was the best tool available. I’ve since discovered Golden Cheetah which is a viable open-source alternative but it still lags behind WKO+ in many critical features.

Now, fresh in time for the 2010 training season, Peaksware has released a new version 3.0 of WKO+ which, amongst many UI and functionality improvements, has made considerable progress on the licensing front.

We know, our licensing has been a challenge to deal with for our customers in the past, but we’ve always tried to be as helpful as possible getting you back up and running after a hard drive crash or new computer. To remedy this, we’re pleased to announce an all new flexible licensing system. First, with every purchase we now allow you to install WKO+ 3.0 on up to two computers; second, we’ve built an online activation/deactivation system so you are free to move your active licenses from machine to machine. Are you leaving on a 2 week trip? Just de-activate your home computer, activate your laptop, and you’re on your way. When you get home, de-actiavate your laptop, re-activate your desktop and you’re all set.

It ain’t open source (there is still a place in this world for proprietary software if they can push the boundaries and innovate) but the sensitivity to the licensing issue just may have restored my faith in their company.

One response so far

Aug 10 2009

Nice examples of ESRIs geoprocessing python module (9.3)

Published by perrygeo under ESRI, Python

Just thought I’d point out a great presentation about the “new” 9.3 geoprocessing (gp) python module from ESRI.

Ghislain Prince and Elizabeth Flanary do a great job of introduction by examples. The latest gp module is much more pythonic and these examples show how to leverage that to its full advantage. If you try to do this with older gp versions, the code would make most pythonistas cringe. This latest version returns objects and lists, use real booleans, and uses true objects instead of funky string parameters. Basic OO stuff for most python libraries but a big improvement for gp.

Here’s the powerpoint presentation. Thanks to Jamey Rosen for the tip!

One response so far

Jun 23 2009

Peaksware licensing hell

Published by perrygeo under Uncategorized

I’ve been using Peaksware’s WKO+, a cycling and running training tool to manage data from heart rate monitors, GPS units, power meters, etc. Its a powerful tool with a clunky UI but I’ve gotten used to it.

You pay $100 for a “personal” license. Not a big deal to me since they basically have a monopoly on this software niche. I first installed it on my work computer to test the data from my daily bike commute. Cool it works. Then I went to install it at home since that’s where I’ll be using it. Works ok. I proceed to gather all my fitness data into their proprietary binary format.

Fast forward a few months. I’m reformatting the hard drive on the laptop and want to move all my data and software to my desktop. But installing WKO+ is giving me a headache (”Error: Too many installations”). The registration process takes a hardware fingerprint and your must active it via the web to get a registration code. However, hidden withing their EULA, is a term which dissallows the transfer of license to another computer other than the one to which it was originally installed. The second installation was just an allowance they make to allow for “hard drive crashes” and such.

Since neither of those machines would be available to me, certainly there would be a way to transfer it? After several progressively more desperate communications with Matt Allen at peaksware support, he informed me that there was no way they would transfer the license (the non-transfer clause IS in the EULA after all). I would need to purchase another license simply because I switched computers!

Here is my response:

Basically what you are telling me is that I can no longer use WKO+
without paying again. I get to use the software for a few months and
you revoke my right to use it because I buy a new computer! I am a
paying customer, trying to be totally legit here, willing to support
your business in exchange for a license to use your software and you
insist on screwing me over. Brilliant.

This is one of the most unprofessional and idiotic stances I have ever
seen from a software company. Your intention appears to be to screw
over your paying customers and milk as much cash from them as possible
- you might want to rethink that business model unless you want to
loose customers! I will never endorse, recommend or purchase another
product or service from peaksware nor will any of my family, friends,
teammates or readers once the word gets out about your disrespectful
policies.

There are numerous typical situations where a new copy of the software
would need to be installed including:

* Hard drive failure
* Operating system upgrades
* New computer purchases
* Extended traveling and touring (installing onto a laptop or netbook)

Now I fully understand why your policy is one license per computer. It
makes perfect sense. I have seen plenty of other software with a
similar licensing model. But they also allow to uninstall the software
and re-register it on another computer due to these circumstances.
There is simply no technological reason why you could not implement a
licensing structure that allowed the user more freedom to transfer
licenses while still preventing piracy. As it stands, your licensing
model treats paying customers like criminals if they happen to run
across any one of the above situations.

So, to sum it up - your foolish license policy has lost you one
customer and many future ones.

Good riddance.

So if you want to support a company that treats its paying customers like criminals because they get a new computer, go right ahead and support Peaksware. But anyone who expects to use software that they pay for even if they happen to buy a new computer should steer clear.

The real kicker is that all that work is locked away in their proprietary file format simply because of their draconian licensing. This is the real take home lesson to all software users (not just fitness geeks): If you lock your data away in a proprietary format and are beholden to a single company in order to access it, they can and will screw you. Always insist on open data formats, even if using proprietary software. Oh and always read the EULA carefully before clicking OK!

5 responses so far

Next »