Archive for September, 2007

Sep 25 2007

Autodesk open sources coordinate system software

Published by perrygeo under Software

Not very often do I see open source mentioned on the front page of my Google Finance page (let alone Geospatial Open Source). But here it is.. the announcement was made at FOSS4G2007 that autodesk will be open sourcing part of it’s coordinate system and map projection technology.

So what motivation does Autodesk (or any other company) have to open source it’s technology? An important line from Lisa Campbell, vice president, Autodesk Geospatial:

“Our intent to contribute again to the open source community is a reflection of our customers’ desire for faster innovation, more frequent product releases, and lower total cost of ownership.”

3 responses so far

Sep 18 2007

Parallel python and GIS

Published by perrygeo under Python

Let’s face it - processing speeds aren’t going to be increasing according to Moore’s Law anymore; Instead of faster CPUs, we’ll be getting more of them. The future of programming, it seems to me, lies in the ability to leverage multiple processors. In other words, we have to write parallel code. Until I read Seans’ post, I was unware that there was a viable python solution. I had been growing quite dissillusioned by python’s dreaded Global Interpreter Lock which confines python to a single processing core. I’ve even started learning Erlang to leverage SMP processing (until I realized that Erlang and it’s standard libraries are virtually useless for anything that needs to handle geospatial data).

So I gave Parallel Python (pp) a shot. Since Sean also offered up a bounty for the first GIS application that used pp, I thought it might be a good time to try ;-)

A good candidate for parallel processing is any application that has to crunch away on lists/arrays of data and whose individual members be handled independently (see pmap in Erlang). I have been working on an application to smooth linework using bezier curves. It’s not quite polished yet but the image below shows the before and after

… but bezier curves aren’t quite the subject of this post. Let’s just say the algorithm takes some time to compute (if you’re using a high density of verticies) and can be handled one LineString feature at a time. This makes it a prime candidate for parallelization.

Given a list of input LineStrings, I could process them the sequential way:

smooth_lines = [] for line in lines: smooth_lines.append( calcBezierFromLine( line, num_bezpts, beztype, t) )

Or use pp to start up a “job server” which doles the tasks out to as many “workers”. A busy worker utilizes a single processing core so a good rule of thumb would be to start up as many workers as you have CPU cores:

numworkers = 2 # dual-core machine job_server = pp.Server(numworkers, ppservers=ppservers) smooth_lines = [] jobs = [(line, job_server.submit(calcBezierFromLine, (line, num_bezpts, beztype, t), \ (computeBezier, getPointOnCubicBezier), (”numpy”,) )) for line in lines] for input, job in jobs: smooth_lines.append( job() )

Theoretically the parellized version should run twice as fast as the sequential version on my core2 duo machine. And reality was pretty darn close to that:

$ time python bezier_smooth_pp.py 2 Shapefile contains 1114 lines Starting pp with 2 workers Completed 1114 new lines with 8 additional verticies for each line segment along a cubic bezier curve real 0m10.908s … $ time python bezier_smooth_pp.py 1 Shapefile contains 1114 lines Starting pp with 1 workers Completed 1114 new lines with 8 additional verticies for each line segment along a cubic bezier curve real 0m20.007s …

Just think of the possibilities. In the forseeable future, the average computer might have 8+ cores to work with. This could mean that your app will move 8x faster if you parallize the code (assuming there are no IO or bandwidth bottlenecks). I’d love to test it out on a system with more than 2 processing cores but, unfortunately, I don’t have access to any beowulf clusters, Sun UltraSparc servers, or 8-core Xeon Mac Pros. This is what I really need to complete my research ;-) So if anyone want to donate to the cause, send me an email!

And to answer Sean’s bounty, I don’t consider this an actual application (yet) but I hope it can spur some interest and move things in that direction. But if you feel the need to send me some New Belgium swag (or one of the machines listed above), feel free ;-)

3 responses so far

Sep 05 2007

The world turned right-side up

Published by perrygeo under Uncategorized

I’ve been working alot in Surfer these days; an excellent geostats and surface mapping package. I was very happy to find that GDAL read it’s .grd binary format until I noticed the output from gdalinfo:

C:\Workspace\Temp\interpolation>gdalinfo svpce_5.grd
Driver: GS7BG/Golden Software 7 Binary Grid (.grd)
Files: svpce_5.grd
Size is 555, 339
Coordinate System is `’
Origin = (383371.000000000000000,3764907.000000000000000)
Pixel Size = (0.500000000000000,0.500000000000000)
Corner Coordinates:
Upper Left ( 383371.000, 3764907.000)
Lower Left ( 383371.000, 3765076.500)
Upper Right ( 383648.500, 3764907.000)
Lower Right ( 383648.500, 3765076.500)
Center ( 383509.750, 3764991.750)
Band 1 Block=555×1 Type=Float64, ColorInterp=Undefined
NoData Value=1.70141e+038

Notice that upper Y value is south of the lower Y value! Basically the raster lines order is reversed (bottom-to-top instead of the normal raster orientation of top-to-bottom). I’ve also experienced the same issue with some NetCDF files so I thought it would be good to have a generic solution to the problem.

So I hacked up the gdal_merge.py script (distributed with gdal, fwtools, etc) and created a raster flip script that will invert the image along the y axis and retain the georeferencing and metadata. The resulting flip_raster.py script seems to work pretty well though it is far from tested.

Here’s an example:

The standard gdal_translate method (which doesn’t account for the inverted coordinate space):

gdal_translate -of GTiff krig1.grd krig1_translate.tif

And the flipped raster method:

flip_raster.py -o krig1_flip.tif -of GTiff krig1.grd

And we’re good. gdalinfo confirms that we have the same extents, pixel sizes, metadata, etc as the original dataset.

2 responses so far

Sep 04 2007

Mapserver vs Mapnik revisited

Published by perrygeo under WMS

A while ago, I was enamored with mapnik’s image quality despite it’s limitations compared to the vast configurability of the mapserver mapfile. Now that mapserver uses the AGG rendering library, it might not be necessary to compromise configurability in order to get beautiful linework. I just installed the recent beta of mapserver 5.0 and the image quality is very crisp… but this comes at the expense of rendering speed.

All the times below are the average of ten runs using a full global view of a simplified shapefile of country borders.

mapserver (gd) : 0.082 sec , 18kb

OUTPUTFORMAT
NAME “GD_JPEG”
DRIVER “GD/JPEG”
MIMETYPE “image/jpeg”
IMAGEMODE RGB
EXTENSION “jpg”
END

shp2img -m test.map -o mapserver_gd_test.jpg

mapserver (agg) : 0.188 sec , 16kb

IMAGEQUALITY 80
OUTPUTFORMAT
NAME ‘AGG_JPEG’
DRIVER AGG/JPEG
IMAGEMODE RGB
END

* Note that if we bump up imagequality to 90% to (roughly) match the mapnik image, the rendering time and size increase a bit (.201 sec, 25kb)

shp2img -m test.map -o mapserver_agg_test.jpg

mapnik (agg) : 0.282 sec, 23kb
python test_mapnik.py

* Running this through the python interpreter is likely to interfere with the speed of the results so these times may not be very comparable to shp2img.

Using these preliminary results, it looks like mapserver 5.0 with AGG rendering is roughly equal to mapnik based on a balance of quality/speed/image size. But since I’d prefer to use mapfiles over the undocumented mapnik xml format any day, I think I’ll stick with my beloved mapserver. Kudos to the mapserver developers for raising the bar once again.

10 responses so far

Sep 04 2007

Performance testing rasters with mapserver

Published by perrygeo under Uncategorized

There’s been some good talk on the mapserver list (thanks to Gregor’s diligent testing) about performance related to serving up raster imagery.

First off, comparisons of image formats. Then a look at some TIFF optimization techniques like overviews (similar to “pyramids” in ESRI land) and internal tiling to boost rendering speed.

Most of the conclusions are not all that staggering:

  • TIFF is fastest but takes up more space compared to ECW and JPEG2000.
  • Overviews speed up TIFFs tremendously when zoomed out (ie when mapserver would otherwise have to perform some heavy downsampling)
  • Internal tiles in GeoTIFF format give a boost when zoomed in (only the necessary tiles are read from disk)
  • The TIFF comparison was run on two setups; a monsterous 8-core, RAID-5 equipped beast and a low-memory virtual machine on low-end PC hardware. The TIFF optimizations are very noticeable on the lesser machine but almost completely negligible on the high-end machine.
    Both tiling and overviews are useful, but only on machines with resource
    shortages, such as slow disks or a lack of spare RAM for caching.

Nothing earth-shattering (these techniques are often mentioned as best practices) but is very nice to see some hard numbers to back it up. Plus the verbose test logs provide a good example for a newbie trying to implement them. Good stuff Gregor!

No responses yet