Apr 01 2006
LIDAR data processing with open source tools
LIDAR data is certainly a hot technology these days. LIght Detection And Ranging data can be used to create extremely detailed terrain models but there are lots of barriers to using LIDAR data effectively. USGS Center for LIDAR Information Coordination and Knowledge was put in place to “facilitate data access, user coordination and education of lidar remote sensing for scientific needs“.
Beyond the sheer size of the datasets and the knowledge and hardware required to process them, software is a big issue. In the realm of open-source GIS tools, there are many applications (GRASS being the most prominent) for dealing with elevation point data and processing it into more meaningful products such as elevation DEMs and contours.
Usually the data comes as simple ASCII text files and the x,y and z values are easily extracted from such a file. But take a look at the USGS data distribution site and you’ll notice some of the datasets are distributed as LAS binary files. It makes sense to store such massive datasets in binary so I started looking for some LAS conversion tools.So after some searching, I found a bunch of proprietary products for working with LAS but no open source tools. Luckily, the format is well documented thanks to the efforts by the ASPRS to make it an open specification.
So dusting off my notes about parsing binary files in python, I set out to create a python module for extracting LIDAR data from LAS files. The LAS format contains a header which needs to be parsed first in order to read the point cloud. Once you have the header info, you can scan your way through the dataset to pick out the x,y,z values.
Here’s an example of the python interface that will read the first 10,000 points into a 2D shapefile with the elevation as a attribute in the dbf:
import pylas infile = 'sanand000001.las' outfile = 'lidar.shp' header = pylas.parseHeader(infile) pylas.createShp(outfile, header, numpts=10000, rand=False)
The issue I struggled with is the sheer size of these datasets. A USGS quarter quad can contain 10 million points which is an excessive number of points to create, say, a 10 meter DEM over such a small area. Clearly there was a need to extract a subset of this dataset but just taking the points sequentially gives you a subset of the total area. So, by default, pylas randomly scans the data to pull the number of specified points so that the point cloud could cover the entire area (at a much lower point density). Without numpts specified, it will randomly select 1/2000th of the total number.
So the simplified interface to make a more manageable lidar shapefile would be:
header = pylas.parseHeader(infile) pylas.createShp(outfile, header)
Once the shapefile is created, you can bring it into GRASS to do the processing to generate DEMs, contours and other derived elevation products:
v.in.ogr dsn=lidar.shp layer=lidar output=lidar g.region vect=lidar g.region res=10 v.surf.rst input=lidar elev=lidar_dem zcolumn=elev # Launch the interactive 3D viewer nviz lidar_dem

Of course the method I just described is very simplistic and does not even come close to utilizing the full potential of the LIDAR point cloud, but it’s a start.
The pylas.py module can be downloaded here. The code has worked for me on the few datasets I’ve tested it with but it should certainly be considered a rough-cut, alpha product. There is much room for improvement and, of course, if you have any suggestions or contributions, please get in touch.
Hi,
GRASS 5 has a module called s.cellstats which is yet to be ported to GRASS 6. It fills a raster grid with min/max/mean/median of the points which fall in that grid. GRASS 6.1-cvs can now handle ~3million input points before it runs out of memory. Don’t create a table (v.in.ascii -t) and don’t build topology for pretty much unlimitted points. v.surf.rst et al are currently (slowly) being adapted to work with maps with no topography. Obviously a port or pre-processing tool like s.cellstats would be really great to have. Search the GRASS mailing list archives for posts by Helena on the subject.
5% and 95% of points would also be nice..
Hamish
Nice work Hamish,
What approach will you suggest to read 17,000 .las files and write shp files with, let’s say, 1% points, .1% points and 0.01 % points of all the set, being the total over 2 billion points and the distribution not even across .las files?
Quite a challenge ah ?
Great idea to program a pylas module!
Exactly what I was looking for.
The fastest way to rasterize huge LIDAR point datasets would be to extract subsets with pylas, save it as (x,y,z) ASCII files and import them with the new GRASS module r.in.xyz. There are several modules for interpolate the NODATA cells (e.g. use the triangulation of nnbathy - r.surf.nnbathy).
Bernhard
Anybody has a sample data in .las 1.1 or 2.0 format?
Please, let me know.
Thanks!
Gennady
I need to include the projection information in the binary las file and save out the file in the las format again.
I guess I need to change the settings in the point data records of the header file so as to populate the projection/geo-referencing info, project the data into the required system and save it back as a .las file directly or through an intermediate .shp file.
Can you help me with that please.
i have a put few links where to download sample LIDAR data in LAS format on this webpage http://www.cs.unc.edu/~isenburg/laszip/ and there is also c++ source code with examples for reading and writing the LAS format.
i have put some code here (http://www.cs.unc.edu/~isenburg/googleearth/) for converting large amounts of LIDAR points in either LAS format or ASCII format (possibly stored across multiple files) into correctly georeferenced *.KML files that contain a tiling of on-the-fly extracted elevation contours for easy visualization in google earth.
Does anybody know some more effective algorithms for linear objects’s edage detecting and crossing points extracting. I just want to obtain the edages of linear objects in vector format.
itslbs@126.com
Thanks for your reply.
Thanks for writing and releasing this–it’s been a great convenience.
But there is a small bug in pylas.py. Raw X, Y, and Z are read as unsigned integers (L format) and they are signed (l format). Not a problem until you have negative values of a coordinate, such as beach elevation at low tide, and then …
Hi all,
I want to know how to import lidar data to grass and create voxels in them. I am a newbie to lidar and to GRASS..so apologies if i am asking a dumb question. Is there are sort of manual for GRASS to work with lidar data.
Thanks.
i have the cloud point data (laser data converted from .las to ascii xyz data format). i wanna create some surface model using GRASS 6. however, i’m yet not familiar with this software. anybody can list down the processing step to create surface model from ascii xyz data? besides, is anybody know any opensources to create surface model (3D)? thanks.
_newbie GIS_
@ maya,
I am also at this stage, trying to get Lidar ascii data into GRASS. The wiki they have is very useful (links off the website), but the ascii I have is the point number, x, y, z, intensity, (then in the same row the last return) x, y, z, intensity.
When I’ve tried importing ascii to vector into GRASS it only gives option of point number, x, y, z. So i am wondering how to get the last return data in there as well?
I wonder if i am importing in the right place. Anyway good luck…
@deadjo,
A raster grid can only have 1 value for each cell, so you can either have elevation, using the z value from the 3rd column, or you can use the intensity value, using the 4th column for the z value, etc.
do anyone have c++ open source code for reading las data i tried the one written by martin isenburg but it doesn’t work