Last month we brought you ‘The Heights of England’, a map of building heights across the country. It proved to be one of our most popular blog posts with over 50,000 people visiting the interactive map on the day of its launch. Many of you also asked for information about how we created the map. Happy to oblige, we’re going to take you on a deep-dive into the technology behind the map and explain how we visualized 13,000,000 buildings without melting our servers or your browser.
The Data
In our previous blog post, Jon and Alice from our GIS team explained how they took the raw LiDAR data, kindly supplied by the Environment Agency, and merged this with building outlines from Ordnance Survey data. The result was handed over to me as 3.5GB of data in the form of an ESRI shapefile.
The first step was to import this shapefile into a geo-database. PostgresSQL (with the PostGIS spatial database extensions) is our geoDB of choice for production systems. Its spatial capabilities are second to none (including commercial offerings from leading suppliers) and its open-source ecosystem has a vibrant following within the GIS community. Using the shp2pgsql tool (next to the bottle-opener on the PostGIS ‘Swiss Army knife’) we imported the building data into a PostgreSQL table with each table row describing a single building.
With such a large amount of data it is important to really understand PostgreSQL spatial indexes and how to optimize queries. For the following steps we created spatial indexes on the building footprint (stored as a PostGIS multipolygon) and also the building centroid (stored as a PostGIS point). To maximize query performance we also used the PostgreSQL CLUSTER command to physically reorder the table rows on disk so that buildings that are close geographically can be loaded in a single read operation.
Vector Tile Mapping
One of the first things you will notice when using the Building Heights map is the great zooming and panning performance. This is in large part down to our adoption of vector tile mapping (VTM) rather than the more traditional raster-based image tile mapping (ITM) used on the web.
To understand the difference we need to think about how map data is delivered to your browser to display a web map. With ITM, individual raster images (usually 256x256 pixel PNGs) are requested by your browser and reassembled to give the illusion of a continuous, zoomable and pannable map. With VTM, the individual tiles are actually a set of drawing instructions that are interpreted by a rendering engine running in your browser. If you are familiar with GeoJSON, you can think of each vector tile as a set of GeoJSON features describing geographical objects within a square.
Another important difference between VTM and ITM is that image tiles are just a bunch of pixels, whereas vector tiles contain rich geometric data and metadata associated with each feature. This opens up lots of possibilities for interactivity and is how we are able to respond to clicks on individual buildings on the map and display building height, area, and the what3words address (based on the building centroid).
The VTM implementation that we use is MapboxGL, an open source mapping library from Mapbox. This is essentially a Javascript rendering engine that runs in your browser and converts the vector tile drawing instructions into the map that you see on your screen. Data is delivered to your browser in a very efficient binary format, but conceptually you can think of it as ultra-compressed GeoJSON. For further performance gains, MapboxGL uses WebGL to take advantage of hardware acceleration provided by your graphics card.
The Tile Server
To create the building layer on the map we needed a way to take the data from the PostgreSQL database and deliver vector tiles to your browser. We needed a tile server; a backend service that fulfils requests from connected browsers for vector tiles at specified coordinates and zoom levels. This is where we spent a lot of time optimizing the technology stack to deliver the performance we wanted.
While it is possible to serve vector tiles straight from a PostgreSQL/PostGIS database we found best performance was achieved by pre-rendering the vector tiles using the MBTiles storage format. An MBTiles file (actually a SQLite database) stores vector tiles in a format ready to be delivered straight to your browser. Because no further processing is necessary, the tile server can efficiently service many simultaneous requests from connected browsers.
The conversion process from PostgreSQL to MBTiles was patched together using the TileLive framework, a box of open-source components that allows you to build a processing chain to take spatial data from a variety of sources, produce vector tiles, and convert them to other formats. The whole process takes about 12 hours, but it is through this offline processing that we can provide realtime performance in the finished map. The resulting MBTiles file is served using the open-source Tessera tile server, itself based on the TileLive framework.
Rendering the Map
So at this point in the story we have vector tiles containing our building data being supplied to your browser. MapboxGL uses a style definition in JSON format to render the vector data, turning the drawing instructions into pixels on your screen. Each vector tile contains a set of building polygons with associated metadata such as building centroid and height. Using MapboxGL’s great, new data-driven styling features, we use the height metadata to determine the building fill colour. MapboxGL interpolates the height value to provide a continuous colour scale to match the legend on the map.
Our basemap vector tiles (derived from an Open Street Map extract) are rendered in a similar way using a very minimal style definition to allow the building polygons to take centre stage on the map. This highlights another great advantage of VTM; the ability to completely change the style of a map just by changing how it is rendered in the browser and without having to change the underlying data.
Urban Centre Statistics
The right-hand information panel in the interactive map displays some interesting metrics for each urban centre that you select. These are calculated in real-time by going back to the building data in the PostgreSQL database and using PostgreSQL’s powerful spatial querying and aggregation functions. The building height histogram is generated with a single query using PostgreSQL’s width_bucket function, which aggregates and counts records after placing them into buckets. The remaining metrics are calculated using a single, combined aggregate query
To improve performance we hand-optimized the queries and the results are also cached in the database. In a future version of the map you will be able to draw your own area of interest and view the same metrics.
Putting It All Together
The front-end web app was developed using AngularJS with UI theming from Angular Material. We are big fans of AngularJS at Emu Analytics and use it for most of our production-grade client work. We love how it brings good software engineering practices such as component-based development, modularity and reuse to front-end development. We are excited about the upcoming release of AngularJS 2.0 and are starting to move over to TypeScript development.
Of course, no data visualisation article would be complete without mention of D3. We channeled our D3-fu in this visualisation to implement the height histogram and the animated metrics displays. We also used it to implement the subtle height bar overlay on the map legend. Remember, D3 is not just for charts!
To complete the picture we also have a back-end service metrics service developed in Scala. This provides urban centre data to the front-end via a REST API.
Wrap all that up in a scaleable, fault-tolerant deployment over a cluster of physical machines and you have a service that can deliver 13,000,000 buildings to 300 concurrent users without breaking sweat.
Where next?
Of course, the finished product is more than the sum of its parts (or a checklist of fashionable technologies). If you have enjoyed this article and want to learn more about how Emu Analytics can help you make sense of your data, produce real-time insights and answer the ‘so what?’ then please get in touch.
Robin Summerhill is Head of Technology at Emu Analytics. He also sings in an a capella group, has recorded an album and, rather randomly, has a PhD in Pharmacology from Oxford University.