Toponymy - Mapping UK Placename Origins

Toponymy - Mapping UK Placename Origins

Jake, our Junior Web Application Developer and newest member of Emu, has been very busy lately taking a leading role in the development of our new product, Aurora. And as we do with every new joiner, we challenged him to write a blog that showcased his interests, is based on location insights and uses our technology. As Jake is a bit of a language buff he chose to create a Toponomy map of the UK, and as he needed to display a lot of location data in a performant environment he chose the visualisation components of Aurora to do it on. So over to Jake…

What is Toponymy?

Toponymy is the study of the origins and meanings of placenames. Placenames are important and interesting to study because they allow us to uncover details about the historical immigration patterns, local history and culture of an area. Toponymy by Emu Analytics provides an interactive toponymic map of the UK, where a settlement is colour-coded by the linguistic origin of some generic form within its name. A generic form is a commonly recurring element of a place name, normally appearing as a prefix or suffix. For example, the thorpe in Scunthorpe is a generic form deriving from the Old Norse word for “homestead”.

Toponymy App

Click on the image above to be taken to the app

This project allows the user to view the map in two modes; the “language view” shows 30% of UK settlements along with the colour-coded linguistic origins of their generic forms. The “filter view” shows all UK settlements, and allows the user to search for substrings of settlement names using regex.

Once a settlement is clicked, an info panel appears on the right which gives details about the language of origin of the settlement name’s generic form, as well as a “nearest neighbours” table which shows the classifications of nearby settlements. A “notes” section containing extra information is also provided where available.

The config panel allows you to select how some of the functions of the application work. You can also turn on a ‘Lines to nearest neighbours’ feature that shows the neighbours used in the info panel calculation.

Difficulties in Defining the ‘Linguistic Origin’ of a Placename

Although settlement names are categorised as belonging to one language, it’s important to keep in mind that things aren’t really as simple. A placename can have multiple generic forms from different linguistic origins – for example, the settlement name Dumbarton has a prefixed generic form (Dum) deriving from Scottish Gaelic, and a suffix (ton) deriving from Old English. In these cases, settlements are not added to the map’s language view. The exception to this rule is in the case of Latin generic forms, since it was found that the vast majority of settlements with Latin generic forms had some other generic form of a different origin. Since following this rule absolutely would result in removing Latin from the list of available languages, the decision was made to exempt Latin.

Another potential complication worth considering is that a settlement name may be composed of some non-generic word or phrase from one language, and some generic form from another. For example, the component Edin in Edinburgh is not a commonly-found generic form, and probably has Cumbric origins. However its suffix -burgh is a commonly-found Scots generic form. In this case, the solution is to simply tag the settlement as the language of its generic form. This is not problematic, as the project aims to map the distribution of generic forms rather than analyse and map the origins of each unpredictable component in a settlement name.

Cluster of ‘Burgh’ on the East Coast of England

The final hurdle which has to be crossed in this aspect is the case where a settlement name has no generic forms. These settlements are largely ignored in language view mode, however a few notable areas (such as London) are manually tagged on the map with some notes justifying the classification.

Language View Method

An incomplete but extensive list of generic forms in the UK was taken from Wikipedia, which itself is compiled from information provided by Nottingham University and Ordnance Survey. A separate list of Scots generic forms was used from Glasgow University. A complete list of UK settlements was also provided by UKTownsList, on which a regex search was performed to associate each settlement in the UK with zero or more generic forms. Only settlements with a single generic form, or multiple generic forms with the same origin, were tagged with a language.

After the initial tagging, more intelligent reclassification was required in the case where a generic form had multiple possible linguistic origins. For example the prefix auch- can be both Irish and Scottish Gaelic. In this case, settlements with the auch- prefix lying in Scotland were tagged as “Scottish Gaelic”, with the same process being performed with the tag “Irish” for settlements in Ireland.

Results and Correctness

This method produced an expected distribution of linguistic origins; Old Norse classifications are most densely spread across the old Danelaw, Old Brythonic and Cornish classifications are largely in the south-west, and Pictish is exclusively in Scotland to name a few. However the decision to display settlements as individual points rather than show a less-detailed distribution map inevitably will result in some erroneous data being displayed. This danger is avoided in some previous research by deliberately displaying aggregate data in a low resolution.

Example Distributions across Northern-Ireland, Wales, Scotland and Cornwall

The design decision to display datapoints individually was made as the project is ongoing, and continued efforts will be made to correct the data where errors are found. A “report mistake” utility is provided in the application when a settlement name is clicked, and users are encouraged to use it wherever they see something that doesn’t look right.

Problems with Scots

Scots generic forms are more challenging to identify, as many are similar or identical to Old English generic forms, particularly Northern dialects of Old English. For example the suffix -haugh is a common Scots generic form, however this also appears in placenames where the Northumbrian dialect of Old English was spoken. A compromise was made to only include Scots generic names which exclusively appear in the Scots language, such as -hame, -brae and auld-.


Aurora is Emu Analytics’ geospatial event processing and visualisation platform. It was built to meet a growing need to consume, analyse and visualise large location-based datasets both historically and in real-time in a highly engaging and interactive manner. We use it to support the majority of our products such as Duty of Care and the Smart Energy Portal and it can also be utilised as a framework to deploy into our clients infrastructure to support their location intelligence goals. If you would like to discuss this or any of are other offering please get in touch.