The Quest for the "Earth Tile"

I have returned to the project and have spent the last 7 weeks rewriting the publication and simplification code, which is half of what this is. What I chose to do was fire the entire OSM road map, down to and including secondary roads, plus the entire rail network, and all the ferries for good measure, at my program and get it to work. Even if I don’t manage to handle the entire map, I hoped to get the simpler job of merging and presenting the combined bus and rail routes, as defined by their GTFS, under control.

The results are mixed as discussed below. I have good renders of the global road+rail system, as detailed in The Map itself, up to level 5, and I certainly have a very much improved approach than before, having twigged the plainly obvious that it’s all just a massive geojson simplification problem. The complete road tiles are however far too heavy to be used as vectors, they will have to be rendered. I am trying to get a “have your vector tile and eat it” approach where we can render PNGs for each tile, and generate rarefied vector tiles, by chucking out small stuff, presented transparently, so we can still interact with most of the map. I go to some effort to try and make the longest possible Linestrings of multi-line features to aid this. Much of what I am doing can be achieved simply with styling back the roads. But that too will fail in these dense areas once you get to the top 5 or so tiers. You cant show all the roads and railways in Germany, Switzerland and a 300km radius around about Munich at level 5, you’ll just get a messy blob, we need to thin these routes down.

The result at medium range, levels #9 to 5, gives me at least something to say for the vast areas of the map I have no GTFS for and the ability to make them interactive . It has already produced genuine discoveries for me the author (interpreter ?) , in particular the locations of ferries. The Irrawaddy delta turns out to be the densest ferry network on Earth. It is virtually invisible on the current render even at close range, as is any ferry that winds up two very windy narrow river banks. You can get a postal ferry all up the side of Norway to directly north of Finland. There are ferries in much of Russia, also right the way up the Amazon, but virtually none left in China which is a great shame if true.

I have placed these global road map tiles here http://buz-map.com:8080/ . They are preposterously large at the higher levels, in excess of 150k zipped, so over 10 megabytes unzipped, per tile!, and will probably crash your browser. I haven’t got a geojson to png working yet using mapnik plug-ins, but I haven’t really tried. Indeed right now I cant even bake level 3 and above without bumping up my RAM. They are nowhere near fit for purpose till we have this bake and shake PNG file and reduced vector tile approach. I obviously need to tone the styling right down in order that it can be used as an underlay for the bus map.

My plan now is to get back to the original job, that of showing people where we know there are bus, rail and ferry services. I was aware of routes and services being encoded into the map. Personally I don’t think this is feasible, and indeed when I attempted to use it to reduce the numerous industrial and agricultural railways in Russia I just got loads of routes and services of industrial railways. In the US, all those coal railways in Virginia have absolutely no obvious way of differentiating them. My issue about coding the services into the map is essentially the source of this data is in the GTFS, not the map. We are just going to end up having to synch the map up, mechanically, with the GTFS world, with potentially undesirable effects. I think we should have a Global GTFS repo, or GGTFS, and a method of assigning OSM features/ways to it, not the other way round, not writing into OSM, but writing into the G(lobal)GTFS and pointing it to something somewhere in OSM . Well that’s essentially what I am building anyway.

On my return to the doing the BuzMap proper stuff I will look to refit Graphhopper with the Open Route Service ecosystem. I don’t really fit into the Graphhopper group and hope to get a little more interaction from ORS. I have a number of things I will need help with, 3 easy, one “research”. Specifically

  1. hacking out Douglas Peucker simplification of the results, (trivial)
  2. fixing road “encoder” to use bus-able ways. (easy)
  3. providing a ferry encoder (trivial)
  4. determining preferred platforms/ways through station networks (very hard)

The railways turn out to be quite tricky. We can’t just pick any old railway line, we will end up shunting in and out of junction stations. Worse still, GTFS gives no indication of the gauge any service uses. So we don’t even know which network to use. I hacked something up to do this, it failed miserably once I threw Victorian railway networks at it. Glasgow in particular but think of Gare de Lyon and Gare de l’Est, you cant just do a radius search. It may be that the only short term practical solution is to knock up an editors app to allow subsequent manual assignment of these platform nodes. I suspect that inevitably someone is going to go have to go round and tweak massive Bahnhofs.

I haven’t done anything on the bus map proper, this post is just a way of getting me back onto the target. So apologies to Norway, Austria and Spain who have all directed me to their national data sets. I will be loading those countries along with the missing Switzerland and the half broken Finland before leaving Europe to tackle the US. I will produce an interim release and blog post with Europe better completed before moving to the US.

More news as I get it, all feedback gratefully received.

Recap of latest attempt

I am building a travellers map, and so wanted to fill out those places, most of the map that is, without GTFS. The hope was that I might be able to at least display a world map with something on it anywhere there might be a bus, even if I don’t have the timetable. It would balance the map out. It has moved from being a feasibility study to temper the original route map code, to a quest for a single usable tile at 0,0,0 to represent the entire planet’s transport infrastructure, ensuring that we have remote roads in Siberia and across the Arctic, a miniature version of Australia, a complete view of Africa etc. I want a render where I can see stuff above level 9 which is when we normally have to start styling stuff out or end up with an unreadable map. By level 5 on most renders we have lost just about everything. You need level 5 just to be able to get a handle on supersize countries or regions. Even on renders that still have major roads preserved at level #5 they don’t really make much impact. They have to be styled very thinly. On Google, to cope with the Ruhr and the US East Coast and still have detail in sparser regions, individual countries have to have different styling, it’s quite ugly and still doesn’t give the high level continental view I want. Only motorways and trunk roads can be rendered, we end up with a very messy map in Europe if we go any deeper. This can be misleading. France has a far more developed road network than Spain but these renders say the opposite. This issue of how individual countries chose to classify their roads is of course a general problem across the map, not just with roads.

These remote and sparse areas are relatively easy. The fun starts when you tackle a scaled down version of these super dense areas. Maharastra is also problematic and of course China. My map considers Chicago area to have the greatest major road density. In these areas, applying my solution of clustering vertices will rapidly and inevitably result in a honeycomb effect. This isn’t just major cities, the whole of the Ruhr and Low Countries becomes a triangulated lattice all too easily and the map becomes distinctly untrustworthy.

There are a number of simple things I have tried to mitigate this. The first is not to use the cluster root as the point to shift to, but the nearest point on the best line by grade then length. Indeed we should not merge any of the points on this “best” line through the cluster, thus keeping it’s detail. This solves most of the issue but it is all too easy to lose information in the quest for reduction of the stupendous amounts of data in the map. Line simplification needs to be performed piece-wise on each inter-nodal section, i.e don’t simplify out the junctions. If we lose junctions we will naturally end up having to merge onto more distant vertices as Douglas and Peucker will have zapped junction nodes on straitish lines. This can re-introduce “triangulism”, we can also end up getting gaps in the map. It’s remarkable how much information your eyes can get through. A tiny blank pixel on a line can make a lot of difference.

Another scourge I had to fight was “creaseism” as a result of the hierarchical nature of my tiling architecture which inherits geojson from child “tiers” of groups of layers and simplifies these tiles. Without a significant overlap you will end up clustering away form the edges, and end up with very visible major tile boundary creases. To reduce based on density I obviously need the context of the current region in order to determine if a feature should be reduced . This is my excuse for not using tippecanoe to date as I don’t fancy hacking that to make it happen.

And then there is “squarism”, which needs to be preserved. This is the business of trying to present the vast tract up the centre of the US and Canada that has a grid road system religiously aligned to the geographic co-ordinate system. At this point I leaned back and decided I was disappearing down a rabbit hole. We may be able to tackle these issues by applying some stuff called computer science and determining the dominant paths through a network, but for now I need to cut my losses and return to the bus map proper.

State of The BuzMap, 2019 Report

I have created a single map that lets you easily browse all bus, tram, metro, ferry  and train routes. I’ve loaded some data from Europe, Israel and India. It’s called BuzMap: Here is an update. 

I have done significant work over the last 3 months to get BuzMap further towards a usable product and at least prove the scalability of the architecture and I hope a glimpse of where we can go with this.. I have what I think is a viable user interaction model.  Upon touching or hovering over a section of the map you (should, see bugs below) get the longest service that runs through that part of the map highlighted on the map itself and an instance of its timetable in the margin. To move the mouse over to the margin or manipulate the map without crossing other lines and thus selecting them, click on the map to “hold” that route.  Below the timetable should be all the other services that use that line. You can click on any of those and get that route highlighted and a timetable instance. Click back on the map to enable browsing of other routes. There is currently only one timetable instance. Multi modal route planning is a well covered subject. BuzMap is something else. It’s a map. It is intended as a discovery tool that can direct you to detailed timetables and booking platforms elsewhere. It could easily be integrated as the background to a route finder or perhaps a hotel booking application, and be as reactive as the application requires.

The publication model works very well. I make extensive use of the amazing works of Graphhopper, Overpass and not least the MapBox box of tricks especially the geojson to vector tile technology which is preposterously fast. I can manufacture 2,500 tiles a second on an entire reprint thanks to the above software, and the simplification method works great. The heaviest tiles are in large metropolitan areas at level 10 and are under 60k. I believe this is the upper limit, no matter how dense your map is, the worst tiles will still be under 60k and most will normally be under 15k. We only need to reprint the entire map if we are making system wide changes, which right now is a common occurrence of course. For a single operator the data can be live in minutes.  

I have loaded most of the European data I can find, plus the whole of Israel, and also Indian Railways. We are missing, due to the bugs mentioned below, several famous narrow gauge lines in India, but not all,  and a lot of lines are broken on railways across the map from Tralee to Ledo. There is a particular railway in Sweden that links those highland lakes together which is completely missing despite the data being available. It is precisely the kind of service BuzMap is here to encourage you to investigate. The US and EU is where the vast bulk of the available data is, and perhaps where any viable business model might lie. I have at least been able to address many performance bottlenecks by tackling such a significant data set as 85,000 routes from over 500 different operators with saturation coverage in five countries, Ireland, Holland, Sweden, Estonia and Israel.  

I have yet to load much of the available GTFS data outside Europe. I haven’t even touched North America as I just don’t have the disk space left. I am going to need at least another 10TB to handle all currently existing GTFS without having to delete then download OSM data and the GTFS and various stages of transformation repeatedly, and still have room to run test suites. There are currently 8.3 million main map tiles in total, plus a further 316 million highlighting tiles. Every route has its own tile set. I imagine these numbers will easily double if we load all of North American and also Australia which has a lot of data available, plus all the bits and bobs dotted about the south. Switzerland, who predictably have supplied at country level, failed to load and I haven’t had chance to debug it, nor the disk space either now. As a railway enthusiast Switzerland is hallowed ground. I’ll be making sure we’ve got every yard of metre, narrow gauge and rack railway they’ve given us. I think we’ve even got cable cars in it. 

Even if all the data currently held by the Google route-finder was in public view, most of the world’s transit networks across Southern America, Africa and Asia will still be blank. There is a lot of work to do to encourage both state and privately run bus companies in these regions to get their bus stops geocoded and present their timetables publicly. Some operators such as Indian Railways actually forbid the caching of their timetable data without a license, so I may be in breach of this. The India map is both incomplete and uses data that is at least 4 years old. I will gladly redact the times, perhaps replaced with number of hours and minutes it takes to travel, if that keeps BuzMap on the right side of IR rules. 

Bugs, there are two serious ones:

  1. Route Merging. The consolidation of routes is working well topographically, we are getting a good map, at least on roads/buses.  However we are not merging all the Route IDs properly. I have a good idea why this is so but at the time of writing it still isn’t working quite properly and I need a lot more disk space to re-run. I’ve also had to colour trams in as buses due to another merge inheritance failure. I don’t store the list of route ids in the tiles, only a pointer/id to a group of them, i.e a single value. If a new service adds no new geography it will make zero impact on the tile sizes.  The list of routes will end up in the 100s for some sections and while I claim this will have no impact on the tile sizes if we’ve added no more geography, the client UI will need refining. 

  2. Train lines are more seriously broken. There are several reasons for this that I am aware of relating to the assignment of possible platform lines/ways through stations. In a city like Glasgow for instance this is certainly a non trivial task to do automatically, A radius search certainly won’t do the job.  There is another problem in there though which I need to debug. There are railway lines broken, or missing entirely, for no apparent reason. I think I am going to have to build an interactive editing tool where candidate platform nodes can be selected by the cartographer. Add that to a relative speed check on the service to flag suspect shunting around cities, and comparison of calculated polyline length to estimated point to point length of the route to highlight broken train lines, and it shouldn’t be too onerous a task to identify most of the problem areas. It’s feasible to do this level of work on railways, buses are too heavy and need to work entirely automatically. 

There is some bizarre usage of ferries in Sweden by some bus routes, due to stations being reused as bus stops, something my importer needs to resolve. A comparison of speeds between sections will help detect these anomalies, e.g. a bus actually does need to use the ferry but doesnt due to a map encoding error and skirts the entire estuary. We’ve also got bus only lanes to be considered in the Graphopper encoder. I am sure these are behind some of the craziness in St Petersburg, and Paris too. 

Some of the highlight tiles didn’t get made before we hit 99% of the disk, so sometimes you get a timetable but no highlight. There are also “bugs” in the map itself where road junctions are not passable in one or both directions, e.g. some roundabouts and at least one check point on the Israeli border which nevertheless has a bus service straight through it. These will require individual consideration and careful editing of the map itself, but can be detected simply by relative speed checking. 

The interaction doesn’t work very well yet on touch-screen/mobile devices, I’m only rarely getting an event from touching a line. I have tried several approaches and cannot regularly get an event. I am hoping to get help on this now we have something tangible. It is of course critical that it works well on smartphones.  I’ve also just noticed I am losing events even with a mouse, I have to wave the cursor several times over lines sometimes but not always. The client is the least developed part so far as for the most part it’s all Leaflet and vector tiles doing the work. It’s going to get re-written from the top before any significant attempt to launch this.

What Next 

My main concerns are nailing the railways which have a lot of failures including merge bugs common in buses. I have to get an encoder working for Graphhopper that will consume all roads including bus only ways, rail of all kinds as long as not “unused”, cable cars, cable trams, and ferries. Due to the nature of how the merging  works the railways of any kind can and need to be treated as all the same network type. 

I need to enable ferries ASAP. They are just the kind of thing I envisage the map helping people see. I did implement a hack previously for them which I avoided re-hacking into the latest rewrite. The Baltic coast in particular comes alive and the Agean would also look superb if the data were available anywhere. 

We are going to need to do things like overlay place names as I have scribbled all over them with my bus routes. Another cool trick would be to get the different networks, ferry, tram, metro, railway or road, to flip to the foreground as you touch them. Both of these thanks to these amazing vector tiles. Once you get saturation coverage of a metropolis this becomes not just a gimmick but quite necessary, allowing us to pack far more information into just the one map and it still works. 

The refinement of the UI is going to be a challenge. This isn’t a multi modal route finder, it’s a map. Sure, as soon as you have honed in on somewhere to investigate the user is obviously going to need that level of detail about exactly when and how often. I could attempt to run Open Trip Planner for everywhere, I have the data, but it is a distraction from the job of building a working map. If BuzMap can establish itself as a default first point of investigation for any terrestrial travel, directing users to relevant local route finding platforms and booking engines, it will be serving us well. 

Once I have obtained some more disk I will process all of the US and Canada data. New York, especially with it’s ferries, is going to be spectacular.  AmTrak, which I have tackled previously, is a most important addition as the US rail network is very heavily freight only. I think less than 20% of it actually takes passengers. I would also love to map Greyhound or any other national bus operators in the US if I could just locate their GTFS.

Most European countries have little bus data available, the UK in particular has virtually none currently visible. I have Transport for Greater Manchester but it didn’t get past the public domain GTFS loader. I have seen there is a project run by the UK government aimed at providing “local bus data in England .. by early 2020”.  This would be a great step forward and I would hope it bears results and grows to be a comprehensive listing of operators of any size across the UK. This data can effectively be held in a version control repository, as the Belgian rail group have done. I will use the Belgian and any other new data available in the next phase.

My keen interest is in South & Central America, Africa and Asia. For Africa I am aware of GTFS for Cairo, Tunisian and Algerian Railways, Accra, Nairobi and I think Cape Town but that’s about it. 

In Asia, a continent festooned with state run bus operators and home to a score of mega cities, the situation is arguably worse. I have next to nothing for Japan and less still for South Korea. We are in a position to build a bus map of the entire PRC if we can get hold of the data. Elsewhere, there are bus booking engines for long distance travel in south east and southern Asia that have the key data, so at that level we could build a map quite quickly. Manilla has data, and Delhi is highly regulated and we should be able to do this there. What I’d really like is for the likes of Himachal Pradesh or Nagaland to compile and then give us their State Bus Company’s GTFS.   

Some of the major cities of Argentina, Brazil and Chile have data. There is an awful lot more that is possible everywhere. 

Special thanks to OpenCage for lending me a 32G machine so that I can run a continental sized Graphhopper instance and for their continued moral support and encouragement, Digital Ocean for giving me a free 8G machine for 6 months, and also to the Graphhopper group themselves and MapBox, without which none of this is possible. And of course everyone who has ever contributed to The Open Street Map in any way.

Thank you for your interest.

Mark LesterWhat is BuzMap ?

What Is This ?

What Is BuzMap ? Original Concepts

BuzMap is a map of transport services. It maps the services that run on railways and roads, not the railways and roads themselves. Obviously a map of the roads doesn’t equate to a map of bus services, but the same is true of railways. E.g. In the USA.,only a fraction of the network actually has passenger services running on it, even though the tracks are well used. 

Usage

Place the cursor over a line and the longest service will light up on the map. Click on the map to hold that route so you can move the mouse over to the margin where an instance of a timetable is displayed and below that a list of other routes that use that section of line.  Hover over a place name and it will be shown on the map. Select a different route by clicking on that entry. To select other parts of the map re-click back on the map. 

What’s New ?

It’s a complete map of the entire network. 

BuzMap produces a  map overlay of an entire individual transit provider’s route network. I have witnessed numerous projects to provide maps of particular cities  BuzMap will draw one for any and all transport providers who can supply a GTFS of their network, and on on one map.

Why is so much missing

  1. I have still to load most non EU GTFS and am now out of disk.I need about 30TB to do all the currently available GTFS world wide
  2. Many service providers (at least 95%) do not have a publicly available and visible GTFS. .
  3. Some EU data didnt work. E.g Norway buses and also Switzerland.   This is not the fault of the data I am sure.
  4. Railways are actually more complicated than doing the buses. Sometimes the lines are broken. This is quite evident in India where the Shivalik, Nilgiri and Matheran narrow gauge lines are missing, but we have the Darjeeling and Pathankot lines. BuzMap is intended as a discovery tool. Narrow gauge railways are important., . 
  5. Ferries are simple, and also very important, especially in Europe. There is a lot of data avaiable but I haven’t done them yet.

It’s (intended to be) comprehensive. 

It’s a complete map of ALL transport networks, on a single map. 

BuzMap merges multiple transit providers that use the same roads or railways. For a comprehensive view of a country’s transport network we need to amalgamate hundreds of different GTFS.

It’s interactive. 

If you are lucky enough to have a route map of a metro, it is invariably in a PDF where you can’t just cut and paste the place names, let alone just click on it and get the timetable. BuzMap allows you to see principal place names from a very high level. It is more than just a travelers map.

It’s Scaleable, In Multiple Dimensions! 

Scalability concerns with respect to building a transport map of the entire planet are numerous. Not only is the input data vast, there are issues with respect to generating a tile set of an arbitrary size and linking it to a timetable reference. There are a number of minor challenges and one in particular which avaiable tiling software does not address. 

  1. Front End

BuzMap uses vector tiles. This project is totally dependent on this technology..We still need to take care that the tiles themselves don’t balloon in size. The raw performance and elasticity of expansion make the front end performance excellent and highly scalable.

  1. Processing of input file

GTFS have ballooned in size. Transit providers have been compelled to deliver them, but not to rationalise them in any way. BuzMap can handle arbitrary sized modern GTFS of entire countries if required.

  1. Zoom Out Usability.

A key feature of BuzMap is it’s ability to simplify and reduce network maps to allow them to be readable at higher zoom levels. If you don’t do this then as you pull out all you will be left with is a blob of scribble that is not much use and becomes increasingly heavy as an increasingly large area is displayed. Eventually the entire country will be a mass of route lines  BuzMap solves this. Not only can we see say the Trans SIberian railroad at global level (should I ever get my hands on the Russian Railways GTFS!), we can view a metropolis such a Paris and even a whole country or continent. You just can’t do this without reducing the lines.

  1. Publication 

The API to the vector tile technology is essentially to present a GeoJSON, a file describing all the lines you want to be drawn, at full vertex level detail. The software will then allow you to generate any tile at any level. It is highly performant. BuzMap further accelerates the job by calculating precisely which tiles need to be generated. Even in a dense network we would typically use less than ½% if the tiles within a bounding box of our network. The problem comes when this GeoJSON of your entire world gets big. We need to put EVERY bus and railway company, trams, ferries, cable cars the lot on one map. This input file is going to get very big very quickly. 

BuzMap incorporates a novel publication strategy which relies on the above line reduction technology . Only low level tiles require a geospatial dump of all data in that area.  Higher level tiles take a promoted set of route id’s. The strategy is tiered, we fetch an area the size of a level 8 tile to generate all of the 18 to 8 level tiles beneath it. Level 8 thensaves the simplified GeoJSON.. This cascades up to the four tiles of level 1.  It’s cute, but once you have twigged how to fix the line reduction and you are determined to fix this severe production end blocking problem, it’s pretty obvious that you need to extend the tiles DB and evolve a hierarchical publication scheme.

  1. Incremental Publication

BuzMap incorporates a hierarchical publication strategy. We only print what we need to. As described below the line clustering technology can be adapted for rendering upper zoom levels. We can produce a profoundly more useful map at even higher zoom levels by removing only those roads that we need to. And we can do it incrementally, we can have virtually live updates to the fixed tile set.

Mark Lester mc_lester@yahoo.co.uk

General use of the reduction technology

BuzMap employs a strategy of always printing a line if it can be seen and doesn’t need to be merged with another. It doesn’t just style out classes of route based on the zoom level, there just isn’t the data within the timetable to do this consistently. This same strategy can be used to print road maps such that you don’t just lose everything once you get to about level 6. If it’s a desert with a few roads in it, we will get those roads. This could produce a profoundly more useful map even at intermediate levels.Some counties just don’t have motorways. Some others are so big you just can’t see the whole network in one go,  the roads will be styled away once you zoom out. You can still paint the roads in the relevant class colour, blue for motorways, red for A roads etc. I am very keen to try this out.  

Examples

UK at zoom level 6 on BuzMap http://buz-map.com/#6/54.470/-2.538

UK at zoom level 6 on OSM “transport” render https://www.openstreetmap.org/#map=6/54.387/-7.075&layers=T

US “rust belt” on OSM. Apart from the line just at the bottom, all those wiggly ones are for coal, not people. This is not a usable travel map imo. Its blank for almost all of Africa apart from the north coast and a bit on SA. Likewise south of the Ria Grande, there just aren’t any railways apart from isolated bits.

Create your website at WordPress.com
Get started