The latest installment in Christopher Groskopf's attempt to open up the data of the small town in Texas to which he's moving. Read about the genesis of the project here.
Why bus schedules? In my first post I named them at the top of my list of datasets I would like to build on. I also mentioned that I intended to avoid buying a car once I moved, a statement that provoked significant eye-rolling. I've been told that no one rides the bus in Tyler or that only poor people do. A fellow hacker who grew up Tyler told me he didn't even know they had a bus system. This isn't really a surprise--Tyler has low population-density (1,982 people per square mile, according to Wolfram Alpha) and a food desert in its urban core. I was stunned to discover that a transit system even existed. So why do I think its a good idea to digitize the bus schedule? Five reasons:
- I need it. Its not just that I don't want to drive. It's that I suck at driving. Having access to public transit is an immediately useful thing for me.
- Tyler has several colleges, but none of them even mention the bus system on their websites. If building this app means one student takes the bus instead of driving then it will be a success.
- It's easy. (Mostly, more on this below.)
- It's an excellent pilot project. The data is available (albeit in a terrible format) and the shape of the application I will build is relatively straightforward.
- Financial freedom, green living, world peace, etc.
The first thing I needed in order to build this app was to get data for routes, schedules and stop locations. The Tyler Transit agency publishes a route map as PDF, though it only includes a very small number of stops. They publish schedule data for weekdays and Saturdays as PDFs. These PDFs only include estimated arrival times for five stops per route, less than ten percent of the total number of stops. Stop location data isn't available anywhere online, so I emailed Tyler Transit and asked for a complete list. I requested an Excel document; they sent me a PDF of a scan of a printout of a web application.
I don't raise these data quality issues as an affront to Tyler Transit. Through my own experiences and those of my many friends in the open government community I've learned that this is the state of public data in much of the US. I want to help change that, but right now I'm not trying to open governments, I'm just trying to build a transit app, so I did what a pragmatic geek has to do sometimes:
I keyed them.
Lacking an obvious way to extract the data I needed from a scanned PDF I took two hours and re-keyed the spreadsheet. This also gave me the opportunity to correct numerous typos in street names that would have foiled any geocoder.
Using the route map, the street centerline GIS data available from Smith County, QGIS and a lot of patience I was able construct what is possibly the only digital map of Tyler's bus routes. I then geocoded the above bus stops list and put those over the top, yielding:
Fun fact: A simple buffer computation on the stops will tell you that over 70% of all streets in the city of Tyler are within a half mile of a bus stop. (That's less than the distance I walk to and from the L every day.)
This is good progress, however, its far from perfect. The geocodes for the bus stops are not their actual location, but rather that of the next intersection following the stop. Worse, many of them didn't geocode at all, forcing me into an ardous process of trying to manually locate them using Google Maps and Google Street View. Even then I wasn't able to determine even an approximate location for some of the stops.
I have long-term plans for dealing with this and the other data quality issues. Better stop locations can be crowd-sourced by users. The arrival times present a more audacious challenge as I have to compute estimated times for all the stops which don't have times in the official timetable. Fortunately, the street centerline data provides me with both distance and speed limit, so I should be able to make sound estimates and fine-tune those with user feedback.
Though much of it was painfully manual, most of the required data preparation is done at this point and I'll can move on to prototyping the application. Interested coders can follow my progress at the hacktyler-transit repository on Github. Everyone else: speak your mind.