Wednesday, November 19, 2008

Plans & electronics

I've always wanted to know more about lower level systems. I can run a TSR app in MS-DOS and steal data from it using QuickBASIC (I did this once to make an oscilliscope type app using a sound recorder with a 2kb buffer and output set to nul:) but I'd love to know more about electronics, PICs, etc.

So I have a plan.
I'm going to start reading bits & pieces as & when I need to, and try to make small circuits I can fit in a tic-tac box. I do have some tripad board and some veroboard, but I'm somewhat lacking in thin single-core wire and actual components. And for such small designs, I might abandon boards altogether.

I'll post about my first project when I get to it :)

Status update

OK, I've now got SQL Server 2008 Express installed, which does support spatial data. Which is nice. I've yet to see what I can do with it, though one thing I look forward to is being able to store lines as geometric data instead of having to perform string manipulation within a stored procedure. We're going to stick with Express because we need a database on every customer's site and our business is such that we can't fit a full SQL Server license into the deal. I'm developing on Express so I always know with absolute certainty that what I do will work in the field.

FME is quietly chugging away pulling data in, but it's a little on the slow side. My personal opinion is that it eats far too much memory for what I'm asking of it, and that there could be some quite considerable optimisations made. All the same, it does what we need it to. I'm going to ask if we can buy a hardware-locked FME license in preference to a node-locked license, so we can run jobs (which should ultimately occur quarterly) on any machine which isn't in use. I'd like one now, so I can use this machine for developing on at the same time. I can foresee a time when we have a node-locked license and the machine we need to run the translation on becomes outmoded or unavailable through malfunction. And I don't want to have to deal with that.

FME can generate Voronoi cells, but we're now debating whether we need to do that or just go back to the Nearest Neighbour approach with 50m accurate points from Densified lines. Which would blow away the point about being able to store lines - the roads are stored as To and From Node IDs for our Dijkstra implementation to walk down, and the positions of those Nodes are stored in another table. For each To / From combination we store whether the path is Forwards or Backwards, and a Road ID which points to another table containing all the points in that Road except for the Nodes (which we already have elsewhere).

Monday, November 10, 2008

Dagnabbit, it *doesn't* work after all!

Because of the distortion between Latitude & Longitude, and the sparseness (sparsity??) of the data we're using, we found that we had difficulties where we have a sparse road with a dense junction at one end. The "nearest point" in terms of Longitude & Latitude is not necessarily the same as that in terms of Metres.

We now have further options: One is to generate Voronoi Cells around every road & junction, and use those to determine which road section (up to 50m in length) or junction someone is at by checking if their location is within that cell. Another is to oversample the road (required for the generation of the Voronoi Cells in any case) and hope that the existing algorithm works properly in that case - which would probably be much quicker.

I've spent an inordinate amount of time tinkering with a tool called FME, which allows me to throw geometric data about. It's with FME that I created the node map we used to do the routing, although I have also written an exporter in C# using our own Shapefile reader (started by Ben, enriched by myself). FME also allows us to generate the Voronoi Cells, but if we don't need them, we might not even need FME in the long run.

Watch this space!

Friday, October 24, 2008

Pythagoras and the OSGB coordinate system

We've decided that, since sinϑ is very close to ϑ for very small values of ϑ, and we're only ever dealing with very small values, we'll simply use Pythagoras with the Latitude & Longitude. Which avoids any problems with operating internationally. That is, in areas within other countries where the areas are small enough for the accuracy lost to be immeasurable and/or irrelevant. We're not routeplanning or performing highly critical analasys, after all.

Google Maps WTF Moment

What the hell is this thing? There appears to be a sign beside it. Searching for ["Guys head" crater] did not yield the results I expected...

Thursday, October 23, 2008

Nearest Neighbour in SQL Server 2005

Actually this would work for any database with a little adjustment to the SQL.

It's all very well having a lot of points and the distances to them, but what about finding which of those points we're actually close to? There's a bunch of algorithms which mostly involve running through a massive list of points until you've worked out which is the shortest, by applying Pythagoras' theorem to the difference between the points. This is the most common, and can be reached by simply reprojecting the map into a grid and selecting the result of the still squared phythagorean equation, then ordering by that column:

SELECT TOP 1 NodeID, POWER(Northing - <actualNorthing>, 2) + POWER(Easting - <actualEasting>, 2) AS result FROM NodeTable ORDER BY result;

We're in the UK so we've used OSGB as our grid for the time being. There's plenty of other grid systems available such as UTM (Universal Transverse Mercator) and various other grids for other countries or continents. As it is, we'd have to import new data into our database if we did ship out to another country so the issue probably won't arise for us until we have someone who wants to work across a country border. Watch this space - we have Ireland to tackle yet :)

There is immediately an obvious optimisation here. England is very densely packed, and it's unlikely that you'll ever be on a road and be further than 2.5km from the next junction. (I actually ran a quick query on our data and the longest road we've got is just under 5km, but that may well be a bendy road.) So, with an index on Northing and Easting, we can put a bounding box around our actual point:

SELECT TOP 1 NodeID, POWER(Northing - <actualNorthing>, 2) + POWER(Easting - ,2) AS result FROM NodeTable WHERE Northing > <actualnorthing> - 1250 AND Northing < <actualnorthing> + 1250 AND Easting > <actualEasting> - 1250 AND Easting < <actualEasting> + 1250 ORDER BY result;

(As an aside, to make this more generic we could easily select the max roadlength if we have that data available - we do, so it's something we may look at in the future.)

Now we have the squared distance to the nearest node and that node's ID. We can, of course, then make this select into a subselect and in the outer, calculate the square root of the distance. Then we know roughly how far the actual point is from the node we've selected.

There are other optimisations for this algorithm. One involves starting with the bounding box, counting the points, taking the standard deviation and creating a new bounding box, then iterating through this until one remains. A short paper can be read here (PDF format). Another takes a small box and enlarges it until candidate nodes are found, then uses Pythagoras' theorem again. There is an MSDN blog I found after writing my SQL Server 2005 solution, which was funnily enough only published today: Linkage here. This one is optimised for SQL Server 2008 Spatial, which I'd love to get my teeth into.

Change of plan...

Well, our really good idea was to use agents to roam the network and build a massive grid of Point to Point distances. After some thought, we abandoned that and learnt about Dijkstra's algorithm. Then we wrote our own implementation, which in C# on a decent box takes about 2500ms to find the distance (or time) from one point in Southwest England to every other point in the same area. This is absolutely perfect - we can cache the data against each static location and determine the approximate time to reach it from anywhere. If we make the assumption that we don't ever want to consider drivers further than 20 miles away, we're probably looking at a 10ms hit when we come to use it in reality.