For the last day and a half or so I've been writing statistical analysis tools that look at my list of known locations (as in locations that I have geolocation data on), as well as all of the text locations that have been entered, and decide on the correct spelling based on what's “valid” and what's most commonly used, as well as attempting to statistically extract the correct keywords from larger segments to choose the most likely intended match.
It's still far from perfect (but it's already generating a useful alias table) but I think I'm going to have to take a break from it or this day is going to end Scanners-style… But just a little more fine tuning and it's done, and I can write similar tools for studios and artist names.
Post a Comment