Subscribe:   

Towards An Open Database of Places: Location Autodiscovery

Tags:

Aug. 20th 2010 in software No Comments       

A short while back I read a challenging article titled t’s Time For An Open Database Of Places. There, Erich Schonfeld notes:

A long list of companies including Twitter, Google,
Foursquare, Gowalla, SimpleGeo, Loopt, and
Citysearch are far along in creating separate
databases of places mapped to their
geo-coordinates.
 These efforts at creating an underlying database of places are duplicative, and any
competitive advantage any single company gets from being more comprehensive than
the rest will be short-lived at best. It is time for an open database of places which all
companies and developers can both contribute to and borrow from.

I agree that there is duplication of effort but this is what happens with many competitive technologies (look at now many online maps are available today). Each company tries to add a competitive advantage to its offering while providing the same core functionality as the competition.
Update: I started this post back in April and a lot of developments recently only enforce: this point. (Check Facebook Places and Google Places for more info).

I like the idea of an open database of places. Any company could build value-added services on top of it and sell them while they are not concerned about issues that come with building and maintaining such database like geo-location/address accuracy and duplicate place resolution to name just a few. Techcrunch’s Schonfeld adds another issue: who can a place and who should be in control of it, suggesting that anybody can update the database and “the best data should prevail”. This is hard and suggests a wiki-like approach for better or worse.
I’m not a fan of centralizing such database. Since there are such great market forces at play, it may become a playground for fights (my data is better than yours), a committee will attempt to regulate it just to push it into oblivion while everybody will get their toys and go build their own database.

I have a different idea (and it’s not new either).

Businesses have a great deal of interest in such database. It puts them on the map. They don’t particularly care who is using their place as long as the data about their business is correct and their customers easily reach their venue. The experience with using a mobile routing software to get to a place in real world is the equivalent of not waiting more than four seconds for a webpage to load. It just has to route the customer precisely to a location.

Why not letting the business to own their own geo data? All it takes is for them to have a website and add a bit of information to it to allow for auto-discovery; it’s called geotagging. It’s the same idea that Matt Griffith had back in 2002 that allows RSS feed autodiscovery applied to geo. The real win is for small businesses that adopt geotagging. All they need to do is add a small bit of metadata on their homepage and let web indexers do the job of collecting this data. Oh, and it’s free.
This brings a double win: companies in the mapping business access accurate geo information about businesses. The business themselves are happy that their customers can precisely find their physical location by means of address and/or geo-coordinates. Moreover, the accuracy of the data is maintained by the businesses since they want their customers to find them even when they move. A Places database that aggregates this type of data can mark these places as “verified” since they come directly from merchants. It even provides more accurate means of building forward and reverse geocoding tools.
Going forward with this model, the competition will shift their efforts from building a database of places to adding value to a (more or less) common Places database like local promotions and building great mapping products to allow us, the customers to find them.

The hard part is promoting this model. If say, half of the brick and mortar small businesses with a web presence embed geo metadata on their website, then the big players take notice. How to get there is the real challenge.

Image via Flickr/bryankennedy

RESTful error handling with Tomcat and SpringMVC 3.x

Tags:

Aug. 4th 2010 in java, software, spring No Comments       

Handling errors in a REST way is seemingly simple enough: upon requesting a resource, when an error occurs, a proper status code and a body that contains a parseable message and using the content-type of the request should be returned.
The default error pages in Tomcat are ugly. Not only they expose too much of the server internals, they are only HTML formatted and making them a poor choice if a RESTful web service is deployed in that Tomcat container. Substituting them to simple static pages is still no enough since I want a dynamic response containing error information.

Here’s how to do it in 3 simple steps:

Read on…

Building a content aggregation service with node.js

Tags:

Jun. 27th 2010 in software 2 Comments       

Fetching, aggregating and transforming data for delivery is a seemingly complex task. Imagine a service that serves aggregated search results from Twitter, Google and Bing where the response has to be tailored for mobile and web. One has to fetch data from different sources, parse and compose the results then transform them into the right markup for delivery to a specific client platform.
To cook this I’ll need:
- a web server
- a nice way to aggregate web service responses (pipelining would be nice)
- a component to transform the raw aggregated representation into a tailored client response.

I could take a stab at it and use Apache/Tomcat, Java (using Apache HttpClient 4.0), a servlet dispatcher (Spring WebMVC) and Velocity templating but it sounds too complex.

Enter Node.js. It’s an event-based web server built on Google’s V8 engine. It’s fast and it’s scalable and you develop on it using the familiar Javascript.
While Nodejs is still new, the community has built a rich ecosystem of extensions (modules) that greatly ease the pain of using it. If you’re unfamiliar with the technology, check-out the Hello World example, it should get you started.
Back to the task at hand, here are the modules I’ll need:
- Restler to get me data.
- async to allow parallelizing requests for effective data fetching.
- Haml-js for view generation

Read on…

Using Spring 3.0 MVC for RESTful web services (rebuttal)

Tags:

Feb. 23rd 2010 in java, software, spring 4 Comments       

Update Mar.04 Thanks to @ewolff some of the points described below are now official feature requests. One (SPR-6928) is actually scheduled in Spring 3.1 (cool!). I’ve updated the post and added all open tickets. Please vote!

This post is somewhat a response to InfoQ’s Comparison of Spring MVC and JAX-RS.
Recently I have completed a migration from a JAX-RS implementation of a web service to Spring 3.0 MVC annotation-based @Controllers. The aforementioned post on InfoQ was published a few days after my migration so I’m dumping below the list of problems I had, along with solutions.

Full list of issues:


Same relative paths in multiple @Controllers not supported
Consider two Controllers where I use a versioned URL and a web.xml file that uses two URL mappings:

@Controller
public class AdminController {
   @RequestMapping("/v1/{userId}")
   public SomeResponse showUserDetails(String userId) {
      ...
   }
}

@Controller
public class UserController {
   @RequestMapping("/v1/{userId}")
   public SomeOtherResponse showUserStreamtring userId) {
      ...
   }
}
In web.xml:
	<servlet-mapping>
	  <servlet-name>public-api</servlet-name>
	  <url-pattern>/public</url-pattern>
	</servlet-mapping>
	<servlet-mapping>
	  <servlet-name>admin-api</servlet-name>
	  <url-pattern>/admin</url-pattern>
	</servlet-mapping>

Read on…

Unit testing with Commons HttpClient library

Tags:

Feb. 9th 2010 in java, software 2 Comments       

I want to write testable code and occasionally I bump into frameworks that make it challenging to unit test. Ideally I want to inject a service stub into my code then control the stub’s behavior based on my testing needs.
Commons Http Client from Jakarta facilitates integration with HTTP services but how to easily unit test code that depends on the HttpClient library? Turns out it’s not that hard.
I’ll cover both 1.3 and the newer 1.4 versions of the library since the older v1.3 is still widely used.
Here’s some typical service (HttpClient v1.3) we want to test. It returns the remote HTML page title:

public class RemoteHttpService {
   private HttpClient client;

   public String getPageTitle(String uri)  throws IOException {
     String contentHtml = fetchContent(uri);
     Pattern p = Pattern.compile("<title>(.*)</title>");
     Matcher m = p.matcher(contentHtml);
     if(m.find()) {
        return m.group(1);
     }
     return null;
   }

   private String fetchContent(String uri)  throws IOException {
      HttpMethod method = new GetMethod("http://blog.newsplore.com/" + uri);
      int responseStatus = client.executeMethod(method);
      if(responseStatus != 200) {
        throw new IllegalStateException("Expected HTTP response status 200 " +
"but instead got [" + responseStatus + "]");
      }
      byte[] responseBody = method.getResponseBody();
      return new String(responseBody, "UTF-8");
   }

   public void setHttpClient(HttpClient client) {
      this.client = client;
   }
}

with the HttpClient is injected at runtime (via some IoC container or explicitly).
To be able to unit-test this code we have to come-up with a stubbed version of the HttpClient and emulate the GET method.

Read on…

Spincloud, now with worldwide forecast

Tags:

Dec. 13th 2009 in projects, software No Comments       

In my constant search for free weather data for Spincloud, a short while ago I have found a gem: free forecast data offered by the progressive Norwegian Meteorologic Institute. The long range forecast coverage is fairly thorough and covers most more than 2700 locations worldwide. I am happy to announce that I have extended spincloud.com to include it.

worldwide_fcast    

The data is refreshed every hour and the forecast range is available for the next seven days. You can bookmark any location or subscribe to weather reports via RSS.
On a different but related note, I can only be thankful for the free data offered by various meteorological organizations that allows Spincloud to exist and I believe in freeing public data (weather related and otherwise) as it belongs to the public that finances government and inter-government agencies in the first place. The Norwegian Met Institute is a great example for freeing its data and I am saluting them for making the right decision.
Now if only all such progressive Meteorological institutes around the world would agree on a common format for disseminating their data, it would make developers like me a tad happier…

Continuous everything?

Tags:

Oct. 15th 2009 in Agile, software No Comments       

I admit that I regard automation as a dull but vital part in the success of a project. Automation had evolved into Continuous Integration, a powerful toolset allowing frequent and regular building and testing of the code. I won’t get into what CI is (check the internets). Instead, I am going to explore a couple of aspects of CI that can be added to the artifacts of the development process and note some others that cannot.

Continuous performance
You wrote performance tests. You can run performance tests by firing a battery of tests from a client machine and targeted on an arbitrary environment where the application lives. The test results are collected on the client and you can publish them on a web server. Why not automating this completely then? To accomplish this, automate the execution, gathering and publishing of the performance results. Daily performance indicators not only increase visibility of the progress of the application but it becomes much easier to fix a performance degradation on a daily changeset than between two releases. There are a couple of factors that may add complexity to establishing performance tests:
Dealing with dependencies
The obvious rule of thumb is to minimize dependencies. However, if there still are dependencies on other (perhaps external) systems, use mocks and to isolate the system you’re testing for performance. We’re talking about nightly performance tests so don’t put unnecessary stress where you shouldn’t.
Finally, the main artifact of the final integration (done once per iteration) is a running environment where all components run together, in a production-like setup. Use this environment to run your system performance test where you measure the current performance against the baseline.
Measuring relative performance
The environment you’re using for the nightly PT cycle most likely will not be a perfect mirror of production (especially true when dealing with geographically distributed systems). Use common-sense to establish the ratio between the two environments then derive rough production performance numbers using it (assuming a linear CPU/Throughput relationship).

Continuous deployment
This is as simple as it sounds: automate the install. Make it dead easy to deploy the application in any environment by providing installation scripts. Simplify the configuration down to a single file that is self-documented and easily understood by non-programmers (read Operations Teams). The goal here is to unlock a powerful tool: making the application installable and upgradable with a click of a button. If all the other pieces of the continuum are in place then you could confidently deploy your application in production it on a much tighter release cycle, even on a daily basis. Deployment and integration become tasks in the background rather than first-class events.

More continuous
Since I am just fresh off the Agile Testing Days conference and I have learned a few more Cs from the distinguished speakers which I term as Soft Cs since they involve constant human engagement:
- Continuous learning (Declan Whelan)
- Continuous process improvement (Stuart Reid)
- Continuous acceptance testing (i.e. stakeholder signoff at the end of every sprint) – Anko Tijman
- Continuous customer involvement (Anko Tijman)

blog.newsplore.com is one year old!

Tags:

Oct. 3rd 2009 in software No Comments       

Sep. 15 came and went and I didn’t realize that this blog is one year old. I didn’t post in the last few months but this is because I have been busy moving across the world from Toronto back to Europe (I’m in Berlin now), becoming a father (“the best job in the world”) and taking a new job (LBS, yes!).
Things are happening, wheels are in motion and there’s still a lot to write about.

Stay tuned!

New Spincloud feature: heat map overlay

Tags:

Jul. 7th 2009 in java, projects No Comments       

heatmap-overlay    It took a while since the previous feature update to Spincloud. I have done a number of upgrades to the underlying tech and some intensive code refactoring but nothing visible. The time has come for another eye candy: heat maps. It is a map overlay that shows a color-translated temperature layer based on interpolated values of the current weather conditions. It gives a quick indication of the average temperature across all land masses where data is available.

Naturally in areas where data is more dense, the visual representation is better. The toughest job was to figure-out the interpolation math. I still have to refine the algorithm currently based on the inverse distance weighting algorithm which interpolates scattered points on a surface. Apparently a better one to use is kriging but I have yet to implement it.
The temperature map is generated from current temperatures and is updated once every hour. You can toggle the overlay by clicking the “Temp” button found on the top of the active map area.

A project triumvirate

Tags:

Jul. 5th 2009 in Agile, opinions, projects No Comments       

There are three forces that shape a project: domain, process and technology. Add the “-driven” suffix to any of them and you’ll perhaps recognize some of the methods used in projects you’ve been involved in and yet, as soon as one takes too much of a lead against the other two, failure will follow almost inevitably. At the intersection of these three forces we find familiar terms and concepts but first a word or two about each of them:

Technology
Back in the days of the tech bubble, tech was allmighty. Buzzwords like Java, EJBs, PHP defined entire projects. It was the time where software became accessible to a much larger audience. The new wave of enthusiastic geeks embraced everything from new languages to professional certifications to the then-nascent open source. I admit having my share of technology-driven projects back in the day…

Process
Process brings structure and pace to a project. The two complementary components of a project process are methodology and integration. We are all too familiar with methodology: waterfall, RUP or agile methods of software development are vastly documented and practiced; integration largely is defect management, testing, build, deployment and documentation. Today they all come together in what is called continuous integration where all these concepts become interrelated in a repetitive process that produces accountability and visibility into the progress of a project. Process is also the one force that tends to disappear first after a project is finished.

Domain
The domain captures entities and business logic, is driven by the business requirements and is in no way influenced by the two other forces. The most successful way to employ the domain is through Domain Driven Design where the main focus of software development is neither technology nor process but the business requirements.

Read on…

Next Page »