Unit testing with Commons HttpClient library

I want to write testable code and occasionally I bump into frameworks that make it challenging to unit test. Ideally I want to inject a service stub into my code then control the stub’s behavior based on my testing needs.
Commons Http Client from Jakarta facilitates integration with HTTP services but how to easily unit test code that depends on the HttpClient library? Turns out it’s not that hard.
I’ll cover both 1.3 and the newer 1.4 versions of the library since the older v1.3 is still widely used.
Here’s some typical service (HttpClient v1.3) we want to test. It returns the remote HTML page title:

public class RemoteHttpService {
   private HttpClient client;
   
   public String getPageTitle(String uri)  throws IOException {
     String contentHtml = fetchContent(uri);
     Pattern p = Pattern.compile("<title>(.*)</title>");
     Matcher m = p.matcher(contentHtml);
     if(m.find()) {
        return m.group(1);
     }
     return null;
   }

   private String fetchContent(String uri)  throws IOException {
      HttpMethod method = new GetMethod("http://blog.newsplore.com/" + uri);
      int responseStatus = client.executeMethod(method);
      if(responseStatus != 200) {
        throw new IllegalStateException("Expected HTTP response status 200 " +
"but instead got [" + responseStatus + "]");
      }
      byte[] responseBody = method.getResponseBody();
      return new String(responseBody, "UTF-8");
   }

   public void setHttpClient(HttpClient client) {
      this.client = client;
   }
}

with the HttpClient is injected at runtime (via some IoC container or explicitly).
To be able to unit-test this code we have to come-up with a stubbed version of the HttpClient and emulate the GET method.

Continue reading “Unit testing with Commons HttpClient library”

Spincloud, now with worldwide forecast

In my constant search for free weather data for Spincloud, a short while ago I have found a gem: free forecast data offered by the progressive Norwegian Meteorologic Institute. The long range forecast coverage is fairly thorough and covers most more than 2700 locations worldwide. I am happy to announce that I have extended spincloud.com to include it.

worldwide_fcast    

The data is refreshed every hour and the forecast range is available for the next seven days. You can bookmark any location or subscribe to weather reports via RSS.
On a different but related note, I can only be thankful for the free data offered by various meteorological organizations that allows Spincloud to exist and I believe in freeing public data (weather related and otherwise) as it belongs to the public that finances government and inter-government agencies in the first place. The Norwegian Met Institute is a great example for freeing its data and I am saluting them for making the right decision.
Now if only all such progressive Meteorological institutes around the world would agree on a common format for disseminating their data, it would make developers like me a tad happier…

Continuous everything?

I admit that I regard automation as a dull but vital part in the success of a project. Automation had evolved into Continuous Integration, a powerful toolset allowing frequent and regular building and testing of the code. I won’t get into what CI is (check the internets). Instead, I am going to explore a couple of aspects of CI that can be added to the artifacts of the development process and note some others that cannot.

Continuous performance
You wrote performance tests. You can run performance tests by firing a battery of tests from a client machine and targeted on an arbitrary environment where the application lives. The test results are collected on the client and you can publish them on a web server. Why not automating this completely then? To accomplish this, automate the execution, gathering and publishing of the performance results. Daily performance indicators not only increase visibility of the progress of the application but it becomes much easier to fix a performance degradation on a daily changeset than between two releases. There are a couple of factors that may add complexity to establishing performance tests:
Dealing with dependencies
The obvious rule of thumb is to minimize dependencies. However, if there still are dependencies on other (perhaps external) systems, use mocks and to isolate the system you’re testing for performance. We’re talking about nightly performance tests so don’t put unnecessary stress where you shouldn’t.
Finally, the main artifact of the final integration (done once per iteration) is a running environment where all components run together, in a production-like setup. Use this environment to run your system performance test where you measure the current performance against the baseline.
Measuring relative performance
The environment you’re using for the nightly PT cycle most likely will not be a perfect mirror of production (especially true when dealing with geographically distributed systems). Use common-sense to establish the ratio between the two environments then derive rough production performance numbers using it (assuming a linear CPU/Throughput relationship).

Continuous deployment
This is as simple as it sounds: automate the install. Make it dead easy to deploy the application in any environment by providing installation scripts. Simplify the configuration down to a single file that is self-documented and easily understood by non-programmers (read Operations Teams). The goal here is to unlock a powerful tool: making the application installable and upgradable with a click of a button. If all the other pieces of the continuum are in place then you could confidently deploy your application in production it on a much tighter release cycle, even on a daily basis. Deployment and integration become tasks in the background rather than first-class events.

More continuous
Since I am just fresh off the Agile Testing Days conference and I have learned a few more Cs from the distinguished speakers which I term as Soft Cs since they involve constant human engagement:
– Continuous learning (Declan Whelan)
– Continuous process improvement (Stuart Reid)
– Continuous acceptance testing (i.e. stakeholder signoff at the end of every sprint) – Anko Tijman
– Continuous customer involvement (Anko Tijman)

blog.newsplore.com is one year old!

Sep. 15 came and went and I didn’t realize that this blog is one year old. I didn’t post in the last few months but this is because I have been busy moving across the world from Toronto back to Europe (I’m in Berlin now), becoming a father (“the best job in the world”) and taking a new job (LBS, yes!).
Things are happening, wheels are in motion and there’s still a lot to write about.

Stay tuned!

New Spincloud feature: heat map overlay

heatmap-overlayIt took a while since the previous feature update to Spincloud. I have done a number of upgrades to the underlying tech and some intensive code refactoring but nothing visible. The time has come for another eye candy: heat maps. It is a map overlay that shows a color-translated temperature layer based on interpolated values of the current weather conditions. It gives a quick indication of the average temperature across all land masses where data is available.
Naturally in areas where data is more dense, the visual representation is better. The toughest job was to figure-out the interpolation math. I still have to refine the algorithm currently based on the inverse distance weighting algorithm which interpolates scattered points on a surface. Apparently a better one to use is kriging but I have yet to implement it.
The temperature map is generated from current temperatures and is updated once every hour. You can toggle the overlay by clicking the “Temp” button found on the top of the active map area.

A project triumvirate

There are three forces that shape a project: domain, process and technology. Add the “-driven” suffix to any of them and you’ll perhaps recognize some of the methods used in projects you’ve been involved in and yet, as soon as one takes too much of a lead against the other two, failure will follow almost inevitably. At the intersection of these three forces we find familiar terms and concepts but first a word or two about each of them:

Technology
Back in the days of the tech bubble, tech was allmighty. Buzzwords like Java, EJBs, PHP defined entire projects. It was the time where software became accessible to a much larger audience. The new wave of enthusiastic geeks embraced everything from new languages to professional certifications to the then-nascent open source. I admit having my share of technology-driven projects back in the day…

Process
Process brings structure and pace to a project. The two complementary components of a project process are methodology and integration. We are all too familiar with methodology: waterfall, RUP or agile methods of software development are vastly documented and practiced; integration largely is defect management, testing, build, deployment and documentation. Today they all come together in what is called continuous integration where all these concepts become interrelated in a repetitive process that produces accountability and visibility into the progress of a project. Process is also the one force that tends to disappear first after a project is finished.

Domain
The domain captures entities and business logic, is driven by the business requirements and is in no way influenced by the two other forces. The most successful way to employ the domain is through Domain Driven Design where the main focus of software development is neither technology nor process but the business requirements.

Continue reading “A project triumvirate”

Process Journalism really is Agile Journalism

MediaThere’s an interesting story rippling the news stream these days: The New York Times is questioning the ways Techcrunch reports on news. The crux of the issue is product journalism v. process journalism, the act of producing news only after all facts have been verified versus just writing the story based on what the author best knows about an event at that moment then let the story evolve and provide rigorous updates and corrections once it does.
As I was reading, I got an epiphany. The “old media” is following the old methods, much like the waterfall process where the story (the requirements) has to be fully known before proceeding with actually breaking the story (the implementation). The new media follows a radical approach: it starts with a lead (or rumor) that is initially published in a somewhat crude form, then the story evolves based on field feedback and facts drawn from the actual reality, until it becomes accurate. Even the language of the aforementioned references use words like “iterative process”, “transparent” and “standards” the same way I’d use them when talking about developing software.

This method of early release followed by incremental improvements resembles so much the agile methods of software development, I’m calling it Agile Journalism.

A revolution is happening in the media. Old methods are swiftly pushed aside; the emergence of the Internet as the news dissemination medium is fundamentally changing the ways the reality is reported upon. Who wants to read a story that is already obsolete or wrong altogether and impossible to be corrected/evolved until a new paper edition is off the press? The new media not only breaks news faster (release early) but gets the stories right eventually, simply because the new rules (standards) are clearly stating that until confirmed, the product -otherwise useful- is still a beta representation of the reality and the consumers already know how to consume such product.

From blogs to real time news streams, the new media’s aspiration is to bring the most accurate information yet to the consumers. It can and will do it by revolutionizing the way the stories are iteratively discovered based on constant feedback, the same way agile methods are revolutionizing the software industry.

Image courtesy hirefrank

Reviewing Google AppEngine for Java (Part 2)

In the first part I’ve left-off with some good news: successful deployment in the local GAE container. In this second part I’ll talk about the following:


Loading data and browsing
After finishing-off the first successful deployment, next on the agenda was testing out the persistence tier but for this I needed some data to work with in the GAE datastore.
Along with the local container Google makes available a local datastore but so far the only way to populate the datastore is described here and it uses python. Furthermore, currently there is one documented way to browse the local datastore using the local development console but this again requires the python environment. There’s a hint from Google though that there will be a data viewer for the Java SDK. In the mean time I voted up the feature request.

Back to bulk loading, after voting on the feature request to have a bulk uploader, I decided not to use the python solution but to handcraft a loader that will fill-in two tables: COUNTRY and COUNTRY_STATE (for federated countries like US and CAN). Since there’s no way to schedule worker threads (but is on it’s way) the only mechanism to trigger the data loading is via URLs. I’m using Spring 3.0 and so it wasn’t hard to create a new @Controller, some methods to handle data loading then map them to URLs:

@Controller
public class BulkLoadController {
	@RequestMapping("/importCountries")
	public ModelAndView importCountries() {
           ...
        }
	@RequestMapping("/importProvinces")
	public ModelAndView importProvinces() {
           ...
        }
}

Continue reading “Reviewing Google AppEngine for Java (Part 2)”

Upgrading to Spring 3.0.0.M3 and Spring Security 3.0.0.M1

A short two months back I posted an article describing how to upgrade to Spring 3.0 M2. Spring folks are releasing at breakneck speed and so I got busy again upgrading spincloud.com to Spring 3.0 M3 released at the beginning of May. Just yesterday (June 3rd) the team released Spring Security 3.0 M1 and I decided to roll this in Spincloud as well.

Upgrading Spring Security from 2.0.4 to 3.0.0 M1
For Spring Security 3.0.0 M1 I’m doing a “soft” upgrade since I had done my homework when I migrated from Acegi. I won’t use any of the new 3.0 features, just getting ready to use them.

To digress a bit, Spring Security a technology that is harder to swallow due to its breadth. To simplify the picture, Spring Security (former Acegi) provides three major security concerns to enterprise applications:
– Authorization of method execution (either through standard JSR-250 JEE security annotations or via specific annotation-based method security).
– HTTP Request authorization (mapping URLs to accessible roles using Ant style or regexp’ed paths, dubbed channel security).
– Website authentication (integrating Single Sign On (SSO) services by supporting major SSO providers, HTTP BASIC authentication, OpenId authentication and other types).
For Spincloud I’m using OpenId for authentication and Channel Security to secure website pages.
Continue reading “Upgrading to Spring 3.0.0.M3 and Spring Security 3.0.0.M1”

Proposal to standardize the Level 2 query cache configuration in JPA 2.0

Level 2 cache is one of the most powerful features of the JPA spec. It’s a transparent layer that manages out-of-session data access and cranks-up the performance of the data access tier. To my knowledge it has been first seen in Hibernate and was later adopted by the then-emerging JPA spec (driven mostly by the Hibernate guys back in the day).
As annotations gained strength and adoption, L2 caching that was initially configured through XML or properties files, was brought closer to the source code, alas in different forms and shapes. This becomes apparent if you ever have to deploy your webapp across a multitude of containers as you have to painfully recode the cache configuration (or worse, hand-coded cache access). Why not standardizing the cache control in JPA? This seems to be simple enough to achieve and yet it isn’t there. Now that JPA 2.0 is standardizing on Level 2 cache access (See JSR 317 section 6.10) it is the natural thing to do.
Every JPA provider has its own way of specifying cache access (both Entity and query cache).
To grasp the extent of the various ways cache is configured, here are some examples:

Hibernate:
Cache control for entities

@Entity
@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)
public class Employee {...

Cache control for named queries:

@javax.persistence.NamedQuery(name="findEmployeesInDept",
query="select emp from Employee emp where emp.department = ?1",
hints={@QueryHint(name="org.hibernate.cacheable",value="true")})

or

@org.hibernate.annotations.NamedQuery(cacheable=true, cacheRegion="employeeRegion")

OpenJPA
Cache control for entities

@Entity
@org.apache.openjpa.persistence.DataCache(timeout=10000)
public class Employee {...

Query cache requires hand coded cache access:

OpenJPAEntityManagerFactory oemf = OpenJPAPersistence.cast(emf);
QueryResultCache qcache = oemf.getQueryResultCache();

Continue reading “Proposal to standardize the Level 2 query cache configuration in JPA 2.0”