Heading into the final day of the second test between South Africa and England at Newlands, Cape Town, it now looks as though the match will conclude in a stalemate. The Test Match thus far has been host to a dramatic number of records that have been conquered by the South African and English batsman. To name a few:
- Temba Bavuma, 25, became the first black African to score a Test hundred for South Africa as he hit an unbeaten 102 against England in Cape Town.
- Day 2 saw Stokes' innings topple 7 records alone, one of which dates back to a score posted by Jack Hobbs in 1910.
The eventual Test Match result provides an incomplete picture- just one of thousands of test matches. One match ignores the significance that each of these records holds. Fortunately, due to the collection of match figures from scorecards, and such publications as 'Wisden Cricketers' Almanack' (colloquially referred to as "the Bible of Cricket") dating back to 1864, we can begin to comprehend the magnitude of each record and how they can be positioned historically. Therefore, we can appreciate the complete picture from individual, and team performances, as well as contextualise a match/series against it's place in history.
When former England cricketer, John Wisden, originally published his annual cricketing reference book in 1864 (making it the longest running sporting annual in history), he perhaps would not have foreseen the significance that collecting such relevant data would have in the long-term. However, the fact that such statistics are available, allows us to enrich the value of each record broken, as it can be compared to a vast number of historical data points; again, enabling us to contextualise the records' place in history.
Similarly, with the hype surrounding 'Big Data', we see organisations attempt to build a universe of data, without taking the lesson from Wisden's publications. Wisden's collection of cricketing statistics, allows his readership to appreciate the full picture of cricketing records. The publications portray a connected entry of data points, that may not seem relevant to compare one to another, but overtime build a complete picture that can be referenced. The stark contrast this has with the trend of big data is both the relevancy of the data points and completeness of datasets. Forfeiting these factors and choosing to build an endless repository including irrelevant data, endangers one's ability to view the whole picture, as it rapidly becomes clouded and muddied with the entry of unrelated data. Therefore, organisations should aim to focus their efforts on building complete, relevant suites of data, rather than a universe of unrelated data points.
Internal and external, structured and unstructured, and not forgetting the outliers. Often ignored, these may well prove the fallacy of theories you have built.
Returning to the current England v South Africa Test match - how relevant is Adelaide 2006 as a data set? England choked then as Shane Warne gleefully points out today. Then what good is the test match results without the weather data? It looks as if the weather will be the decider now and not South Africa's bowlers.
You can only describe the past and present and predict the future with complete data rather than big data.
On the face of it, 629-6 declared plays 627-7 declared isn't the greatest of Test matches, not particularly remarkable. It may well fizzle out to a dull stalemate. Then again, there's already been some memorable moments...