Daniel Garton is a partner and Primrose Tay an associate at White & Case
The global volume of data created has grown at an incredible pace in recent years and is forecast to continue
its rapid ascent. This has largely been fuelled by new technologies and digital solutions, including in the construction industry, which has introduced digital solutions such as building information modelling (BIM), artificial intelligence, robotics and other data-collection and monitoring technology.
When it comes to disputes, big data presents both a challenge and an opportunity for litigants seeking to make use of the data produced. While the cost of collecting, processing and reviewing large amounts of data is often at odds with a litigant’s desire to deal with cases proportionately, parties who manage to use data efficiently obtain a significant strategic advantage in the disputes process.
“There has been a growing acceptance among lawyers, arbitral tribunals and the courts of the use of data sampling”
Against this background, there has been a growing acceptance among lawyers, arbitral tribunals and the courts of the use of data sampling where it may otherwise be too disproportionately costly.
Sampling is a means of finding out about the characteristics of a large population by looking at a subset of that population. The sample ‘s results are then extrapolated to the whole population.
A key requirement of effective sampling is that the sample investigated must be representative of the relevant population as a whole. The main issue that prevents this is sampling bias. Put simply, bias is the difference between the sample results and the population it is meant to represent. It can occur both on purpose – motivated by a desire to achieve a particular result – or inadvertently due to a poorly thought-out methodology. Examples of common sources of bias include:
Non-random methods of drawing the sample: Broadly speaking, sampling can be carried out either by selecting samples at random or by specifically selecting samples based on certain characteristics. The courts have approved of the
use of either method but actual, or perceived, bias is more likely when non-random methods
are used, as subjective judgment is needed to select the sample.
Insufficient sample size – If your sample is too small, it can be hard to obtain reliable results. Bias tends to fall when the sample size increases as the sample population becomes more like the actual population.
Over/under inclusion of potential samples from the sampling frame – The sampling frame (the group that you draw your sample from) should be as close to the population as possible.
Random sampling is, therefore, the preferred method. Also, when using a well-constructed random sampling process, the results can be supported with statistical concepts such as confidence intervals and margins of error, which demonstrate how accurate your results are likely to be by quantifying the uncertainty that is present in any sampling exercise.
If a sample is found to be unrepresentative of the population, the claim is likely to fail – and case law shows that the failure rate for claims based on sampling is high.
However, construction disputes are notoriously data-heavy in nature and sometimes sampling is the best way to bring the claim, but a statistical expert should be instructed from the outset to design or advise on the process.
It is important that the legal team works closely with the statistical expert to determine exactly what elements of the claim (such as breach, causation and/or quantum) may need to be supported by sampling evidence and what data is available in order to develop the most appropriate sampling methodology.