In the previous post, we used term profiles to judge the effects of participation inequality in simulated collections of 100 photographs. In this post, we will use them to judge the effects of participation inequality in Geograph, a collection of nearly six million photographs of Great Britain and Ireland.

Geograph

Geograph is a collection of nearly six million photographs of Great Britain and Ireland. Each photograph is associated with a 1km² location on either The National Grid or the Irish Grid Reference System, a description, and several tags.

Term profiles

Following this article by Ross Purves, Alistair Edwardes, and Jo Wood, I constructed three term profiles showing the percentage of photographs in each group that were tagged with the terms engine, hill, and road.

I wasn’t able to reconstruct the term profiles from the article, so we shouldn’t make comparisons between these term profiles and those from the article. Nevertheless, for engine and hill, there doesn’t seem to be a relationship between the percentage of photographs and ranked prolificness. However, there does seem to be a relationship between the percentage of photographs and ranked prolificness for road. The percentage of photographs is larger where ranked prolificness is smaller.

We can see the relationship for road more clearly when we compare the z-scores to the numbers of users in each group.

For roughly the first 75% of groups, the range of z-scores is larger and the numbers of users in each group are smaller. For roughly the last 25% of groups, the range of z-scores is smaller and the numbers of users in each group are larger. The pattern suggests there is bias in the use of road.

Conclusion

I began the previous post by admitting that I couldn’t judge the effects of participation inequality using term profiles. However, by taking the technique apart and putting it back together again, I feel better able to detect the bias that is the result of a small number of users taking a large number of photographs.

I think it’s worth repeating that term profiles are a technique that can be applied to any collection of documents, not just a collection of photographs; that is, term profiles are a tool in the exploratory data analysis toolbox.

Code

For more information, see:

https://github.com/iaindillingham/geograph-sandbox/blob/v0.0.3/notebooks/Term%20Profiles.ipynb