Large-scale and high-resolution analysis of food purchases and health outcomes

To complement traditional dietary surveys, which are costly and of limited scale, researchers have resorted to digital data to infer the impact of eating habits on people’s health. However, online studies are limited in resolution: they are carried out at country or regional level and do not capture...

Full description

Saved in:

Bibliographic Details
Published in:	EPJ data science 2019-12, Vol.8 (1), p.1-22, Article 14
Main Authors:	Aiello, Luca Maria, Schifanella, Rossano, Quercia, Daniele, Del Prete, Lucia
Format:	Article
Language:	eng
Subjects:	Adults Calories Carbohydrates Cholesterol Complexity Computer Appl. in Social and Behavioral Sciences Computer Science Data-driven Science Diabetes Diabetes mellitus Diet Digital data Digital purchase records Food Food analysis Food composition Food consumption Health Health surveillance Hypertension Loyalty cards Metabolic disorders Metabolic syndrome Modeling and Theory Building Nutrients Nutrition Regression analysis Regression models Regular Article Signs and symptoms Sugar
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	To complement traditional dietary surveys, which are costly and of limited scale, researchers have resorted to digital data to infer the impact of eating habits on people’s health. However, online studies are limited in resolution: they are carried out at country or regional level and do not capture precisely the composition of the food consumed. We study the association between food consumption (derived from the loyalty cards of the main grocery retailer in London) and health outcomes (derived from publicly-available medical prescription records of all general practitioners in the city). The scale and granularity of our analysis is unprecedented: we analyze 1.6B food item purchases and 1.1B medical prescriptions for the entire city of London over the course of one year. By studying food consumption down to the level of nutrients, we show that nutrient diversity and amount of calories are the two strongest predictors of the prevalence of three diseases related to what is called the “metabolic syndrome”: hypertension, high cholesterol, and diabetes. This syndrome is a cluster of symptoms generally associated with obesity, is common across the rich world, and affects one in four adults in the UK. Our linear regression models achieve an R 2 of 0.6 when estimating the prevalence of diabetes in nearly 1000 census areas in London, and a classifier can identify (un)healthy areas with up to 91% accuracy. Interestingly, healthy areas are not necessarily well-off (income matters less than what one would expect) and have distinctive features: they tend to systematically eat less carbohydrates and sugar, diversify nutrients, and avoid large quantities. More generally, our study shows that analytics of digital records of grocery purchases can be used as a cheap and scalable tool for health surveillance and, upon these records, different stakeholders from governments to insurance companies to food companies could implement effective prevention strategies.
ISSN:	2193-1127 2193-1127