Newsletter

Data Curious 01.11.2018 - Visual Perception in data viz, exploring SFMOMA, and dropping bad data in Python

Ben Dexter Cooley

01 Nov 2018 — 3 min read

01.11.2018

Hello. It’s another edition of Data Curious.

Last week I found some fascinating academic research into visual perception and data visualisation. It’s a field that is often ignored, which can be puzzling: how people see things should be the foundation of data visualisation design.

Some purists swear that a bar chart is always the best choice. Data artists play with less precise forms of representation, inventing new chart types that may “bend” the rules. Whatever your preference, I think it’s important to see what the research has to say on how people will see and understand your data. Just because you like it, doesn’t mean they will get it.

And now to the good stuff.

Read_

Visual perception, sociological data, and do I really need to know stats to do data science?

How do people perceive and understand visualisations with their eyes?
I discovered this research paper by Steve Haroz from a recent Data Stories episode (a fantastic podcast about data viz). It explores how people understand visualisations, and what methods are more memorable than others. Disclaimer: Edward Tufte loyalists, prepare to be outraged.

Pictograms incoming →

How can I use data visualisation and art to explore sociological issues of the day?
Jason Forrest has been releasing a series of blog posts on Medium covering the work of W.E.B. Du Bois. He does so through the lens of UX, data visualisation and sociology. The most recent article makes for a fascinating read and features some inspirational hand-drawn graphics from Du Bois' collection.

Sounds cool →

Do I really need to know statistics to do data science?
Short answer: yes. But it doesn't have to be complicated. I found this article on "The 5 Basic Statistics Concepts Data Scientists Need to Know" to be a helpful start.

Ok fine, let's get nerdy →

Explore_

Overnight stays in national parks, student debt, popular subreddits, and what people need

How long do people stay in US National Parks?
This exploration of overnight visits to parks is absolutely beautiful. I love the colour choice in this data visualisation, as well as the radial line graphs that capture multiple variables in the data.

Let's go camping →

How bad is student debt across America?
The Washington Center for Equitable Growth created a new interactive map which shows the $1.5 trillion of student debt spread across America.

Explore →

How have subreddits grown and changed over the past 12 years? Or: what is the internet talking about?
Nathan Yau of Flowing Data has come out with a stellar new analysis on the growth in comments on subreddits.

/subredditdataviz→

You can receive free images from the SFMOMA archive: what are people asking for?
Data viz creative Shirley Wu published a recent collaboration with the SFMOMA. The visualisation uses generative art to show how people interacted with the Send me SFMOMA tool. I especially enjoyed reading her write-up on why she took this creative approach for the project.

Send me love →

Analyse_

Food deserts and nutrient food groups

Where are all the food deserts in the United States?
Dive into this dataset on data.world to explore.

Dry food, dry prospects →

What are the most nutrient-rich foods?
This comprehensive database from the USDA contains nutrient info on raw, processed and prepared foods. Could be interesting for some analysis on related food groups and nutrients.

Healthy by data →

Learn_

Data cleaning essentials and geospatial analysis

How can I quickly clean a dataset before analysing in Python?
"Drop that bad data like Obama drops mics." George Seif has got your back with a nice intro tutorial on cleaning data with pandas.

Healthy by data →

How do I get started visualising geospatial data in Python?
I know I've included a few mapping tutorials in the previous edition, but this one takes even one step further back. It's a really nice intro to working with Point data and using coordinates on top of a basemap using Geopandas and Shapely.

Healthy by data →

Spread the love.
Share this edition of Data Curious.