Data Curious 25.10.2018 - Information is Beautiful, NYT Bestsellers and getting started with NLP

25.10.2018
Welcome to Data Curious. This week includes loads of visual inspiration.
The Information is Beautiful Awards voting came to a close last week, and I’m feeling simultaneously inspired and intimidated by the level of work. Each year new innovations emerge in the field of data visualisation and 2018 was no exception.
Time to get back to the grind.
See what you find interesting in this week’s roundup, and if you want me to focus on something new/different in this newsletter let me know here.
Read_
Machine learning explained and the visual interrogation of data
WTF even is Machine Learning?
It's ok to ask—I still ask most weeks. The trouble with new technologies is that they are plagued with jargon. Consider this article an intro-friendly, jargon-busting explainer.
"The purpose of a machine learning algorithm is to pick the most sensible place to put a fence in your data.”
Enlighten me, no-jargon style →
How can I ask questions of my data using charts?
Nathan Yau wrote a blog post, and the title alone captured my fullest attention: "Ask the Question, Visualize the Answer". The article takes a single question and then visualises it 16 different ways to find the best fit. Dreamy.
Explore_
Info is Beautiful Shortlist, gender disparity in orchestras and trade flows around the world
Who made the best data visualisations of 2018?
The 2018 Information is Beautiful Awards voting has officially closed. Explore the Shortlist pieces below for some mind-melting data viz inspiration. I'd also encourage starting here, with an intro blog post on how the judges decide the winners of each category for an enlightening read defining: what makes a good data visualization?
Show me the best of the best →
What is the gender disparity like among top orchestra musicians?
Quartz published a data analysis looking at the gender split among full-time musicians. The resulting interactive beeswarm plot is illuminating: instruments like bassoon and trumpet are almost entirely made of men. In contrast, violins are made of close to a 50/50 split (but this is one of the most "female" instruments in the analysis).
How do countries around the world trade with each other?
Data visualisation expert Moritz Stefaner shared his latest work on Twitter last week. "Flows & Tolls" paints a fascinating picture of exports/imports across the world through an interactive flow map. I particularly like the step-through editorial at the bottom with story highlights.
Analyse_
What makes a good book and where people buy property
What genre of book stays on the NYT Bestsellers list the longest?
Avid reader? Dive into this dataset of bestsellers from 2011 to 2018 to find the recurring trends in top-selling books.
Well-read discoveries →
Where are people buying properties in the UK and for how much?
Here's one from the archives: the UK Price Paid Database. It includes information on all property sales in England and Wales going all the way back to 1995.
Learn_
Getting started with NLP and choropleths in Altair
I'd like to learn NLP...but how do I start?
This is one of the best starting points I have found for diving into some tutorials on NLP methods. It includes links to resources on sentiment analysis, tokenization, parts-of-speech tagging and more. Plus it's written by a dude from IBM's Natural Language Understanding team, so seems legit.
How can I make a custom choropleth map using Altair in Python?
I've been working more with geographic data lately, so my bookmarks are full of map-related tutorials these days. Here's a nice one on how to create a choropleth in Altair from a custom shapefile. Not heard of Altiar yet? It's a "declarative statistical visualization library for Python, based on Vega and Vega-Lite” and it is great.

Thanks for reading. More to come next week.
Ben_ say hi: twitter | medium | github
Spread the love.
Share this edition of Data Curious.