One of the important abilities of Data Analyst is asking questions. Questions are important because they can make a difference. Dan Rothstein – Professor from Right Questions Institute put it this way: Questions are like flashlights throwing light on the path that one can take. Rothstein points out that questions not only stimulate thinking but can also steer and focus. In his teachings, he asks students to think only by asking questions and do brainstorming. According to Rothstein, this method opens “lock gates” to the power of imagination. Thoughts begin to flow – in the form of questions and students show more interest in a topic.

In my point of view, questions are a motor for a movement. The smart questions are very powerful and can change a lot. I strongly believe questions bring us from unknowingness to knowingness.

Questions are essential for my job, as a data analyst. Every analysis begins with questions. Questions have an impact on the story you want to tell with the data. The answers are secondary. The questions are more valuable than the answers!

In this blog article, I have selected the most common questions that I ask myself in every single step by analyzing data. I have collected these questions through my own experience, but also through the books I read.

Exploratory Analysis

  • What is the topic of the data?
  • How much data do I have?
  • What is the measuring and by which dimensions is it broken down?
  • What time period has the data?
  • What are the most important metrics?
  • Do the metrics have a proper context?
  • What is “good” and what is “bad” in the context?
  • Is data available for each year? or are there any time gaps? How many time gaps are there and how will it influence the data analysis?
  • What do all abbreviations mean?
  • Is it survey data? How many people have participated?
  • Does it come from sensor measurements?
  • Is it a set of summary statistics or detailed transaction-level records?
  • Are there any geographical dimensions? Which countries are listed?

It is also significant to plot the data in order to find some patterns/anomalies

  • How does the correlation look like between relevant dimensions?
  • What is the average for the relevant values? How do the median, quantile, and outliers look like?

Explanatory Analysis

  • Do I understand relevant measures and their context correctly?
  • Can I explain the positive/negative trend?
  • Can I explain the gaps in the data?
  • Which facts are available to explain the trends?

Second Step: Audience

It is important to think about the audience and their needs, as you want to communicate your message to an audience. In the book “Storytelling with data” Cole Nussbaumer Knafflic had written: The more specific you can be about who your audience is, the better position you will be in for successful communication.

Follow questions could be helpful to understand your audience:

  • Who is your audience and which interests do they represent?
  • Are you creating analysis and visualizations for an unknown audience or is your audience inside your organization?
  • Ask yourself, if you were them, what would you want to know?
  • Which dimensions could be relevant for them?
  • Why is it worth providing a view of your data from “this” point of view and not “another” one?
  • Why is this angle of analysis likely to offer the most relevant and compelling window into the subject for your intended audience?
  • Does your audience have knowledge about the topic already?
  • How numerically literate are they?
  • Is it still relevant in light of the context of the origin curiosity – that is, have definitions evolved since familiarising yourself with the data, learning about its potential qualities as well as researching the subject large?
  • How can I give more than the audience asked for? Are there additional insights in the data of which they may not be aware?

Third Step: Data Wrangling

Once you get familiar with data and understand the context, it is a time to clear the data.  The data, you are using for visualization, must be correct.

  • What KPIs are there?
  • Is there any guide for the KPIs? Do I understand their calculation?
  • Should I create any additional calculation fields?
  • How much variety do the fields have?
  • Is there a data hierarchy or should I build one?
  • What aggregation level is there in the data?
  • Are there any relationships between metrics?
  • What happens when I compare data across fields? Does one field effect another?
  • What data types are in the data set?
  • Are the names in the fields understandable? Do I understand them?
  • What range of values do the fields contain?
  • Is the data complete? How many null values do I have?
  • Is it possible to provide an accurate analysis with all missing values in the data?

Fourth Step: Find a Story

Finally, you have the data which is clean. Now it is time to play with different charts and find a story (stories). Think about data presentation and representation. Try to play with different dimensions and numbers by representing them in different shapes. Maybe you can find an interesting pattern in the data. To help you to find a story in your data, consider the following:

  • Have you found any interesting trends in the data? What did you notice?
  • Are there obvious outliers? (Be careful with outliers. Sometimes there are mistakes in the data)
  • Which trends did you notice?
  • Are there interesting correlations between the two measures?
  • Are there repetitive patterns in the data, such as seasonal spikes?
  • What do you need your audience to know?
  • Do I want to represent trends over time?

After you have found your story, skip the data you don’t need.

Fifth Step: Communicate Your Result

In this step you communicate the data, visualizing different chart types. Keep in mind, that your choice of a graph is a basis for decision making. The graph should be understandable, truthful and trustful.

These questions could be helpful if you want to communicate your results visually:

  • Is the title clear enough? Does my key message clear enough?
  • Are any contexts/annotations needed?
  • Did I specify the data source?
  • Did I name an author who inspired me for this visualization type?
  • What could be the clutter on my visualization?
  • Do all elements on my display have value or do they not?
  • Which layout should I use (horizontal/vertical)?
  • On which device will be my work screened?
  • Are there any specific guidelines when it comes to style?
  • How do I communicate with my audience? Do I need to confirm any additional information before presenting my work?
  • Did I use to much green or red? (Important for colorblindness)
  • Did I used color sparingly or it is too colorful?
  • Did I use the color of Corporate Design?
  • Is the important information in the foreground or has it been lost?

Some Notes to This Topic (Gold Rules)

If you’re wondering what is the right graph for my situation? – The right answer is always the same: whatever will be easiest for your audience to read.

Cole Nussbaumer Knaflic

If everything in visualization is shouting, nothing is heard!; If everything is in the foreground, nothing stands out; if everything is large, nothing is dominant.

Andy Kirk

 

 

 

Source: Andy Kirk: Data Visualisation; Cole Nussbaumer Knaflic: Storytelling with data; Andy Kriebel, Eva Murray; #MakeoverMonday; Waren Bercher: Die Kunst des klugen Fragen;

Picture Source: https://bit.ly/2ZBigGY