Data visualization is a pretty important factos for conveying your research to your team members, specially for an individual working in a corporate world. Thus, it is pretty important that your visualization conveys truthful data and is understandable for one’s not having enough knowledge of the dataset on which you are working on.

While you are working for a business or a company, most of the people, to whom you will be presenting your data to, will not be interested in complex model even if it conveys a lot of important information but instead a simpler segmented visualization that is easier to understand and only presents the information they are interested in or are related to.

Hence, it is very important to know the aspects of a good visualization and using the designing in a consistent and appealing manner. Here are a few principles that you should follow in order to communicate your data to your audience in an efficient and professional manner. The following principles are given by Alberto Cario.

  1. Principle of Truthfulness
    The visualization should be truthful, and should not mislead anyone. This is really important to take care of when filtering or cleaning the data by avoiding change of data by deleting some information that you think is not important but can really impact the final result by making one of the feature look far more superior than others and hence misleading both, audience and yourself
  2. Principle of Truthfulness
    This principle is extremely important for effective communication of data. This principle consists of choices like, “Which type of data you want to present?” and “Which type of chart will be suitable for your data?”. It is extremely important for your graphs to have functionality so that it can communicate useful data and do so efficiently.
    This also includes removing plot junk and parts of the visualizations which are not adding any additional visualization and/or affecting the visualization with things that look complex and distract viewer from the original data; in such a case, you will be better of removing those features acting as plot junk.
  3. Principle of Beauty
    This principle majorly depends upon your audience and your data. You should choose the factors of presentation like fonts, font-size, colors, backgrounds and sometimes even additional factors like patterns and styles depending on the audience to whom you are presenting your data. An important feature of this principle is to stay consistent with your design and fonts and keeping them as easy to read as possible.
  4. Principle of Insightfulness
    While reading your audience may not have enough time to go through your visualization model completely and might decide what to read or what to focus on based on the insight that they get while gazing over the plots. Your visualization should give enough, and useful insight of the data that is consists and the viewer will find out if they decide to observe it closely.
    These principles are very enlightening and hence one should take care of these when building visualization plots off of some dataset for communicating the data to the audience which can either be your team or even stakeholders of your company or the one you are working for.
    Let’s see a basic example of a good and bad visualization plot. The following representation will be followed by the code that was used for making the plot. This plot is made using matplotlib which is a python library for data visualization and quite popular for its purpose. You may try to execute the code for yourself, but it will not give the exact same result as this due to use of normalized random function for creating a random dataset however, it will give you the idea about why the first representation is bad and the second one is the better one.
    The following are two plots representing same data, i.e. a randomly generated data set representing stock value of a company through a span of 30 days which is meant for stakeholders of the company or investors looking to invest in it. The first one (the bad one) is using a bar chart for representing the average stock value of the company in form of groups of 5 days spanning throughout. The second one is a better representation of this dataset since it shows the trend of the stock price on each day. The first one represents that the stock prices have been quite balanced for a value of about 4 to 6 but the second one shows that the real values are far more complex than this. It has gone below 4 a few times and also above 6 for quite some but the first one is hiding all this information hence not being truthful. Also the first one is using different colors and grouping data in groups of 5 days, these violate the principle of beauty and functionality respectively. Hence, we would say that the second one is a better representation for showing the value trends throughout the given span. It also represents trends like the stock value is on a negative trend at the end of the given 30 days.

import numpy as np
import matplotlib.pyplot as plt
# generating data
data = np.random.randn(30) + 5
#plotting the bar graph with average of grouped data
fig1 = plt.subplot(121)
fig1.bar(np.arange(6),
np.array([np.average(data[i:i+5]) for i in range(0,30,5)]),
width=0.8,
color=[‘blue’,’orange’,’red’,’green’,’purple’,’pink’])

fig1.set_xticklabels([‘’,’0–4',’5–9',’10–14',’15–19',’20–24',’25–29']) fig1.set_yticks(np.arange(8))
fig1.set_ylabel(‘Stock Value (Average)’)
fig1.set_xlabel(‘Days’)
fig1.set_title(‘Stock trends of company (30 days) — Bad’)
# plotting the line graph with shared y-axis with first plot
fig2 = plt.subplot(122, sharey=fig1)
fig2.plot(np.arange(30), data, ‘-o’)
fig2.set_xticks(np.arange(0,30,5))
fig2.set_ylabel(‘Stock Value’)
fig2.set_xlabel(‘Days’)
fig2.set_title(‘Stock trends of company (30 days) — Good’)
plt.suptitle(‘Bad vs Good Visualization’)
plt.show()

If you want some examples of how well one can visualize data, the reddit r/dataisbeautiful (https://www.reddit.com/r/dataisbeautiful/) could be a good source. It is a quite popular reddit page where people share and discuss visual representations of data: Graphs, charts, maps, etc.

CONTRIBUTED BY

SHUBH BHARDWAJ

CGC-COE(2nd YEAR)

The Competitive Coding Community which provides all details of events organized by Codechef CGC Chapter.