Skip to content

jcpetitto/CS573_Graph10Ways

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assignment 2 - Data Visualization, 10 Ways


MatLab

MatLab

The code I used was based on the documentation for creating a scatterplot in MatLab and modified to fit the assignment criteria. For the csv file import, I used the built in import function and then created an .m file to hold the code generated by the program. MatLab was easier to use for creating a visualization than I thought it would be based on my experience tripping over syntax in my first lab rotation. Now, it makes a lot of sense why this software has endured as a staple. It is also clear why I have seen some of my more computationally focused colleagues are able to use it to generate informative graphics as a secondary result of running their simulations. This tool would be useful to quickly create data and formula-based graphics. A feature of this program I found very useful was the ability to tweak graphs after creating them by running a script. After tinkering with them, a script that would result in the graph post-tinkering can be created. While the script isn't always the tidiest or most efficient, it can be easier to figure out what methods or attributes modify a minute detail without slogging through all of the documentation.

Technical Achievements

  • Target and Click for Car Name: When the code is run within MatLab, an option to mouse over the graph with a set of cross-hairs that follow the mouse is turned on. When the user clicks on a specific point, the name of the car corresponding to the point becomes a label for that point.

Design Achievements

  • Scaling by Weight: I wrote a function to scale the points based on a scale factor proportional to the quotient of the minimum and maximum weights. The idea behind this was to scale 1% -> min value and 100% -> max value, then multiply that by a decimal < 1 to refine the graph.

R Studio + ggplot2

R Studio and ggplot2

To get started, I used a tutorial I found (gplot2 scatter plots : Quick start guide - R software and data visualization ), which conveniently used what is likely the same dataset, though that wasn't one of my search parameters. As I modified the graph to fit the given parameters, I used the official documentation: Function Reference - ggplot2. Using the ggplot2 grammar system took a little bit to get used to, but was easy to use once I realized keeping variables outside of the aesthetics tab kept an associated legend from being added to the graph. I could see this package being incredibly useful in the future to create clean looking visualizations that are reproducible and can be modified to keep styles consistent within a project. Once I become more accustom to using R for data processing, I will be able to gain more benefit from using this package as well as the rest of the tidyverse.

Technical Achievements

  • Grammar: Manipulating the different layers using the grammar used in ggplot2 became easier the more I used it. In trying to manipulate the graph to fit the example, I learned about assigning the plot to a variable and adding layers in a fashion similar to x+=y.

Design Achievements

  • Full Implementation of the Design: I was able to fully implement the design as shown as an example for the assignment.

Stata

Stata

I did not start with any code that I found to use Stata as I used this package when studying Public Health at Tufts. I did modify some of my own code (which is called syntax by Stata uses, for reasons unknown) and utilized the code produced by the program when building graphs to refresh my memory with respect to scatter plot options. I did not find using Stata particularly difficult for this project, but I do remember it being a bit frustrating to learn as there a lot of menu options and the data is stored all in one table without the ability to break it down into smaller pieces. It is also somewhat procedural, so depending on the visual you are trying to produce, certain steps must be done in a specific order without interruption because the background variables are easily overwritten. The original design was mostly able to be implemented, with the exception of no additional legend, the minor gridlines are not included, and the number is a bit off. The maximum and minimum values on the axes were easily set to what I wanted, but having the labels match the original was a battle lost.

Technical Achievements

  • None

Design Achievements

  • Aesthetics (Somewhat) Maintained: I was able to choose similar colors as the original design. If I went back, now knowing the actual color palette, I could mimic the colors better. Time is a constraint, however.

D3

d3

The code I used to import the data was based on this reference. I started to use code from Bostock's scatterplot example, but I found his code overly complex for what I was trying to do, so I looked for a simpler example so I could start simple and build. I found a chapter in the book Creating Web Charts in D3 a good place to start, though the version of d3 used is not current. Lane Harrison helped me debug my code and introduced me to the wonder of the console function as js version of print or trace. I find/found using D3 both interesting and incredibly frustrating. My graph is still missing such features as a full vertical axis and a legend. While I had not yet tried to add a legend, the y-axis is added to the graph. In theory, I have a basic understanding of how the language works. In reality, my understanding lead me to give up because I ran out of time. I will be using this tool in the future, in part because it is required for this course, and because I stubbornly refuse to let the puzzle go unsolved.

Technical Achievements

  • Hidden Errors: I managed to create errors that stayed hidden... no errors were thrown and yet, the code just stopped.

Design Achievements

  • Circles: the circles, once they appeared, have the correct opacity and the colors match the ggplot2 palette. While this may not be an achievement for some, given my difficulties with D3, I am counting this as a small victory.

Tableau

Tableau

Tableau was very easy to use. I had played with it a little bit before, without the use of tutorials, but had no formal introduction to the interface. Importing the data was simple, as was choosing to embed it in the Tableau file rather than have a dependency on the csv file in my file structure. Dragging and dropping the variables into the various slots was simple, though required a bit of trial and error to get just right. The main frustration is Tableau likes to automatically sum, average, etc. the variables dropped into the row and column slots, which took a minute to figure out. Otherwise, this tool is one I would use to quickly visualize data to see if any relationships pop out between variables. I prefer the interface over any of the spreadsheet interfaces, which are addressed later. The features I was not able to reproduce were the minor gridlines. I also did not notice until now that the export of the graph uses "2K" instead of "2000" for the y-axis tick labels.

Technical Achievements

  • Interactivity: The live graph has interactivity including tooltips that display information such as the car name.

Design Achievements

  • Weight Legend: Although the weight legend is not the same as that shown in the original graph, I was able to create a weight legend and manipulate the range it covers to match that of the original.
  • Caption: An automatically generated caption included an explanation of the relationships depicted in the graph as well as addresses the missing data.

Plotly Chart Studio

Plotly

To create this graph, I used the browser based GUI Online Graph Maker - Plotly Chart Studio. The resulting interactive graph has several ways the user can interact with the graph. It is also possible to download generated code that would result in the same graph in several programing languages. The export to R, MatLab, Python, etc. feature was not one that I took advantages of. However, it is one of the features I could see being useful in the future as a way to tinker with different visualization idioms that are feasible to develop in these packages without committing to writing coding, and customizing, in a particular language. Using the interface was fairly easy. There were just a few limitations to using the interface that may not be there if one were to use a different package. For example, only one legend can be displayed. Additionally, the minor axis labels were not removable, although the GUI indicated they should be.

Technical Achievements

  • Toggling Groups: Clicking on a group in the legend toggles members of that group on and off.
  • Point Information Displayed: On mouseover, the coordinate information for a point as well as the name of the car.

Design Achievements

  • Matched the Color Scheme: I was easily able to match the color scheme.

Power BI

Power BI

I tried Power BI because my search history leaves me inundated with BI tools. After having my WPI account denied the ability to register for a free trial, I figured out I could just close the free trial versus buy pop up and use the software. I was expecting this to be very much like Excel, given it is a Microsoft package; I was unclear why this tool was created given Excel exists and its chart related GUI could a use significant overhaul. I did not look at any documentation when working on this graph other than to look up the syntax for a formula to create a new column by multiplying an existing column by a scalar. Digging through Data The program would not accept MPG as a numerical variable because the missing data imported as string values. Given the common use of NA as placeholder for missing data, I think this is a weakness in a program that is suppose to pull datasets from multiple sources. I had to dig through layers to manually remove the rows containing the NA values, which makes them unavailable for analyzing anything not tied to MPG. My final graph has no distinctions between major and minor gridlines. Grouping by Manufacturer was easy, but resizing by Weight was not. Despite setting the variable as the column scaled by weight, all the values were adjusted rather than taking on relative sizes. I tried a few options that seemed like they should work, then gave up. I have very little inclination to use this tool in the future.

Technical Achievements

  • Missing Data: I figured out how to deal with missing data for the sake of this assignment, but the solution was not elegant.

Design Achievements

  • Overall Appearance: The graph does bare resemblance to the original graph, and I managed to get to that point fairly quickly without documentation.

Excel

Excel

Despite (many finance pros asserting you can pull Excel out of their cold, dead hands)[https://www.wsj.com/articles/finance-pros-say-youll-have-to-pry-excel-out-of-their-cold-dead-hands-1512060948], this is not a tool to use for data analysis and is one I begrudgingly use for data visualization. I was pleasantly surprised how well I was able to match the original graph, and relatively quickly, though I will not claim it was easy. The grouping by Manufacturer had to be done by manually creating series via scrolling through the imported csv. I am sure there must be a formulaic way to do this, but I did not have the patience to dig through tutorials. I also had to manually remove the rows with missing values because they were registering as a MPH of 0. I could have left them out of the range definition, but manipulating the domain and range via the chart GUI in Excel can be difficult as you cannot use the arrow keys without adding unintentional cells to your selection. I was able to place the major axis where I wanted them, though the graph insisted on retaining values through 0 unless I wanted the first major gridline labeled "8" at the original. The x-axis was argumentative when it came to mimicking the labeling scheme on the original plot. Otherwise, I was able to create the gridlines, legend, and mock the colors with a few mouse clicks.

Technical Achievements

  • None.

Design Achievements

  • Marker Size: I was able to scale the bubbles according to the weight of the car. Of the graphs, this is one of the easier programs for doing this.

Sheets

Sheets

Much to my surprise, Sheets was the easiest tool to use for this particular graph type. I say surprised, because you could not create a regression line in Sheets for a years, so I did not expect to pop open the chart types and see a bubble plot readily available. The set up was simple. data setup The bubble size is a bit excessive. The size scaled relative to the whole set, so changing the value did not change the size in the visual. Other than a legend for the bubble size, I was able to include the features included in the original graph. The lack of major axis lines was a choice, as I did not look carefully enough at the original, but could easily be added back. The sheet for this graph can be found (here)[https://docs.google.com/spreadsheets/d/1UafRR3-u2PKi2dsP6ZLQHxwwieuEPGQreKCQ7QnJn_M/edit?usp=sharing].

Technical Achievements

  • None.

Design Achievements

  • Marker Size: I was able to scale the bubbles according to the weight of the car. Of the graphs, this is one of the easier programs for doing this. Unfortunately, the scaling was relative to that of the set.

Numbers

Numbers

This may have been the worst program to try to use. I started off using the desktop version on my lab Mac. After a lot of tinkering, I was able to get the axis somewhat scaled, though the program would not accept steps greater than 100, and was auto numbering the axis tick marks without accepting how I was manually setting them. Additionally, the legend I set to show as no where to be found (see above). After a lot of struggle, I realized I had to set up each manufacturer as a series. However, at that point I was out of time to try, so I uploaded it into the cloud with the hope that I could figure out how using the iCloud version of the program. When I tried to use the iCloud version of the program, I found it had significantly less functionality when it came to graphing. I could not find a way to break the data down into series. Overall, creating this graph to fit specifications was a bust.

Technical Achievements

  • iCloud: I used my iCloud account for something other than checking an email address I no longer use.

Design Achievements

  • None. Using Numbers, I made this graph look hideous.

Missing Data Handling Summary

MatLab - automatically ignored missing values

ggplot2 - automatically ignored missing values

Stata - automatically ignores entries with missing data if the specific variable is called on

D3 - manually deleted the missing data

Tableau - automatically deletes the missing data, but makes note of it in the caption of an exported image

Plotly - automatically dealt with the missing data

Excel - manually had to delete missing data to remove the points from the x-axis

Sheets - automatically excluded missing data

Numbers - probably automatically removed them, the experience is one I would rather not remember

Power BI - does NOT deal well with missing entries. They were pulled in as strings, which made the entire column a string, and the observations had to be removed to convert MPG to a number.

About

Assignment graphing the same dataset with 10 different tools for CS573 Biovisualization Spring 2019

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published