Data Visualization

Astro 497, Week 11, Day 1

TableOfContents()

General Presentation Tips

What is the purpose of your presentation/slide/figure?

  • What is essential to accomplishing that goal?

  • What content is only periperial?

Who is your audience?

What do they...

  • Already know well?

  • Have already heard, but need a reminder?

  • Not know, but are well-prepared to understand?

  • Not know and need scaffolding to appreciate?

Choose complexity of figure (or entire presentation) to your audience.

Foldable("Example: Color-luminosity diagram of stars in local neighborhood",
md"""
$(RobustLocalResource("https://vlas.dev/post/gaia-dr2-hrd/gaia-hrd-dr2.png", "../_assets/week11/gaia-hrd-dr2.png", :height=>"80%", :alt=>"Color-Luminosity Diagram from Gaia DR2"))
--- Credit: [Gaia Collaboration (2018)](https://doi.org/10.1051/0004-6361/201832843) (DR2) & [Vlas Sokolov](https://vlas.dev/post/gaia-dr2-hrd/)
""")
Example: Color-luminosity diagram of stars in local neighborhood

Figures

Choose what variables to plot

Can you choose a set of variables that:

  • Shows what is most directly observable

  • Makes the relationship simpler or more obvious

  • Makes the figure more general ("switch to theorist units")

  • Makes the figure easier to compare to observations ("switch to observer units")

Would a transformation make the relationship (nearly) linear?

  • Log

  • Power-law

  • Divide by a baseline model

Foldable("Example: Transit timiing variations",md"""
- Most directly observable:  Transit times vs Time 
- Makes the relationship more obvious:  Residual of Transit time minus linear transit timing model 
- Observer units
  - X-axis: Transit times ($t_n$'s) in days or years
  - Y-axis: Residuals ($\Delta~t_n$'s) in minutes
- Theorist units
  - X-axis: Transit Number ($n$'s)
  - Y-axis: Residuals ($\Delta~t_n$'s) in fraction of orbital period
""")
Example: Transit timiing variations

  • Most directly observable: Transit times vs Time

  • Makes the relationship more obvious: Residual of Transit time minus linear transit timing model

  • Observer units

    • X-axis: Transit times ($t_n$'s) in days or years

    • Y-axis: Residuals ($\Delta~t_n$'s) in minutes

  • Theorist units

    • X-axis: Transit Number ($n$'s)

    • Y-axis: Residuals ($\Delta~t_n$'s) in fraction of orbital period

Figure type

Start with figure type that's well-suited for the data and purpose of the plot.

  • Points – measurements

  • Lines – predictions of models

  • Contours or Heatmap – 2-d histogram or probability density

  • Bar chart – comparing categorical data

  • Vector field or quiver – velocities in fluid flow

  • Box plot or Violin plots – comparing distributions"

let
    x = range(0,step=π/16,stop=π);
    X = repeat(x,1,length(x))
    Y = repeat(x',length(x))
    U = cos.(X.*Y)
    V = sin.(X.*Y)
    quiver_scale = 0.2
    plt = quiver(X,Y,quiver=(quiver_scale.*U,quiver_scale.*V), color=:black)
    Foldable("Example: Quiver Plot", plt)
end
Example: Quiver Plot

let
    plt = violin(repeat(["Type I", "Type II", "Type III"],outer=100),randn(300), legend=:none)
    Foldable("Example: Violin Plot",plt)
end
Example: Violin Plot

Fonts

  • Use font large enough to read

  • Sans serif is easier to read (especially if small)

  • Font includes accents or symbols to be used

  • Choose colors wisely

  • Proportional spacing vs Mono-spaced

begin	
    df = DataFrame(udec=Int64[], uhex=String[], Glyph=String[], LaTeX=String[], Description=String[])
    
    for i in (
        [0x00278, "ɸ", "\\ltphi", "Latin Small Letter Phi"] ,
        [0x003C6, "φ", "\\varphi", "Greek Small Letter Phi"] ,
        [0x003D5, "ϕ", "\\phi", "Greek Phi Symbol / Greek Small Letter Script Phi"] ,
        [0x01D60, "ᵠ", "\\^phi", "Modifier Letter Small Greek Phi"] ,
        [0x01D69, "ᵩ", "\\_phi", "Greek Subscript Small Letter Phi"] ,
        [0x1D6D7, "𝛗", "\\bfvarphi", "Mathematical Bold Small Phi"] ,
        [0x1D6DF, "𝛟", "\\bfphi", "Mathematical Bold Phi Symbol"] ,
        [0x1D711, "𝜑", "\\itphi", "Mathematical Italic Small Phi"] ,
        [0x1D719, "𝜙", "\\itvarphi", "Mathematical Italic Phi Symbol"] ,
        [0x1D74B, "𝝋", "\\biphi", "Mathematical Bold Italic Small Phi"] ,
        [0x1D753, "𝝓", "\\bivarphi", "Mathematical Bold Italic Phi Symbol"] ,
        [0x1D785, "𝞅", "\\bsansphi", "Mathematical Sans-Serif Bold Small Phi"] ,
        [0x1D78D, "𝞍", "\\bsansvarphi", "Mathematical Sans-Serif Bold Phi Symbol"] ,
        [0x1D7BF, "𝞿", "\\bisansphi", "Mathematical Sans-Serif Bold Italic Small Phi"] ,
        [0x1D7C7, "𝟇", "\\bisansvarphi", "Mathematical Sans-Serif Bold Italic Phi Symbol"])
        push!(df, (i[1], string(i[1], base=16), string(Char(i[1])), i[3], i[4]))
        
    end
    Foldable("Example of mathematical symbols",
md"""
$(df[!,3:5])
#### [JuliaMono](https://juliamono.netlify.app/) has great coverage. 
"""
    )
end
Example of mathematical symbols

15×3 DataFrame
RowGlyphLaTeXDescription
StringStringString
1ɸ\\ltphiLatin Small Letter Phi
2φ\\varphiGreek Small Letter Phi
3ϕ\\phiGreek Phi Symbol / Greek Small Letter Script Phi
4\\^phiModifier Letter Small Greek Phi
5\\_phiGreek Subscript Small Letter Phi
6𝛗\\bfvarphiMathematical Bold Small Phi
7𝛟\\bfphiMathematical Bold Phi Symbol
8𝜑\\itphiMathematical Italic Small Phi
9𝜙\\itvarphiMathematical Italic Phi Symbol
10𝝋\\biphiMathematical Bold Italic Small Phi
11𝝓\\bivarphiMathematical Bold Italic Phi Symbol
12𝞅\\bsansphiMathematical Sans-Serif Bold Small Phi
13𝞍\\bsansvarphiMathematical Sans-Serif Bold Phi Symbol
14𝞿\\bisansphiMathematical Sans-Serif Bold Italic Small Phi
15𝟇\\bisansvarphiMathematical Sans-Serif Bold Italic Phi Symbol

JuliaMono has great coverage.

Axes

  • Label axes

  • Specify units (unless dimensionless)

  • ≥ 3 tick marks per axis

  • Plenty of space between axis labels

Axis Range

Don't just accept default axis scales!

  • Scale that includes all points may hide important variations.

  • Zooming in all the way can give the impression that variations are large, regardless of whether they are small or large.

Questions to ask:

  • Does zero (or one) have significance?

  • Linear vs Log?

  • Would showing residuals to a baseline model be more helpful (or just more confusing)?

let
    plt1 = plot(xlabel="Transit Number", ylabel="Time (d)", 
            markersize=1, legend=:none)
    n = 100
    tr_num = collect(1:n)
    period = 10.0
    t0 = 1000.0
    t_linear = t0 .+ tr_num .* period
    p_ttv = 365
    t = t_linear .+ 10/(24*60) .* sin.(2π.*t_linear/p_ttv)
    t .+= 5/(24*60) *randn(n)
    resid = (t.-t_linear).*(24*60)
    σt = ones(n).*15 ./(24*60)
    mask = rand(n) .< 0.75
    scatter!(plt1, tr_num[mask], t[mask], )
    plt2 = plot(xlabel="Time (days)", ylabel="Δt (min)", 
            markersize=2, legend=:none)
    σt .*= (24*60)
    if plt_errorbars=="All"
        scatter!(plt2, t[mask], resid[mask], yerr=σt[mask], )
    else
        scatter!(plt2, t[mask], resid[mask], )
        if plt_errorbars == "One"
            scatter!(plt2, [t[mask][end]], [resid[mask][end]], yerr=[σt[mask][end]], markercolor=1)
        end
    end
    

    plot(plt1, plt2, layout=(2,1) )
end

Show erorrbars?  

Lines

  • Increase line width (or weight) when used for print or projection

  • Hard to tell apart more than 4 line styles

  • Distinguish lines with both color and line style

let
    plt = plot( xlabel="X", ylabel="Y", color_palette=cs_ex_categories_unordered, legend=:none)
    x = 0:0.1:2π
    y = sin.(x)
    lw = 4
    plot!(plt, x, y, linewidth=lw, linestyle= :solid)
    plot!(plt, x, y.+0.1, linewidth=lw, linestyle= :dash)
    plot!(plt, x, y.+0.2, linewidth=lw, linestyle= :dot)
    plot!(plt, x, y.+0.3, linewidth=lw, linestyle= :dashdot)
    plot!(plt, x, y.+0.4, linewidth=lw, linestyle= :dashdotdot)
end

Points

  • Increase point size, especially for print or projection

  • Hard to tell apart more than ~5 point (marker) shapes

  • Too many points overlapping can be misleading

  • Convey measurement uncertainties

let
    plt = plot( xlabel="X", ylabel="Y", color_palette=ColorScheme(cvd_dict[cvd].(ColorSchemes.Paired_12)), legend=:none)
    x = 0:0.1:2π
    y = sin.(x)
    lw = 4
    shape_wo_stroke = [:circle, :rect, :star5, :diamond, :hexagon, :utriangle, :dtriangle, :rtriangle, :ltriangle, :pentagon, :heptagon, :octagon, :star4, :star6, :star7, :star8]
    shape_w_stroke = [:cross, :xcross, :vline, :hline]
    for i in 1:length(shape_wo_stroke)
        scatter!(plt, x, y.+i.*0.1, markershape=shape_wo_stroke[i], markersize=3.5, markerstrokewidth=0)
    end
    for i in 1:length(shape_w_stroke)
        j = i + length(shape_wo_stroke)
        scatter!(plt, x, y.+j.*0.1, markershape=shape_w_stroke[i], markersize=3.5, markerstrokewidth=3)
    end
    plt
end
a, b = randn(4_000), randn(4_000);
let
    if plt_heatmap
        plt = histogram2d(a,b, bins=40, xlabel="X", ylabel="Y", color_palette=:lajolla, legend=:none)
    else
        plt = plot(xlabel="X", ylabel="Y", color_palette=:lajolla, legend=:none)
    end
    xlims!(-3.5, 3.5)
    ylims!(-3.5, 3.5)
    dens = kde((a,b))
    dens_interp = InterpKDE(dens)
    
    mask = map(i->pdf(dens_interp, a[i], b[i]) .<= plt_point_threshold, 1:length(a) )
    #mask = trues(length(a))
    
    scatter!(plt, a[mask],b[mask], markercolor=:blue, markersize=pointsize, markerstrokewidth=0)
    if plt_contours
        contour_color = plt_heatmap ? :yellow : :blue
        plot!(plt, dens)
    end
    plt
end

Contours: Heatmap:

Point size: Plot points inside contours:

Colors

Why are you using color?

1. To convey additional information

begin
    local t = range(0,stop=5,length=500)
    y = 10 .+ (2.0.+t./20) .* sin.(2π.*(t.+0.2.*sin.(2π.*t./20)))
    tmod1 = mod.(t,1)
    plt_dim1 = scatter(t, y, markerz=t, markersize=2.5, markerstrokewidth=0, xlabel = "Time", ylabel="Flux", legend=:none)
    plt_dim2 = scatter(tmod1, y, markerz=t, markersize=2.5, markerstrokewidth=0, xlabel = "Time modulo 1", ylabel="Flux",legend=:none)
    plt_dim = plot(plt_dim1, plt_dim2, layout=(2,1) )
end
#$plt_dim

md"""
#### 2. Draw attention to one element
"""

2. Draw attention to one element

begin  # Microlensing magnification for single lens
    A(u) = (2+u^2)/(u*sqrt(4+u^2))
    u(t; u0::Real, t0::Real=zero(t), tE::Real=one(t) ) = sqrt(u0^2+(t-t0)^2/tE^2)
end;
begin
    plt_accent = plot(xlabel = "Time", ylabel="Magnification", legend=:none)
    local t = -5:0.02:5
    local flux = 0.5 .*( A.(u.(t,u0=0.75)) .+ A.(u.(t,u0=2,t0=2,tE=0.05)) )  
    flux .+= 0.005.*randn(length(flux))
    scatter!(plt_accent, t, flux, markersize=2, markerstrokewidth=0 )
    local idx_accent = findall(t->abs(t-2)<0.15, t)
    scatter!(plt_accent, t[idx_accent], flux[idx_accent], markersize=2, markerstrokewidth=0 )
end
#$plt_accent

md"""
#### 3. To fit in with color palette
$cs_ex_psu_penn
"""

3. To fit in with color palette

begin
    plt_theme = plot(xlabel = "Time", ylabel="Magnification", legend=:none, 
        fg_color_subplot=cs_pennsylvania[2], markersize=2.5, framestyle=:box,
        fontfamily="Bookman Demi", color_palette=cs_ex_psu_accent
        )
    local λ = 5428:0.05:5432
    line(λ; λ0=zero(λ), depth=1, width=7000/3e8 ) = 1-depth*exp(-0.5*((λ-λ0)^2/(λ0*width)^2))
    local flux = line.(λ, λ0=5430).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux,  markercolor=2, markerstrokewidth=0, )
    flux = 0.3.+line.(λ, λ0=5430.05).+0.05.*randn(length(λ))  
    scatter!(plt_theme, λ, flux, markercolor=3, markerstrokewidth=0, )
    flux = 0.6.+line.(λ, λ0=5430.10).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux, markercolor=4, markerstrokewidth=0, )
    flux = 0.9.+line.(λ, λ0=5430.15).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux, markercolor=5, markerstrokewidth=0, )
    flux = 1.2.+line.(λ, λ0=5430.20).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux, markercolor=6, markerstrokewidth=0, )
end

Choosing a Color Palette

Continuous Values

  • Should it be perceptually uniform?

    • Otherwise can give an inaccurate impression

  • What if you print it in black & white?

  • Is there sufficient contrast?

    • Especially important when using projector

    • Can truncate palette to avoid using too light a color

  • How is perceived by people with a color vision deficiency?

Examples:

  • Linear (e.g., lajolla)

vs vs

  • Diverging (e.g., vik)

vs

  • Cyclic (e.g., cyclic_protanopic_deuteranopic_bwyk_16_96_c31_n256)

vs

Color Vision Deficiency:

Categories

  • Unordered

  • Ordered Categories

  • Paired

Other common mistakes

  • Cramming too much information into a figure

    • The goal of visualization is to make it easy for the audience to understand, and not to show off your skill at making complicated plots.

    • When giving a presentation, can build up a figure step-by-step to turn complex figure into a story.

  • Making plot 3d when 2d would be easier to interpret

  • Using yellows in a presentation to be displayed with a projector

Keep your eyes open for...

  • Particularly effective plots

  • Poorly executed plots

  • Plot that are complex, but still readable thanks to good design

Think about why they were good/bad and what you can learn from it.

Setup & Helper Code

begin
    using PlutoUI, PlutoTeachingTools
    using Plots, Plots.PlotMeasures, LaTeXStrings
    using StatsPlots, KernelDensity
    using Colors, ColorSchemes, FixedPointNumbers
    using DataFrames
end
ChooseDisplayMode()
     

Figures

PSU Palletes

cs_pennsylvania = ColorScheme(reinterpret.(RGB24,[0x001E44, 0x1E407C, 0x009CDE, 0x314D64, 0x3EA39E, 0xA2AAAD, 0xffffff]))
cs_classic_accent = ColorScheme(reinterpret.(RGB24,[0x6A3028,0xB88965,0xBF8226,0x4A7729,0x96BEE6,0xAC8DCE,0x444444,0xBC204B]))
cs_vibrant_accent = ColorScheme(reinterpret.(RGB24,[0xF2665E,0xE98300,0xFFD100,0x99CC00,0x008755,0x491D70,0x000321]))

Color vision deficiency code

begin
    cvd_names = ["None", "Protanopic","Deuteranopic","Tritanopic","Greyscale"]
    cvd_funcs = [identity, protanopic, deuteranopic, tritanopic,Gray]
    cvd_dict = Dict(zip(cvd_names,cvd_funcs))
end;
begin
    cs_ex_linear = ColorScheme(cvd_dict[cvd].(ColorSchemes.lajolla))
    cs_ex_diverge = ColorScheme(cvd_dict[cvd].(ColorSchemes.vik))
    cs_ex_cyclic = ColorScheme(cvd_dict[cvd].(ColorSchemes.cyclic_protanopic_deuteranopic_bwyk_16_96_c31_n256))
    
    cs_ex_categories_unordered = ColorScheme(cvd_dict[cvd].(ColorSchemes.tol_bright))
    cs_ex_categories_ordered = ColorScheme(cvd_dict[cvd].(ColorSchemes.YlGnBu_6))
    cs_ex_paired = ColorScheme(cvd_dict[cvd].(ColorSchemes.Paired_6))

    cs_ex_psu_penn = ColorScheme(cvd_dict[cvd].(cs_pennsylvania))
    cs_ex_psu_accent = ColorScheme(cvd_dict[cvd].(cs_classic_accent))
    cs_ex_psu_vibrant = ColorScheme(cvd_dict[cvd].(cs_vibrant_accent))
    end;
begin
    cs_ex_not_linear1 = ColorScheme(cvd_dict[cvd].(reverse(ColorSchemes.hot)))
    cs_ex_not_linear2 = ColorScheme(cvd_dict[cvd].(ColorSchemes.jet1))
    cs_ex_not_diverge = ColorScheme(cvd_dict[cvd].(ColorSchemes.coolwarm))
    cs_ex_not_cyclic = ColorScheme(cvd_dict[cvd].(ColorSchemes.phase))
end;

Built with Julia 1.8.2 and

ColorSchemes 3.19.0
Colors 0.12.8
DataFrames 1.4.2
FixedPointNumbers 0.8.4
KernelDensity 0.6.5
LaTeXStrings 1.3.0
Plots 1.35.5
PlutoTeachingTools 0.2.3
PlutoUI 0.7.48
StatsPlots 0.15.4

To run this tutorial locally, download this file and open it with Pluto.jl.

To run this tutorial locally, download this file and open it with Pluto.jl.

To run this tutorial locally, download this file and open it with Pluto.jl.

To run this tutorial locally, download this file and open it with Pluto.jl.