Data Visualization

Astro 497, Week 11, Day 1

TableOfContents()

General Presentation Tips

What is the purpose of your presentation/slide/figure?

What is essential to accomplishing that goal?
What content is only periperial?

Who is your audience?

What do they...

Already know well?
Have already heard, but need a reminder?
Not know, but are well-prepared to understand?
Not know and need scaffolding to appreciate?

Choose complexity of figure (or entire presentation) to your audience.

Foldable("Example: Color-luminosity diagram of stars in local neighborhood",
md"""
$(RobustLocalResource("https://vlas.dev/post/gaia-dr2-hrd/gaia-hrd-dr2.png", "../_assets/week11/gaia-hrd-dr2.png", :height=>"80%", :alt=>"Color-Luminosity Diagram from Gaia DR2"))
--- Credit: [Gaia Collaboration (2018)](https://doi.org/10.1051/0004-6361/201832843) (DR2) & [Vlas Sokolov](https://vlas.dev/post/gaia-dr2-hrd/)
""")

Example: Color-luminosity diagram of stars in local neighborhood

–- Credit: Gaia Collaboration (2018) (DR2) & Vlas Sokolov

Figures

Choose what variables to plot

Can you choose a set of variables that:

Shows what is most directly observable
Makes the relationship simpler or more obvious
Makes the figure more general ("switch to theorist units")
Makes the figure easier to compare to observations ("switch to observer units")

Would a transformation make the relationship (nearly) linear?

Log
Power-law
Divide by a baseline model

Foldable("Example: Transit timiing variations",md"""
- Most directly observable:  Transit times vs Time 
- Makes the relationship more obvious:  Residual of Transit time minus linear transit timing model 
- Observer units
  - X-axis: Transit times ($t_n$'s) in days or years
  - Y-axis: Residuals ($\Delta~t_n$'s) in minutes
- Theorist units
  - X-axis: Transit Number ($n$'s)
  - Y-axis: Residuals ($\Delta~t_n$'s) in fraction of orbital period
""")

Example: Transit timiing variations

Most directly observable: Transit times vs Time
Makes the relationship more obvious: Residual of Transit time minus linear transit timing model
Observer units
- X-axis: Transit times ($t_n$'s) in days or years
- Y-axis: Residuals ($\Delta~t_n$'s) in minutes
Theorist units
- X-axis: Transit Number ($n$'s)
- Y-axis: Residuals ($\Delta~t_n$'s) in fraction of orbital period

Figure type

Start with figure type that's well-suited for the data and purpose of the plot.

Points – measurements
Lines – predictions of models
Contours or Heatmap – 2-d histogram or probability density
Bar chart – comparing categorical data
Vector field or quiver – velocities in fluid flow
Box plot or Violin plots – comparing distributions"

let
    x = range(0,step=π/16,stop=π);
    X = repeat(x,1,length(x))
    Y = repeat(x',length(x))
    U = cos.(X.*Y)
    V = sin.(X.*Y)
    quiver_scale = 0.2
    plt = quiver(X,Y,quiver=(quiver_scale.*U,quiver_scale.*V), color=:black)
    Foldable("Example: Quiver Plot", plt)
end

Example: Quiver Plot

let
    plt = violin(repeat(["Type I", "Type II", "Type III"],outer=100),randn(300), legend=:none)
    Foldable("Example: Violin Plot",plt)
end

Example: Violin Plot

Fonts

Use font large enough to read
Sans serif is easier to read (especially if small)
Font includes accents or symbols to be used
Choose colors wisely
Proportional spacing vs Mono-spaced

begin	
    df = DataFrame(udec=Int64[], uhex=String[], Glyph=String[], LaTeX=String[], Description=String[])
    
    for i in (
        [0x00278, "ɸ", "\\ltphi", "Latin Small Letter Phi"] ,
        [0x003C6, "φ", "\\varphi", "Greek Small Letter Phi"] ,
        [0x003D5, "ϕ", "\\phi", "Greek Phi Symbol / Greek Small Letter Script Phi"] ,
        [0x01D60, "ᵠ", "\\^phi", "Modifier Letter Small Greek Phi"] ,
        [0x01D69, "ᵩ", "\\_phi", "Greek Subscript Small Letter Phi"] ,
        [0x1D6D7, "𝛗", "\\bfvarphi", "Mathematical Bold Small Phi"] ,
        [0x1D6DF, "𝛟", "\\bfphi", "Mathematical Bold Phi Symbol"] ,
        [0x1D711, "𝜑", "\\itphi", "Mathematical Italic Small Phi"] ,
        [0x1D719, "𝜙", "\\itvarphi", "Mathematical Italic Phi Symbol"] ,
        [0x1D74B, "𝝋", "\\biphi", "Mathematical Bold Italic Small Phi"] ,
        [0x1D753, "𝝓", "\\bivarphi", "Mathematical Bold Italic Phi Symbol"] ,
        [0x1D785, "𝞅", "\\bsansphi", "Mathematical Sans-Serif Bold Small Phi"] ,
        [0x1D78D, "𝞍", "\\bsansvarphi", "Mathematical Sans-Serif Bold Phi Symbol"] ,
        [0x1D7BF, "𝞿", "\\bisansphi", "Mathematical Sans-Serif Bold Italic Small Phi"] ,
        [0x1D7C7, "𝟇", "\\bisansvarphi", "Mathematical Sans-Serif Bold Italic Phi Symbol"])
        push!(df, (i[1], string(i[1], base=16), string(Char(i[1])), i[3], i[4]))
        
    end
    Foldable("Example of mathematical symbols",
md"""
$(df[!,3:5])
#### [JuliaMono](https://juliamono.netlify.app/) has great coverage. 
"""
    )
end

Example of mathematical symbols

15×3 DataFrame

Row	Glyph	LaTeX	Description
	String	String	String
1	ɸ	\\ltphi	Latin Small Letter Phi
2	φ	\\varphi	Greek Small Letter Phi
3	ϕ	\\phi	Greek Phi Symbol / Greek Small Letter Script Phi
4	ᵠ	\\^phi	Modifier Letter Small Greek Phi
5	ᵩ	\\_phi	Greek Subscript Small Letter Phi
6	𝛗	\\bfvarphi	Mathematical Bold Small Phi
7	𝛟	\\bfphi	Mathematical Bold Phi Symbol
8	𝜑	\\itphi	Mathematical Italic Small Phi
9	𝜙	\\itvarphi	Mathematical Italic Phi Symbol
10	𝝋	\\biphi	Mathematical Bold Italic Small Phi
11	𝝓	\\bivarphi	Mathematical Bold Italic Phi Symbol
12	𝞅	\\bsansphi	Mathematical Sans-Serif Bold Small Phi
13	𝞍	\\bsansvarphi	Mathematical Sans-Serif Bold Phi Symbol
14	𝞿	\\bisansphi	Mathematical Sans-Serif Bold Italic Small Phi
15	𝟇	\\bisansvarphi	Mathematical Sans-Serif Bold Italic Phi Symbol

JuliaMono has great coverage.

Axes

Label axes
Specify units (unless dimensionless)
≥ 3 tick marks per axis
Plenty of space between axis labels

Axis Range

Don't just accept default axis scales!

Scale that includes all points may hide important variations.
Zooming in all the way can give the impression that variations are large, regardless of whether they are small or large.

Questions to ask:

Does zero (or one) have significance?
Linear vs Log?
Would showing residuals to a baseline model be more helpful (or just more confusing)?

let
    plt1 = plot(xlabel="Transit Number", ylabel="Time (d)", 
            markersize=1, legend=:none)
    n = 100
    tr_num = collect(1:n)
    period = 10.0
    t0 = 1000.0
    t_linear = t0 .+ tr_num .* period
    p_ttv = 365
    t = t_linear .+ 10/(24*60) .* sin.(2π.*t_linear/p_ttv)
    t .+= 5/(24*60) *randn(n)
    resid = (t.-t_linear).*(24*60)
    σt = ones(n).*15 ./(24*60)
    mask = rand(n) .< 0.75
    scatter!(plt1, tr_num[mask], t[mask], )
    plt2 = plot(xlabel="Time (days)", ylabel="Δt (min)", 
            markersize=2, legend=:none)
    σt .*= (24*60)
    if plt_errorbars=="All"
        scatter!(plt2, t[mask], resid[mask], yerr=σt[mask], )
    else
        scatter!(plt2, t[mask], resid[mask], )
        if plt_errorbars == "One"
            scatter!(plt2, [t[mask][end]], [resid[mask][end]], yerr=[σt[mask][end]], markercolor=1)
        end
    end
    

    plot(plt1, plt2, layout=(2,1) )
end

Show erorrbars?

Lines

Increase line width (or weight) when used for print or projection
Hard to tell apart more than 4 line styles
Distinguish lines with both color and line style

let
    plt = plot( xlabel="X", ylabel="Y", color_palette=cs_ex_categories_unordered, legend=:none)
    x = 0:0.1:2π
    y = sin.(x)
    lw = 4
    plot!(plt, x, y, linewidth=lw, linestyle= :solid)
    plot!(plt, x, y.+0.1, linewidth=lw, linestyle= :dash)
    plot!(plt, x, y.+0.2, linewidth=lw, linestyle= :dot)
    plot!(plt, x, y.+0.3, linewidth=lw, linestyle= :dashdot)
    plot!(plt, x, y.+0.4, linewidth=lw, linestyle= :dashdotdot)
end

Points

Increase point size, especially for print or projection
Hard to tell apart more than ~5 point (marker) shapes
Too many points overlapping can be misleading
Convey measurement uncertainties

let
    plt = plot( xlabel="X", ylabel="Y", color_palette=ColorScheme(cvd_dict[cvd].(ColorSchemes.Paired_12)), legend=:none)
    x = 0:0.1:2π
    y = sin.(x)
    lw = 4
    shape_wo_stroke = [:circle, :rect, :star5, :diamond, :hexagon, :utriangle, :dtriangle, :rtriangle, :ltriangle, :pentagon, :heptagon, :octagon, :star4, :star6, :star7, :star8]
    shape_w_stroke = [:cross, :xcross, :vline, :hline]
    for i in 1:length(shape_wo_stroke)
        scatter!(plt, x, y.+i.*0.1, markershape=shape_wo_stroke[i], markersize=3.5, markerstrokewidth=0)
    end
    for i in 1:length(shape_w_stroke)
        j = i + length(shape_wo_stroke)
        scatter!(plt, x, y.+j.*0.1, markershape=shape_w_stroke[i], markersize=3.5, markerstrokewidth=3)
    end
    plt
end

a, b = randn(4_000), randn(4_000);

let
    if plt_heatmap
        plt = histogram2d(a,b, bins=40, xlabel="X", ylabel="Y", color_palette=:lajolla, legend=:none)
    else
        plt = plot(xlabel="X", ylabel="Y", color_palette=:lajolla, legend=:none)
    end
    xlims!(-3.5, 3.5)
    ylims!(-3.5, 3.5)
    dens = kde((a,b))
    dens_interp = InterpKDE(dens)
    
    mask = map(i->pdf(dens_interp, a[i], b[i]) .<= plt_point_threshold, 1:length(a) )
    #mask = trues(length(a))
    
    scatter!(plt, a[mask],b[mask], markercolor=:blue, markersize=pointsize, markerstrokewidth=0)
    if plt_contours
        contour_color = plt_heatmap ? :yellow : :blue
        plot!(plt, dens)
    end
    plt
end

Contours: Heatmap:

Point size: Plot points inside contours:

Colors

Why are you using color?

1. To convey additional information

begin
    local t = range(0,stop=5,length=500)
    y = 10 .+ (2.0.+t./20) .* sin.(2π.*(t.+0.2.*sin.(2π.*t./20)))
    tmod1 = mod.(t,1)
    plt_dim1 = scatter(t, y, markerz=t, markersize=2.5, markerstrokewidth=0, xlabel = "Time", ylabel="Flux", legend=:none)
    plt_dim2 = scatter(tmod1, y, markerz=t, markersize=2.5, markerstrokewidth=0, xlabel = "Time modulo 1", ylabel="Flux",legend=:none)
    plt_dim = plot(plt_dim1, plt_dim2, layout=(2,1) )
end

#$plt_dim

md"""
#### 2. Draw attention to one element
"""

2. Draw attention to one element

begin  # Microlensing magnification for single lens
    A(u) = (2+u^2)/(u*sqrt(4+u^2))
    u(t; u0::Real, t0::Real=zero(t), tE::Real=one(t) ) = sqrt(u0^2+(t-t0)^2/tE^2)
end;

begin
    plt_accent = plot(xlabel = "Time", ylabel="Magnification", legend=:none)
    local t = -5:0.02:5
    local flux = 0.5 .*( A.(u.(t,u0=0.75)) .+ A.(u.(t,u0=2,t0=2,tE=0.05)) )  
    flux .+= 0.005.*randn(length(flux))
    scatter!(plt_accent, t, flux, markersize=2, markerstrokewidth=0 )
    local idx_accent = findall(t->abs(t-2)<0.15, t)
    scatter!(plt_accent, t[idx_accent], flux[idx_accent], markersize=2, markerstrokewidth=0 )
end

#$plt_accent

md"""
#### 3. To fit in with color palette
$cs_ex_psu_penn
"""

3. To fit in with color palette

begin
    plt_theme = plot(xlabel = "Time", ylabel="Magnification", legend=:none, 
        fg_color_subplot=cs_pennsylvania[2], markersize=2.5, framestyle=:box,
        fontfamily="Bookman Demi", color_palette=cs_ex_psu_accent
        )
    local λ = 5428:0.05:5432
    line(λ; λ0=zero(λ), depth=1, width=7000/3e8 ) = 1-depth*exp(-0.5*((λ-λ0)^2/(λ0*width)^2))
    local flux = line.(λ, λ0=5430).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux,  markercolor=2, markerstrokewidth=0, )
    flux = 0.3.+line.(λ, λ0=5430.05).+0.05.*randn(length(λ))  
    scatter!(plt_theme, λ, flux, markercolor=3, markerstrokewidth=0, )
    flux = 0.6.+line.(λ, λ0=5430.10).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux, markercolor=4, markerstrokewidth=0, )
    flux = 0.9.+line.(λ, λ0=5430.15).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux, markercolor=5, markerstrokewidth=0, )
    flux = 1.2.+line.(λ, λ0=5430.20).+0.05.*randn(length(λ))
    scatter!(plt_theme, λ, flux, markercolor=6, markerstrokewidth=0, )
end

Choosing a Color Palette

Continuous Values

Should it be perceptually uniform?
- Otherwise can give an inaccurate impression
What if you print it in black & white?
Is there sufficient contrast?
- Especially important when using projector
- Can truncate palette to avoid using too light a color
How is perceived by people with a color vision deficiency?

Examples:

Linear (e.g., lajolla)

vs vs

Diverging (e.g., vik)

Cyclic (e.g., cyclic_protanopic_deuteranopic_bwyk_16_96_c31_n256)

Color Vision Deficiency:

Other common mistakes

Cramming too much information into a figure
- The goal of visualization is to make it easy for the audience to understand, and not to show off your skill at making complicated plots.
- When giving a presentation, can build up a figure step-by-step to turn complex figure into a story.
Making plot 3d when 2d would be easier to interpret
Using yellows in a presentation to be displayed with a projector

Keep your eyes open for...

Particularly effective plots
Poorly executed plots
Plot that are complex, but still readable thanks to good design

Think about why they were good/bad and what you can learn from it.

Setup & Helper Code

begin
    using PlutoUI, PlutoTeachingTools
    using Plots, Plots.PlotMeasures, LaTeXStrings
    using StatsPlots, KernelDensity
    using Colors, ColorSchemes, FixedPointNumbers
    using DataFrames
end

ChooseDisplayMode()

Full Width Mode Present Mode

Figures

PSU Palletes

cs_pennsylvania = ColorScheme(reinterpret.(RGB24,[0x001E44, 0x1E407C, 0x009CDE, 0x314D64, 0x3EA39E, 0xA2AAAD, 0xffffff]))

cs_classic_accent = ColorScheme(reinterpret.(RGB24,[0x6A3028,0xB88965,0xBF8226,0x4A7729,0x96BEE6,0xAC8DCE,0x444444,0xBC204B]))

cs_vibrant_accent = ColorScheme(reinterpret.(RGB24,[0xF2665E,0xE98300,0xFFD100,0x99CC00,0x008755,0x491D70,0x000321]))

Color vision deficiency code

begin
    cvd_names = ["None", "Protanopic","Deuteranopic","Tritanopic","Greyscale"]
    cvd_funcs = [identity, protanopic, deuteranopic, tritanopic,Gray]
    cvd_dict = Dict(zip(cvd_names,cvd_funcs))
end;

begin
    cs_ex_linear = ColorScheme(cvd_dict[cvd].(ColorSchemes.lajolla))
    cs_ex_diverge = ColorScheme(cvd_dict[cvd].(ColorSchemes.vik))
    cs_ex_cyclic = ColorScheme(cvd_dict[cvd].(ColorSchemes.cyclic_protanopic_deuteranopic_bwyk_16_96_c31_n256))
    
    cs_ex_categories_unordered = ColorScheme(cvd_dict[cvd].(ColorSchemes.tol_bright))
    cs_ex_categories_ordered = ColorScheme(cvd_dict[cvd].(ColorSchemes.YlGnBu_6))
    cs_ex_paired = ColorScheme(cvd_dict[cvd].(ColorSchemes.Paired_6))

    cs_ex_psu_penn = ColorScheme(cvd_dict[cvd].(cs_pennsylvania))
    cs_ex_psu_accent = ColorScheme(cvd_dict[cvd].(cs_classic_accent))
    cs_ex_psu_vibrant = ColorScheme(cvd_dict[cvd].(cs_vibrant_accent))
    end;

begin
    cs_ex_not_linear1 = ColorScheme(cvd_dict[cvd].(reverse(ColorSchemes.hot)))
    cs_ex_not_linear2 = ColorScheme(cvd_dict[cvd].(ColorSchemes.jet1))
    cs_ex_not_diverge = ColorScheme(cvd_dict[cvd].(ColorSchemes.coolwarm))
    cs_ex_not_cyclic = ColorScheme(cvd_dict[cvd].(ColorSchemes.phase))
end;

Built with Julia 1.8.2 and

ColorSchemes 3.19.0
Colors 0.12.8
DataFrames 1.4.2
FixedPointNumbers 0.8.4
KernelDensity 0.6.5
LaTeXStrings 1.3.0
Plots 1.35.5
PlutoTeachingTools 0.2.3
PlutoUI 0.7.48
StatsPlots 0.15.4

To run this tutorial locally, download this file and open it with Pluto.jl.

Astro 497

Data Visualization

General Presentation Tips

What is the purpose of your presentation/slide/figure?

Who is your audience?

Figures

Choose what variables to plot

Figure type

Fonts

JuliaMono has great coverage.

Axes

Axis Range

Lines

Points

Colors

Why are you using color?

1. To convey additional information

2. Draw attention to one element

3. To fit in with color palette

Choosing a Color Palette

Continuous Values

Categories

Other common mistakes

Keep your eyes open for...

Setup & Helper Code

Figures

PSU Palletes

Color vision deficiency code