Journal graphics

A bit biased here… but I think P-value functions for main effects may be more useful than graphing 95% interval estimates

Relatively simple to graph with the concurve R package

6 Likes

i would just note that sas graphs have improved a lot with annotate and sgplot etc, and there’s a free ebook here which includes medical data (chapter 4): http://support.sas.com/publishing/vds.pdf, the author’s sas code is available here: https://support.sas.com/en/books/authors/sanjay-matange.html

2 Likes

I know that SAS graphics have improved a lot, but my own personal opinion is that graphics in other packages has improved faster. And with R you can easily convert many of the graphics to be interactive.

4 Likes

is “interactive” good with all the reproducibility talk? one good thing about sas is it reproduces the same figure exactly - any adjustments are within the code and not made after the image is created. I want to get back into matlab, i think it feels like R + great graphics, although i don’t know R well…

I like this short <5min read on violin plots from https://twitter.com/kathmwood: https://inattentionalcoffee.wordpress.com/2017/02/14/data-in-the-raw-violin-plots/

1 Like

Definitely, since I’m really referring to partial interactivity, e.g. drill down to see more information. And one of our hotshot R developers showed me how to create an RMarkdown html report that has a click box in it that allows the user to view printable static graphs vs. semi-interactive graphs when the statistician has dual code chunks to handle the two tasks (e.g., one chunk calling ggplot and the other calling plotly).

1 Like

wow great discussion.
where can I find the code for the spike histogram in the eg
thanks

Click on the link just above the plot, and look for spike histograms in that document. Then click on Code there to see the code. That particular graphic used plotly to produce an interactive graphic, which uses javascript and html after calling from R.

1 Like

A problem with violin plots is their resemblance with a certain body part:

(Someone else made a point about it before XKCD, but I couldn’t find it right now.)

Half-eye plots or joy plots don’t have this issue.

1 Like

I dream to replace the infamous “Table 1.” with better visualizations…
Toying with the ggridges functions:

```R
library(survival)
library(tidyverse)
library(ggridges)
mgus2 %>% 
  pivot_longer(names_to="cont_var",values_to="value",-c(id,sex,dxyr,ptime,pstat,futime,death)) %>% 
  mutate(y=1,
         cont_var=(recode(cont_var,
         age="Age at diagnosis",
         creat="Serum creatinine",
         hgb="Hemoglobin concentration",
         mspike="M-spike (g/dL)"))) %>% 
  ggplot(aes(x = value,y=y)) +
  geom_density_ridges(
    jittered_points = TRUE,
    position = position_points_jitter(width = 0.05, height = 0),
    point_shape = '|', point_size = 3, point_alpha = 1, alpha = 0.7,
  )+
  stat_density_ridges(
      geom = "density_ridges_gradient", calc_ecdf = TRUE,
      quantiles = 4, quantile_lines = TRUE
    )+
  theme_ridges()+
  theme(axis.title=element_blank(),axis.title.y=element_blank())+
  facet_wrap(~cont_var,scales="free",ncol=1)


Using dots instead of the jittered lines:

mgus2 %>% 
  pivot_longer(names_to="cont_var",values_to="value",-c(id,sex,dxyr,ptime,pstat,futime,death)) %>% 
  mutate(y=1,
         cont_var=(recode(cont_var,
                          age="Age at diagnosis (years)",
                          creat="Serum creatinine (mg/dL)",
                          hgb="Hemoglobin (concentration (g/dL)",
                          mspike="M-spike (g/dL)"))) %>% 
  ggplot(aes(x = value,y=y)) +
  geom_density_ridges(
    jittered_points = TRUE,
    position = "raincloud",
    point_alpha = 0.2,
    alpha = 0.2,
    scale=0.9
  )+
  stat_density_ridges(
    geom = "density_ridges_gradient", calc_ecdf = TRUE,
    quantiles = 4, quantile_lines = TRUE
  )+
  theme_ridges()+
  theme(axis.title=element_blank(),axis.title.y=element_blank())+
  facet_wrap(~cont_var,scales="free",ncol=2,nrow=2)


Adding histograms:

mgus2 %>% 
  #  summarize_at(vars(var),median,na.rm=TRUE)
  ggplot(aes(x = hgb,y=y)) +
  geom_density_ridges(
#    jittered_points = TRUE,
#    position = "raincloud",
    point_alpha = 0.2,
    alpha = 0.2,
    scale=0.9
  )+
  stat_density_ridges(
    geom = "density_ridges_gradient", calc_ecdf = TRUE,
    quantiles = 4, quantile_lines = TRUE
  )+
  theme_ridges()+
  scale_x_continuous(limits = c(0,22))+
  theme(axis.title=element_blank(),
        axis.title.y=element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        axis.line.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.margin=margin(margin(b=0))) -> p1

mgus2 %>% 
  #  summarize_at(vars(var),median,na.rm=TRUE)
  filter(!is.na(hgb)) %>% 
  ggplot(aes(x = hgb,y=y)) +
  geom_col()+
  theme_ridges()+
  theme(axis.title=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y = element_blank(),
        axis.line.x = element_blank(),
        axis.ticks.x = element_blank())+
  scale_x_continuous(limits = c(0,22)) -> p2
ggarrange(p1,p2,ncol=1,nrow=2,heights = c(4, 1),align="v")


Comments and suggestions are appreciated :slight_smile:

1 Like

That’s really excellent. But we can show more: the relationship between each baseline variable and outcome. See this .

I don’t think jittering is the way to go as points get hidden in the dense region. I’d go with spike histograms as with the Hmisc histSpikeg function that works with ggplot2.

1 Like

Thanks for the comments Frank. I’m having troubles using your histSpikeg function. Ideally I would like to keep the dplyr pipe – almost – intact. See above some edits to my previous post.
Agree that showing association with outcomes very interesting!
I’m just keeping it simpler for the time being…

i guess people saw this paper recently re meta-analysis plots: Charting the landscape of graphical displays for meta-analysis and systematic reviews
although they don’t provide the code for the best plots, only figures 3-7, R code

2 Likes

I am not sure if it’s appropriate to ask this basic question here…
With the spike histogram on the curve in the example above (first graph), using histspikeg, what does the spike on the curve actually represent? I gather it should be representing the distribution of creatinine on the curve within some x range, but I can’t figure out what that means (e.g. what does the width of the vertical spike mean? what does the space between the spike mean… etc?).
Apologise if this is a very basic question, I am just not familiar with spike histograms and can’t seem to find an answer when googling it.
Thank you.

Spikes are used so that you don’t need to talk about the width. The width conveys no information. The height is proportional to the bin frequency just as with histograms in general. Just think about this as a regular histogram with very thin bars and lots of bins, to show the marginal distribution of the predictor.

Oh yes! I can see it on the graph now and how it is related to the rug plot at the top! Thank you!

I like the approach of plotting the confidence interval for the difference in survival curves as suggested by @f2harrell from Boers et al ( Journal of Clinical Epidemiology 57 (2004) 712–715). However, I wonder if this could be seen as performing multiple statistical tests.
There are also proposed tests to compare statistical differences in survival at fixed time points (Klein et al. Statist. Med. 2007; 26:4505–4519). What is your opinion/recommendations about providing the results of any of these tests at some relevant time points in addition to the 1/2 CI of the difference? Those could be useful for example when survival curves cross and avoid multiple comparisons…

1 Like

Any time you show > 1 confidence interval there is a multiple test risk. As an aside, multiplicity adjustments are controversial when you really have marginal questions, e.g., questions targeted to one time point at a time. The best approach is to figure out how to compute simultaneous confidence bands that “protect” against an infinity of hypotheses. These will be a little wider than the intervals we see at present, but not as huge as you’d think. Something like using \alpha=0.01 for a 0.05 simultaneous band. Could base this on Kolmogorov-Smirnov type of ideas.

1 Like

Thank you @f2harrell. What are your thoughts about survival comparisons at fixed time points?

In any case, I understand our default should be to use adjusted curves (Cox if PH assumption holds or parametric models if not). Sometimes, if the variable of interest (i.e: treatment) does not satisfy the PH assumption, stratifying by that variable may be the way to go, but no statistical tests for computing survival differences can be used, right? (as in Califf et al., 1989). Thus, it may be useful some provide an overall log-rank test and compare survivals at fixed time points.

I am dealing with a similar situation (early high risk in one treatment arm, late high risk in the other, PH assumption violated) and I have doubts about the right approach.