9 min read

How many HTTP 404 errors are in the RStudio Blog?

Recently I was reading a post from 2020 on the RStudio Blog, when I followed a link in the post only to find…

Of course, it was easy to find the intended page with Google, but it made me curious:

  • How many HTTP 404 (Page Not Found) errors like this exist in the RStudio Blog?

  • Which links are broken?

  • Could these broken links be easily fixed?

It turns out we can get at these questions pretty quickly with R. Especially, if we break the overall mini-project into even smaller tasks such as:

  • Get all of the blog posts on the RStudio Blog

  • Get all of the links out of each blog post

  • Test out all of the links

Getting All Blog Posts

Clicking around in the blog reveals 38 pages of blog posts ranging from early 2011 to recently in December 2021 (Happy belated 10th birthday RStudio blog!).

Let’s see if we can harvest or “rvest” all of the links.

First, let’s build the 38 urls we need to retrieve links from.

n_pages <- 38
url_bloghome <- 'https://www.rstudio.com/blog' # This is page 1
url_blogpages <- c()
for (i in 1:38){
  current_url <- url_bloghome
  if (i > 1){
    current_url <- file.path(url_bloghome, 'page', i)
  }
  url_blogpages <- c(url_blogpages, current_url)
}
str(url_blogpages)
##  chr [1:38] "https://www.rstudio.com/blog" ...
head(url_blogpages)
## [1] "https://www.rstudio.com/blog"        "https://www.rstudio.com/blog/page/2"
## [3] "https://www.rstudio.com/blog/page/3" "https://www.rstudio.com/blog/page/4"
## [5] "https://www.rstudio.com/blog/page/5" "https://www.rstudio.com/blog/page/6"

Now, let’s pull all of the blog post links out of these 38 urls.

library(rvest)
get_blogposts <- function(url){
  read_html(url) %>%
    html_nodes('.pt-3') %>%
    html_nodes('a') %>%
    html_attr('href')
}
blogposts <- unname(unlist(sapply(url_blogpages, FUN=get_blogposts)))
str(blogposts)
##  chr [1:562] "https://www.rstudio.com/blog/three-ways-to-program-in-python-with-rstudio/" ...
head(blogposts, 25)
##  [1] "https://www.rstudio.com/blog/three-ways-to-program-in-python-with-rstudio/"                            
##  [2] "https://www.rstudio.com/blog/rstudio-community-monthly-events-december-2021/"                          
##  [3] "https://www.rstudio.com/blog/r-markdown-tips-tricks-1-rstudio-ide/"                                    
##  [4] "https://www.rstudio.com/blog/announcing-the-rstudio-blog-s-new-vision-and-design/"                     
##  [5] "https://www.rstudio.com/blog/augment-tableau-with-r-python/"                                           
##  [6] "https://www.rstudio.com/blog/building-code-movies-with-flipbookr/"                                     
##  [7] "https://www.rstudio.com/blog/rstudio-community-monthly-events-november-2021/"                          
##  [8] "https://www.rstudio.com/blog/announcing-rstudio-on-amazon-sagemaker/"                                  
##  [9] "https://www.rstudio.com/blog/how-the-clusterbuster-shiny-app-helps-battle-covid-19-in-the-netherlands/"
## [10] "https://www.rstudio.com/blog/announcing-the-2021-rstudio-communications-survey/"                       
## [11] "https://www.rstudio.com/blog/rstudio-at-r-pharma-2021/"                                                
## [12] "https://www.rstudio.com/blog/how-data-scientists-and-security-teams-can-work-together/"                
## [13] "https://www.rstudio.com/blog/pro-drivers-2021-10-0-release/"                                           
## [14] "https://www.rstudio.com/blog/the-inspire-u2-program-student-reflections/"                              
## [15] "https://www.rstudio.com/blog/embedding-shiny-apps-in-tableau-dashboards-using-shinytableau/"           
## [16] "https://www.rstudio.com/blog/the-inspire-u2-program/"                                                  
## [17] "https://www.rstudio.com/blog/why-your-ds-team-might-need-a-shiny-deployment-engineer/"                 
## [18] "https://www.rstudio.com/blog/rstudio-connect-2021-09-0-tableau-analytics-extensions/"                  
## [19] "https://www.rstudio.com/blog/teaching-data-science-with-rstudio-cloud/"                                
## [20] "https://www.rstudio.com/blog/pins-1-0-0/"                                                              
## [21] "https://www.rstudio.com/blog/rstudio-table-contest-2021/"                                              
## [22] "https://www.rstudio.com/blog/how-to-use-shinymatrix-and-plotly-graphs/"                                
## [23] "https://www.rstudio.com/blog/rstudio-2021.09.0-update-whats-new/"                                      
## [24] "https://www.rstudio.com/blog/what-s-new-on-rstudio-cloud-september-2021/"                              
## [25] "https://www.rstudio.com/blog/curating-for-wearerladies-on-twitter/"

Check that out, we have 562 blog post urls now!

Sanity Check

Let’s do a quick manual check here on at least one of these broken links. How about the first one? (https://www.rstudio.com/2014/06/18/r-markdown-v2/).

To truly sanity check this, we would want to find the original blog post containing the link and try clicking the link in the post.

So which blog post contained this link?

result <- sapply(links, function(x) '/2014/06/18/r-markdown-v2/' %in% x)
result[result]
## https://www.rstudio.com/blog/introducing-ggvis/ 
##                                            TRUE

And indeed if we navigate to https://www.rstudio.com/blog/introducing-ggvis/ and try clicking on the R Markdown v2 link in the blog post, we are taken to the Page Not Found error page.

Concluding Remarks

To conclude, let’s circle back to our original questions.

  • How many HTTP 404 (Page Not Found) errors like this exist in the RStudio Blog?

We ended limiting scope to only rstudio.com links and found that 117 of over 1500 unique links are currently returning a 404 error. Furthermore, there are two links to doc.rstudio.com that return a Can't Find the Server error.

  • Which links are broken?

A full list of the broken links is printed in the appendix (sorted alphabetically).

  • Could these broken links be easily fixed?

The RStudio blog does not appear to be open source, thus we cannot create a PR to fix the links. However, the most notable thing is that many of the broken links seem to follow certain patterns, notably related to absence or presence of dates that could hopefully be easily fixed. Consider these two examples.

EXAMPLE 1

This blog post, after redirects, is linked to https://www.rstudio.com/blog/driving-real-lasting-value-with-serious-data-science/ but the correct link should have a yyyy-mm-dd in the slug: https://www.rstudio.com/blog/2020-05-19-driving-real-lasting-value-with-serious-data-science/

EXAMPLE 2

This other blog post is linked to https://www.rstudio.com/2014/06/18/r-markdown-v2/ and in this case the /yyyy/mm/dd/ folders need to be replaced with simply /blog/. The correct link should be https://www.rstudio.com/blog/r-markdown-v2/.

In other words, afik these links cannot be fixed by the community (i.e. me via a PR), but someone with access could presumably fix this up with a little effort. Also, at this point, we’d likely want someone with domain knowledge of the RStudio Blog who could combine these results with their knowledge to determine a next steps.

Thank you for reading, I hope you have enjoyed this analysis!

Appendix

Here is the full vector of 404 error links.

links404
##   [1] "https://www.rstudio.com/2014/06/18/r-markdown-v2/"                                                                             
##   [2] "https://www.rstudio.com/2014/06/19/interactive-documents-an-incredibly-easy-way-to-use-shiny/"                                 
##   [3] "https://www.rstudio.com/2015/06/24/dt-an-r-interface-to-the-datatables-library/"                                               
##   [4] "https://www.rstudio.com/2016/12/02/announcing-bookdown/"                                                                       
##   [5] "https://www.rstudio.com/2017/06/26/bigrquery-0-4-0/"                                                                           
##   [6] "https://www.rstudio.com/2017/09/11/announcing-blogdown/"                                                                       
##   [7] "https://www.rstudio.com/2017/09/13/rstudio-v1.1-the-little-things/"                                                            
##   [8] "https://www.rstudio.com/2018/09/19/radix-for-r-markdown/"                                                                      
##   [9] "https://www.rstudio.com/2018/11/19/rstudio-1-2-preview-the-little-things/"                                                     
##  [10] "https://www.rstudio.com/2019/01/17/announcing-rstudio-connect-1-7-0/"                                                          
##  [11] "https://www.rstudio.com/2020/03/17/rstudio-1-3-the-little-things/"                                                             
##  [12] "https://www.rstudio.com/2020/07/17/rstudio-global-2021"                                                                        
##  [13] "https://www.rstudio.com/2020/07/17/rstudio-global-call-for-talks"                                                              
##  [14] "https://www.rstudio.com/2020/09/30/rstudio-v1-4-preview-visual-markdown-editing/"                                              
##  [15] "https://www.rstudio.com/2020/11/09/rstudio-1-4-preview-citations/"                                                             
##  [16] "https://www.rstudio.com/2020/12/07/distill/"                                                                                   
##  [17] "https://www.rstudio.com/2021/01/18/blogdown-v1.0/"                                                                             
##  [18] "https://www.rstudio.com/2021/02/04/rstudio-cloud1/"                                                                            
##  [19] "https://www.rstudio.com/2021/06/02/announcing-rstudio-workbench/"                                                              
##  [20] "https://www.rstudio.com/2021/06/02/rstudio-workbench-vscode-sessions/"                                                         
##  [21] "https://www.rstudio.com/2021/06/24/winners-of-the-3rd-annual-shiny-contest/"                                                   
##  [22] "https://www.rstudio.com/s/photos/brian-mcgowan-tomorrowland?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText"
##  [23] "https://www.rstudio.com/s/photos/match?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText"                     
##  [24] "http://cran.rstudio.com/web/packages/dplyr/vignettes/databases.html"                                                           
##  [25] "http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html"                                                        
##  [26] "http://cran.rstudio.com/web/packages/shiny/NEWS"                                                                               
##  [27] "http://docs.rstudio.com/connect/1.4.5/admin/user-management.html#user-roles"                                                   
##  [28] "http://pins.rstudio.com/articles/advanced-versions.html"                                                                       
##  [29] "http://pins.rstudio.com/articles/boards-azure.html"                                                                            
##  [30] "http://pins.rstudio.com/articles/boards-dospace.html"                                                                          
##  [31] "http://pins.rstudio.com/articles/boards-gcloud.html"                                                                           
##  [32] "http://pins.rstudio.com/articles/boards-kaggle.html"                                                                           
##  [33] "http://pins.rstudio.com/articles/boards-rsconnect.html"                                                                        
##  [34] "http://pins.rstudio.com/articles/boards-s3.html"                                                                               
##  [35] "http://rstudio.com/training/curriculum/advanced-r-programming.html"                                                            
##  [36] "http://rstudio.com/training/curriculum/effective-data-visualization.html"                                                      
##  [37] "http://rstudio.com/training/curriculum/package-development.html"                                                               
##  [38] "http://rstudio.com/training/curriculum/reports-and-reproducible-research.html"                                                 
##  [39] "http://rstudio.com/training/on-site.html"                                                                                      
##  [40] "http://rstudio.com/training/philosophy.html"                                                                                   
##  [41] "http://rstudio.com/training/public-courses.html"                                                                               
##  [42] "http://rstudio.com/training/trainers.html"                                                                                     
##  [43] "https://blog.rstudio.com/2019/01/18/summer-internships-2019/"                                                                  
##  [44] "https://blog.rstudio.com/2019/02/28/rstudio-instructor-training/"                                                              
##  [45] "https://blog.rstudio.com/2019/05/21/rstudio-instructor-training-updates/"                                                      
##  [46] "https://blog.rstudio.com/2020/05/19/driving-real-lasting-value-with-serious-data-science/"                                     
##  [47] "https://blog.rstudio.com/2020/07/09/why-you-need-a-world-class-ide-to-do-serious-data-science/"                                
##  [48] "https://blog.rstudio.com/2020/09/15/announcing-the-2020-rstudio-table-contest/"                                                
##  [49] "https://blog.rstudio.com/2020/12/07/package-manager-1-2-0/"                                                                    
##  [50] "https://blog.rstudio.com/tags/bi-tools/"                                                                                       
##  [51] "https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html"                                                       
##  [52] "https://docs.rstudio.com/connect/1.6.11/admin/python.html"                                                                     
##  [53] "https://docs.rstudio.com/connect/admin/authentication.html#authentication-oauth2"                                              
##  [54] "https://docs.rstudio.com/connect/admin/authentication.html#authentication-saml"                                                
##  [55] "https://docs.rstudio.com/connect/admin/authentication.html#change-auth-provider"                                               
##  [56] "https://docs.rstudio.com/connect/admin/cli.html#cli-usermanager"                                                               
##  [57] "https://docs.rstudio.com/connect/admin/getting-started.html#need-help"                                                         
##  [58] "https://pins.rstudio.com/articles/boards-azure.html"                                                                           
##  [59] "https://pins.rstudio.com/articles/boards-gcloud.html"                                                                          
##  [60] "https://pins.rstudio.com/articles/boards-kaggle.html"                                                                          
##  [61] "https://pins.rstudio.com/articles/boards-rsconnect.html"                                                                       
##  [62] "https://pins.rstudio.com/articles/boards-s3.html"                                                                              
##  [63] "https://pins.rstudio.com/articles/boards-websites.html"                                                                        
##  [64] "https://pins.rstudio.com/articles/pins-rstudio.html"                                                                           
##  [65] "https://resources.rstudio.com/rstudio-conf-2020/value-in-data-science-beyond-models-in-production-eduardo-arino-de-la-rubia"   
##  [66] "https://rmarkdown.rstudio.com/flexdashboard/examples.html"                                                                     
##  [67] "https://rmarkdown.rstudio.com/flexdashboard/layouts.html"                                                                      
##  [68] "https://rmarkdown.rstudio.com/flexdashboard/shiny.html"                                                                        
##  [69] "https://rmarkdown.rstudio.com/flexdashboard/using.html"                                                                        
##  [70] "https://rmarkdown.rstudio.com/flexdashboard/using.html#storyboards"                                                            
##  [71] "https://rstudio.com/resources/rstudioconf-2020/making-the-shiny-contest/"                                                      
##  [72] "https://rstudio.com/resources/rstudioconf-2020/value-in-data-science-beyond-models-in-production/"                             
##  [73] "https://rstudio.com/resources/rstudioglobal-2021/,"                                                                            
##  [74] "https://shiny.rstudio.com/articles/single-file.html"                                                                           
##  [75] "https://shiny.rstudio.com/articles/upgrade-0.14.html#full-changelog"                                                           
##  [76] "https://shiny.rstudio.com/gallery/widgets-gallery.html"                                                                        
##  [77] "https://shiny.rstudio.com/reference/shiny/latest/removeUI.html"                                                                
##  [78] "https://shiny.rstudio.com/reference/shiny/latest/showReactLog.html"                                                            
##  [79] "https://solutions.rstudio.com/2019/12/30/rstudio-connect-custom-emails-with-blastula/"                                         
##  [80] "https://solutions.rstudio.com/data-science-admin/deploy/apis/"                                                                 
##  [81] "https://solutions.rstudio.com/deploy/overview/"                                                                                
##  [82] "https://solutions.rstudio.com/deploy/promote/"                                                                                 
##  [83] "https://solutions.rstudio.com/examples/jobs-overview/"                                                                         
##  [84] "https://solutions.rstudio.com/examples/rest-apis-overview/#log-details-about-api-requests-and-responses"                       
##  [85] "https://solutions.rstudio.com/examples/rsc-apis/acl-audit-report"                                                              
##  [86] "https://solutions.rstudio.com/examples/rsc-apis/basic-audit-report"                                                            
##  [87] "https://solutions.rstudio.com/examples/rsc-apis/tag-audit-report"                                                              
##  [88] "https://solutions.rstudio.com/examples/rsc-apis/vanity-audit-report"                                                           
##  [89] "https://solutions.rstudio.com/examples/rsc-server-api-overview/"                                                               
##  [90] "https://solutions.rstudio.com/launcher/kubernetes/"                                                                            
##  [91] "https://solutions.rstudio.com/launcher/kubernetes/#want-to-learn-more-about-rstudio-server-pro-and-kubernetes"                 
##  [92] "https://solutions.rstudio.com/production/integrations/"                                                                        
##  [93] "https://spark.rstudio.com/articles/guides-distributed-r.html"                                                                  
##  [94] "https://spark.rstudio.com/deployment_examples.html"                                                                            
##  [95] "https://spark.rstudio.com/h2o.html"                                                                                            
##  [96] "https://spark.rstudio.com/images/sparklyr-cheatsheet.pdf"                                                                      
##  [97] "https://spark.rstudio.com/mllib.html"                                                                                          
##  [98] "https://tensorflow.rstudio.com/gallery/"                                                                                       
##  [99] "https://tensorflow.rstudio.com/learn/examples.html"                                                                            
## [100] "https://www.rstudio.com/about/news-events/"                                                                                    
## [101] "https://www.rstudio.com/conference/rstudioconf-tickets/"                                                                       
## [102] "https://www.rstudio.com/ide/docs/authoring/using_markdown.html"                                                                
## [103] "https://www.rstudio.com/ide/docs/release_notes_v0.97.html"                                                                     
## [104] "https://www.rstudio.com/ide/docs/release_notes_v0.98.html"                                                                     
## [105] "https://www.rstudio.com/ide/download/server-pro-evaluation.html"                                                               
## [106] "https://www.rstudio.com/products/rstudio-server-pro2/"                                                                         
## [107] "https://www.rstudio.com/resources/videos/debugging-techniques/"                                                                
## [108] "https://www.rstudio.com/resources/videos/plumbing-apis-with-plumber/"                                                          
## [109] "https://www.rstudio.com/resources/videos/scaling-shiny-apps-with-async-programming-june-2018/"                                 
## [110] "https://www.rstudio.com/resources/videos/scaling-shiny/"                                                                       
## [111] "https://www.rstudio.com/resources/webinars/introducing-an-r-interface-for-apache-spark/"                                       
## [112] "https://www.rstudio.com/resources/webinars/introducing-notebooks-with-r-markdown/"                                             
## [113] "https://www.rstudio.com/resources/webinars/shiny-developer-conference/"                                                        
## [114] "https://www.rstudio.com/rstudio/download/preview/"                                                                             
## [115] "https://www.rstudio.com/workshops/applied-machine-learning/"                                                                   
## [116] "https://www.rstudio.com/workshops/extending-the-tidyverse/"                                                                    
## [117] "https://www.rstudio.com/workshops/what-they-forgot-to-teach-you-about-r/"