SQL Tutorial and Exploration

Overview

This vignette aims to be a brief introduction and overview of using SQL. The tutorial will illustrate common functions by answering questions around Pixar films and their performance.

Setup

Load packages.

library(pixarfilms)
library(dplyr)
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

Load data into temporary database to interact with.

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, pixar_films)

Top movies

Develop query.

query <- "
select * from pixar_films
"

Let’s execute it now.

DBI::dbGetQuery(conn = con, statement = query)
#>    number                film release_date run_time film_rating
#> 1       1           Toy Story         9456       81           G
#> 2       2        A Bug's Life        10555       95           G
#> 3       3         Toy Story 2        10919       92           G
#> 4       4      Monsters, Inc.        11628       92           G
#> 5       5        Finding Nemo        12202      100           G
#> 6       6     The Incredibles        12727      115          PG
#> 7       7                Cars        13308      117           G
#> 8       8         Ratatouille        13693      111           G
#> 9       9              WALL-E        14057       98           G
#> 10     10                  Up        14393       96          PG
#> 11     11         Toy Story 3        14778      103           G
#> 12     12              Cars 2        15149      106           G
#> 13     13               Brave        15513       93          PG
#> 14     14 Monsters University        15877      104           G
#> 15     15          Inside Out        16605       95          PG
#> 16     16   The Good Dinosaur        16764       93          PG
#> 17     17        Finding Dory        16969       97          PG
#> 18     18              Cars 3        17333      102           G
#> 19     19                Coco        17492      105          PG
#> 20     20       Incredibles 2        17697      118          PG
#> 21     21         Toy Story 4        18068      100           G
#> 22     22              Onward        18327      102          PG
#> 23     23                Soul        18621      100          PG
#> 24     24                Luca        18796      151         N/A
#> 25     25         Turning Red        19062       NA         N/A
#> 26     26           Lightyear        19160       NA         N/A
#> 27     27                <NA>        19524      155   Not Rated

Summary

Here, let’s review the functions of SQL covered above.

  • SELECT = text that follows this chooses the columns you want in your query, and doing SELECT * will select all columns

Session information

sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] dbplyr_2.5.0     irr_0.84.1       lpSolve_5.6.20   forcats_1.0.0   
#>  [5] tidyr_1.3.1      patchwork_1.2.0  lubridate_1.9.3  ggplot2_3.5.1   
#>  [9] dplyr_1.1.4      pixarfilms_0.2.1 readr_2.1.5      rmarkdown_2.28  
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.5       beeswarm_0.4.0     xfun_0.47          bslib_0.8.0       
#>  [5] ggrepel_0.9.5      tzdb_0.4.0         vctrs_0.6.5        tools_4.4.1       
#>  [9] generics_0.1.3     curl_5.2.1         parallel_4.4.1     tibble_3.2.1      
#> [13] fansi_1.0.6        RSQLite_2.3.7      highr_0.11         blob_1.2.4        
#> [17] pkgconfig_2.0.3    RColorBrewer_1.1-3 lifecycle_1.0.4    compiler_4.4.1    
#> [21] farver_2.1.2       munsell_0.5.1      vipor_0.4.7        htmltools_0.5.8.1 
#> [25] sys_3.4.2          buildtools_1.0.0   sass_0.4.9         yaml_2.3.10       
#> [29] pillar_1.9.0       crayon_1.5.3       jquerylib_0.1.4    cachem_1.1.0      
#> [33] tidyselect_1.2.1   digest_0.6.37      purrr_1.0.2        maketools_1.3.0   
#> [37] labeling_0.4.3     fastmap_1.2.0      grid_4.4.1         colorspace_2.1-1  
#> [41] cli_3.6.3          magrittr_2.0.3     utf8_1.2.4         withr_3.0.1       
#> [45] scales_1.3.0       bit64_4.0.5        ggbeeswarm_0.7.2   timechange_0.3.0  
#> [49] bit_4.0.5          hms_1.1.3          memoise_2.0.1      evaluate_0.24.0   
#> [53] knitr_1.48         rlang_1.1.4        Rcpp_1.0.13        glue_1.7.0        
#> [57] DBI_1.2.3          vroom_1.6.5        jsonlite_1.8.8     R6_2.5.1