In-class Exercise 1: Preparing the Data Flow

Author

Muhamad Ameer Noor

Published

November 18, 2023

Modified

December 14, 2023

Preparing Data Flow Illustration

1 Getting Started

The code chunk below load the following packages: - tmap: for thematic mapping - sf : for geospatial data handling - tidyverse for non-spatial data handling.

pacman::p_load(tmap, sf, tidyverse)
# load the libraries, the pacman itself will only be loaded temporarily

2 Preparing the Data Flow

2.1 Importing the Aspatial data

Import the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall by using read_csv() of readr package.

odbus <- read_csv("../data/aspatial/origin_destination_bus_202308.csv.gz")
head(odbus)

Change Character Data Type to Numerical Factor

odbus08 is a tibble dataframe. However, ORIGIN_PT_CODE and DESTINATION_PT_CODE are in character format. These are transformed into factors (categorical data type) for further analysis.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE)

Extracting the data for analysis

extract commuting flows by extracting Origin bus stop codes and number of trips for weekdays between 7 and 9 o’clock, into a new dataframe:

Code
origtrip_7_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 7 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
glimpse(origtrip_7_9)
# the %>% sign is for stepping the process in order 

2.2 Importing the geospatial data

Two geospatial data will be used in this exercise both data contain coordinate in geometry column:

busstop <- st_read(dsn = "../data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414) # the st_transform is for projection
glimpse(busstop)
mpsz <- st_read(dsn = "../data/geospatial",
                layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
glimpse(mpsz)