Parse an HTML page and return all <a href="...">...</a> links as a data frame.

html_getLinks(url = NULL, relative = TRUE)

html_getLinkNames(url = NULL)

html_getLinkUrls(url = NULL, relative = TRUE)

Arguments

url

URL or local file path of an HTML page.

relative

Logical specifying whether to return relative URLs. If FALSE, relative URLs are converted to absolute URLs using url as the base.

Value

A tibble with linkName and linkUrl columns.

html_getLinkNames() returns a character vector of link names.

html_getLinkUrls() returns a character vector of link URLs.

Details

The returned data frame contains the human-readable link text in linkName and the href value in linkUrl. This is useful for extracting links from index pages, including web-accessible directories that list downloadable files.

Wrapper functions html_getLinkNames() and html_getLinkUrls() return the corresponding columns as character vectors.

Examples

if (FALSE) { # \dontrun{

# If you want to download lots of USCensus shapefiles
url <- "https://www2.census.gov/geo/tiger/GENZ2019/shp/"

browseURL(url)

dataLinks <- html_getLinks(url)

dataLinks <-
  dataLinks %>%
  dplyr::filter(stringr::str_detect(linkName, "us_county"))

head(dataLinks, 10)

html_getLinkNames(url)
html_getLinkUrls(url, relative = FALSE)
} # }