The volume you’re holding (or the e-book you’re viewing) consists of five books that cover a lot of the length and breadth of R.
As I said earlier in this introduction, R is a language that deals with statistics. Accordingly, Book 1 introduces you to the fundamental concepts of statistics that you just have to know in order to progress with R.
You then learn about R and RStudio, a widely used development environment for working with R. I begin by describing the rudiments of R code, and I discuss R functions and structures.
R truly comes alive when you use its specialized packages, which you learn about early on.
Part of working with statistics is to summarize data in meaningful ways. In Book 2, you find out how to do just that.
Most people know about averages and how to compute them. But that’s not the whole story. In Book 2, I tell you about additional descriptive statistics that fill in the gaps, and I show you how to use R to calculate and work with those statistics. You also learn to create graphics that visualize the data descriptions and analyses you encounter in Books 2 and 3.
Book 3 addresses the fundamental aim of statistical analysis: to go beyond the data and help you make decisions. Usually, the data are measurements of a sample taken from a large population. The goal is to use these data to figure out what’s going on in the population.
This opens a wide range of questions: What does an average mean? What does the difference between two averages mean? Are two things associated? These are only a few of the questions I address in Book 3, and you learn to use the R tools that help you answer them.
Effective machine learning model creation comes with experience. Accordingly, in Book 4 you gain experience by completing machine learning projects. In addition to the projects you complete along with me, I suggest additional projects for you to try on your own.
I begin by telling you about the University of California-Irvine Machine Learning Repository, which provides the data sets for most of the projects you encounter in Book 4.
To give you a gentle on-ramp into the field, I show you the Rattle
package for creating machine learning applications. It’s a friendly interface to R’s machine learning functionality. I like Rattle
a lot, and I think you will, too. You use it to learn about and work with decision trees, random forests, support vector machines, k-means clustering, and neural networks.
You also work with fairly large data sets — not the terabytes and petabytes data scientists work with, but large enough to get you started. In one project, you analyze a data set of more than 500,000 airline flights. In another, you complete a customer segmentation analysis of over 300,000 customers of an online retailer.
As its title suggests, Book 5 is also organized around projects.
In these projects, you create applications that respond to users. I show you the shiny
package for working with web browsers and the shinydashboard
package for creating dashboards.
All this is a little far afield from R’s original mission in life, but you get an idea of R’s potential to expand in new directions.
After you’ve worked with R for a while, maybe you can discover some of those new directions!