## 4.8 Explicit Coercion

Objects can be explicitly coerced from one class to another using the
`as.*`

functions, if available.

```
> x <- 0:6
> class(x)
1] "integer"
[> as.numeric(x)
1] 0 1 2 3 4 5 6
[> as.logical(x)
1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
[> as.character(x)
1] "0" "1" "2" "3" "4" "5" "6" [
```

Sometimes, R can’t figure out how to coerce an object and this can
result in `NA`

s being produced.

```
> x <- c("a", "b", "c")
> as.numeric(x)
: NAs introduced by coercion
Warning1] NA NA NA
[> as.logical(x)
1] NA NA NA
[> as.complex(x)
: NAs introduced by coercion
Warning1] NA NA NA [
```

When nonsensical coercion takes place, you will usually get a warning from R.

## 4.9 Matrices

Matrices are vectors with a *dimension* attribute. The dimension
attribute is itself an integer vector of length 2 (number of rows,
number of columns)

```
> m <- matrix(nrow = 2, ncol = 3)
> m
1] [,2] [,3]
[,1,] NA NA NA
[2,] NA NA NA
[> dim(m)
1] 2 3
[> attributes(m)
$dim
1] 2 3 [
```

Matrices are constructed *column-wise*, so entries can be thought of
starting in the “upper left” corner and running down the columns.

```
> m <- matrix(1:6, nrow = 2, ncol = 3)
> m
1] [,2] [,3]
[,1,] 1 3 5
[2,] 2 4 6 [
```

Matrices can also be created directly from vectors by adding a dimension attribute.

```
> m <- 1:10
> m
1] 1 2 3 4 5 6 7 8 9 10
[> dim(m) <- c(2, 5)
> m
1] [,2] [,3] [,4] [,5]
[,1,] 1 3 5 7 9
[2,] 2 4 6 8 10 [
```

Matrices can be created by *column-binding* or *row-binding* with the
`cbind()`

and `rbind()`

functions.

```
> x <- 1:3
> y <- 10:12
> cbind(x, y)
x y1,] 1 10
[2,] 2 11
[3,] 3 12
[> rbind(x, y)
1] [,2] [,3]
[,1 2 3
x 10 11 12 y
```

## 4.10 Lists

Lists are a special type of vector that can contain elements of different classes. Lists are a very important data type in R and you should get to know them well. Lists, in combination with the various “apply” functions discussed later, make for a powerful combination.

Lists can be explicitly created using the `list()`

function, which
takes an arbitrary number of arguments.

```
> x <- list(1, "a", TRUE, 1 + 4i)
> x
1]]
[[1] 1
[
2]]
[[1] "a"
[
3]]
[[1] TRUE
[
4]]
[[1] 1+4i [
```

We can also create an empty list of a prespecified length with the
`vector()`

function

```
> x <- vector("list", length = 5)
> x
1]]
[[NULL
2]]
[[NULL
3]]
[[NULL
4]]
[[NULL
5]]
[[NULL
```

## 4.11 Factors

Factors are used to represent categorical data and can be unordered or
ordered. One can think of a factor as an integer vector where each
integer has a *label*. Factors are important in statistical modeling
and are treated specially by modelling functions like `lm()`

and
`glm()`

.

Using factors with labels is *better* than using integers because
factors are self-describing. Having a variable that has values “Male”
and “Female” is better than a variable that has values 1 and 2.

Factor objects can be created with the `factor()`

function.

```
> x <- factor(c("yes", "yes", "no", "yes", "no"))
> x
1] yes yes no yes no
[: no yes
Levels> table(x)
x
no yes 2 3
> ## See the underlying representation of factor
> unclass(x)
1] 2 2 1 2 1
[attr(,"levels")
1] "no" "yes" [
```

Often factors will be automatically created for you when you read a
dataset in using a function like `read.table()`

. Those functions often
default to creating factors when they encounter data that look like
characters or strings.

The order of the levels of a factor can be set using the `levels`

argument to `factor()`

. This can be important in linear modelling
because the first level is used as the baseline level.

```
> x <- factor(c("yes", "yes", "no", "yes", "no"))
> x ## Levels are put in alphabetical order
1] yes yes no yes no
[: no yes
Levels> x <- factor(c("yes", "yes", "no", "yes", "no"),
+ levels = c("yes", "no"))
> x
1] yes yes no yes no
[: yes no Levels
```

## 4.12 Missing Values

Missing values are denoted by `NA`

or `NaN`

for q undefined
mathematical operations.

`is.na()`

is used to test objects if they are`NA`

`is.nan()`

is used to test for`NaN`

`NA`

values have a class also, so there are integer`NA`

, character`NA`

, etc.A

`NaN`

value is also`NA`

but the converse is not true

```
> ## Create a vector with NAs in it
> x <- c(1, 2, NA, 10, 3)
> ## Return a logical vector indicating which elements are NA
> is.na(x)
1] FALSE FALSE TRUE FALSE FALSE
[> ## Return a logical vector indicating which elements are NaN
> is.nan(x)
1] FALSE FALSE FALSE FALSE FALSE [
```

```
> ## Now create a vector with both NA and NaN values
> x <- c(1, 2, NaN, NA, 4)
> is.na(x)
1] FALSE FALSE TRUE TRUE FALSE
[> is.nan(x)
1] FALSE FALSE TRUE FALSE FALSE [
```

## 4.13 Data Frames

Data frames are used to store tabular data in R. They are an important type of object in R and are used in a variety of statistical modeling applications. Hadley Wickham’s package dplyr has an optimized set of functions designed to work efficiently with data frames.

Data frames are represented as a special type of list where every element of the list has to have the same length. Each element of the list can be thought of as a column and the length of each element of the list is the number of rows.

Unlike matrices, data frames can store different classes of objects in each column. Matrices must have every element be the same class (e.g. all integers or all numeric).

In addition to column names, indicating the names of the variables or
predictors, data frames have a special attribute called `row.names`

which indicate information about each row of the data frame.

Data frames are usually created by reading in a dataset using the
`read.table()`

or `read.csv()`

. However, data frames can also be
created explicitly with the `data.frame()`

function or they can be
coerced from other types of objects like lists.

Data frames can be converted to a matrix by calling
`data.matrix()`

. While it might seem that the `as.matrix()`

function
should be used to coerce a data frame to a matrix, almost always, what
you want is the result of `data.matrix()`

.

```
> x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
> x
foo bar1 1 TRUE
2 2 TRUE
3 3 FALSE
4 4 FALSE
> nrow(x)
1] 4
[> ncol(x)
1] 2 [
```

## 4.14 Names

R objects can have names, which is very useful for writing readable code and self-describing objects. Here is an example of assigning names to an integer vector.

```
> x <- 1:3
> names(x)
NULL
> names(x) <- c("New York", "Seattle", "Los Angeles")
> x
New York Seattle Los Angeles 1 2 3
> names(x)
1] "New York" "Seattle" "Los Angeles" [
```

Lists can also have names, which is often very useful.

```
> x <- list("Los Angeles" = 1, Boston = 2, London = 3)
> x
$`Los Angeles`
1] 1
[
$Boston
1] 2
[
$London
1] 3
[> names(x)
1] "Los Angeles" "Boston" "London" [
```

Matrices can have both column and row names.

```
> m <- matrix(1:4, nrow = 2, ncol = 2)
> dimnames(m) <- list(c("a", "b"), c("c", "d"))
> m
c d1 3
a 2 4 b
```

Column names and row names can be set separately using the
`colnames()`

and `rownames()`

functions.

```
> colnames(m) <- c("h", "f")
> rownames(m) <- c("x", "z")
> m
h f1 3
x 2 4 z
```

Note that for data frames, there is a separate function for setting
the row names, the `row.names()`

function. Also, data frames do not
have column names, they just have names (like lists). So to set the
column names of a data frame just use the `names()`

function. Yes, I
know its confusing. Here’s a quick summary:

Object | Set column names | Set row names |
---|---|---|

data frame | `names()` |
`row.names()` |

matrix | `colnames()` |
`rownames()` |

## 0comments:

## Post a Comment

Note: only a member of this blog may post a comment.