hour_of_day cash_type money coffee_name Time_of_Day Weekday
1 10 card 38.7 Latte Morning Fri
2 12 card 38.7 Hot Chocolate Afternoon Fri
3 12 card 38.7 Hot Chocolate Afternoon Fri
4 13 card 28.9 Americano Afternoon Fri
5 13 card 38.7 Latte Afternoon Fri
6 15 card 33.8 Americano with Milk Afternoon Fri
Month_name Weekdaysort Monthsort Date Time
1 Mar 5 3 2024-03-01 10:15:50.520000
2 Mar 5 3 2024-03-01 12:19:22.539000
3 Mar 5 3 2024-03-01 12:20:18.089000
4 Mar 5 3 2024-03-01 13:46:33.006000
5 Mar 5 3 2024-03-01 13:48:14.626000
6 Mar 5 3 2024-03-01 15:39:47.726000
Data Describe
Herramientas para la descriptiva de los datos
1 Descripción del problema
Este conjunto de datos contiene registros de transacciones de cafeterías, incluyendo detalles sobre ventas, tipo de pago, hora de compra y preferencias del cliente.
Con atributos que abarcan la hora del día, los días de la semana, los meses, los tipos de café y los ingresos, este conjunto de datos proporciona una base sólida para analizar el comportamiento del cliente, los patrones de ventas y las tendencias de rendimiento empresarial.
Estructura del conjunto de datos:
- hour_of_day: Hora de compra (0–23)
- cash_type: Forma de pago (efectivo/tarjeta)
- money: Importe de la transacción (en moneda local)
- coffee_name: Tipo de café comprado (p. ej., Latte, Americano, Chocolate caliente)
- Time_of_Day: Hora de compra (mañana, tarde, noche)
- Weekday: Día de la semana (p. ej., lun., mar., etc.)
- Month_name: Mes de compra (p. ej., ene., feb., mar.)
- Weekdaysort: Representación numérica para ordenar por día de la semana (1 = lun., 7 = dom.)
- Monthsort: Representación numérica para ordenar por mes (1 = ene., 12 = dic.)
- Date: Fecha de la transacción (AAAA-MM-DD)
- Time: Hora exacta de la transacción (HH:MM:SS)
Para hacer la descriptiva, podréis utilizar la siguiente base de datos.
A continuación vamos a detectar de que clase es cada una de las variables
Code
clases <- sapply(datos, class)
varNum <- names(clases)[which(clases %in% c("numeric", "integer"))]
varCat <- names(clases)[which(clases %in% c("character", "factor"))]Para poder realizar una descriptiva correcta, descartaremos las variables Time y Date.
2 Análisis exploratorio

2.1 Análisis exploratorio de una variable
2.1.1 Numerical
2.1.1.1 Description
Code
library(psych)
psych::describe(datos[, varNum]) vars n mean sd median trimmed mad min max range skew
hour_of_day 1 3547 14.19 4.23 14.00 14.11 5.93 6.00 22.0 16.00 0.12
money 2 3547 31.65 4.88 32.82 31.98 4.36 18.12 38.7 20.58 -0.54
Weekdaysort 3 3547 3.85 1.97 4.00 3.81 2.97 1.00 7.0 6.00 0.08
Monthsort 4 3547 6.45 3.50 7.00 6.42 4.45 1.00 12.0 11.00 0.00
kurtosis se
hour_of_day -1.13 0.07
money -0.67 0.08
Weekdaysort -1.23 0.03
Monthsort -1.38 0.06
2.1.1.2 Graphic
Code
par(mfrow = c(2, 4))
for (var in varNum) {
hist(datos[, var], main = paste0("Histograma variable ", var))
boxplot(datos[, var], main = paste0("Boxplot variable ", var))
}
Code
par(mfrow = c(1, 1)) Code
library(ggplot2)
library(patchwork)
plots <- list()
for (var in varNum) {
histo <- ggplot(datos, aes(x = .data[[var]])) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") +
geom_density(alpha = .2, fill = "#FF6666") +
geom_vline(aes(xintercept = mean(.data[[var]], na.rm = TRUE)),
color = "blue", linetype = "dashed", linewidth = 1) +
ggtitle(paste("Histograma de", var))
boxp <- ggplot(datos, aes(x = .data[[var]])) +
geom_boxplot(outlier.colour = "red", outlier.shape = 8,
outlier.size = 4) +
ggtitle(paste("Boxplot de", var))
plots <- append(plots, list(histo, boxp))
}
# Combinar en un grid automático con 2 columnas
final_plot <- Reduce(`+`, plots) + plot_layout(ncol = 2)
final_plot
2.1.2 Categorical
2.1.2.1 Description
Code
for (var in varCat) {
tablaAbs <- data.frame(table(datos[, var]))
tablaFreq <- data.frame(table(datos[, var])/sum(table(datos[, var])))
m <- match(tablaAbs$Var1, tablaFreq$Var1)
tablaAbs[, "FreqRel"] <- tablaFreq[m, "Freq"]
colnames(tablaAbs) <- c("Categoria", "FreqAbs", "FreqRel")
cat("===============", var, "===================================\n")
print(tablaAbs)
cat("==================================================\n")
}=============== cash_type ===================================
Categoria FreqAbs FreqRel
1 card 3547 1
==================================================
=============== coffee_name ===================================
Categoria FreqAbs FreqRel
1 Americano 564 0.15900761
2 Americano with Milk 809 0.22808007
3 Cappuccino 486 0.13701720
4 Cocoa 239 0.06738089
5 Cortado 287 0.08091345
6 Espresso 129 0.03636876
7 Hot Chocolate 276 0.07781224
8 Latte 757 0.21341979
==================================================
=============== Time_of_Day ===================================
Categoria FreqAbs FreqRel
1 Afternoon 1205 0.3397237
2 Morning 1181 0.3329574
3 Night 1161 0.3273189
==================================================
=============== Weekday ===================================
Categoria FreqAbs FreqRel
1 Fri 532 0.1499859
2 Mon 544 0.1533690
3 Sat 470 0.1325063
4 Sun 419 0.1181280
5 Thu 510 0.1437835
6 Tue 572 0.1612630
7 Wed 500 0.1409642
==================================================
=============== Month_name ===================================
Categoria FreqAbs FreqRel
1 Apr 168 0.04736397
2 Aug 272 0.07668452
3 Dec 259 0.07301945
4 Feb 423 0.11925571
5 Jan 201 0.05666761
6 Jul 237 0.06681703
7 Jun 223 0.06287003
8 Mar 494 0.13927262
9 May 241 0.06794474
10 Nov 259 0.07301945
11 Oct 426 0.12010149
12 Sep 344 0.09698337
==================================================
2.1.2.2 Graphic
Code
par(mfrow = c(2, 3))
for (var in varCat) {
barplot(table(datos[, var]))
}
par(mfrow = c(1, 1)) 
Code
library(ggplot2)
library(gridExtra)
plots <- list() # lista vacía
i <- 1 # índice
for (var in varCat) {
tabla <- data.frame(table(datos[, var]) / sum(table(datos[, var])))
p <- ggplot(data = tabla, aes(x = Var1, y = Freq)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = paste0(round(Freq * 100, 2), "%")),
vjust = 1.6, color = "white", size = 3.5) +
theme_minimal() +
labs(title = paste("Distribución de", var), x = var, y = "Proporción")
plots[[i]] <- p
i <- i + 1
}
# Mostrar todos los gráficos en un grid (ejemplo con 2 columnas)
grid.arrange(grobs = plots, ncol = 2)
2.2 Bivariant analysis
2.2.1 Numerical vs. numerical
2.2.1.1 Description
Code
cor(datos[, varNum]) hour_of_day money Weekdaysort Monthsort
hour_of_day 1.000000000 0.20274794 -0.002613959 0.008292999
money 0.202747935 1.00000000 -0.017264091 -0.050043191
Weekdaysort -0.002613959 -0.01726409 1.000000000 0.044140930
Monthsort 0.008292999 -0.05004319 0.044140930 1.000000000
2.2.1.2 Graphic
Code
library(PerformanceAnalytics)
chart.Correlation(as.matrix(datos[, varNum]),histogram = TRUE,pch=12)
Code
library(ggcorrplot)
corr <- round(cor(datos[, varNum]), 1)
ggcorrplot(corr, lab = T)
2.2.2 Numerical vs. categorical
2.2.2.1 Description
Code
for (varN in varNum) {
for (varC in varCat) {
print(psych::describeBy(datos[, varN], group = datos[, varC]))
}
}
Descriptive statistics by group
group: card
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 3547 14.19 4.23 14 14.11 5.93 6 22 16 0.12 -1.13 0.07
Descriptive statistics by group
group: Americano
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 564 13.19 3.7 13 13 4.45 6 22 16 0.42 -0.53 0.16
------------------------------------------------------------
group: Americano with Milk
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 809 13.62 4.31 13 13.39 4.45 6 22 16 0.37 -1.09 0.15
------------------------------------------------------------
group: Cappuccino
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 486 14.92 4.23 15 15.05 5.93 6 22 16 -0.22 -1.05 0.19
------------------------------------------------------------
group: Cocoa
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 239 15.26 4.26 16 15.4 5.93 7 22 15 -0.28 -1.14 0.28
------------------------------------------------------------
group: Cortado
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 287 12.65 3.99 12 12.26 4.45 7 22 15 0.69 -0.6 0.24
------------------------------------------------------------
group: Espresso
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 129 13.6 3.57 14 13.49 4.45 7 22 15 0.24 -0.78 0.31
------------------------------------------------------------
group: Hot Chocolate
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 276 16.32 3.89 17 16.51 4.45 8 22 14 -0.41 -0.89 0.23
------------------------------------------------------------
group: Latte
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 757 14.63 4.31 15 14.67 5.93 7 22 15 -0.08 -1.11 0.16
Descriptive statistics by group
group: Afternoon
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1205 14.07 1.45 14 14.09 1.48 12 16 4 -0.06 -1.36 0.04
------------------------------------------------------------
group: Morning
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1181 9.4 1.27 10 9.47 1.48 6 11 5 -0.34 -0.93 0.04
------------------------------------------------------------
group: Night
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1161 19.18 1.63 19 19.1 1.48 17 22 5 0.2 -1.17 0.05
Descriptive statistics by group
group: Fri
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 532 13.83 4.46 14 13.64 5.93 6 22 16 0.23 -1.09 0.19
------------------------------------------------------------
group: Mon
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 544 14.01 4.29 14 13.94 5.93 6 22 16 0.06 -1.17 0.18
------------------------------------------------------------
group: Sat
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 470 13.93 3.89 13.5 13.71 3.71 7 22 15 0.4 -0.79 0.18
------------------------------------------------------------
group: Sun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 419 14.33 3.96 14 14.21 4.45 7 22 15 0.23 -1.02 0.19
------------------------------------------------------------
group: Thu
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 510 14.74 4.27 15 14.76 5.93 7 22 15 -0.05 -1.2 0.19
------------------------------------------------------------
group: Tue
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 572 14.27 4.39 14 14.24 5.93 7 22 15 0.04 -1.31 0.18
------------------------------------------------------------
group: Wed
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 500 14.23 4.2 14 14.2 5.93 7 22 15 0.08 -1.16 0.19
Descriptive statistics by group
group: Apr
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 168 14.4 3.13 14 14.38 4.45 10 20 10 0.1 -1.37 0.24
------------------------------------------------------------
group: Aug
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 272 13.51 4.51 12 13.26 4.45 7 22 15 0.4 -1.18 0.27
------------------------------------------------------------
group: Dec
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 14.33 4.39 15 14.25 5.93 7 22 15 0.1 -1.23 0.27
------------------------------------------------------------
group: Feb
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 423 14.06 3.66 15 14.17 4.45 6 21 15 -0.23 -0.93 0.18
------------------------------------------------------------
group: Jan
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 201 14.34 4.36 14 14.25 5.93 7 22 15 0.09 -1.1 0.31
------------------------------------------------------------
group: Jul
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 237 14.18 4.8 13 14.02 5.93 7 22 15 0.3 -1.35 0.31
------------------------------------------------------------
group: Jun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 223 15.23 4.59 16 15.38 5.93 7 22 15 -0.15 -1.47 0.31
------------------------------------------------------------
group: Mar
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 494 13.51 3.37 14 13.53 4.45 6 21 15 -0.03 -1.02 0.15
------------------------------------------------------------
group: May
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 241 15.2 4.1 15 15.3 5.93 7 22 15 -0.17 -1.12 0.26
------------------------------------------------------------
group: Nov
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 14.1 4.27 14 14.06 4.45 7 22 15 0.08 -1.03 0.27
------------------------------------------------------------
group: Oct
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 426 14.04 4.59 14 13.91 5.93 7 22 15 0.2 -1.21 0.22
------------------------------------------------------------
group: Sep
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 344 14.4 4.73 13.5 14.34 6.67 7 22 15 0.14 -1.43 0.26
Descriptive statistics by group
group: card
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 3547 31.65 4.88 32.82 31.98 4.36 18.12 38.7 20.58 -0.54 -0.67
se
X1 0.08
Descriptive statistics by group
group: Americano
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 564 25.98 1.68 25.96 25.99 0 23.02 28.9 5.88 -0.25 -0.22 0.07
------------------------------------------------------------
group: Americano with Milk
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 809 30.59 1.88 30.86 30.57 2.91 27.92 33.8 5.88 -0.17 -1 0.07
------------------------------------------------------------
group: Cappuccino
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 486 35.88 1.82 35.76 35.94 2.91 32.82 38.7 5.88 -0.4 -0.7 0.08
------------------------------------------------------------
group: Cocoa
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 239 35.65 1.23 35.76 35.7 0 32.82 38.7 5.88 -0.53 2 0.08
------------------------------------------------------------
group: Cortado
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 287 25.73 2.09 25.96 25.68 2.91 23.02 28.9 5.88 -0.03 -1.23 0.12
------------------------------------------------------------
group: Espresso
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 129 20.85 1.97 21.06 20.81 2.91 18.12 24 5.88 -0.12 -1.09 0.17
------------------------------------------------------------
group: Hot Chocolate
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 276 35.99 1.44 35.76 36.03 0 32.82 38.7 5.88 -0.13 0.77 0.09
------------------------------------------------------------
group: Latte
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 757 35.5 1.82 35.76 35.48 0 32.82 38.7 5.88 -0.18 -0.82 0.07
Descriptive statistics by group
group: Afternoon
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 1205 31.64 4.92 32.82 31.95 4.36 18.12 38.7 20.58 -0.53 -0.75
se
X1 0.14
------------------------------------------------------------
group: Morning
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 1181 30.42 4.94 30.86 30.6 7.26 18.12 38.7 20.58 -0.25 -0.8
se
X1 0.14
------------------------------------------------------------
group: Night
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 1161 32.89 4.43 33.8 33.38 2.91 18.12 38.7 20.58 -0.91 -0.02
se
X1 0.13
Descriptive statistics by group
group: Fri
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 532 31.58 4.91 32.82 31.87 4.36 18.12 38.7 20.58 -0.51 -0.77 0.21
------------------------------------------------------------
group: Mon
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 544 31.92 4.51 32.82 32.17 4.36 18.12 38.7 20.58 -0.53 -0.72 0.19
------------------------------------------------------------
group: Sat
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 470 31.35 5 32.82 31.66 4.36 18.12 38.7 20.58 -0.52 -0.58 0.23
------------------------------------------------------------
group: Sun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 419 31.83 4.88 32.82 32.22 4.36 18.12 38.7 20.58 -0.59 -0.64 0.24
------------------------------------------------------------
group: Thu
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 510 31.55 5.19 32.82 31.95 4.36 18.12 38.7 20.58 -0.57 -0.76 0.23
------------------------------------------------------------
group: Tue
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 572 31.76 4.74 32.82 32.06 4.36 18.12 38.7 20.58 -0.53 -0.62 0.2
------------------------------------------------------------
group: Wed
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 500 31.5 4.94 32.82 31.85 4.36 18.12 38.7 20.58 -0.51 -0.8 0.22
Descriptive statistics by group
group: Apr
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 168 34.05 4.49 33.8 34.33 7.26 24 38.7 14.7 -0.38 -1.31 0.35
------------------------------------------------------------
group: Aug
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 272 27.99 4.63 27.92 28.32 7.26 18.12 32.82 14.7 -0.4 -1.09 0.28
------------------------------------------------------------
group: Dec
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 31.81 4.61 35.76 32.31 0 21.06 35.76 14.7 -0.72 -0.78 0.29
------------------------------------------------------------
group: Feb
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 423 31.24 4.69 30.86 31.58 7.26 21.06 35.76 14.7 -0.43 -1.23
se
X1 0.23
------------------------------------------------------------
group: Jan
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 201 31.84 4.33 30.86 32.23 7.26 21.06 35.76 14.7 -0.61 -0.92
se
X1 0.31
------------------------------------------------------------
group: Jul
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 237 29.18 4.77 27.92 29.46 7.26 18.12 37.72 19.6 -0.5 -0.58 0.31
------------------------------------------------------------
group: Jun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 223 34.16 4.29 37.72 34.76 0 23.02 37.72 14.7 -0.96 -0.06 0.29
------------------------------------------------------------
group: Mar
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 494 32.17 4.91 33.8 32.3 7.26 21.06 38.7 17.64 -0.3 -1.19 0.22
------------------------------------------------------------
group: May
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 241 33.88 4.44 37.72 34.32 0 23.02 37.72 14.7 -0.67 -0.9 0.29
------------------------------------------------------------
group: Nov
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 33.17 3.84 35.76 33.79 0 21.06 35.76 14.7 -1.18 0.12 0.24
------------------------------------------------------------
group: Oct
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 426 32.61 4.29 35.76 33.21 0 21.06 35.76 14.7 -1.01 -0.29 0.21
------------------------------------------------------------
group: Sep
vars n mean sd median trimmed mad min max range skew kurtosis
X1 1 344 29.04 4.46 27.92 29.33 7.26 18.12 35.76 17.64 -0.57 -0.62
se
X1 0.24
Descriptive statistics by group
group: card
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 3547 3.85 1.97 4 3.81 2.97 1 7 6 0.08 -1.23 0.03
Descriptive statistics by group
group: Americano
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 564 3.74 1.89 4 3.69 2.97 1 7 6 0.06 -1.14 0.08
------------------------------------------------------------
group: Americano with Milk
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 809 3.82 2.02 4 3.78 2.97 1 7 6 0.12 -1.3 0.07
------------------------------------------------------------
group: Cappuccino
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 486 3.99 2 4 3.99 2.97 1 7 6 0.01 -1.23 0.09
------------------------------------------------------------
group: Cocoa
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 239 3.7 1.94 4 3.63 2.97 1 7 6 0.17 -1.26 0.13
------------------------------------------------------------
group: Cortado
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 287 4.18 2.01 4 4.22 2.97 1 7 6 -0.14 -1.3 0.12
------------------------------------------------------------
group: Espresso
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 129 4.08 1.77 4 4.08 1.48 1 7 6 0.07 -0.93 0.16
------------------------------------------------------------
group: Hot Chocolate
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 276 3.91 1.95 4 3.89 2.97 1 7 6 0.11 -1.15 0.12
------------------------------------------------------------
group: Latte
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 757 3.71 1.99 4 3.64 2.97 1 7 6 0.15 -1.23 0.07
Descriptive statistics by group
group: Afternoon
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1205 4.04 2.01 4 4.05 2.97 1 7 6 -0.06 -1.27 0.06
------------------------------------------------------------
group: Morning
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1181 3.75 1.97 4 3.69 2.97 1 7 6 0.12 -1.25 0.06
------------------------------------------------------------
group: Night
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1161 3.74 1.92 4 3.67 2.97 1 7 6 0.19 -1.11 0.06
Descriptive statistics by group
group: Fri
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 532 5 0 5 5 0 5 5 0 NaN NaN 0
------------------------------------------------------------
group: Mon
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 544 1 0 1 1 0 1 1 0 NaN NaN 0
------------------------------------------------------------
group: Sat
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 470 6 0 6 6 0 6 6 0 NaN NaN 0
------------------------------------------------------------
group: Sun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 419 7 0 7 7 0 7 7 0 NaN NaN 0
------------------------------------------------------------
group: Thu
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 510 4 0 4 4 0 4 4 0 NaN NaN 0
------------------------------------------------------------
group: Tue
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 572 2 0 2 2 0 2 2 0 NaN NaN 0
------------------------------------------------------------
group: Wed
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 500 3 0 3 3 0 3 3 0 NaN NaN 0
Descriptive statistics by group
group: Apr
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 168 3.89 2.06 4 3.87 2.97 1 7 6 0.03 -1.33 0.16
------------------------------------------------------------
group: Aug
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 272 4.04 2.01 4 4.06 2.97 1 7 6 -0.07 -1.27 0.12
------------------------------------------------------------
group: Dec
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 3.97 2.12 4 3.96 2.97 1 7 6 0.05 -1.42 0.13
------------------------------------------------------------
group: Feb
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 423 3.47 1.87 3 3.37 2.97 1 7 6 0.25 -1 0.09
------------------------------------------------------------
group: Jan
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 201 3.83 1.8 4 3.85 1.48 1 7 6 -0.11 -1.13 0.13
------------------------------------------------------------
group: Jul
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 237 3.78 1.85 3 3.73 1.48 1 7 6 0.23 -1.12 0.12
------------------------------------------------------------
group: Jun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 223 4.17 2.02 4 4.21 2.97 1 7 6 -0.06 -1.3 0.14
------------------------------------------------------------
group: Mar
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 494 3.82 1.87 4 3.8 2.97 1 7 6 0.03 -1.17 0.08
------------------------------------------------------------
group: May
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 241 3.84 1.97 4 3.8 2.97 1 7 6 0.15 -1.12 0.13
------------------------------------------------------------
group: Nov
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 4.08 1.99 4 4.11 2.97 1 7 6 -0.16 -1.33 0.12
------------------------------------------------------------
group: Oct
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 426 3.7 1.94 4 3.63 2.97 1 7 6 0.23 -1.11 0.09
------------------------------------------------------------
group: Sep
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 344 3.91 2.15 4 3.88 2.97 1 7 6 0.07 -1.44 0.12
Descriptive statistics by group
group: card
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 3547 6.45 3.5 7 6.42 4.45 1 12 11 0 -1.38 0.06
Descriptive statistics by group
group: Americano
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 564 5.23 3.35 4 4.92 2.97 1 12 11 0.62 -1.03 0.14
------------------------------------------------------------
group: Americano with Milk
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 809 6.59 3.39 7 6.6 4.45 1 12 11 -0.11 -1.24 0.12
------------------------------------------------------------
group: Cappuccino
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 486 6.24 3.35 6 6.15 4.45 1 12 11 0.17 -1.19 0.15
------------------------------------------------------------
group: Cocoa
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 239 6.29 4.02 6 6.2 5.93 1 12 11 0.1 -1.71 0.26
------------------------------------------------------------
group: Cortado
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 287 7.1 3.37 8 7.22 2.97 1 12 11 -0.32 -1.07 0.2
------------------------------------------------------------
group: Espresso
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 129 6.37 3.37 7 6.28 4.45 1 12 11 0.06 -1.24 0.3
------------------------------------------------------------
group: Hot Chocolate
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 276 6.96 3.81 7.5 7.02 5.19 1 12 11 -0.17 -1.6 0.23
------------------------------------------------------------
group: Latte
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 757 7 3.37 8 7.11 4.45 1 12 11 -0.29 -1.24 0.12
Descriptive statistics by group
group: Afternoon
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1205 6.02 3.61 5 5.9 4.45 1 12 11 0.23 -1.45 0.1
------------------------------------------------------------
group: Morning
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1181 6.74 3.44 7 6.77 4.45 1 12 11 -0.15 -1.32 0.1
------------------------------------------------------------
group: Night
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1161 6.61 3.41 7 6.61 4.45 1 12 11 -0.06 -1.27 0.1
Descriptive statistics by group
group: Fri
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 532 6.08 3.57 6 6 4.45 1 12 11 0.14 -1.44 0.15
------------------------------------------------------------
group: Mon
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 544 6.3 3.58 6 6.22 4.45 1 12 11 0.06 -1.46 0.15
------------------------------------------------------------
group: Sat
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 470 6.75 3.53 7 6.8 4.45 1 12 11 -0.11 -1.35 0.16
------------------------------------------------------------
group: Sun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 419 7.18 3.25 8 7.25 4.45 1 12 11 -0.2 -1.17 0.16
------------------------------------------------------------
group: Thu
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 510 6.21 3.47 6 6.15 4.45 1 12 11 0.07 -1.37 0.15
------------------------------------------------------------
group: Tue
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 572 6.79 3.47 7 6.82 4.45 1 12 11 -0.14 -1.36 0.15
------------------------------------------------------------
group: Wed
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 500 6.01 3.44 6 5.86 4.45 1 12 11 0.2 -1.33 0.15
Descriptive statistics by group
group: Apr
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 168 4 0 4 4 0 4 4 0 NaN NaN 0
------------------------------------------------------------
group: Aug
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 272 8 0 8 8 0 8 8 0 NaN NaN 0
------------------------------------------------------------
group: Dec
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 12 0 12 12 0 12 12 0 NaN NaN 0
------------------------------------------------------------
group: Feb
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 423 2 0 2 2 0 2 2 0 NaN NaN 0
------------------------------------------------------------
group: Jan
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 201 1 0 1 1 0 1 1 0 NaN NaN 0
------------------------------------------------------------
group: Jul
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 237 7 0 7 7 0 7 7 0 NaN NaN 0
------------------------------------------------------------
group: Jun
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 223 6 0 6 6 0 6 6 0 NaN NaN 0
------------------------------------------------------------
group: Mar
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 494 3 0 3 3 0 3 3 0 NaN NaN 0
------------------------------------------------------------
group: May
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 241 5 0 5 5 0 5 5 0 NaN NaN 0
------------------------------------------------------------
group: Nov
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 259 11 0 11 11 0 11 11 0 NaN NaN 0
------------------------------------------------------------
group: Oct
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 426 10 0 10 10 0 10 10 0 NaN NaN 0
------------------------------------------------------------
group: Sep
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 344 9 0 9 9 0 9 9 0 NaN NaN 0
2.2.2.2 Graphic
Code
library(ggplot2)
library(gridExtra)
plots <- list()
i <- 1
for (varC in varCat) {
for (varN in varNum) {
grafico <- ggplot(datos, aes(x = .data[[varN]], fill = .data[[varC]])) +
geom_histogram(colour = "black",
lwd = 0.75,
linetype = 1,
position = "identity",
alpha = 0.5) +
labs(title = paste("Histograma de", varN, "por", varC),
x = varN, y = "Frecuencia", fill = varC) +
theme_minimal()
plots[[i]] <- grafico
i <- i + 1
}
}
# Mostrar todos en un grid (2 columnas)
grid.arrange(grobs = plots, ncol = 2)
2.2.3 Categorical vs. categorical
2.2.3.1 Description
Code
for (varc1 in varCat) {
for (varc2 in varCat) {
if (varc1 != varc2) {
prop_table <- prop.table(table(datos[, varc1], datos[, varc2]))
cat("=============", varc1, " vs. ", varc2, "=========================\n")
print(prop_table)
}
}
}============= cash_type vs. coffee_name =========================
Americano Americano with Milk Cappuccino Cocoa Cortado
card 0.15900761 0.22808007 0.13701720 0.06738089 0.08091345
Espresso Hot Chocolate Latte
card 0.03636876 0.07781224 0.21341979
============= cash_type vs. Time_of_Day =========================
Afternoon Morning Night
card 0.3397237 0.3329574 0.3273189
============= cash_type vs. Weekday =========================
Fri Mon Sat Sun Thu Tue Wed
card 0.1499859 0.1533690 0.1325063 0.1181280 0.1437835 0.1612630 0.1409642
============= cash_type vs. Month_name =========================
Apr Aug Dec Feb Jan Jul
card 0.04736397 0.07668452 0.07301945 0.11925571 0.05666761 0.06681703
Jun Mar May Nov Oct Sep
card 0.06287003 0.13927262 0.06794474 0.07301945 0.12010149 0.09698337
============= coffee_name vs. cash_type =========================
card
Americano 0.15900761
Americano with Milk 0.22808007
Cappuccino 0.13701720
Cocoa 0.06738089
Cortado 0.08091345
Espresso 0.03636876
Hot Chocolate 0.07781224
Latte 0.21341979
============= coffee_name vs. Time_of_Day =========================
Afternoon Morning Night
Americano 0.065689315 0.061742317 0.031575980
Americano with Milk 0.067380885 0.093318297 0.067380885
Cappuccino 0.046236256 0.034395264 0.056385678
Cocoa 0.021144629 0.016351847 0.029884409
Cortado 0.024809698 0.040315760 0.015787990
Espresso 0.015787990 0.012404849 0.008175923
Hot Chocolate 0.022554271 0.013814491 0.041443473
Latte 0.076120665 0.060614604 0.076684522
============= coffee_name vs. Weekday =========================
Fri Mon Sat Sun
Americano 0.029602481 0.026219340 0.019453059 0.012968706
Americano with Milk 0.029038624 0.036086834 0.033831407 0.027910911
Cappuccino 0.017479560 0.020016916 0.019453059 0.019734987
Cocoa 0.014660276 0.009585565 0.006484353 0.006766281
Cortado 0.010431350 0.009867494 0.015787990 0.011840992
Espresso 0.005920496 0.002819284 0.003946997 0.004510854
Hot Chocolate 0.012686778 0.009867494 0.006766281 0.011277136
Latte 0.030166338 0.038906118 0.026783197 0.023118128
Thu Tue Wed
Americano 0.023118128 0.022836200 0.024809698
Americano with Milk 0.029038624 0.040315760 0.031857908
Cappuccino 0.021708486 0.017761489 0.020862701
Cocoa 0.006484353 0.016069918 0.007330138
Cortado 0.011840992 0.012404849 0.008739780
Espresso 0.007612067 0.004510854 0.007048210
Hot Chocolate 0.013532563 0.013814491 0.009867494
Latte 0.030448266 0.033549478 0.030448266
============= coffee_name vs. Month_name =========================
Apr Aug Dec Feb
Americano 0.0093036369 0.0104313504 0.0076120665 0.0329856217
Americano with Milk 0.0107132788 0.0202988441 0.0160699182 0.0239639132
Cappuccino 0.0101494220 0.0095855653 0.0107132788 0.0146602763
Cocoa 0.0011277136 0.0031012123 0.0059204962 0.0157879899
Cortado 0.0045108542 0.0112771356 0.0087397801 0.0028192839
Espresso 0.0011277136 0.0039469975 0.0033831407 0.0047927826
Hot Chocolate 0.0028192839 0.0016915703 0.0073301381 0.0090217085
Latte 0.0076120665 0.0163518466 0.0132506343 0.0152241331
Jan Jul Jun Mar
Americano 0.0070482098 0.0101494220 0.0039469975 0.0377784043
Americano with Milk 0.0146602763 0.0183253454 0.0186072738 0.0231181280
Cappuccino 0.0076120665 0.0090217085 0.0129687059 0.0163518466
Cocoa 0.0039469975 0.0025373555 0.0011277136 0.0101494220
Cortado 0.0062024246 0.0039469975 0.0053566394 0.0084578517
Espresso 0.0014096420 0.0039469975 0.0028192839 0.0053566394
Hot Chocolate 0.0042289259 0.0031012123 0.0039469975 0.0121229208
Latte 0.0115590640 0.0157879899 0.0140964195 0.0259374119
May Nov Oct Sep
Americano 0.0112771356 0.0070482098 0.0124048492 0.0090217085
Americano with Milk 0.0152241331 0.0146602763 0.0231181280 0.0293205526
Cappuccino 0.0146602763 0.0073301381 0.0124048492 0.0115590640
Cocoa 0.0022554271 0.0098674937 0.0090217085 0.0025373555
Cortado 0.0047927826 0.0036650691 0.0095855653 0.0115590640
Espresso 0.0019734987 0.0008457852 0.0033831407 0.0033831407
Hot Chocolate 0.0036650691 0.0104313504 0.0163518466 0.0031012123
Latte 0.0140964195 0.0191711305 0.0338314068 0.0265012687
============= Time_of_Day vs. cash_type =========================
card
Afternoon 0.3397237
Morning 0.3329574
Night 0.3273189
============= Time_of_Day vs. coffee_name =========================
Americano Americano with Milk Cappuccino Cocoa Cortado
Afternoon 0.065689315 0.067380885 0.046236256 0.021144629 0.024809698
Morning 0.061742317 0.093318297 0.034395264 0.016351847 0.040315760
Night 0.031575980 0.067380885 0.056385678 0.029884409 0.015787990
Espresso Hot Chocolate Latte
Afternoon 0.015787990 0.022554271 0.076120665
Morning 0.012404849 0.013814491 0.060614604
Night 0.008175923 0.041443473 0.076684522
============= Time_of_Day vs. Weekday =========================
Fri Mon Sat Sun Thu Tue
Afternoon 0.04849168 0.04990133 0.05469411 0.04736397 0.04764590 0.04510854
Morning 0.05441218 0.05441218 0.04426276 0.03383141 0.04116154 0.05835918
Night 0.04708204 0.04905554 0.03354948 0.03693262 0.05497604 0.05779532
Wed
Afternoon 0.04651818
Morning 0.04651818
Night 0.04792783
============= Time_of_Day vs. Month_name =========================
Apr Aug Dec Feb Jan Jul
Afternoon 0.02058077 0.02001692 0.02199041 0.04961940 0.02114463 0.01606992
Morning 0.01127714 0.03411334 0.02537356 0.03326755 0.01719763 0.02762898
Night 0.01550606 0.02255427 0.02565548 0.03636876 0.01832535 0.02311813
Jun Mar May Nov Oct Sep
Afternoon 0.01522413 0.06146039 0.02199041 0.02847477 0.03834226 0.02480970
Morning 0.01888920 0.04623626 0.01691570 0.02368198 0.04341697 0.03495912
Night 0.02875670 0.03157598 0.02903862 0.02086270 0.03834226 0.03721455
============= Weekday vs. cash_type =========================
card
Fri 0.1499859
Mon 0.1533690
Sat 0.1325063
Sun 0.1181280
Thu 0.1437835
Tue 0.1612630
Wed 0.1409642
============= Weekday vs. coffee_name =========================
Americano Americano with Milk Cappuccino Cocoa Cortado
Fri 0.029602481 0.029038624 0.017479560 0.014660276 0.010431350
Mon 0.026219340 0.036086834 0.020016916 0.009585565 0.009867494
Sat 0.019453059 0.033831407 0.019453059 0.006484353 0.015787990
Sun 0.012968706 0.027910911 0.019734987 0.006766281 0.011840992
Thu 0.023118128 0.029038624 0.021708486 0.006484353 0.011840992
Tue 0.022836200 0.040315760 0.017761489 0.016069918 0.012404849
Wed 0.024809698 0.031857908 0.020862701 0.007330138 0.008739780
Espresso Hot Chocolate Latte
Fri 0.005920496 0.012686778 0.030166338
Mon 0.002819284 0.009867494 0.038906118
Sat 0.003946997 0.006766281 0.026783197
Sun 0.004510854 0.011277136 0.023118128
Thu 0.007612067 0.013532563 0.030448266
Tue 0.004510854 0.013814491 0.033549478
Wed 0.007048210 0.009867494 0.030448266
============= Weekday vs. Time_of_Day =========================
Afternoon Morning Night
Fri 0.04849168 0.05441218 0.04708204
Mon 0.04990133 0.05441218 0.04905554
Sat 0.05469411 0.04426276 0.03354948
Sun 0.04736397 0.03383141 0.03693262
Thu 0.04764590 0.04116154 0.05497604
Tue 0.04510854 0.05835918 0.05779532
Wed 0.04651818 0.04651818 0.04792783
============= Weekday vs. Month_name =========================
Apr Aug Dec Feb Jan Jul
Fri 0.007048210 0.009585565 0.008175923 0.020016916 0.011277136 0.010995207
Mon 0.008175923 0.011559064 0.011559064 0.024809698 0.008175923 0.006766281
Sat 0.006484353 0.013250634 0.011277136 0.009303637 0.009021708 0.007612067
Sun 0.006484353 0.010431350 0.012122921 0.009021708 0.002819284 0.006766281
Thu 0.006484353 0.012404849 0.007893995 0.017761489 0.010431350 0.007612067
Tue 0.007330138 0.009867494 0.012686778 0.015787990 0.007893995 0.013814491
Wed 0.005356639 0.009585565 0.009303637 0.022554271 0.007048210 0.013250634
Jun Mar May Nov Oct Sep
Fri 0.007330138 0.025091627 0.009021708 0.012122921 0.018325345 0.010995207
Mon 0.007612067 0.019453059 0.010431350 0.009585565 0.018325345 0.016915703
Sat 0.009867494 0.020298844 0.005638568 0.014942205 0.010713279 0.014096420
Sun 0.010995207 0.011277136 0.009867494 0.008175923 0.014096420 0.016069918
Thu 0.009303637 0.018889202 0.013532563 0.009867494 0.018043417 0.011559064
Tue 0.008739780 0.021426558 0.010149422 0.013250634 0.021990414 0.018325345
Wed 0.009021708 0.022836200 0.009303637 0.005074711 0.018607274 0.009021708
============= Month_name vs. cash_type =========================
card
Apr 0.04736397
Aug 0.07668452
Dec 0.07301945
Feb 0.11925571
Jan 0.05666761
Jul 0.06681703
Jun 0.06287003
Mar 0.13927262
May 0.06794474
Nov 0.07301945
Oct 0.12010149
Sep 0.09698337
============= Month_name vs. coffee_name =========================
Americano Americano with Milk Cappuccino Cocoa Cortado
Apr 0.0093036369 0.0107132788 0.0101494220 0.0011277136 0.0045108542
Aug 0.0104313504 0.0202988441 0.0095855653 0.0031012123 0.0112771356
Dec 0.0076120665 0.0160699182 0.0107132788 0.0059204962 0.0087397801
Feb 0.0329856217 0.0239639132 0.0146602763 0.0157879899 0.0028192839
Jan 0.0070482098 0.0146602763 0.0076120665 0.0039469975 0.0062024246
Jul 0.0101494220 0.0183253454 0.0090217085 0.0025373555 0.0039469975
Jun 0.0039469975 0.0186072738 0.0129687059 0.0011277136 0.0053566394
Mar 0.0377784043 0.0231181280 0.0163518466 0.0101494220 0.0084578517
May 0.0112771356 0.0152241331 0.0146602763 0.0022554271 0.0047927826
Nov 0.0070482098 0.0146602763 0.0073301381 0.0098674937 0.0036650691
Oct 0.0124048492 0.0231181280 0.0124048492 0.0090217085 0.0095855653
Sep 0.0090217085 0.0293205526 0.0115590640 0.0025373555 0.0115590640
Espresso Hot Chocolate Latte
Apr 0.0011277136 0.0028192839 0.0076120665
Aug 0.0039469975 0.0016915703 0.0163518466
Dec 0.0033831407 0.0073301381 0.0132506343
Feb 0.0047927826 0.0090217085 0.0152241331
Jan 0.0014096420 0.0042289259 0.0115590640
Jul 0.0039469975 0.0031012123 0.0157879899
Jun 0.0028192839 0.0039469975 0.0140964195
Mar 0.0053566394 0.0121229208 0.0259374119
May 0.0019734987 0.0036650691 0.0140964195
Nov 0.0008457852 0.0104313504 0.0191711305
Oct 0.0033831407 0.0163518466 0.0338314068
Sep 0.0033831407 0.0031012123 0.0265012687
============= Month_name vs. Time_of_Day =========================
Afternoon Morning Night
Apr 0.02058077 0.01127714 0.01550606
Aug 0.02001692 0.03411334 0.02255427
Dec 0.02199041 0.02537356 0.02565548
Feb 0.04961940 0.03326755 0.03636876
Jan 0.02114463 0.01719763 0.01832535
Jul 0.01606992 0.02762898 0.02311813
Jun 0.01522413 0.01888920 0.02875670
Mar 0.06146039 0.04623626 0.03157598
May 0.02199041 0.01691570 0.02903862
Nov 0.02847477 0.02368198 0.02086270
Oct 0.03834226 0.04341697 0.03834226
Sep 0.02480970 0.03495912 0.03721455
============= Month_name vs. Weekday =========================
Fri Mon Sat Sun Thu Tue
Apr 0.007048210 0.008175923 0.006484353 0.006484353 0.006484353 0.007330138
Aug 0.009585565 0.011559064 0.013250634 0.010431350 0.012404849 0.009867494
Dec 0.008175923 0.011559064 0.011277136 0.012122921 0.007893995 0.012686778
Feb 0.020016916 0.024809698 0.009303637 0.009021708 0.017761489 0.015787990
Jan 0.011277136 0.008175923 0.009021708 0.002819284 0.010431350 0.007893995
Jul 0.010995207 0.006766281 0.007612067 0.006766281 0.007612067 0.013814491
Jun 0.007330138 0.007612067 0.009867494 0.010995207 0.009303637 0.008739780
Mar 0.025091627 0.019453059 0.020298844 0.011277136 0.018889202 0.021426558
May 0.009021708 0.010431350 0.005638568 0.009867494 0.013532563 0.010149422
Nov 0.012122921 0.009585565 0.014942205 0.008175923 0.009867494 0.013250634
Oct 0.018325345 0.018325345 0.010713279 0.014096420 0.018043417 0.021990414
Sep 0.010995207 0.016915703 0.014096420 0.016069918 0.011559064 0.018325345
Wed
Apr 0.005356639
Aug 0.009585565
Dec 0.009303637
Feb 0.022554271
Jan 0.007048210
Jul 0.013250634
Jun 0.009021708
Mar 0.022836200
May 0.009303637
Nov 0.005074711
Oct 0.018607274
Sep 0.009021708
2.2.3.2 Graphic
Code
par(mfrow = c(3, 3))
for (varc1 in varCat) {
for (varc2 in varCat) {
if (varc1 != varc2) {
prop_table <- prop.table(table(datos[, varc1], datos[, varc2]))
barplot(prop_table, beside = TRUE)
}
}
}

Code
par(mfrow = c(1, 1)) 
3 Automatic Descriptive Analysis (EDA)
Existen muchas herramientas que realizan la descriptiva de manera automática sin necesidad de la programación. Este apartado permite recoger algunas de ellas para su facilidad.
3.1 Skim
Code
library(skimr)
library(tidyverse)
## Podem visualitzar un descriptiu de les dades
skim(datos)| Name | datos |
| Number of rows | 3547 |
| Number of columns | 9 |
| _______________________ | |
| Column type frequency: | |
| character | 5 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| cash_type | 0 | 1 | 4 | 4 | 0 | 1 | 0 |
| coffee_name | 0 | 1 | 5 | 19 | 0 | 8 | 0 |
| Time_of_Day | 0 | 1 | 5 | 9 | 0 | 3 | 0 |
| Weekday | 0 | 1 | 3 | 3 | 0 | 7 | 0 |
| Month_name | 0 | 1 | 3 | 3 | 0 | 12 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| hour_of_day | 0 | 1 | 14.19 | 4.23 | 6.00 | 10.00 | 14.00 | 18.00 | 22.0 | ▆▇▆▇▆ |
| money | 0 | 1 | 31.65 | 4.88 | 18.12 | 27.92 | 32.82 | 35.76 | 38.7 | ▁▃▂▅▇ |
| Weekdaysort | 0 | 1 | 3.85 | 1.97 | 1.00 | 2.00 | 4.00 | 6.00 | 7.0 | ▇▃▃▃▆ |
| Monthsort | 0 | 1 | 6.45 | 3.50 | 1.00 | 3.00 | 7.00 | 10.00 | 12.0 | ▇▃▃▅▇ |
Code
# Visualitzem exclusivament les variables numériques
skim(datos) %>% yank("numeric")Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| hour_of_day | 0 | 1 | 14.19 | 4.23 | 6.00 | 10.00 | 14.00 | 18.00 | 22.0 | ▆▇▆▇▆ |
| money | 0 | 1 | 31.65 | 4.88 | 18.12 | 27.92 | 32.82 | 35.76 | 38.7 | ▁▃▂▅▇ |
| Weekdaysort | 0 | 1 | 3.85 | 1.97 | 1.00 | 2.00 | 4.00 | 6.00 | 7.0 | ▇▃▃▃▆ |
| Monthsort | 0 | 1 | 6.45 | 3.50 | 1.00 | 3.00 | 7.00 | 10.00 | 12.0 | ▇▃▃▅▇ |
Code
skim(datos) %>% yank("character")Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| cash_type | 0 | 1 | 4 | 4 | 0 | 1 | 0 |
| coffee_name | 0 | 1 | 5 | 19 | 0 | 8 | 0 |
| Time_of_Day | 0 | 1 | 5 | 9 | 0 | 3 | 0 |
| Weekday | 0 | 1 | 3 | 3 | 0 | 7 | 0 |
| Month_name | 0 | 1 | 3 | 3 | 0 | 12 | 0 |
3.2 Vis
Code
library(visdat)
## Busquem per a variables numériques o categóriques si hi ha NA's
vis_dat(datos)
Code
## Visualitzem percentatges de NA's en les variables
vis_miss(datos)
Code
## Generem la matriu de correlacions
datos %>% select(where(is.numeric)) %>%
vis_cor()
Code
## Podem visualitzar condicionants de les dades. En aquest cas, mirem si tenim mes de
## 2 clases
vis_expect(datos, ~ .x > 2)
3.3 Inspectdf
Code
library(inspectdf)
## Tipus de dades
inspect_types(datos) %>% show_plot()
Code
## Utilització de la memoria
inspect_mem(datos) %>% show_plot()
Code
## Comprovem NA's
data_price_dummy <- datos %>%
mutate(price_dummy = if_else(money > 35, "High", "Low"))
inspect_na(data_price_dummy %>% filter(price_dummy == "High"),
data_price_dummy %>% filter(price_dummy == "Low")) %>%
show_plot()
Code
## Comprovem la distribució de les variables
inspect_num(datos) %>% show_plot()
Code
## check categorical variable distribution
inspect_imb(datos) %>% show_plot()
Code
## check two categorical
inspect_imb(data_price_dummy %>% filter(price_dummy == "High"),
data_price_dummy %>% filter(price_dummy == "Low")) %>%
show_plot() + theme(legend.position = "none")
Code
## similiar to inspect_imb, but for all levels
inspect_cat(datos) %>% show_plot()
Code
inspect_cor(datos) %>% show_plot()
3.4 dataReporter (antiguo dataMaid)
Code
library("dataReporter")
dataReporter::makeDataReport(datos, output = "html", file = "/Users/ramitjans/Downloads/Preprocessing/report.Rmd")
dataReporter::makeCodebook(data = datos, file = "/Users/ramitjans/Downloads/Preprocessing/codebook.Rmd")3.5 DataExplorer
Code
library(DataExplorer)
plot_str(datos)
introduce(datos) rows columns discrete_columns continuous_columns all_missing_columns
1 3547 9 5 4 0
total_missing_values complete_rows total_observations memory_usage
1 0 3547 31923 217536
Code
plot_intro(datos)
Code
plot_missing(datos)
Code
plot_bar(datos)
Code
plot_bar(datos, with = "money")
Code
plot_bar(datos, by = "cash_type")
Code
plot_histogram(datos)
Code
plot_correlation(na.omit(datos), maxcat = 5L)
3.6 SmartEDA
Code
library("SmartEDA")
## Overview of the data
ExpData(data = datos,type = 1)
## structure of the data
ExpData(data = datos,type = 2)3.6.1 Frequency or custom tables for categorical variables
Code
SmartEDA::ExpCTable(datos,Target=NULL,margin=1,clim=10,nlim=5,round=2,bin=NULL,per=T) Variable Valid Frequency Percent CumPercent
1 coffee_name Americano 564 15.90 15.90
2 coffee_name Americano with Milk 809 22.81 38.71
3 coffee_name Cappuccino 486 13.70 52.41
4 coffee_name Cocoa 239 6.74 59.15
5 coffee_name Cortado 287 8.09 67.24
6 coffee_name Espresso 129 3.64 70.88
7 coffee_name Hot Chocolate 276 7.78 78.66
8 coffee_name Latte 757 21.34 100.00
9 coffee_name TOTAL 3547 NA NA
10 Time_of_Day Afternoon 1205 33.97 33.97
11 Time_of_Day Morning 1181 33.30 67.27
12 Time_of_Day Night 1161 32.73 100.00
13 Time_of_Day TOTAL 3547 NA NA
14 Weekday Fri 532 15.00 15.00
15 Weekday Mon 544 15.34 30.34
16 Weekday Sat 470 13.25 43.59
17 Weekday Sun 419 11.81 55.40
18 Weekday Thu 510 14.38 69.78
19 Weekday Tue 572 16.13 85.91
20 Weekday Wed 500 14.10 100.01
21 Weekday TOTAL 3547 NA NA
3.7 Esquisse
Este paquete nos permite abrir un Shiny con los controles para realizar mediante menús.
Code
esquisse::esquisser(datos)4 Bibliografia
- https://www.analyticsvidhya.com/blog/2022/10/three-r-libraries-for-automated-eda/
- https://cran.r-project.org/web/packages/dlookr/vignettes/EDA.html
- https://cran.r-project.org/web/packages/DataExplorer/vignettes/dataexplorer-intro.html
- https://daya6489.github.io/SmartEDA/
Esta web está creada por Dante Conti y Sergi Ramírez, (c) 2025
