Homework

Including my original code here for visibility purposes.

library(ggplot2)

#setting a seed for reproducible data
set.seed(190)

# creating my treatment groups, here I am using SD to define the range instead of what we did in class
control <- rnorm(100, mean = 25000, sd = 1000)
treated <- rnorm(100, mean = 30000, sd = 1000)

# Create sample IDs
sample_ids <- paste(1:200)

#  creating a dataframe named z in the spirit of continuing the classes methods, I am using a
z <- data.frame(
  Sample_ID = sample_ids,
  Group = rep(c("Control", "Treated"), each = 100),
  Cell_Count = c(control, treated)
)

# the creation of the histogram from last class is a bit confusing so using this method from ggplot instead
OG_plot <- ggplot(z, aes(x = Cell_Count, fill = Group)) +
  geom_histogram(binwidth = 500, position = "dodge", color = "black") +
  labs(title = "Histogram of Cell Count by Treatment Group",
       x = "Cell Count", y = "Frequency") +
  scale_fill_manual(values = c("Control" = "blue", "Treated" = "red")) +
  theme_minimal()

print(OG_plot)

#adding a geom den curve, however, this doesn't seem to be following the trend, perhaps since the two treatments are so different?

OG_plot <-  OG_plot +  geom_density(linetype="dotted",size=0.75)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

print(OG_plot)

Use the code that you worked on in Homework #7 (creating fake data sets), and re-organize it following the principles of structured programming. Do all the work in a single chunk in your R markdown file, just as if you were writing a single R script. Start with all of your annotated functions, preliminary calls, and global variables. The program body should be only a few lines of code that call the appropriate functions and run them in the correct order. Make sure that the output from one function serves as the input to the next. You can either daisy-chain the functions or write separate lines of code to hold elements in temporary variables and pass them along.

library(ggplot2)

source("Homework_9.R")

# setting seed for reproducibility. I didn't turn this into a function. 
set.seed(190)

# generating fake data
control <- generate_data(100, mean = 25000, std = 1000)
treated <- generate_data(100, mean = 30000, std = 1000)

# creating sample ids
sample_ids <- create_IDs(200)

# creating data frame
z <- create_dataframe(sample_ids, control, treated)

# plotting histo
histogram_plot <- plot_hist(z)

# printing histo
print(histogram_plot)

#running anova
run_anova(z)

##              Df    Sum Sq   Mean Sq F value Pr(>F)    
## Group         1 1.194e+09 1.194e+09    1393 <2e-16 ***
## Residuals   198 1.698e+08 8.574e+05                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## Call:
##    aov(formula = Cell_Count ~ Group, data = dataset)
## 
## Terms:
##                      Group  Residuals
## Sum of Squares  1194132066  169765592
## Deg. of Freedom          1        198
## 
## Residual standard error: 925.96
## Estimated effects may be unbalanced

#plotting and printing boxplot
boxplot_plot <- plot_boxplot(z)
print(boxplot_plot)

Homework_9

Alexander Kissonergis

2024-04-01