Running Head: STATISTICS ASSIGNMENT

Statistics Assignment

Name of Student

Name of Course

Course Instructor

Date

**Statistics Assignment**

**Experiment 1 and 2 (Part A)**

**Introduction:** In experiments 1 and 2 we test the genetics and phenotype of the mouse mutants by first discussing their Mendelian inheritance pattern and then testing whether the distribution of the genotypes is consistent or significantly different.

a).

The laws of Mendelian inheritance pattern are highly important in understanding the patterns of the disease transmission. Most of the genes have more than one versions due to the mutation or due to polymorphisms which are called as alleles. There are different patterns of inheritance of the gene. However, the most adequate inheritance pattern for describing the distribution of the genotypes would be the autosomal dominant pattern. If we look at the number in the experiments 1 and 2 then we can see that the distribution of the number of mice shows that each of the mouse has an affected person and tis occurs in males and females and in every generation generally.

b).

Since, experiment 1 has been conducted at Embryo E19 stage and experiment 2 has been conducted at the Postnatal P16 stage therefore, the distribution of genotypes in experiment 1 is expected distribution and the distribution of the genotypes in experiment 2 is observed distribution. Therefore, we have to determine the differences in both the distributions and whether they are significant or not. We have applied the chi square test for means with a significance level of 0.05. The results are shown in the table below:

Experiment 1 | ||||

KIAA+/+ | KIAA+/- | KIAA-/- | ||

o | 31 | 46 | 15 | |

E | 23 | 46 | 23 | |

O-E | 8 | 0 | -8 | |

(O-E)^2 | 64 | 0 | 64 | |

(O-E)^2/E | 2.782608696 | 0 | 2.7826087 | |

total | 5.565217391 | |||

not significant | ||||

Experiment 2 | ||||

KIAA+/+ | KIAA+/- | KIAA-/- | ||

o | 62 | 120 | 36 | |

E | 54.5 | 109 | 54.5 | |

O-E | 7.5 | 11 | -18.5 | |

(O-E)^2 | 56.25 | 121 | 342.25 | |

(O-E)^2/E | 0.907258065 | 1.11009174 | 6.2798165 | |

total | 8.297166321 | |||

significant |

Since, the p value is greater than 0.05 level of significance for experiment 2 therefore, the difference is significant and we can conclude that the distribution of the genotypes is not consistent with the expected distributions. However, the results for experiment 1 are vice versa and the distribution of the genotypes is consistent with the expected distributions

c).

The distribution of the phenotypes in experiment 2 suggest that the expected distributions might be statistically different from observed distributions due to the different patterns of distributions as shown by numbers and vice versa in experiment 1.

d).

The conclusion that we make based on the given data is that mutation effects the mice however, the inheritance patterns start immediately as the breeding starts. The mutation shows its effects across all generations because of the distribution pattern that it follows for all the genotypes however, the phenotype distribution suggests that it does not follow the same distribution and thus the number of genotypes mice increases with each experiment.

e).

As we want to analyze the pattern of the phenotype with the passage of the time, therefore, One Way ANOVA test could be used to check the hypothesis.

**Conclusion:** Based on the results of the tests that we applied we conclude that distribution of the phenotypes in experiment 2 suggest that the expected distributions might be statistically different from observed distributions due to the different patterns of distributions as shown by numbers and vice versa in experiment 1.

**Experiment 3 (Part A)**

**Introduction: **In this experiment we test the bodyweights of various genotypes by testing that whether there are differences in the bodyweight of the females at various ages of the different genotypes.

a).

In order to determine that whether there are differences in the bodyweight of the females of different genotypes at different time periods, the most relevant statistical test would be student’s t test. However, it is the condition of the t- test that all the samples at different time periods need to have the same number of the observations therefore, we have assumed the observations for day 20 and day 40 as 0 grams for the missing values. The results of the student’s t-test between days 10 and 20 and days 20 and 4 at a significance level of 0.05 are shown in the table below:

t-Test: Two-Sample Assuming Equal Variances | t-Test: Two-Sample Assuming Equal Variances | |||||

| Day10 | Day 20 | | Day 20 | Day 40 | |

Mean | 6.11162135 | 14.97692308 | | Mean | 14.9769 | 15.7885 |

Variance | 0.00564799 | 0.424376781 | Variance | 0.42438 | 22.9487 | |

Observations | 13 | 13 | Observations | 13 | 13 | |

Pooled Variance | 0.21501239 | Pooled Variance | 11.6865 | |||

Hypothesized Mean Difference | 0 | Hypothesized Mean Difference | 0 | |||

df | 24 | df | 24 | |||

t Stat | -48.7436965 | t Stat | -0.6052 | |||

P(T<=t) one-tail | 8.0986E-26 | | P(T<=t) one-tail | 0.27535 | ||

t Critical one-tail | 1.71088208 | t Critical one-tail | 1.71088 | |||

P(T<=t) two-tail | 1.6197E-25 | P(T<=t) two-tail | 0.5507 | |||

t Critical two-tail | 2.06389856 | t Critical two-tail | 2.0639 |

Since, the p value is 0.000 for first test therefore, there is significant difference in the bodyweight of the females of the different genotypes at the various stages of age 10 and 20 days However, the p value is 0.550 for second test therefore, there is insignificant difference in the bodyweight of the females of the different genotypes at the various stages of age 20 and 40 days. All the three data set samples were reasonable to be compared as they represent the body weights at different ages. Based on the results we conclude that the lack of the KIAA protein causes an increase in the body weight of the female genotypes with the passage of the time as they get old for first 20 days.

b).

Another alternative method that could be used to analyze the data is to plot the data and its distribution at different stages and then compare the distribution patterns of the weights. However, the limitation of this method is that it might not adequately indicate the significance of the differences. Additional tests that could be performed is the One Way ANOVA. This test could be first performed between all the intervals of 10 days. However, the limitation of this is that the sample comes from different ages therefore, this would not yield relevant results. The One-way ANOVA results are shown in excel spreadsheet.

c).

The bar charts are shown below comparing the three different data sets:

d).

The final conclusion that we can draw from the results is that mutation effects across all generations because of the distribution pattern that it follows for all the genotypes however, the phenotype distribution suggests that it does not follow the same distribution. Secondly, when there is lack of KIAA in the mice then they start to gain body weight with the passage of the time and this provides enough evidence that the lack of the protein KIAA affects the various organs of the mice as their age increases.

**Conclusion:** Based on the results of the tests we conclude that mutation effects across all generations because of the distribution pattern that it follows and there is significant difference in the bodyweight of the females of the different genotypes at the various stages of age 10 and 20 days However, the p value is 0.459 for second test therefore, there is insignificant difference in the bodyweight of the females of the different genotypes at the various stages of age 20 and 40 days.

**Experiment 4 (Part A)**

**Introduction: **In this experiment we compare the bodyweight of females of any genotype at day 16 with expected normal bodyweight.

a).

In order to test whether there are any significant differences in the bodyweight of the females of ay genotype at day 16 when compared with the known data, the one sample t-test would be the most appropriate test. The study mean would be 9.8g and we would compare it with the sample mean for day 16. The results of this test are shown below:

ONE SAMPLE T-TEST (Alpha = 0.05) | |

Count | 15 |

Mean | 9.68 |

Standard Deviation | 1.01 |

Standard Error | 0.26 |

Study Mean | 9.8 |

Alpha | 0.05 |

Tails | 1 |

Df | 14 |

t stat | -0.4602 |

P value | 0.3262 |

T crit | 2.1448 |

Significance | NO |

Since, the p value is 0.3262, therefore, there is no significant difference in the mean body weight of the females of any genotype as compared to previously known data.

b).

We can conclude from the results that, the past studies results are accurate and the body weight of the females of any type of genotype at day 16 should be around 9.8 grams and it should not significantly deviate from this value.

**Conclusion: **Based on the results, we conclude that the body weight of the females of any type of genotype at day 16 should be around 9.8 grams and it should not significantly deviate from this value.

**Experiment 5 (Part A)**

**Introduction: **In this experiment we compute the probabilities of selecting the different genotypes and we have employed normal probability distribution for this analysis.

a).

P (x<=4) = 4-19/56 = -0.267

Probability at this z value from z table is: **39.74%**

b).

P (x>=3) = 3-19/56 = -0.285

The probability that none of the heterozygotes are selected would be: 1- P(x>=3 = 1-0.3897 = **61.03%**

c).

1.645 = X/stdev – Mean / Stdev

X/56 = 1.645 + 1/56

X/56 = 1.66

X = **93.12** mice needed to be selected with a probability in excess of 95%.

**Conclusion:** The results show that probabilities for parts A and B are 39.74% and 61.03% respectively and for part C 93 mice are needed approximately.

**Experiment 6 (Part B)**

**Introduction: **In this experiment we analyze the sporadic diseases in mouse colony. We employed probability distribution for computing the distributions for selecting affected and not affected cages.

a).

The distribution that would be adequate to describe the probability of the events would be the Gaussian or the standard normal distribution. Under this method, all the values would be plotted in the form of a symmetric fashion and most of the results would be located around the mean of the probability. All the values within the event are likely to be plotted below or above the value of the mean.

b).

We have first computed the average number of infected cages and its standard deviation for the five-month period which is shown below:

Mean Infected Cages | 0.226 |

Standard Deviation | 0.663 |

Sample Size | 15 |

We now need to find probability that no affected cages are included when randomly selected rack is moved to clean room.

P (x>=1) = 1-0.226/0.663 = 1.1671

Probability would be: 1- P (x>=1) = 1- 0.8770 = **12.3%**

c).

P (x<=2) = 2-0.226/0.663 = 2.6747

Probability would be: P (x<=2) = **99.62%**

**Conclusion: **The results for Parts B and C show probabilities of 12.30% and 99.62% respectively.

**Experiment 7 (Part C)**

**Introduction: **In this experiment we analyze the effect of mutations in gene linked with primary cilia on the regeneration of muscle and independent sample t test has been performed to test for the significant differences in mean.

a).

The variability and the distribution of both the data sets is reflected in the charts below:

Wildtype | |

Min | 4.059 |

Max | 50.839 |

Bin Range | Frequency |

5 | 14 |

10 | 54 |

15 | 56 |

20 | 105 |

25 | 215 |

30 | 220 |

35 | 155 |

40 | 70 |

45 | 35 |

50 | 5 |

55 | 1 |

More | 0 |

Talpid Muscle | |

Min | 3.474 |

Max | 48.935 |

Bin Range | Frequency |

5 | 28 |

10 | 166 |

15 | 172 |

20 | 211 |

25 | 136 |

30 | 55 |

35 | 22 |

40 | 20 |

45 | 7 |

50 | 5 |

55 | 0 |

More | 0 |

b).

The statistical test that we have selected to test for the differences in the means of the two samples is the independent sample t test assuming unequal variances. The two samples show us the muscle fiber diameters and this test would help us to determine that whether there is a significant difference between the Ferret’s diameter of the muscle between wild type and talpid measurements. The results of the independent sample t test are shown in the table below:

t-Test: Two-Sample Assuming Unequal Variances | ||

| Wildtype (um) | Talpid muscle(-/-) (um) |

Mean | 25.020 | 16.694 |

Variance | 77.569 | 68.080 |

Observations | 930.000 | 822.000 |

Hypothesized Mean Difference | 0.000 | |

df | 1744.000 | |

t Stat | 20.420 | |

P(T<=t) one-tail | 0.000 | |

t Critical one-tail | 1.646 | |

P(T<=t) two-tail | 0.000 | |

t Critical two-tail | 1.961 |

c).

Since, the p value is for one tail and two tail is less than 0.05 level of significance therefore, this means that there is significant difference between the individual fiber diameters. We conclude on the basis of the results that, the effect of the tissue specific deficiency of the protein is significant and it impact significantly on the mice with muscle specific. The Injection of a myotoxin induces local muscle defects in the injected muscle which is followed by the regeneration of muscle fibers within a period of 10-14 days. The difference in the individual muscle fibers at day 10 after the injury are significantly.

**Conclusion:** The results of the tests show that the effect of the tissue specific deficiency of the protein is significant and it impact significantly on the mice with muscle specific.