top of page

Discussion Questions

Chapter 1

1. The author talks about how there are a lot of issues with using statistics…that it is an imperfect measure. If this is the case, why do we use statistics and why to do many people rely on them to make informed decisions/conclusions?

It is true that there are a lot of issues with using statistics since it is an imperfect measure, but the reason people rely heavily on statistics is because it is a quick way to find out information.  Charles Wheelan used the example of judging a quarterback’s game (page 1).  Different people may judge a quarterback on their rushing yards, touchdown passes, or yardage.  There might be pages and pages of data for just one quarterback during one game.  Having an equation that encompasses all of the separate aspects makes comparing things easily.  People like nice and easy things; that’s why I believe people rely on statistics to make informed decisions. 

 

There are so many things that use statistics, for instance I will use colleges as an example since it is near and dear to me at this point in my life.  I can judge my chances on getting into a school by typing in my ACT score and GPA.  Not only are the odds a form of statistics, the ACT scoring and GPA are both scoring systems that rely heavily on statistical measures.  Let’s say my odds are 1:1000000 on getting into Stanford, but I end up getting in because my essay was excellent and my extra curriculars were impressive.  Statistics can’t factor into how good my essay is going to be, how challenging of classes I took, or whether or not I have a relationship with the admissions officer, it can only tell me my odds for the quantitative data like my test scores.  The statistics used were helpful because it prepared me.  Although the statistics in this case were wrong, they were still useful in telling me information to prepare me for the odds.  I believe that people use them because some knowledge is better than nothing.  And people use statistics to make informed conclusions because statistics aren’t biased and will help people make informed decisions without getting emotions involved.  Although they aren’t perfect, they sure are useful.

2. I once had a friend who said, ‘Statistics is more like an art than a math.’ Use examples from chapter 1 of Naked Statistics to support my friend’s statement. Then, write your own reflection—do you agree or disagree with the statement and why?

Wheelan talked about sampling in order to find the count of homeless people in Chicago (page 6).  Statisticians sample because it is more convenient.  I would compare this to an artist painting the sunset.  Instead of sitting outside where the wind could blow their canvas away, the artist can take a picture of the sunset and finish painting it in their studio.  Statisticians take a sample size and then finish their report on the computer by running statistical tests. 

 

On page 14 Wheelan said, “We conduct statistical analysis using the best data and methodologies and resources available.”  This reminds me of children in poor countries who make beautiful beads for necklaces by rolling up newpaper shreds.  They were able to make art with the simplest things.  That is what statisticians do.

 

Charles Wheelan also says that there is no “right” answer in statistics like addition or long division (page 14).  This is very similar to when people are trying to interpret artwork.  There is no right way to interpret a piece; it is whatever emotion it conjures up inside you. 

 

“Data is merely the raw material of knowledge.” (page 4) A canvas is just fabric until it’s made into art.

 

It is a very accurate statement because you are making something ugly into something beautiful.  An analogy I thought of is that data could be non-normally distributed, you might have to use different tools like nonparametric tests, but you can still achieve a graph that depicts what you want.  If an artist ran out watercolor paint while they were painting the sunset, they could water down some acrylic paint and use it to finish the painting.  Both used different tools than they had expected, but ended up getting the same outcome.

3. Give one example of categorical data and one example of quantitative data. In each of these examples, explain what type of graph you would use.

  • Categorical data: Blood types of people

    • Pie chart, dot plot, bar graph

  • Quantitative data: Number of pets people have

    • Boxplot, histogram, dot plot

Chapter 2

1. Most data isn’t exactly normally distributed, and therefore, the Normal curve isn’t ‘perfect.’ Do you think this is an issue? What are pros and cons to using the Normal curve if it isn’t perfect? You may use specific examples if you’d like to explain your thoughts.

 

Yes this can be an issue when you are trying to compare data points.  There are tests for these situations like nonparametric tests that do things by 'rank order' which uses the median and it ranks data points based on this.  Nonparametric tests are a great tool to use, but there a few cons with using them.  The first con is that you have to use a statistical software to do these tests, the second con is that you are unable to see each data point.  Since things are done by rankings, the data points become ranks.  You can put the points into a Normal curve, but if the data isn't normally distributed, then you may get back misleading information.  Charles Wheelan in his book used the example of Bill Gates walking into a bar and collecting the mean salary of the 5 guys in the bar.  Bill Gates would really skew things if you are looking at the mean because it doesn't give an accurate picture of the true salaries of the men in the bar.  The pros to using real data points even when it isn't normally distributed is that people are more comfortable with real data.  The way to report your data when it isn't normally distributed is to give both the median and mean and explain what could possibly be making the data not normal like identifying outliers and speculating why they occured.

2. Suppose you were doing some research for a professor, and the professor asked you to find the heights of 100 people. Using random sampling, you found 100 people’s height in inches. After doing this, your professor asked you to find the mean and standard deviation of the data. You did so, and showed her the results.

 

Professor: Oh no! I wanted the data in centimeters, not inches. Now you are going to have to convert everyone’s height in centimeters to find the mean and standard deviation!

You: It’s okay. Since I know the mean and standard deviation in inches, I just have to transform the data! No need to rewrite everyone’s height in inches.

Explain what you would do to find the mean and standard deviation in centimeters–you will need to find out how to convert from inches to centimeters. Be specific. If you would like, you can use actual numbers to help you explain your answer.

 

 1 inch = 2.54 centiments

 

Since multiplying (or dividing) by a constant also multiplies (divides) measures of center and location (mean, median, quartiles, percentages) and measures of spread (range, IQR, standard deviation) by the constant, then you would just multiply the mean and standard deviation by 2.54 to get it in centiments. 

Chapter 3

1. Find one headline in the news (in the last 6 months) that implies causation, for example, ‘Eating Kale Reduces Colon Cancer.’ Summarize the article in 1-2 sentences, then talk about the methods of the article. Did the researchers survey the participants or do a controlled experiment? Do you think their statement is valid?

Huffington Post: One Hour Of Extra Screen Time Drags Down Teenagers' Grades

 

Actually Published: Revising on the run or studying on the sofa: prospective associations between physical activity, sedentary behaviour, and exam results in British adolescents

 

Summary: Researchers at Cambridge University found that an extra hour in front of TV or online was linked with 9.3 fewer exam points for kids.  Then the researchers found pupils who were doing an extra hour of daily homework and reading scored 23.1 points higher than their peers.

 

Methods: The researchers followed 845 students ages 14 to 15 for 10 years to see how their behaviors affected performance in school.  The study was done through self- reported surveys of the teenagers. A multilevel mixed-effects linear regression was adjusted for mood, BMI z-score, deprivation, sex, season and school in order to make sure nothing was influencing the correlation. 

 

Valid?:  I believe that the researchers’ study was valid.  In the title of their study they even say “prospective associations,” what they don’t say is “prospective causations.”  Huffington Post made their valid study seem invalid but implying causation.  Had I not looked for and read the original publishing, then I would have thought the researchers were at fault and their study was poorly done.

2. One thing that I hadn’t considered was the top grossing films in the US that were adjusted for inflation. The list is very different! Out of the several topics Wheelen talked about in Chapter 3 (minimum wage, top-grossing movies, defense spending, school quality, etc.), which do you 1) find most interesting (and why), and 2) has the biggest implications for our society.

 The topic that I find both the most interesting AND the most horrifying was the “scorecard” pressuring cardiologists to only take patients they think will improve their scores.  I can’t wrap my head around this!  I understand that people want to know what doctor they should choose, but doing a survival rate just seems stupid.  Why not measure a doctor’s performance by family evaluations?  If a cardiologist takes on a patient who has an unlikelihood of living, they can still get good ratings if they treat that patient and his or her family well.  This would combine both EQ (emotional intelligence) and IQ, which in my opinion are what make a doctor a successful one.  These kinds of ratings or “scorecards” also have a huge implication on our society.  Ratings mess almost everything up from college acceptances, to doctor performances, school performances, and job satisfaction rates.  There are numerous college ratings that are supposed to help students decide where they want to go to college.  In the equations used to determine this ranking, are things like acceptance rate and price of college.  Colleges, wanting a higher rating, accept less people.  Colleges are becoming more selective so it is tougher for kids to get into their top schools.  Then with doctor performances, cardiologists sadly aren’t the only kinds of doctors who manipulate the system.  I could imagine that many other types of doctors don’t take on patients they otherwise would in fear of them messing up their “scorecard” or rating.  High schools are guilty of this too, especially Texas.  Not wanting someone to mess up their graduation rates, they will hold back a certain student.  Heck even Wellington does this; Wellington has a 100% acceptance rate in college.  Even if a kid doesn’t want to go to college, they have to apply to at least a community college to maintain the perfect score.  Lots of institutions do this kind of thing.

3. Chapter 4 in ‘Naked Statistics’ is titled, ‘Correlation,’ and Wheelen talks about a contest that gave a cash prize to the team or individual that created the best algorithm that predicted actual customer reviews. Think of something this class could explore that would predict something about the people who respond. For example, predicting a way a student does in a certain class or with a certain teacher. For what you want to predict, come up with a list of 3 – 5 predictors. While our book does not go into multiple regression, we may have some time to actually explore our predictions. (For my example, my predictors would be 1) how many times the student comes in outside of class for help 2) how often a student raises his/her hand in class 3) homework completion rate

 

1) Hours spent reading a week

2) A rating of 1-10 of how much someone enjoys English class (1- hate, 10-love)

3) Grade in English class (i.e. 96% not A)

4) Score of ACT Reading section

Chapter 4

1. I mentioned in class that many real-world experiments and surveys are not done as a true simple random sample. Describe the impact this has on obtaining the results. Knowing the cost and time it takes to obtain a true SRS, is there a better way to collect data than to give college students extra credit?

The impact of not conducting a truely simple random sample is that there can be a bias in the people being surveyed.  In Ms. Lin's example of her pyschology professor offering extra credit for kids to fill out a survey, the professor most likely got data from kids who were concerned about their grade or who were good students.  This volunteer survey would exclude the kids who don't care which would be leaving out an important piece.  

 

I undersyad why the professor chose to do the survey this way.  SRS's are tough to do due to cost and time.  A method the professor could have used is a cluster sample of the floor of a dorm.  This dorm, if chosen correctly (not an honors or specific dorm type) would be able to represent the whole population and be a whole lot easier tp conduct.

2. Find one example of an experiment in a scholarly journal. Upload a link to this study on your website. Describe experimental design of this experiment. What is the independent variable, dependent variable, treatment groups? Who were the participants? Did the researchers use a randomized block design, a completely randomized design, or something else? Name at least one lurking/confounding variable.

 

Influence of supplementary vitamins, minerals and essential fatty acids on the antisocial behaviour of young adult prisoners 2002

 

Experimental Design:  Stratified Randomisation of different wings of prison

 

Independent Variable:  Length of supplementation, receiving dietary supplement or placebo

 

Dependent Varible:  Governor (violence) and minor (failure to comply with requirements) reports

 

Treatment groups:  The active group received 1 capsule of Forceval and 4 capsules of Efamol Marine daily.  The placebo group received capsules with vegetablThe length of treatment depended on how long their jail stay was.

 

Participants:  n=112; 57 active, 55 placebo.  Males Aged 18 and up (mean: 19)

 

Type of Design:  Experimental, double-blind, placebo-controlled, randomised trial

 

Lurking/confounding variables:  Missed days of taking supplement, amount of exercise received, age, food intake.

3. Read Chapter 7 in Naked Statistics. Pick one of the bias he talks about (selection bias, publication bias, recall bias, survivorship bias, healthy user bias). Briefly describe the bias you chose and come up with another example not mentioned in the book.

 

 

A healthy user bias is when people who care about health are the ones participating in a study.  In my study at Nationwide Children's Hospital, we ran into this same bias.  We found it ironic that there were only two people with a BMI between 35-39.9.  Considering that asthma has a high correlation with obesity, we couldn't figure out why our data was not a good representative of kids with asthma as a whole.  My mentor explained to me the healthy user bias and that almost all research done on human subjects has this kind of bias.  So journals are publishing studies that have biases in them all of the time.

Chapter 5

1. Create a probability lesson for grades Pre-K to 5th that revolves around probability. If you think your kids are too young, create a lesson about something we have covered this year. The lesson should be about 10 minutes long, and should include EVERYTHING you plan to say and do during these ten minutes. The lesson should include some activity, and this should be uploaded.

 

2. Here are the rules to the game Deal or No Deal: http://www.dealornodeal.co.uk/show/game-rules/. If there are only 2 cases that remain, the contestant can switch the final case. In the UK version, there are 23 cases. Should the contestant switch the final case?! Explain.

 

Yes.  If there are 3 choices, but two left—like in the Monty Hall problem—the person has a 2/3 chance of winning if they switch.  If the same person had 100 choices, but only two left, they have a 99% chance of winning if they switch.  In this case with the UK version, there are 23 cases and it is down to two boxes, there is a 22/23 or 95.65% chance that the person will win if they switch cases.  So yes, they most definitely should switch because statistics would be on their side.

Chapters 6 & 7

1. Why might we use the Normal approximation to the binomial distribution instead of just using the binomial distribution. Do some research. Include the link to the website you used in your answer. Answers should be at least a paragraph with examples.

 

It is possible to just use the binomial distribution, but it would be grueling work.  If I were to find out say the probability of getting exactly 77 heads out of 100 flips of a coin, it wouldn’t be a big deal.  I would just plug in my calculator—Trials: 100, P: .5, Xvalue:77—and then I would have my answer.  But let’s say instead of getting exactly 77 heads, it was the probability of getting at most 77 heads.  If this were the case, I would have to do the same process for 0, 1, 2, 3, 4, 5…..75, 76, 77 which would take too much time.  A different way we could solve this problem is by using normal approximation to the binomial distribution.  Binomial distribution curves have a normal distribution when the probability is .5 like in the case of the example.  The more trials, like having 100 flips make the curve more normal.  To save time, if the distribution is normal, you can look at the graph and add up the probabilities for getting at most 77 heads, you would add up the probability of 0+ probability of 1+ probability of 2 and so on until you get to 77.  This will be your answer. Mathematicians can be lazy and I don’t blame them because I wouldn’t want to enter something into my calculator 78 times.  Using the normal approximation to binomial distribution is just a way to save time.  The website http://www.regentsprep.org/regents/math/algtrig/ats7/blesson3.htm helped me tremendously in trying to figure out the answer. 

2. Read this article: http://www.theatlantic.com/business/archive/2016/01/powerball-ticket-all-combinations/423930/?utm_source=SFFB and write a reflection (should be at least 2 paragraphs).

 

Wow I loved this article because it laid out the likihood of winning in an easy to see manner.  It’s fascinating to see why everyone was freaking out about the Powerball tickets.  I think of lottery tickets as annoying.  Scratching it off gets in your nails and if you do it with a coin, it just gets your coin all dirty.  Not to mention being disappointed after you don’t win anything.  The most I have ever won is $4.00 with a lottery ticket, I hit the “jack pot” alright.  I just don’t get it, people all over facebook were talking about picking up Powerball tickets and I look at the odds and just shake my head.  I did love what this article did though in laying everything out.  I just can’t imagine what it would be like for a company to buy all of those tickets and then risk that the big winning was split up between winners different days.  I can only see them putting in their financial statements that they spent $584,402,676 on Powerball tickets.  That would be pathetic and I bet the company would go out of business because not only would they lose money if the pot was split up, but customers would not be able to trust that they would make rational decisions.

 

One thing I do not quite understand is how does the Powerball lottery company make money?  It looked to me that they were losing $439,000,000.  Does the company use this as a strategy to get people buying other lottery tickets after the Powerball because they enjoyed the Powerball so much?  Is it a strategy to keep people coming back?  I just can’t quite figure out why a company would do something that made them lose money.  And how did they get the word out there about the Powerball?  Word spread like wild fire; did the company predict that would happen?  I want to learn more on Powerball and I wish the Atlantic article had given a little bit of background to the game.  Although I have a few questions that I would like answered, all in all, the article did a great job of explaining things.

3. When you subtract two random variables that are independent, you add the variances instead of subtracting them. Why don’t you subtract them?

 

The reason why you add the variances is because the variance is standard deviation squared.

 

Variance= (σ)2

 

Finding the variance of x-y is the same as the variance of x+(-y) because the standard deviation is squared therefore making the negatives not matter.

 

{σ(x-y)}2= (σx)2+(-σy)2

 

If I subtract the standard deviations and come up with a negative, since I am squaring it, the answer will be the same as if I added them.

Therefore:

Var(x±y)=Var(x)+Var(y)

 

The website http://apcentral.collegeboard.com/apc/members/courses/teachers_corner/50250.html gave me a nice visual to figure the problem out.

Chapter 8

1. Choose either the Democratic or Republican Primary Forecast. To the best of your ability, explain what is going on in forecast. Then, pick a particular pollster and describe the methodology of that poll (Example: UMass Lowell). Summarize what is happening.

 

Overview:

 

Bernie Sanders is beating Hillary Clinton in four out of five age categories.  The only 

category that Clinton is winning is the older (75+) group with 43% to 53%.  In the 

18-34 group, Sanders is leading 69% to 28% against Clinton.  Overall, Sanders has a 

greater than 99% chance of beating Clinton in the New Hampshire primary.

 

 

Methodology:

 

Data collection – Used an Interactive Voice Response (IVR) system because they 

wanted to eliminate interviewer bias.  Every person interviewed heard the exact 

same message.

 

Data analysis – They used a bivariate analysis (two variables), which looks at 

differences between sub-groups within the overall population.  The statisticians 

used a chi-square test, independent t-test for the mean and a z-test for independent 

percentages.

 

Sample – Random sample of registered voters.  Random list of phone numbers. 

Weighting – surveyed enough people to ensure low margin of error.  

 

 

Summary:

 

This article took a poll in order to get a projection for how the New Hampshire 

primary will go.  Statisticians used a sample because it saves time and by getting the 

right number of people they can reduce the margin of error and be reasonably 

accurate to what the outcome will be.  Statisticians also used a number of things we 

learned in class like bivariate analysis, chi-squared test, t-test and z-test to predict 

what will happen in the primary. 

 

Part of the reason why polls like this are important is because they give politicians a 

non-biased opinion of how they are doing in each state so they can choose which 

states to focus their attention on.  The polls are also important for anyone selling a 

product that has to do with candidates so they can supply things customers are 

interested in. 

 

It’s pretty cool how statistics class is useful in polls and primaries and how they 

used what we just learned in class!

Chapter 9 & 10

2.  FiveThirtyEight has analyzed the NCAA March Madness tournament and gives the percentage each team has of winning the tournament. Pick either NCAA Men’s or Women’s Basketball and read the article. Then write a response. Response should be a few paragraphs (250 – 500 words). Some prompts: did anything surprise you? Are there other things you would calculate into the rating? Are there other issues you feel the authors did not address?

The first thing that surprised me was that this was FiveThirtyEight’s first year covering Women’s March Madness.  It’s sad that women do not get as much attention in sports.  And then that the women did not have enough data for FiveThirtyEight to complete two of the measures they usually use for determining the likelihood of winning.  Injury reports are so important.  What if UConn, who had high rankings during preseason and by the NCAA had all of their starters injured?  That would definitely change the outcome of the game. 

 

I also never knew that the NCAA had the top four seeds play the first two rounds on their own campuses.  That’s so unbelievably unfair.  The team brings in the money from having the game at their campus and then they can use the money to give athletes coming out of high school scholarships.  They have that advantage; they also have the advantage of getting to pay for better staff.  They can have more trainers, more coaches not to mention nicer things.  The team could buy more comfortable buses so traveling isn’t as taxing.  They could buy nicer equipment and more equipment to train the players.  That is so unfair, outrageously so.

 

Getting to the statistics, it was interesting how they decided to apply fat-tailed distributions instead of normal distributions.  After looking up what fat-tailed distributions were, I found one major difference.  With normal distributions, rare events are milder.  Other than that, I had trouble understanding the difference between the two and why fat-tailed were better for this situation.  Apparently there were more differences if by just changing the distribution, St. Francis went from only having a 1 in 2.5 million chance of winning compared to a 1 in 7,000 they got using a fat-tail.

 

It was so neat getting to see all that went behind the ratings and probability.  I never would have thought to factor in home field advantage and the number of miles the visiting team had to travel or to think to use several different rankings.  The amount of thought that went into this article was incredible.  I learned a lot though, so I’m glad they wrote it. :) 

2.  Come up with a potential experiment that will have REAL LIFE implications of making the world a better place. The experiment can be about anything as long as doing it will hopefully make the world a little bit better. Think: Education, sustainability, poverty, healthcare, etc. Write down 1) your justification for doing this 2) your null and alternative hypotheses 3) the implications your proposal could have if you found significant results.

Question: Does requiring doctors and nurses to work in Africa a month every five years, significantly decrease the incidence of deaths?

 

Justification: Because Africa is a third world country, they do not get enough health care provided to them.  Africa has a high incidence of different infectious diseases and AIDS.  If more healthcare was provided to Africans, the death rates would hopefully decreases.

 

Null:  Requiring doctors and nurses to work in Africa for a month every 5 years will not have any significant affect on death rates in Africa.

 

 Alternative:  Requiring doctors and nurses to work in Africa for a month every 5 years will have a significant affect on death rates in Africa.

 

Implications: If I found significant results, Africa would be able to hopefully industrialize more because they won’t be as concerned about health.  The first step is making Africans feel comfortable to have hope and as a result would start industrializing. 

Call

C: (614)390-4128
 

  • facebook
  • Twitter Clean
  • w-googleplus

Follow me

 

© 2015 by Madison Jo Hyzdu. Proudly created with Wix.com
 

bottom of page