Setup

Part 1: Data

The Behavioral Risk Factor Surveillance System (BRFSS) objective is to collect uniform, state-specific data on preventive health practices and risk behaviors that are linked to chronic diseases, injuries, and preventable infectious diseases that affect the adult population. Factors assessed by the BRFSS in 2013 include tobacco use, HIV/AIDS knowledge and prevention, exercise, immunization, health status, healthy days — health-related quality of life, health care access, inadequate sleep, hypertension awareness, cholesterol awareness, chronic health conditions, alcohol consumption, fruits and vegetables consumption, arthritis burden, and seatbelt use. Since 2011, BRFSS conducts both landline telephone- and cellular telephone-based surveys. In conducting the BRFSS landline telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone and resides in a private residence or college housing. Health characteristics estimated from the BRFSS pertain to the non-institutionalized adult population, aged 18 years or older, who reside in the US. In 2013, additional question sets were included as optional modules to provide a measure for several childhood health and wellness indicators, including asthma prevalence for people aged 17 years or younger.


Part 2: Research questions

Research quesion 1:

Among non-institutionalized adults in the US, we investigate any differences in alcohol comsumption between veterans and non-veterans. The results could indicate whether veterans are at a lower or higher risk of alcohol addiction. We note that respondents are likely to underreport their alcohol consumption, leading to a possible bias in the data. The variables of interest are:

Research quesion 2:

Among non-institutionalized adults in the US, we investigate any differences in general health condition depending on the the income level of the individual. We also try to investigate if being a smoker adversely affect the general health of the individual irrespective of the income level. The variables of interest are:

Research quesion 3:

Among non-institutionalized adults in the US, Is a respondent’s Body Mass Index (BMI) affect their chances to get depressive disorders? Is there any difference between genders? This is an interesting question as it looks for linkage between opinion about their mental health to a slightly more objective measure of overall health. The difference between genders is also interesting, as one can tease out different perceptions and pressures within society.The variables of interest are:


Part 3: Exploratory data analysis

Research quesion 1:

Let us first visualise the alcohol intake of the sample in general. Remember that the in the variable alcday5: The first digit denotes days per week (1) or days per month (2). The remaining digits indicate the count of days.

So we first generalise the data and add a variable alcdaysimple to just store the no. of days and plot a histogram of the data

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

From the data it is clear that majority of people did not consume alcohol at all fpr the last 30 days, but we also see a significant no. of people have alcohol everyday.

To get the total no of drinks last 30 days we multiply alcdayssimple with avedrnk2, and store the result in totaldrinks

Let us select the variables of interest for our first research question and store in q1, our variables of interest are:

After storing the data let us summarise the data grouped on the basis of veteran3

## # A tibble: 2 x 6
##   veteran3 Drinks_Mean Drinks_Median Drinks_SD Drinks_min Drinks_max
##   <fct>          <dbl>         <dbl>     <dbl>      <dbl>      <dbl>
## 1 Yes             29.1          12.9      54.8          1       2280
## 2 No              21.6           9        43.0          1       2280

Thus from the data, we see that the mean of the amount of alcoholic beverage consumed by a veteran is almost 34.25% higher than the mean of non-veterans, let us also compare boxplots of the data obtained, separated by veteran status

Again we see that veterans tend to consume more alcohol. However, we cannot conclude that being a veteran causes one to drink more alcohol, because the data is randomly sampled, not randomly assigned.


Research quesion 2:

Let us select the variables of interest for our second research question and store in q2, our variables of interest are:

Before doing anything else let us have a look at the variables of our interest.

## # A tibble: 5 x 2
##   genhlth   Number_in_each_category
##   <fct>                       <int>
## 1 Excellent                   72399
## 2 Very good                  135329
## 3 Good                       124587
## 4 Fair                        54212
## 5 Poor                        22553
## # A tibble: 8 x 2
##   income2           Number_in_each_category
##   <fct>                               <int>
## 1 Less than $10,000                   24486
## 2 Less than $15,000                   25886
## 3 Less than $20,000                   33669
## 4 Less than $25,000                   40440
## 5 Less than $35,000                   47463
## 6 Less than $50,000                   59958
## 7 Less than $75,000                   63820
## 8 $75,000 or more                    113358
## # A tibble: 2 x 2
##   X_rfsmok3 Number_in_each_category
##   <fct>                       <int>
## 1 No                         341603
## 2 Yes                         67477

Now lets plot the two variables income level(income2) and level of general health(genhlth) in a barplot.

A difference in the Health condition with varying income level can be very well made out from the plot that we just generated. With increasing income level, the health conditions also improve, however we once again conclude income is a cause for good or poor health as the data collected is observational.

Let us now include the variable X_rfsmok3 in the plot and observe the changes.

We clearly see a detoriation in the health conditon of smokers irrespective of the income level the person belongs in. Thus this data shows a clear assosiation of smoking with detoriation in the health condition. However we cannot conclude smoking as the cause for detoriating health as the data is observational.


Research quesion 3:

Let us select the variables of interest for our third research question and store in q, our variables of interest are:

Before doing anything else let us have a look at the variables of our interest.

## # A tibble: 2 x 2
##   addepev2 Number_in_each_category
##   <fct>                      <int>
## 1 Yes                        91249
## 2 No                        371851
## # A tibble: 4 x 2
##   X_bmi5cat     Number_in_each_category
##   <fct>                           <int>
## 1 Underweight                      8202
## 2 Normal weight                  154253
## 3 Overweight                     166425
## 4 Obese                          134220
## # A tibble: 2 x 2
##   sex    Number_in_each_category
##   <fct>                    <int>
## 1 Male                    196221
## 2 Female                  266879

Now lets plot the two variables BMI Category(X_bmi5cat)) and level of depression disorder(addepev2) in a barplot.

A difference in the no of people diagnosed with depression clearly varies with the BMI category the person lies in. Clearly underweight and Obese person have a larger probability to be diagnosed with some form of depression mainly because of the prevelant socail norms in the society. However the correlation between BMI and depression disorder cannot be concluded to be causal as the data obtained is observational and not experimental.

Let us now include the variable sex in the plot and observe the changes.

We clearly see an increase in probabilty of being diagnosed with depression for females irrespective of the BMI category the person belongs to. This is probably because of the wide spread old social norms we live in which tend to judge females based on their body type. Thus this data shows a clear assosiation of gender with increase in probability of being diagnosed with depression. However we cannot conclude gender as the cause for increase in probabiity of being diagnosed by depression, as the data is observational not experimental.