**Introduction to Stata

**This file serves as a companion to the Stata Guide provided for the lecture. 
**Both can be accessed on EdShare along with the mini-data sets used in the lecture.

**The Stata Interface

*Setting a Working Directory
	*Drop-down menu or code

	cd "C:\Users\Nicco\Documents\Talks\Introduction_to_Stata"

**Do-files.
	*How to Open
	*What to use for: Data Cleaning/ Model code
	*What are annotations
	*Clean files

**Log Files
	*Why?
	*Drop-down menu
	*Stata to open it
	
	log using "C:\Users\Nicco\Documents\Talks\Introduction_to_Stata\log_file.smcl"

**Open a data set in Stata
	*Drop down menu: open or import
	*Open directly from a file folder
	*pathway, if known
	
	use "C:\Users\Nicco\Documents\Talks\Introduction_to_Stata\data_1.dta"

**Viewing Data set
	browse
	
	edit
	
	*Variable  details
		*you can observe the variabel details in the open window
		*You can edit the details in the window, or later we discuss coding
		
*Merging Data sets

	*merge: based on unique identifiers
	
	 merge 1:1 ID using "C:\Users\Nicco\Documents\Talks\Introduction_to_Stata\data_2.dta"

	*look at new data set
	*append: based on same variable codings across datasets
	
	 append using "C:\Users\Nicco\Documents\Talks\Introduction_to_Stata\data_3.dta"

	 *look at new dataset

*Create new variables

	gen age_Months=age*12
	
	*look at new variable
	
	sum age_Months
	
*Adapt variable descriptors

	label variable age_Months "Age in months"

*Adapt or define/assign data value labels
	label define education 1"low" 2"medium" 3"high"
	*notice this has not changed anything in the datset yet
	
	label value education_level_simple education

*Command structure for univaraite analysis
	*typically a command is issued first then the variable you with to apply it to
	
di 3+7

	*most basic command to use Stata like a calculator
	*order of operations is important 
	
describe

sum age polattention

sum age, detail
	*the detail is an optional command that is not needed for the basic code
	*the option tells Stata you want more

tab partyid
	*sometimes data requires different formats for assessment based on the type of data you have

*Data visualisations
	*basic commmand set-up is the type of graph, the y-axis variable, the x-axis variable

*bar chart 

	hist leftright, frequency discrete width(.5)
	
	*why all the optional commands?

	hist leftright, percent discrete width(.5)

*Installing packages
	*basic Stata does not incldue all the optional packages you could use
	*install as you learn you need them, 1 time only
	*ssc install catplot

	catplot leftright, recast(bar)
	
*histogram
	hist age, frequency
	hist age, freq
		*short-hand used when unique string of letters for command
	
	hist age, bin(5)
	
	hist age, width(3)
	
*Graph commands
	*there are a starnd set of graph commands that can be used, 
	*for details see the help file for "graph"

graph box age

graph box age, over(education_level_simple)

*Editing Graphs in the editor

hist age, freq

*Basic Commands to edit in the code
	hist age, freq ytitle(Number of Respondents) xtitle(Age of Respondents) title(Distribution of Respondent Ages)
	
*Set scheme
	*why do this?
set scheme s1mono	
hist age, freq ytitle(Number of Respondents) xtitle(Age of Respondents) title(Distribution of Respondent Ages)

*Test commands

ttest age==32

ttest age, by(location) unequal
	*notice we now include two optional commands here 

*Confidence interval change-optional
	*default set at 95%
	
ttest age==50
ttest age==50, level(90)
ttest age==50, level(99)

*Bivariate commands

tab partyid leftright

*Optional commands work for these as well

tab partyid leftright, chi2

*graphical displays for bivariate relationships

graph twoway (scatter polattention age)

*graph command allows you to layer graphs

graph twoway (scatter polattention age) (lfit polattention age)

graph twoway (scatter polattention age) (lfit polattention age) (lowess polattention age)

*Correlation 
	*some statistical techniques have multiple ways to be run in Stata
	
corr age polattention gross_personal_income

pwcorr age polattention gross_personal_income, sig

**Modelling Relationships

reg polattention education_level_simple leftright partyid age

mlogit partyid education_level_simple leftright age gross_personal_income

*help files

help mlogit

*Post-estimation commands

	*model fit
	estat ic