Friday, March 29, 2013

Session # 10 - March 26, 2013 -- Plotting in R

IT BAL Assignment#10

Question 1: 

Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Solution:

First creating a random data set of 50 items with mean =30 and standard deviation =10

> data <- rnorm(50,mean=30,sd=10)
> data

Taking sample data of length 10 from the created data set in three different vectors x,y,z
> x <- sample(data,10)
> x

> y <- sample(data,10)
> y

> z <- sample(data,10)
> z

Binding the three vectors x,y,z into a vector T using cbind
> T <- cbind(x,y,z)
> T

                                                                       Data Set
Plotting 3d graph 

Command:

> plot3d(T[,1:3])
                                                                        3D Plot
Plotting of graph with labels for axes and color

Command 
> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500))
 
                                                                      3D Plot with color
Plotting of graph with labels for axes, color and type = spheres

Command
> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type='s')
                                                                      3D Plot with spheres
Plotting of graph with labels for axes, color and type = points

Command

> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type='p')
                                                                   3D Plot with points
Plotting of graph with labels for axes, color and type = lines

Command

> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type='l')
                                                                     3D Plot with lines
Question2 :


Choose 2 random variables 
Create 3 plots: 
1. X-Y 
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph 
4. Smooth and best fit line for the curve

Solution

Creating a data set for two random variables and then introducing third variable z

Command:

> x <- rnorm(5000, mean= 20 , sd=10)
> y <- rnorm(5000, mean= 10, sd=10)
> z1 <- sample(letters, 5)
> z2 <- sample(z1, 5000, replace=TRUE)
> z <- as.factor(z2)
> z
                                                                      Data Set 
Creating Quick Plots

Command:

>qplot(x,y)
                                                                     x and y qplot
>qplot(x,z)
                                                                          x and z qplot
For semi-transparent plot

> qplot(x,z, alpha=I(2/10))
                                                                    Semi-transparent Plot
For coloured plot

> qplot(x,y, color=z)
                                                                      Coloured plot
For Logarithmic coloured plot

> qplot(log(x),log(y), color=z)
                                                                       Logarithmic Plot
Best Fit and Smooth curve using "geom"

Command:

> qplot(x,y,geom=c("path","smooth"))
                                                                         geom='path'

> qplot(x,y,geom=c("point","smooth"))
 
geom='point'
> qplot(x,y,geom=c("boxplot","jitter"))
                                                                geom='boxplot' and 'jitter'

Friday, March 15, 2013

IT BAL LAB Session#8

Session # 8 :


In this session we learnt about the panel data generation and its various models.

Panel Data refers to the combination of various time series data cascaded together
The basic function used for panel data generation and estimation is plm.

The data set we have used in this session in "Produc".

The description for the same is as under.

- state : the state
- year : the year
- pcap: private capital stock
- hwy : highway and streets
- pc: public capital
- gsp: gross state products
- emp: labor input measured by the employement in non–agricultural payrolls
- unemp: state unemployment rate

Use the data set "Produc" , a panel data set within plm package for panel estimations.

Assignment :
To calculate the values for all the 3 models and decide which models best fits the data set for panel estimation ?

Solution :
Step1 : calculating value for pooling model

Step2 : calculating value for fixed model
Step3 : calculating value for random model




To choose the best model that fits the data set "Produc" ,we need to run pairwise hypothesis tests among the 3 models and select the best fit in the end.

Test1 :
Between pooling and fixed model

Command :
pFtest (fixed1 , pooled)



Test details :
H0: Null: the individual index and time based params are all zero
Alternative Hypothesis : atleast one of the index and time based params are non zero

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence Fixed model is better than the pooling model.


Test2:
Between pooling and random model

Command :
plmtest (pooled)


Test details :
H0: Null: the individual index and time based params are all zero : Pooling Model
Alternative Hypothesis : atleast one of the index and time based params are non zero : Random Model

The hypothesis test suggests that the alternative hypothesis has significant effects.
As the p-value is too low.. Null hypothesis is rejected.

Hence random model is better than the pooling model.


Test3:
Between fixed and random model

Command :


We use Hausman test -:
phtest(random1 , fixed1)


Test details :
H0: Null: individual effects are not correlated with any regressor : Random Model
Alternative Hypothesis : Individual effects are correlated : Fixed Model

The hypothesis test suggests that the one of the models is inconsistent.
As the p-value is too low.. Null hypothesis is rejected.

Hence fixed model is better than random model.


Conclusion :-
We can conclude that fixed model best fits the "Produc" data set panel data estimations. i.e there is significant correlation observed with the regressor variables and index impact exists.
Hence, we would choose "Fixed" model to estimate the panel data presented by "Produc" data set.