meet Saptarsi: Exploratory data analysis on P/E ratio of Indian Stocks

Price Earnings ratio (P/E) is one of the very popular ratios reported with all stocks. Very simply this is thought as - Current Market Price / Earning per Share. An operational definition of Earning per Share would be Total profit divided by # of Shares . I will redirect interested readers for further reading to

www.investopedia.com/terms/p/price-earningsratio.asp

In this post, I would just like to show, how we can grab P/E data from Web and create some visualizations on it. My focus right now is Indian stocks and I intend to use the below website

http://www.indiainfoline.com/MarketStatistics/PE-Ratios/

So my first step is gearing up for the data extraction and essentially that is the most non-trivial task. As shown in the figure below, there is separate pages for each sector and we need to click on individual links , to go to that page and get the P/E ratios.

Here is something , I did outside ‘r’ , creating a csv file with the sector names , using delimiters while importing text and paste special as transpose , here is how my csv file would look. I would never discourage using multiple tools as this would be required to solve real world issues

So now I can import this in a dataset and read one row at a time and go to necessary URLs , but god have different plans J , it’s not that straightforward

Case 1 : Single word sector names :

We have sector as ‘Banks’and the sector link is as below

http://www.indiainfoline.com/MarketStatistics/PE-Ratios/Banks-Sector

Again it is a no brainer , we can pick up the base url , append the sector name after a forward slash and then append the string ‘-Sector’ , this is true for most single word sector names like ‘FMCG’ , ‘Tyres’ , ‘Heathcare’ etc

Case 2: Multiple words without ‘-‘ , ‘&’ and ‘/’

We have sector as ‘Tobacco Products’ and the sector link is as below

http://www.indiainfoline.com/MarketStatistics/PE-Ratios/Tobacco-Products-Sector

This is also not that difficult apart from adding the ‘-Sector’ we need replace the spaces by a ‘-‘ .

Case 3: Multiple words with a ‘-‘

We have sector name as ‘IT-Software’, where we have to remove other spaces if exiting. There can be several other cases, but for discussion sake , I will limit myself here

Case 4: Multiple words with a ‘/‘

We have sector name as ‘Stock/ Commodity Brokers’, so the “/” needs to be removed

# Reading in dataset

sectorsv1 <- read.csv("C:/Users/user/Desktop/Datasets/sectorsv1.csv")

# Converting to a matrix , this is a practice generally I follow

sectorvm<-as.matrix(sectorsv1)

we can access individual sectors by , sectorvm[rowno,colon]

pe<-c()

cname<-c()

cnt<-0

baseurl<-'http://www.indiainfoline.com/MarketStatistics/PE-Ratios/'

sectorvm<-as.matrix(sectorsv1)

for(i in 1:nrow(sectorvm))

{

securl<-sectorvm[i,1]

# Fixed true indicated the string is to matched as is and is not a regular expression

# Substitution of the different cases as we explained , we will point out using gsub instead of sub

# else only the first instance will be replaced

if(length(grep(' ',securl,fixed=TRUE))!=1)

{

securl<-paste(securl,'-Sector', sep="")

}

else

{

securl<-gsub(' ', '-', securl, ignore.case =FALSE, fixed=TRUE)

if(length(grep('---',securl,fixed=TRUE))==1)

{

securl<-gsub(' ---', '-', securl, ignore.case =FALSE, fixed=TRUE)

}

if(length(grep('&',securl,fixed=TRUE))==1)

{

securl<-gsub('&', 'and', securl, ignore.case =FALSE, fixed=TRUE)

}

if(length(grep('/',securl,fixed=TRUE))==1)

{

securl<-gsub('/', '', securl, ignore.case =FALSE, fixed=TRUE)

}

if(length(grep(',',securl,fixed=TRUE))==1)

{

securl<-gsub(',', '', securl, ignore.case =FALSE, fixed=TRUE)

}

securl<-paste(securl,'-Sector', sep="")

}

fullurl<-paste(baseurl,securl, sep="")

print(fullurl)

if (url.exists(fullurl))

{

petbls<-readHTMLTable(fullurl)

# Exploring the tables we found out relevant information on table 2

# Also the data is getting stored as factor , just doing an as.numeric will not suffice

# we need to do an as.character and then an as.numeric

pe<-c(pe,as.numeric(as.character(petbls[[2]]$PE)))

cname<-c(cname, as.character (petbls[[2]]$Company))

cnt = cnt + 1

}

Different functions that we have used are explained as below

readHTMLTables -> Given a url , this function can retrieve the contents of the <Table> tag from html page. We need to use appropriate no. for the same. Like in this case we have used table no 2.

Grep, Paste, Gsub are normal string functions, grep finds occurrence of a string in another, paste concatenates and gsub does the act of replacing.

As.numeric(as.character()) had a lasting impressing on my mind as an innocuous and intuitive as.numeric would have left me only with the ranks.

url.exists :-> it is a good idea , to check the existence of the url , given we are dynamically forming the URLs.

Now playing with summary statistics:

We use the describe function from psych package

n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
1797	59.71	76.92	20.09	46.64	29.79	0	587.5	587.5	2.15	7.25	1.81

hist(pe,col='blue',main='P/E Distribution')

We get the below histogram for the P/E ratio , which shows it is nowhere near a normal distribution , with it’s peakedness and skew as confirmed from the summary statistics as well

We will never the less do a normalty test

shapiro.test(pe)

        Shapiro-Wilk normality test

data:  pe

W = 0.7496, p-value < 2.2e-16

Basically the null hypothesis is, the values come from a normal distribution and we see the p value to be very insignificant and hence we can easily reject the null.

Drawing a box plot on the P/E ratios

boxplot(pe,col='blue')

Finding the outliers

boxplot.stats(pe)$out

484.33 327.91 587.50

cname[which(pe %in% boxplot.stats(pe)$out)]

[1] "Bajaj Electrical" "BF Utilities"     "Ruchi Infrastr."

Of course no prize guessing we should stay out of these stocks

So if we summarize this is kind of exploratory data analysis on PE ratio of Indian stocks

· We saw, we can get content out of url and html tables

· We added them in a data frame

· Looked at summary statistics , histogram and did a normality test

· Plotted a box plot and found the outliers

meet Saptarsi

Total Pageviews

Monday, 31 March 2014

Exploratory data analysis on P/E ratio of Indian Stocks

No comments:

Post a Comment

About Me

Translate