Girish_Gaurav_JMP_Poster2014

Predicting the Market Value of the Property Using JMP® Pro 11
1
Shirodkar and
1
Pathak
Girish
Gaurav
1Management Information Systems, Oklahoma State University, Stillwater, OK 74078
Introduction
Many individuals and corporations often end up paying more or getting
less in a property deal due to limited ken of factors that decide the
market value of the Property. For any individual or corporation it is
therefore
important
to
understand
how
particular
parameters/characteristics of a property drive its ultimate market value.
Very few concrete studies have been done to find which factors
eventually decide the value of the property. This study, which is based on
the New York City property valuation data, is an attempt to figure out
governing factors for the actual market value of the property. Based
upon this data and socio-demographic data derived from zip codes, JMP
® Pro 11 is used for prediction of the market value of the property in New
York City.
Methods
Data Preparation
The New York City property valuation dataset consists of 11329
observations and 39 variables. It is a multivariate dataset which has
missing values. The target variable is a continuous variable explaining
the market value of a property. Following steps were taken during data
preparation and exploration:
Fig. 2: K-means clustering
Fig. 1: Scatterplot matrix and Ellipsoid 3D Plot showing
correlation amongst variables
•Identification of multi-collinear data using multivariate methods
•Missing Value imputation using appropriate methods
Clustering and Segment Profiling
•Outlier analysis using Mahalanobis distance statistic
To segment the properties of New York City in distinct groups several
clustering methods were applied on the imputed and transformed data.
The newly formed segments were then profiled and characteristics of
each segment were understood. After segments were created, different
predictive models such as linear regression and decision trees were
applied and compared to predict the market value of a property in New
York City.
Segme
nt
Description
1
Group of properties in NYC where the number of floors are between
3 and 7
2
Group of properties in NYC having actual total value of property
between $12 - $22 million and having 10 to 25 floors
3
This segment comprises of the properties whose monthly
maintenance is $20000 and above and land cost of the property is
greater than $350000
The properties whose actual total value of property is greater than
$360000 and the front dimensions of the building are greater than
105.41 meter.
This segment contains of properties whose dimensions are greater
than 92m X 118m and whose land cost is greater than $2800000.
4
Following steps were taken during this phase:
•Creation and evaluation of clusters using hierarchical and k-means
clustering methods (Ref, Fig. 2)
•ANOVA testing to compare means of variables across segments (Ref.
Fig. 3)
5
6
This segment contains of properties whose number of floors are in
between 0 - 22 and whose land cost is greater than $850000.
Predicting the Market Value of the Property Using JMP® Pro 11
1
Shirodkar and
1
Pathak
Girish
Gaurav
1Management Information Systems, Oklahoma State University, Stillwater, OK 74078
Street Map Functionality
In order to get real sense of property value distribution across the New
York City, We used the newly introduced street map functionality of
JMP® Pro 11.
•From Fig 4 it is clear that the commercial and rented properties are
spread evenly across the lengths and breadths of Manhattan.
•The residential properties are concentrated near the areas like SOHO,
Flatiron building, Canal street and Lower Manhattan.
•Fig 4 shows, the distribution of tax classes properties across all the zips
and the width of the color gives the dominating property class by the full
value of properties.
•It can be inferred from the fig 5 that the most expensive residential
properties are located in Chelsea, Greenwich and SOHO localities of
Manhattan. The most expensive commercial and leased properties are
located in the lower Manhattan (Financial District), Times Square and
Lower east side of the Manhattan.
Fig. 3: Segment profiling for FULLVAL and AVTOT2
Predictive Modeling
The data has been cleaned and prepared by adding demographic variables,
computation of new variables and transformation of skewed variables.
Predictive models such as Forward Linear Regression Model, Backward
Linear Regression Model, Stepwise Linear Regression Model, Decision tree
and Neural Network have been used and competing models were analyzed
and compared with each other. Based on R-squared criterion, the forward
regression model with r-squared value of 0.7508 outperformed other
models. Along with the property characteristics such as Extended land cost,
front and depth measurements of the building in which the property is
located, land costs; the demographic variables such as major industry
prevailing in the area of the property came out to be important factors which
ultimately derive the market value of the property.
Conclusion and Discussion
• Variable transformation improves the performance of clustering
algorithms as compared to the usage of skewed variables.
• Liner regression model outperformed other models such as decision
trees and neural network, hence it was selected as a final model
• Along with the property characteristics such as Extended land cost,
front and depth measurements of the building in which the property is
located, land costs; the demographic variables such as major industry
prevailing in the area of the property came out to be important factors
that influence the property value.
• The properties whose actual total value of property is greater than
$360000 and the front dimensions of the building are greater than
105.41 meter, are the highest market value fetching properties.
Reference
•https://nycopendata.socrata.com/Housing-Development/PropertyValuation-and-Assessment-Data/rgy2-tti8
•http://www.tax.ny.gov/research/property/assess/manuals/vol6/ref/prclas.
htm
Acknowledgements
Fig. 4: Liner Regression model results and parameter
estimates
•Prof. Dr. Goutam Chakraborty, founder of SAS and OSU Business
Analytics Program at Oklahoma State University, for his continued
support
and
guidance