Organization
Python
Software Foundation
Proposal
Title
Statsmodels:
Discrete choice models
Proposal
Abstract
The
aim of this project is to add discrete choice models to statsmodels
and fill a gap in the set of discrete models that are currently
available. Statsmodels is a BSD licensed Python package for
estimation of many different statistical models.
Multinomial
Logit and Nested Logit models have been the workhorse for discrete
choice models since the 1970's despite their limitations. They are
still the best choice for simpler models. But, thanks to the
increased feasibility of computer intensive simulation approaches, it
is now possible to estimate more complex models. Mixed Logit models
are gaining attention and use since they can accommodate random taste
variation across users or consumers and correlation across
alternatives. Furthermore, Mixed Logit models make it possible to use
mixed types of data (preferences revealed and declared) or data from
different sources.
This
project proposes, first, to work on the currently implemented
Multinomial Logit and the Nested Logit algorithms and, then,
implement Mixed Logit algorithms. Also, I propose to implement
flexible model specification and several supporting functions for the
summary of the model, the statistics result, and statistical tests to
check heteroscedasticity, the nesting structures and random
parameters of the model. Working on the model specification will
provide users with a user friendly way to define even complex
discrete choice models.
Proposal
Detailed Description
As
stated, several supporting functions will be improved or implemented.
First, a function which returns a summary of the model and the
principal statistics results. And second, implement three general
statistical tests: the Wald test, the Lagrange multiplier test and
the Likelihood ratio to:
Test
of heteroscedasticity.
Test
about the nesting structure.
Test
of random parameters.
I
am planning to test the implemented algorithm against other
implementations or software available with similar model estimation
functions like: mlogit (package for R), Biogeme and Nlogit, and
implement examples based on the main references on the topic. I am
familiar and have used those packages before.
From
the beginning, I will implement unit tests that will verify the
implemented algorithms against benchmark results to ensure the
correctness of the results. This will help to automatically test the
algorithm to catch any error that could be introduced by any future
modification off the code.
The
task in all parts of the project is to write the statistical models
and the related supporting functionality like plots and statistical
tests. Depending on the time it takes to implement the primary goals,
additional work of this list or proposed by the community could be
done.
I
plan to communicate with mentor by email on a weekly basis to set
weekly mini-goals and discuss regular code reviews. Also, I will
remain in constant touch with my mentor and the Statsmodels community
through IRC and mailing lists.
I
write up weekly update blog posts and at least two posts to show code
snippets to use the package on
http://gsocstatsmodels.blogspot.com.es/, also the code will be
regularly pushed to the github repository.
Timeline
- Community
Bonding Period (May 28 – Jun 16). Familiarize with the Statsmodels
codebase and the community, the version control system, the
documentation and test system used. Start to work on the list of
models and supporting functionality.
September
27. Begin coding
- Weeks
1-2 (June 17 – 28). Study of the Statsmodels codebase to get
familiar with it and write unit tests for current Multinomial Logit
and the Nested Logit algorithms.
- Weeks
3-6 (July 1 – 19). Implement flexible model specification and
supporting functions for the summary of the model, the results
statistics and the three statistical test.
- Weeks
7 (July 22 – 26). Clean code, improve unit tests and documentation
for Multinomial Logit and Nested Logit.
- Weeks
8 (July 29 – 2). Writing up blog posts to show code snippets to use
the package. Submitting mid-term evaluation.
August
2. Mid-term evaluations deadline
- Weeks
9 (August 5 – 9). Start work on mixed logit. Implement a prototype
of the required functions, methods or classes that will set the base
for implementing the algorithms.
- Weeks
10 (August 12 – 16). Implement a basic algorithm for mixed logit
and test the implemented algorithm against another implementations /
software available.
- Week
11 (August 19 – 23). Optimize the implemented algorithm trying to
achieve the best performance and precision possible.
- Weeks
12 (August 26 – 30). Implement unit tests and documentation for
mixed logit algorithms.
- Week
13 (September 2 – 13). Finishing up any pending code corrections,
test and bug fixes.
- Week
14 (September 16 – 20). Clean code, refine unity tests and
documentation for the whole project.
- Week
15 (September 23 – 27). New blog posts to show code snippets to use
the package and a small white paper with investigation, coding and
documentation. Submitting final evaluations to Google.
September
27. Final evaluation deadline
References
Ben-Akiva,
M. y S.R. Lerman. (1985) Discrete
Choice Analysis. Theory and Application to Travel Demand. The
MIT Press. Cambridge, Massachusetts.
Bierlaire,
M. (2003) BIOGEME:
A free package for the estimation of discrete choice models ,
Proceedings of the 3rd Swiss Transportation Research Conference,
Ascona, Switzerland.
Croissant,
Y. (2010) mlogit:
Multinomial Logit Model.
R package version 0.1-5.
Domencich,
T. y D. McFadden (1972) A
Disaggregated Behavioral Model of Urban Travel Demand.
Report No. CRA-156-2. Charles Rivers Associates, Inc. Cambridge,
Massachussetts.
Hensher,
D.A. and W.H. Greene. (2003) The
Mixed Logit model: The state of practice.
Transportation 30, 133-176.
Hensher,
D.A., W.H. Greene and J.M Rose. (2005) Applied
Choice Analysis.
Cambridge University Press.
Louviere,
J.J., D.A. Hensher and J.D. Swait. (2000) Stated
Choice Methods: Analysis and Application.
Cambridge University Press. Cambridge.
McFadden,
D. (2000) Disaggregate
behavioral travel demand´s RUM guide. A 30-year retrospective.
International Association of Travel Behavior Analysts. Brisbane,
Australia.
Orro,
A. (2006) Modelos
de elección discreta en transportes con coeficientes aleatorios.
Tesis Doctoral. University of A Coruña, A Coruña. Abertis chair.
Barcelona.
Ortúzar,
J. de D. and L. G. Willumsen. (2001) Modelling
Transport.
Trird edition. Wilaey and Sons.
Train,
K. (2003) Discrete
Choice Methods with Simulation.
Cambridge University Press.
Zeileis
A, Croissant Y (2010) Extended
Model Formulas in R: Multiple Parts and Multiple Responses.
Journal of Statistical Software,34, 1-13.