Larch Documentation

Larch Documentation
Release 3.3.0
Jeffrey Newman
June 21, 2016
Contents
1
Contents
1.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
Installation . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Data Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1
idco Format . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2
idca Format . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Data in Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1
Data Storage and Access Using SQLite . . . . . . . . . . .
Creating DB Objects . . . . . . . . . . . . . . . . . . . . . .
Importing Data . . . . . . . . . . . . . . . . . . . . . . . . .
Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . .
Reviewing Data . . . . . . . . . . . . . . . . . . . . . . . . .
Loading Data into Arrays . . . . . . . . . . . . . . . . . . . .
Convenience Methods . . . . . . . . . . . . . . . . . . . . .
Using Data in Models . . . . . . . . . . . . . . . . . . . . .
1.3.2
Data Storage and Access Using HDF5 . . . . . . . . . . . .
Creating DT Objects . . . . . . . . . . . . . . . . . . . . . .
Importing Data . . . . . . . . . . . . . . . . . . . . . . . . .
Required HDF5 Structure . . . . . . . . . . . . . . . . . . .
1.3.3
Abstract Data Interface . . . . . . . . . . . . . . . . . . . .
1.4 Logit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1
Creating Model Objects . . . . . . . . . . . . . . . . . . .
1.4.2
Adding Parameters . . . . . . . . . . . . . . . . . . . . . .
1.4.3
Using Model Objects . . . . . . . . . . . . . . . . . . . .
1.4.4
GEV Network . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.5
Reporting Tools . . . . . . . . . . . . . . . . . . . . . . . .
1.4.6
Related Classes . . . . . . . . . . . . . . . . . . . . . . . .
Model Parameters . . . . . . . . . . . . . . . . . . . . . . .
Linear Parts . . . . . . . . . . . . . . . . . . . . . . . . . . .
MetaModels . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1
Parameter References . . . . . . . . . . . . . . . . . . . . .
Math using Parameter References . . . . . . . . . . . . . . .
1.5.2
Data References . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1
Reporting in Text Format . . . . . . . . . . . . . . . . . . .
1.6.2
Reporting in HTML Format . . . . . . . . . . . . . . . . .
1.6.3
Reporting Multiple Models in a Single Consolidated Report
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
3
3
4
4
4
5
7
9
10
10
12
13
17
17
18
19
22
23
23
24
24
25
27
28
28
29
31
31
31
33
33
33
34
34
41
i
1.7
Miscellaneous Tools . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7.1
File Management . . . . . . . . . . . . . . . . . . . . . . .
1.7.2
System Info . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7.3
XHTML . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8 Mathematics of Logit Choice Modeling . . . . . . . . . . . . . . . .
1.9 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9.1
MTC Examples . . . . . . . . . . . . . . . . . . . . . . . .
Importing idca Data . . . . . . . . . . . . . . . . . . . . . .
1: MTC MNL Mode Choice . . . . . . . . . . . . . . . . . .
1b: Using Loops . . . . . . . . . . . . . . . . . . . . . . . .
1l: MTC MNL Mode Choice (Legacy Style) . . . . . . . . .
17: Better MTC MNL Mode Choice . . . . . . . . . . . . . .
17t: MTC MNL Mode Choice Using DT . . . . . . . . . . .
1.9.2
Swissmetro Examples . . . . . . . . . . . . . . . . . . . . .
101: Swissmetro MNL Mode Choice . . . . . . . . . . . . .
101b: Swissmetro MNL, Biogeme Style . . . . . . . . . . . .
101s: Swissmetro MNL, Stacked Variables . . . . . . . . . .
102: Swissmetro Weighted MNL Mode Choice . . . . . . . .
104: Swissmetro MNL with Modified Data . . . . . . . . . .
105: Swissmetro Normal Mixed MNL Mode Choice . . . . .
109: Swissmetro Nested Logit Mode Choice . . . . . . . . .
111: Swissmetro Cross-Nested Logit Mode Choice . . . . . .
1.9.3
Itinerary Choice Examples . . . . . . . . . . . . . . . . . .
201: Network GEV Itinerary Choice . . . . . . . . . . . . . .
202: OGEV Itinerary Choice . . . . . . . . . . . . . . . . . .
203: OGEV-NL Itinerary Choice . . . . . . . . . . . . . . . .
220: Partially Segmented Itinerary Choice Using a MetaModel
1.10 Frequently Asked Questions . . . . . . . . . . . . . . . . . . . . . .
ii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
45
46
46
46
46
46
48
50
51
53
55
57
57
58
60
61
62
63
64
66
68
68
71
74
77
81
Larch Documentation, Release 3.3.0
This documentation is for the Python interface for Larch. If this is your first go with Larch, or the first go on a new
computer, you might want to start with Installation.
This project is very much under development. There are plenty of undocumented functions and features; use them at
your own risk. Undocumented features may be non-functional, not rigorously tested, deprecated or removed without
notice in a future version. If a function or method is documented here, it is intended to be stable in future updates.
You can download and install larch from PyPI by running pip install larch. Or you can get the source code from
GitHub.
You may also find these links useful:
• Python 3.5: http://docs.python.org/3.5/index.html
• NumPy: http://docs.scipy.org/doc/numpy/
• SciPy: http://docs.scipy.org/doc/scipy/reference/
• SQLite: http://www.sqlite.org/docs.html
• APSW: http://rogerbinns.github.io/apsw/
For learning Python itself:
• If you are new to Python but have some experience with some other programming language, I recommend Dive
Into Python.
• If you are new to programming in general, there are better (slower) guides at Learning Python
Contents
1
Larch Documentation, Release 3.3.0
2
Contents
CHAPTER 1
Contents
1.1 Getting Started
1.1.1 Installation
To run Larch, you’ll need to have the 64 bit version of Python 3.5, plus a handful of other useful statistical packages
for Python. The easiest way to get everything you need is to download and install the Anaconda version of Python
3.5 (64 bit). This comes with everything you’ll need to get started, and the Anaconda folks have helpfully curated a
selection of useful tools for you, so you don’t have the sort through the huge pile of junk that is available for Python.
Once you’ve installed Anaconda, to get Larch you can simply run:
pip install larch
from your command prompt (Windows) or the Terminal (Mac OS X). It’s possible that you may get some kind of a
permission error when running this command. If so, try it again as an admin (on windows, right click the command
line program and choose “Run as Administrator”).
Some of the graphical tools used to draw nested and network logit graphs may also not be installed by default by
Anaconda. You don’t need these tools to run any model in Larch, just to draw pretty figures depicting the nests. If you
want to install these, you can go to your command line to get the necessary tools:
conda install graphviz
pip install pygraphviz
Once you’ve got Larch installed, you might want to jump directly to some Examples to see how you might use it.
1.2 Data Fundamentals
Larch requires data to be structured in one of two formats: the case-only (“idco”) format or the case-alternative
(“idca”) format. This are commonly referred to as IDCase (each record contains all the information for mode choice
over alternatives for a single trip) or IDCase-IDAlt (each record contains all the information for a single alternative
available to each decision maker so there is one record for each alternative for each choice).
1.2.1 idco Format
In the idco case-only format, each record provides all the relevant information about an individual choice,
including the variables related to the decision maker or the choice itself, as well as alternative related
variables for all available alternatives and a variable indicating which alternative was chosen.
3
Larch Documentation, Release 3.3.0
Table 1.1: Example of data in idco format
caseid
1
2
3
4
Income
30,000
30,000
40,000
50,000
Alt 1
Time
30
25
40
15
Alt 1
Cost
150
125
125
225
Alt 2
Time
40
35
50
20
Alt 2
Cost
100
100
75
150
Alt 3
Time
20
0
30
10
Alt 3
Cost
200
0
175
250
Chosen
Alt
1
2
3
3
1.2.2 idca Format
In the idca case-alternative format, each record can include information on the variables related to the
decision maker or the choice itself, the attributes of that particular alternative, and a choice variable that
indicates whether the alternative was or was not chosen.
Table 1.2: Example of data in idca format
caseid
1
1
1
2
2
3
3
3
4
4
4
Alt Number
1
2
3
1
2
1
2
3
1
2
3
Number Of Alts
3
3
3
2
2
3
3
3
3
3
3
Income
30,000
30,000
30,000
30,000
30,000
40,000
40,000
40,000
50,000
50,000
50,000
Time
30
40
20
25
35
40
50
30
15
20
10
Cost
150
100
200
125
100
125
75
175
225
150
250
Chosen
1
0
0
0
1
0
0
1
0
0
1
Unlike most other tools for discrete choice analysis, Larch does not demand you employ one or the other of these data
formats. You can use either, or both simultaneously.
1.3 Data in Models
Larch offers two basic data file storage formats: SQLite and HDF5.
If you have experience with earlier version Larch (or its predecessor, ELM) then you have been using the SQLite
database interface.
1.3.1 Data Storage and Access Using SQLite
The default storage of data within Larch is handled using SQLite. This portable and open source database system
provides a common file format that is flexible and practical for storing data.
The interactions with data in Python take place through a DB object, which is derived from the apsw.Connection
class in APSW, the Python interface wrapper for SQLite.
4
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Creating DB Objects
class larch.DB(filename=None, readonly=False)
An SQLite database connection used to get data for models.
This object wraps a apsw.Connection, adding a number of methods designed specifically for working with
choice-based data used in Larch.
Parameters
• filename (str or None) – The filename or URI of the database to open. It must
be encoded as a UTF-8 string. (If your string contains only usual English characters you
probably don’t need to worry about it.) The default is an in-memory database opened with a
URI of file:larchdb?mode=memory, which is very fast as long as you’ve got enough
memory to store the whole thing.
• readonly (bool) – If true, the database connection is opened with a read-only flag set.
If the file does not already exist, an exception is raised.
Warning: The normal constructor creates a DB object linked to an existing SQLite database file. Editing
the object edits the file as well. There is currently no “undo” so be careful when manipulating the database.
In addition to opening an existing SQLite database directly, there are a number of methods available to create a DB
object without having it linked to an original database file.
static DB.Copy(source, destination=’file:larchdb?mode=memory’)
Create a copy of a database and link it to a DB object.
It is often desirable to work on a copy of your data, instead of working with the original file. If you data file is
not very large and you are working with multiple models, there can be significant speed advantages to copying
the entire database into memory first, and then working on it there, instead of reading from disk every time you
want data.
Parameters
• source (str) – The source SQLite database from which the contents will be copied. Can
be given as a plain filename or a URI.
• destination (str) – The destination SQLite database to which the contents will be
copied. Can be given as a plain filename or a URI. If it does not exist it will be created. If the destination is not given, an in-memory database will be opened with a URI
of file:larchdb?mode=memory.
Returns An open connection to destination database.
Return type DB
static DB.Example(dataset=’MTC’, shared=False)
Generate an example data object in memory.
Larch comes with a few example data sets, which are used in documentation and testing. It is important that you
do not edit the original data, so this function copies the data into an in-memory database, which you can freely
edit without damaging the original data.
Parameters
• dataset ({’MTC’, ’SWISSMETRO’, ’MINI’, ’ITINERARY’}) – Which example dataset should be used.
• shared (bool) – If True, the new copy of the database is opened with a shared cache, so
additional database connections can share the same in-memory data. Defaults to False.
1.3. Data in Models
5
Larch Documentation, Release 3.3.0
Returns An open connection to the in-memory copy of the example database.
Return type DB
static DB.CSV_idco(filename, caseid=’_rowid_’, choice=None, weight=None, tablename=’data’, savename=None, alts={}, safety=True)
Creates a new larch DB based on an idco Format CSV data file.
The input data file should be an idco Format data file, with the first line containing the column headings. The
reader will attempt to determine the format (csv, tab-delimited, etc) automatically.
Parameters
• filename (str) – File name (absolute or relative) for CSV (or other text-based delimited)
source data.
• caseid (str) – Column name that contains the unique case id’s. If the data is in idco format, case id’s can be generated automatically based on line numbers, by using the reserved
keyword ‘_rowid_’.
• choice (str or None) – Column name that contains the id of the alternative that is
selected (if applicable). If not given, no sql_choice table will be autogenerated, and it will
need to be set manually later.
• weight (str or None) – Column name of the weight for each case. If None, defaults
to equal weights.
• tablename (str) – The name of the sql table into which the data is to be imported. Do
not give a reserved name (i.e. any name beginning with sqlite or larch).
• savename (str or None) – If not None, the name of the location to save the SQLite
database file that is created.
• alts (dict) – A dictionary with keys of alt codes, and values of (alt name, avail column,
choice column) tuples. If choice is given, the third item in the tuple is ignored and can be
omitted.
• safety (bool) – If true, all alternatives that are chosen, even if not given in alts, will be
automatically added to the alternatives table.
Returns An open connection to the database.
Return type DB
static DB.CSV_idca(filename, caseid=None, altid=None, choice=None, weight=None, avail=None, tablename=’data’, tablename_co=’_co’, savename=None, alts={}, safety=True, index=False)
Creates a new larch DB based on an idca Format CSV data file.
The input data file should be an idca Format data file, with the first line containing the column headings. The
reader will attempt to determine the format (csv, tab-delimited, etc) automatically.
Parameters
• filename (str) – File name (absolute or relative) for CSV (or other text-based delimited)
source data.
• caseid (str or None) – Column name that contains the caseids. Because multiple
rows will share the same caseid, caseid’s cannot be generated automatically based on line
numbers by using the reserved keyword ‘_rowid_’. If None, either the columns titled ‘caseid’ will be used if it exists, and if not then the first column of data in the file will be
used.
6
Chapter 1. Contents
Larch Documentation, Release 3.3.0
• altid (str or None) – Column name that contains the altids. If None, the second
column of data in the file will be used.
• choice (str or None) – Column name that contains the id of the alternative that is
selected (if applicable). If None, the third column of data in the file will be used.
• weight (str or None) – Column name of the weight for each case. If None, defaults
to equal weights. Note that the weight needs to be identical for all altids sharing the same
caseid.
• avail (str or None) – Column name of the availability indicator. If None, it is assumed that unavailable alternatives have the entire row of data missing from the table.
• tablename (str) – The name of the sql table into which the data is to be imported. Do
not give a reserved name (i.e. any name beginning with sqlite or larch).
• tablename_co (str or None) – The name of the sql table into which idco format
data is to be imported. Do not give a reserved name (i.e. any name beginning with sqlite or
larch). If None, then no automatic cracking will be attempted and all data will be imported
into the idca table. If the given name begins with an underscore, it will be used as a suffix
added onto tablename.
• savename (str or None) – If not None, the name of the location to save the SQLite
database file that is created.
• alts (dict) – A dictionary with integer keys of alt codes, and string values of alt names.
• safety (bool) – If true, all alternatives that appear in the altid column, even if not given
in alts, will be automatically added to the alternatives table.
• index (bool) – If true, automatically create indexes for caseids and altids on the idca
Format table, and (if it is created) caseids on the idco Format table.
Returns An open connection to the database.
Return type DB
Importing Data
There are a variety of methods available to import data from external sources into a SQLite table for use with the larch
DB facility.
DB.import_csv(rawdata, table=’data’, drop_old=False, progress_callback=None, temp=False, column_names=None)
Import raw csv or tab-delimited data into SQLite.
Parameters
• rawdata (str) – The filename (relative or absolute) of the raw csv or tab delimited data
file. If the filename has a .gz extension, it is assumed to be in gzip format instead of plain
text.
• table (str) – The name of the table into which the data is to be imported
• drop_old (bool) – If true and the table already exists in the SQLite database, then the
pre-existing table is deleted.
• progress_callback (callback function) – If given, this callback function takes
a single integer as an argument and is called periodically while loading with the current
precentage complete.
1.3. Data in Models
7
Larch Documentation, Release 3.3.0
• temp (bool) – If true, the data is imported into a temporary table in the database, which
will be deleted automatically when the connection is closed.
• column_names (list, optional) – If given, use these column names and assume
the first line of the data file is data, not headers.
Returns A list of column headers from the imported csv file
Return type list
DB.import_dataframe(rawdataframe, table=’data’, if_exists=’fail’)
Imports data from a pandas dataframe into an existing larch DB.
Parameters
• rawdataframe (pandas.DataFrame) – The filename (relative or absolute) of the raw
DataFrame.
• table (str) – The name of the table into which the data is to be imported
• if_exists ({’fail’, ’replace’, ’append’}) – If the table does not exist this
parameter is ignored, otherwise, fail: If table exists, raise a ValueError exception. replace:
If table exists, drop it, recreate it, and insert data. append: If table exists, insert data.
Returns A list of column headers from the imported DBF file
Return type list
DB.import_xlsx(io, sheetname=0, table=’data’, if_exists=’fail’, **kwargs)
Imports data from an Excel spreadsheet into an existing larch DB.
Parameters
• io (string, file-like object, or xlrd workbook.) – The string could be
a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected.
For instance, a local file could be file://localhost/path/to/workbook.xlsx
• sheetname (string or int, default 0) – Name of Excel sheet or the page number of the sheet
• table (str) – The name of the table into which the data is to be imported
• if_exists ({’fail’, ’replace’, ’append’}) – If the table does not exist this
parameter is ignored, otherwise, fail: If table exists, raise a ValueError exception. replace:
If table exists, drop it, recreate it, and insert data. append: If table exists, insert data.
Returns A list of column headers from the imported DBF file
Return type list
Notes
This method uses a pandas.DataFrame as an intermediate step,
first calling
pandas.io.excel.read_excel() and then calling import_dataframe(). All keyword arguments other than those listed here are simply passed to pandas.io.excel.read_excel().
DB.import_dbf(rawdata, table=’data’, drop_old=False)
Imports data from a DBF file into an existing larch DB.
Parameters
• rawdata (str) – The filename (relative or absolute) of the raw DBF data file.
• table (str) – The name of the table into which the data is to be imported
8
Chapter 1. Contents
Larch Documentation, Release 3.3.0
• drop_old (bool) – If true and the table already exists in the SQLite database, then the
pre-existing table is deleted.
Returns A list of column headers from the imported DBF file
Return type list
Note: This method requires the dbfpy module (available using pip).
Exporting Data
Sometimes it will be necessary to get your data out of the database, for use in other programs or for other sundry
purposes. There will eventually be some documented methods to conviently allow you to export data in a few standard
formats. Of course, since the DB object links to a standard SQLite database, it is possible to access your data directly
from SQLite in other programs, or through apsw (included as part of Larch) or sqlite3 (included in standard
Python distributions).
DB.export_idca(file, include_idco=’intersect’, exclude=[], **formats)
Export the idca Format data to a csv file.
Parameters
• file (str or file-like) – If a string, this is the file name to give to the open command. Otherwise, this object is passed to csv.writer directly.
• include_idco ({’intersect’, ’all’, ’none’}) – Unless this is ‘none’, the
idca and idco tables are joined on caseids before exporting. For ‘intersect’, a natural join is
used, so that all columns with the same name are used for the join. This may cause problems
if columns in the idca and idco tables have the same name but different data. For ‘all’, the
join is made on caseids only, and every column in both tables is included in the output.
When ‘none’, only the idca table is exported and the idco table is ignored.
• exclude (set or list) – A list of variables names to exclude from the output. This
could be useful in shrinking the file size if you don’t need all the output columns, or suppressing duplicate copies of caseid and altid columns.
Notes
This method uses a csv.writer object to write the output file. Any keyword arguments not listed here are
passed through to the writer.
DB.export_idco(file, exclude=[], **formats)
Export the idco Format data to a csv file.
Only the idco Format table is exported, the idca Format table is ignored. Future versions of Larch may provide
a facility to export idco and idca data together in a single idco output file.
Parameters
• file (str or file-like) – If a string, this is the file name to give to the open command. Otherwise, this object is passed to csv.writer directly.
• exclude (set or list) – A list of variables names to exclude from the output. This
could be useful in shrinking the file size if you don’t need all the output columns, or suppressing duplicate copies of caseid and altid columns.
1.3. Data in Models
9
Larch Documentation, Release 3.3.0
Notes
This method uses a csv.writer object to write the output file. Any keyword arguments not listed here are
passed through to the writer.
Reviewing Data
DB.seer(file=None, counts=False, **kwargs)
Display a variety of information about the DB connection in an HTML report.
Parameters
• file (str, optional) – A name for the HTML file that will be created. If not given,
a temporary file will automatically be created.
• counts (bool, optional) – If true, the number of rows in each table is calculated.
This may take a long time if the database is large.
Notes
The report will pop up in Chrome or a default browser after it is generated.
Loading Data into Arrays
DB.array_caseids(*, table=None, caseid=None, sort=True, n_cases=None)
Extract the caseids from the DB based on preset queries.
Generaly you won’t need to specify any parameters to this method, as most values are determined automatically
from the preset queries. However, if you need to override things for this array without changing the queries more
permanently, you can use the input parameters to do so. Note that all parameters must be called by keyword,
not as positional arguments.
Parameters
• tablename (str) – The caseids will be found in this table.
• caseid (str) – This sets the column name where the caseids can be found.
• sort (bool) – If true (the default) the resulting array will sorted in ascending order.
• n_cases (int) – If you know the number of cases, you can specify it here to speed up
the return of the results, particularly if the caseids query is complex. You can safely ignore
this and the number of cases will be calculated for you. If you give the wrong number, an
exception will be raised.
Returns An int64 array of shape (n_cases,1).
Return type ndarray
DB.array_idca(*vars, table=None, caseid=None, altid=None,
sort=True, n_cases=None)
Extract a set of idca values from the DB based on preset queries.
altcodes=None,
dtype=’float64’,
Generaly you won’t need to specify any parameters to this method beyond the variables to include in the array,
as most values are determined automatically from the preset queries. However, if you need to override things
for this array without changing the queries more permanently, you can use the input parameters to do so. Note
that all override parameters must be called by keyword, not as positional arguments.
10
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Parameters vars (tuple of str) – A tuple giving the expressions (often column names, but
any valid SQLite expression works) to extract as idca Format format variables.
Other Parameters
• table (str) – The idca data will be found in this table, view, or self contained query (if the
latter, it should be surrounded by parentheses).
• caseid (str) – This sets the column name where the caseids can be found.
• altid (str) – This sets the column name where the altids can be found.
• altcodes (tuple of int) – This is the set of alternative codes used in the data. The second
(middle) dimension of the result array will match these codes in length and order.
• dtype (str or dtype) – Describe the data type you would like the output array to adopt,
probably ‘int64’, ‘float64’, or ‘bool’.
• sort (bool) – If true (the default) the resulting arrays (both of them) will sorted in ascending
order by caseid.
• n_cases (int) – If you know the number of cases, you can specify it here to speed up the
return of the results, particularly if the caseids query is complex. You can safely ignore
this and the number of cases will be calculated for you. If you give the wrong number, an
exception will be raised.
Returns
• data (ndarray) – An array with specified dtype, of shape (n_cases,len(altcodes),len(vars)).
• caseids (ndarray) – An int64 array of shape (n_cases,1).
Examples
Extract a cost and time array from the MTC example data:
>>> import larch
>>> db = larch.DB.Example()
>>> x, c = db.array_idca('totcost','tottime')
>>> x.shape
(5029, 6, 2)
>>> x[0]
Array([[ 70.63,
15.38],
[ 35.32,
20.38],
[ 20.18,
22.38],
[ 115.64,
41.1 ],
[
0. ,
42.5 ],
[
0. ,
0. ]])
DB.array_idco(*vars, table=None, caseid=None, dtype=’float64’, sort=True, n_cases=None)
Extract a set of idco values from the DB based on preset queries.
Generaly you won’t need to specify any parameters to this method beyond the variables to include in the array,
as most values are determined automatically from the preset queries. However, if you need to override things
for this array without changing the queries more permanently, you can use the input parameters to do so. Note
that all override parameters must be called by keyword, not as positional arguments.
Parameters vars (tuple of str) – A tuple (or other iterable) giving the expressions (often
column names, but any valid SQLite expression works) to extract as idco Format format variables.
Other Parameters
1.3. Data in Models
11
Larch Documentation, Release 3.3.0
• tablename (str) – The idco data will be found in this table, view, or self contained query (if
the latter, it should be surrounded by parentheses).
• caseid (str) – This sets the column name where the caseids can be found.
• dtype (str or dtype) – Describe the data type you would like the output array to adopt,
probably ‘int64’, ‘float64’, or ‘bool’.
• sort (bool) – If true (the default) the resulting arrays (both of them) will sorted in ascending
order by caseid.
• n_cases (int) – If you know the number of cases, you can specify it here to speed up the
return of the results, particularly if the caseids query is complex. You can safely ignore
this and the number of cases will be calculated for you. If you give the wrong number, an
exception will be raised.
Returns
• data (ndarray) – An array with specified dtype, of shape (n_cases,len(vars)).
• caseids (ndarray) – An int64 array of shape (n_cases,1).
Convenience Methods
DB.attach(sqlname, filename)
Attach another SQLite database.
Parameters
• sqlname (str) – The name SQLite will use to reference the other database.
• filename (str) – The filename or URI to attach.
Notes
If the other database is already attached, or if the name is already taken by another attached database, the
command will be ignored. Otherwise, this command is the equivalent of executing:
ATTACH filename AS sqlname;
See also:
DB.detach()
DB.detach(sqlname)
Detach another SQLite database.
Parameters sqlname (str) – The name SQLite uses to reference the other database.
Notes
If the name is not an attached database, the command will be ignored. Otherwise, this command is the equivalent
of executing:
DETACH sqlname;
See also:
DB.attach()
12
Chapter 1. Contents
Larch Documentation, Release 3.3.0
DB.crack_idca(tablename, caseid=None, ca_tablename=None, co_tablename=None)
Crack an existing idca table into idca and idco component tables.
This method will automatically analyze an existing idca table and identify columns of data that are invariant
within individual cases. Those variables will be segregated into a new idco table, and the remaining variables
will be put into a new idca table.
Parameters
• tablename (str) – The name of the existing idca table
• caseid (str or None) – The name of the column representing the caseids in the existing table. If not given, it is assumed these are in the first column.
• ca_tablename (str or None) – The name of the table that will be created to hold the
new (with fewer columns) idca table.
• co_tablename (str or None) – The name of the table that will be created to hold the
new idco table.
Raises apsw.SQLError – If the name of one of the tables to be created already exists in the
database.
Using Data in Models
The DB class primarily presents an interface between python and SQLite. The interface between a DB and a Model is
governed by a special attribute of the DB class:
DB.queries
This attribute defines the automatic queries used to provision a Model with data. It should be an object that is
a specialized subtype of the core.QuerySet abstract base class.
Queries To Provision a Model with Data
class larch.core.QuerySet
To provide the ability to extract the correct data from the database, the DB object has an attribute DB.queries,
which is an object that is a subclass of this abstract base class.
tbl_idca()
This method returns a SQL fragment that evaluates to an larch_idca table. The table should have the
following features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2: altid (integer) a key for each alternative available in this case
•Column 3+ can contain any explanatory data, typically numeric data, although non-numeric data is
allowable.
If no columns have the name caseid and altid, larch will use the first two columns, respectively. A query
with less than two columns should raise an exception.
For example, this method might return:
(SELECT casenum AS caseid, altnum AS altid, * FROM data) AS larch_idca
It would be perfectly valid for there to be an actual table in the database named “larch_idca”, and for this
function to return simply “larch_idca”, although this would prohibit using the same underlying database
to build different datasets.
1.3. Data in Models
13
Larch Documentation, Release 3.3.0
tbl_idco()
This method returns a SQL fragment that evaluates to an larch_idco table. The table should have the
following features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2+ can contain any explanatory data, typically numeric data, although non-numeric data is
allowable.
If no column has the name ‘caseid’, larch will use the first column. A query with less than two columns
should raise an exception.
For example, this method might return:
(SELECT _rowid_ AS caseid, * FROM data) AS larch_idco
tbl_alts()
This method returns a SQL fragment that evaluates to an larch_alternatives table. The table should have
the following features:
•Column 1: id (integer) a key for every alternative observed in the sample
•Column 2: name (text) a name for each alternative
tbl_caseids()
This method returns a SQL fragment that evaluates to an larch_caseids table. The table should have the
following features:
•Column 1: caseid (integer) a key for every case observed in the sample
tbl_choice()
This method returns a SQL fragment that evaluates to an larch_choice table. The table should have the
following features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2: altid (integer) a key for each alternative available in this case
•Column 3: choice (numeric, typically 1.0 but could be other values)
If an alternative is not chosen for a given case, it can have a zero choice value or it can simply be omitted
from the result.
tbl_weight()
This method returns a SQL fragment that evaluates to an larch_weight table. The table should have the
following features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2: weight (numeric) a weight associated with each case
Alternatively, this method can return an empty string, in which case it is assumed that all cases are weighted
equally.
tbl_avail()
This method returns a SQL fragment that evaluates to an larch_avail table. The table should have the
following features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2: altid (integer) a key for each alternative available in this case
•Column 3: avail (boolean) evaluates as 1 or true when the alternative is available, 0 otherwise
14
Chapter 1. Contents
Larch Documentation, Release 3.3.0
If an alternative is not available for a given case, it can have a zero avail value or it can simply be omitted
from the result.
Alternatively, this method can return an empty string, in which case it is assumed that all alternatives are
available in all cases.
Queries for a Single idco Format Table
class larch.core.QuerySetSimpleCO
This subclass of core.QuerySet is used when the data consists exclusively of a single idco Format format
table.
Note: This is similar to the data format required by Biogeme. Unlike Biogeme, the Larch DB allows nonnumeric values in the source data.
idco_query
This attribute defines a SQL query that evaluates to an larch_idco table. The table should have the following features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2+ can contain any explanatory data, typically numeric data, although non-numeric data is
allowable.
If the main table is named “data” typically this query will be:
SELECT _rowid_ AS caseid, * FROM data
alts_query
This attribute defines a SQL query that evaluates to an larch_alternatives table. The table should have the
following features:
•Column 1: id (integer) a key for every alternative observed in the sample
•Column 2: name (text) a name for each alternative
alts_values
This attribute defines a set of alternative codes and names, as a dictionary that contains {integer:string}
key/value pairs, where each key is an integer value corresponding to an alternative code, and each value is
a string giving a descriptive name for the alternative. When assigning to this attribute, a query is defined
that can be used with no table.
Warning: Using this method will overwrite alts_query
choice
This attributes defines the choices. It has two styles:
•When set to a string, the string names the column of the main table that identifies the choice for each
case. The indicated column should contain integer values corresponding to the alternative codes.
•When set to a dict, the dict should contain {integer:string} key/value pairs, where each key is an
integer value corresponding to an alternative code, and each value is a string identifying a column
in the main table; that column should contain a value indication whether the alternative was chosen.
Usually this will be a binary dummy variable, although it need not be. For certain specialized models,
values other than 0 or 1 may be appropriate.
1.3. Data in Models
15
Larch Documentation, Release 3.3.0
The choice of style is a matter of convenience; the same data can be expressed with either style as long as
the choices are binary.
avail
This attributes defines the availability of alternatives for each case. If set to True, then all alternatives are
available to all cases. Otherwise, this attribute should be a dict that contains {integer:string} key/value
pairs, where each key is an integer value corresponding to an alternative code, and each value is a string
identifying a column in the main table; that column should contain a value indicating whether the alternative is available. This must be a binary dummy variable.
weight
This attribute names the column in the main table that defines the weight for each case. Set it to an empty
string, or 1.0, to assign all cases equal weight.
Queries for a Pair of Tables
class larch.core.QuerySetTwoTable
This subclass of core.QuerySet is used when the data consists of one idco Format format table and one
idca Format format table.
idco_query
This attribute defines a SQL query that evaluates to an larch_idco table. The table should have the following features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2+ can contain any explanatory data, typically numeric data, although non-numeric data is
allowable.
idca_query
This attribute defines a SQL query that evaluates to an larch_idca table. The table should have the following
features:
•Column 1: caseid (integer) a key for every case observed in the sample
•Column 2: altid (integer) a key for each alternative available in this case
•Column 3+ can contain any explanatory data, typically numeric data, although non-numeric data is
allowable.
alts_query
This attribute defines a SQL query that evaluates to an larch_alternatives table. The table should have the
following features:
•Column 1: id (integer) a key for every alternative observed in the sample
•Column 2: name (text) a name for each alternative
choice
This attribute defines the choices. It has two styles:
•When set to a string, the string gives an expression evaluated on one of the two main tables that
identifies the choice for each case. If the expression is evaluated on the idco table, it should result
in integer values corresponding to the alternative codes. If the expression is evaluated on the idca
table, it should evaluate to 1 if the alternative for the particular row was chosen, and 0 otherwise. (For
certain specialized models, values other than 0 or 1 may be appropriate.) If the expression is part of
a valid query on both main tables, this attribute cannot be set directly, and must instead be set using
either set_choice_ca() or set_choice_co().
16
Chapter 1. Contents
Larch Documentation, Release 3.3.0
•When set to a dict, the dict should contain {integer:string} key/value pairs, where each key is an
integer value corresponding to an alternative code, and each value is a string identifying an expression
evaluated on the idco table; the result should contain a value indicating whether the alternative was
chosen. Usually this will be a binary dummy variable, although it need not be. For certain specialized
models, values other than 0 or 1 may be appropriate.
The choice of style is a matter of convenience; the same data can be expressed with either style as long as
the choices are binary, or if the first style names an expression in the idca table.
set_choice_ca(expr)
Set the choice expression that will evaluate on the idca table.
Parameters expr (str) – The expression to be evaluated. It should evaluate to 1 if the alternative for the particular row was chosen, and 0 otherwise. (For certain specialized models,
values other than 0 or 1 may be appropriate.)
set_choice_co(expr)
Set the choice expression that will evaluate on the idco table.
Parameters expr (str) – The expression to be evaluated. It should result in integer values
corresponding to the alternative codes.
avail
This attributes defines the availability of alternatives for each case. If set to True, then all alternatives are
available to all cases. Otherwise, this attribute should be either:
•a dict that contains {integer:string} key/value pairs, where each key is an integer value corresponding
to an alternative code, and each value is a string identifying an expression evaluated on the idco table;
that expression should evaluate to a value indicating whether the keyed alternative is available. This
must be a binary dummy variable.
•a string identifying an expression evaluated on the idca table; that expression should evaluate to a value
indicating whether the alternative for that row is available. This must be a binary dummy variable.
•If set to 1 or None, then the string “1” is used.
weight
This attribute gives an expression that is evaluated on the idco table to defines the weight for each case.
Set it to an empty string, or 1.0, to assign all cases equal weight. Te weight cannot be set based on the idca
table.
1.3.2 Data Storage and Access Using HDF5
An alternative data storage system is available for Larch, relying on the HDF5 format and the pytables package. This
system is made available through a DT object, which wraps a tables.File object.
Creating DT Objects
class larch.DT(filename, mode=’a’)
A wrapper for a pytables File used to get data for models.
This object wraps a tables.File, adding a number of methods designed specifically for working with
choice-based data used in Larch.
Parameters
• filename (str or None) – The filename of the HDF5/pytables to open. If None (the
default) a named temporary file is created to serve as the backing for an in-memory HDF5
file, which is very fast as long as you’ve got enough memory to store the whole thing.
1.3. Data in Models
17
Larch Documentation, Release 3.3.0
• mode (str) – The mode used to open the H5F file. Common values are ‘a’ for append and
‘r’ for read only. See pytables for more detail.
• complevel (int) – The compression level to use for new objects created. By default no
compression is used, but substantial disk savings may be available by using it.
• inmemory (bool) – If True (defaults False), the H5FD_CORE driver is used and data will
not in general be written to disk until the file is closed, when all accumulated changes will
be written in a single batch. This can be fast if you have sufficent memory but if an error
occurs all your intermediate changes can be lost.
• temp (bool) – If True (defaults False), the inmemory switch is activated and no changes
will be written to disk when the file is closed. This is automatically set to true if the filename
is None.
• warning (.) – The normal constructor creates a DT object linked to an existing HDF5
file. Editing the object edits the file as well.
Similar to the DB class, the DT class can be used with example data files.
static DT.Example(dataset=’MTC’, filename=’{}.h5’, temp=True)
Generate an example data object in memory.
Larch comes with a few example data sets, which are used in documentation and testing. This function copies
the data into a HDF5 file, which you can freely edit without damaging the original data.
Parameters
• dataset ({’MTC’, ’SWISSMETRO’, ’MINI’, ’ITINERARY’}) – Which example dataset should be used.
• filename (str) – A filename to open the HDF5 file (even in-memory files need a name).
• temp (bool) – The example database be created in-memory; if temp is false, the file will
be dumped to disk when closed.
Returns An open connection to the HDF5 example data.
Return type DT
Importing Data
There are methods available to import data from external sources into the correct format for use with the larch DT
facility.
DT.import_idco(filepath_or_buffer, caseid_column=None, *args, **kwargs)
Import an existing CSV or similar file in idco format into this HDF5 file.
This function relies on pandas.read_csv() to read and parse the input data. All arguments other than those
described below are passed through to that function.
Parameters
• filepath_or_buffer (str or buffer) – This argument will be fed directly to the
pandas.read_csv() function.
• caseid_column (None or str) – If given, this is the column of the input data file to
use as caseids. It must be given if the caseids do not already exist in the HDF5 file. If it is
given and the caseids do already exist, a LarchError is raised.
Raises LarchError – If caseids exist and are also given; or if caseids do not exist and are not
given; or if the caseids are no integer values.
18
Chapter 1. Contents
Larch Documentation, Release 3.3.0
DT.import_idca(filepath_or_buffer, caseid_col, altid_col, choice_col=None, force_int_as_float=True,
chunksize=inf )
Import an existing CSV or similar file in idca format into this HDF5 file.
This function relies on pandas.read_csv() to read and parse the input data. All arguments other than those
described below are passed through to that function.
Parameters
• filepath_or_buffer (str or buffer) – This argument will be fed directly to the
pandas.read_csv() function.
• caseid_column (None or str) – If given, this is the column of the input data file to
use as caseids. It must be given if the caseids do not already exist in the HDF5 file. If it is
given and the caseids do already exist, a LarchError is raised.
• altid_col (None or str) – If given, this is the column of the input data file to use as
altids. It must be given if the altids do not already exist in the HDF5 file. If it is given and
the altids do already exist, a LarchError is raised.
• choice_col (None or str) – If given, use this column as the choice indicator.
• force_int_as_float (bool) – If True, data columns that appear to be integer values
will still be stored as double precision floats (defaults to True).
• chunksize (int) – The number of rows of the source file to read as a chunk. Reading
a giant file in moderate sized chunks can be much faster and less memory intensive than
reading the entire file.
Raises LarchError – Various errors.
Notes
Chunking may not work on Mac OS X due to a known bug in the pandas.read_csv function.
Required HDF5 Structure
To be used with Larch, the HDF5 file must have a particular structure. The group node structure is created automatically when you open a new DT object with a file that does not already have the necessary structure.
1.3. Data in Models
19
Larch Documentation, Release 3.3.0
larch
caseids
shape=(N)
screen
shape=(N)
alts
idco
...various...
shape=(N)
...various...
_index_
shape=(N)
altids
shape=(A)
idca
_weight_
shape=(N)
...various...
shape=(N,A)
_values_
shape=(?)
...various...
_index_
shape=(N)
names
shape=(A)
_choice_
shape=(N,A)
_avail_
shape=(N,A)
_values_
shape=(?,A)
Legend
Data Node
Group Node
optional
dtype=Bool
dtype=Unicode
dtype=Float64
dtype=Int64
The details are as follows:
================================================================================
larch.DT Validation for MTC.h5 (with mode 'w')
-----+------------------------------------------------------------------------->>> | There should be a designated `larch` group node under which all other
| nodes reside.
-----+-------------------------------------------------------------------------| CASES
>>> | Under the top node, there must be an array node named `caseids`.
>>> | The `caseids` array dtype should be Int64.
>>> | The `caseids` array should be 1 dimensional.
+ Case Filtering --------------------------------------------------------->>> | If there may be some data cases that are not to be included in the
| processing of the discrete choice model, there should be a node named
| `screen` under the top node.
>>> | If it exists `screen` must be a Bool array.
>>> | And `screen` must be have the same shape as `caseids`.
-----+-------------------------------------------------------------------------| ALTERNATIVES
>>> | Under the top node, there should be a group named `alts` to hold
| alternative data.
>>> | Within the `alts` node, there should be an array node named `altids` to
| hold the identifying code numbers of the alternatives.
>>> | The `altids` array dtype should be Int64.
>>> | The `altids` array should be one dimensional.
>>> | Within the `alts` node, there should also be a VLArray node named `names`
20
Chapter 1. Contents
Larch Documentation, Release 3.3.0
| to hold the names of the alternatives.
>>> | The `names` node should hold unicode values.
>>> | The `altids` and `names` arrays should be the same length, and this will
| be the number of elemental alternatives represented in the data.
-----+-------------------------------------------------------------------------| IDCO FORMAT DATA
>>> | Under the top node, there should be a group named `idco` to hold that
| data.
>>> | Every child node name in `idco` must be a valid Python identifer (i.e.
| starts with a letter or underscore, and only contains letters, numbers,
| and underscores) and not a Python reserved keyword.
>>> | Every child node in `idco` must be (1) an array node with shape the same
| as `caseids`, or (2) a group node with child nodes `_index_` as an array
| with the correct shape and an integer dtype, and `_values_` such that
| _values_[_index_] reconstructs the desired data array.
+ Case Weights ----------------------------------------------------------->>> | If the cases are to have non uniform weights, then there should a
| `_weight_` node (or a name link to a node) within the `idco` group.
>>> | If weights are given, they should be of Float64 dtype.
-----+-------------------------------------------------------------------------| IDCA FORMAT DATA
>>> | Under the top node, there should be a group named `idca` to hold that
| data.
>>> | Every child node name in `idca` must be a valid Python identifer (i.e.
| starts with a letter or underscore, and only contains letters, numbers,
| and underscores) and not a Python reserved keyword.
>>> | Every child node in `idca` must be (1) an array node with the first
| dimension the same as the length of `caseids`, and the second dimension
| the same as the length of `altids`, or (2) a group node with child nodes
| `_index_` as a 1-dimensional array with the same length as the length of
| `caseids` and an integer dtype, and a 2-dimensional `_values_` with the
| second dimension the same as the length of `altids`, such that
| _values_[_index_] reconstructs the desired data array.
+ Alternative Availability ----------------------------------------------->>> | If there may be some alternatives that are unavailable in some cases,
| there should be a node named `_avail_` under `idca`.
>>> | If given as an array, it should contain an appropriately sized Bool array
| indicating the availability status for each alternative.
>>> | If given as a group, it should have an attribute named `stack` that is a
| tuple of `idco` expressions indicating the availability status for each
| alternative. The length and order of `stack` should match that of the
| altid array.
+ Chosen Alternatives --------------------------------------------------->>> | There should be a node named `_choice_` under `idca`.
>>> | If given as an array, it should be a Float64 array indicating the chosen| ness for each alternative. Typically, this will take a value of 1.0 for
| the alternative that is chosen and 0.0 otherwise, although it is possible
| to have other values, including non-integer values, in some applications.
>>> | If given as a group, it should have an attribute named `stack` that is a
| tuple of `idco` expressions indicating the choice status for each
| alternative. The length and order of `stack` should match that of the
| altid array.
-----+-------------------------------------------------------------------------| OTHER TECHNICAL DETAILS
>>> | The set of child node names within `idca` and `idco` should not overlap
| (i.e. there should be no node names that appear in both).
================================================================================
1.3. Data in Models
21
Larch Documentation, Release 3.3.0
Note that the _choice_ and _avail_ nodes are special, they can be expressed as a stack if idco expressions instead of
as a single idca array. To do so, replace the array node with a group node, and attach a stack attribute that gives the
list of idco expressions. The list should match the list of alternatives. One way to do this automatically is to use the
avail_idco and choice_idco attributes of the DT.
To check if your file has the correct structure, you can use the validate function:
DT.validate(log=<built-in function print>, errlog=None)
Generate a validation report for this DT.
The generated report is fairly detailed and describes each requirement for a valid DT file and whether or not it
is met.
Parameters
• log (callable) – Typically “print”, but can be replaced with a different callable to accept
a series of unicode strings for each line in the report.
• errlog (callable or None) – By default, None. If not none, the report will print as
with log but only if there are errors.
DT.choice_idco
The stack manager for choice data in idco format.
To set a stack of idco expressions to represent choice data, assign a dictionary to this attribute with keys as
alternative codes and values as idco expressions.
You can also get and assign individual alternative values using the usual dictionary operations:
DT.choice_idco[key]
DT.choice_idco[key] = value
# get expression
# set expression
DT.avail_idco
The stack manager for avail data in idco format.
To set a stack of idco expressions to represent availability data, assign a dictionary to this attribute with keys as
alternative codes and values as idco expressions.
You can also get and assign individual alternative values using the usual dictionary operations:
DT.avail_idco[key]
DT.avail_idco[key] = value
# get expression
# set expression
1.3.3 Abstract Data Interface
Both DT and DB classes are derived from a common abstract base class, which defines a handful of important interface
functions.
class larch.core.Fountain
This object represents a source of data. It is an abstract base class from which both the DT and DB classes are
derived.
alternative_names() → ‘std::vector< std::string,std::allocator< std::string > >’
A vector of the alternative names used by this Fountain.
alternative_codes() → ‘std::vector< long long,std::allocator< long long > >’
A vector of the alternative codes (64 bit integers) used by this Fountain.
alternative_name(arg0: ‘long long’) → ‘std::string’
Given an alternative code, return the name.
22
Chapter 1. Contents
Larch Documentation, Release 3.3.0
alternative_code(arg0: ‘std::string’) → ‘long long’
Given an alternative name, return the code.
array_idco(*vars, dtype=’float64’)
Extract an array of idco Format data from the underlying data source. The vars arguments define what
data columns to extract, although the exact format and implementation is left to the base class.
array_idca(*vars, dtype=’float64’)
Extract an array of idca Format data from the underlying data source. The vars arguments define what
data columns to extract, although the exact format and implementation is left to the base class.
check_co(column)
Validate whether column is a legitimate input for Fountain.array_idco().
check_ca(column)
Validate whether column is a legitimate input for Fountain.array_idca().
variables_co()
Return a list of the natural columns of idco Format data available.
variables_ca()
Return a list of the natural columns of idca Format data available.
export_all(*arg, **kwarg)
Export all data (idca and idco) to one big idco format csv file.
This method takes the dataframe from dataframe_all() and writes it out to a csv file. All arguments
are passed through to pandas.DataFrame.to_csv(). No effort is made to prevent duplication of
data in this export (e.g. if there are idco variables stacked to make a single idca variable, these will appear
in the output twice).
dataframe_all()
Load all data (idca and idco) to one big idco format pandas.DataFrame.
No effort is made to prevent duplication of data in this DataFrame. (e.g. if there are idco variables stacked
to make a single idca variable, these will appear in the output twice). If there is a lot of data, this DataFrame
could be very large.
dataframe_idco(*vars, **kwargs)
Load a selection of idco Format data into a pandas.DataFrame.
This function passes all parameters through to Fountain.array_idco().
dataframe_idca(*vars, wide=False, **kwargs)
Load a selection of idca Format data into a pandas.DataFrame.
Parameters wide (bool) – If True (defaults False), the resulting data array will be pivoted to
be idco Format, with one row per case and a hierarchical columns definition.
This function passes all other parameters through to Fountain.array_idca().
1.4 Logit Models
The basic tool for analysis in Larch is a discrete choice model. A model is a structure that interacts data with a set of
ModelParameters.
1.4.1 Creating Model Objects
class larch.Model([d ])
1.4. Logit Models
23
Larch Documentation, Release 3.3.0
Parameters d (Fountain) – The source data used to automatically populate model arrays. This
can be either a DB or DT object (or another data provider that inherits from the abstract
Fountain class). This parameter can be omitted, in which case data will not be loaded automatically and validation checks will not be performed when specifying data elements of the
model.
This object represents a discrete choice model. In addition to the methods described below, a Model also acts
a bit like a list of ModelParameter.
Model.Example(number=1)
Generate an example model object.
Parameters number (int) – The code number of the example model to load. Valid numbers
include {1,17,22,101,102,104,109,111,114}.
Larch comes with a few example models, which are used in documentation and testing. Models with numbers
greater than 100 are designed to align with the example models given for Biogeme.
1.4.2 Adding Parameters
Model.parameter(name[, value, null_value, initial_value, max, min, holdfast ])
Add a parameter to the model, or access an existing parameter.
Parameters
• name (str or larch.roles.ParameterRef) – The name for the parameter to add or
access
• value (float (optional)) – The value to set for the parameter. If initial_value is
not given, this value is also used as the initial value. If not given, 0 is assumed.
• null_value (float (optional)) – This is the assumed value for a “null” or no
information model. For utility parameters, this is typically 0 (the default). For logsum
parameters, the null value should usually be set to 1.
• initial_value (float (optional)) – It is possible to set initial_value seperately
from the current value. This can be useful to reconstruct an already-estimated model.
• max (float (optional)) – If given, set a max bound for the parameter during the
estimation process.
• min (float (optional)) – If given, set a min bound for the parameter during the
estimation process.
• holdfast (int (optional)) – If nonzero, the parameter will be held fast (constrained) at the current value during estimation.
Returns ModelParameter
1.4.3 Using Model Objects
Model.maximize_loglike()
Find the likelihood maximizing parameters of the model, using the scipy.optimize module. Depending on the
model type and structure, various different optimization algorithms may be used.
Model.roll(filename=None, loglevel=20, cats=’-‘, use_ce=False, sourcecode=True, **format)
Estimate a model and generate a report.
This method rolls together model estimation, reporting, and saving results into a single handy function.
24
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Parameters
• filename (str, optional) – The filename into which the output report will be saved.
If not given, a temporary file will be created. If the given file already exists, a new file will
be created with a number appended to the base filename.
• loglevel (int, optional) – The log level that will be used while estimating the
model. Smaller numbers result in a more verbose log, the contents of which appear at the
end of the HTML report. See the standard Python logging module for more details.
• cats (list of str, or ’-’ or ’*’) – A list of report sections to include in the
report. The default is ‘-‘, which includes a minimal list of report setions. Giving ‘*’ will
dump every available report section, which could be a lot and might take a lot of time (and
computer memory) to compute.
• sourcecode (bool) – If true (the default), this method will attempt to access the source
code of the file where this function was called, and insert the contents of that file into a
section of the resulting report. This is done because the source code may be more instructive
as to how the model was created, and how different (but related) future models might be
created.
Model.estimate()
Find the likelihood maximizing parameters of the model using deprecated Larch optimization engine.
This engine has fewer algorithms available than the scipy.optimize and may perform poorly for some
model types, particularly cross-nested and network GEV models. Users should almost always prefer the
Model.maximize_loglike() function instead.
Model.loglike([values ])
Find the log likelihood of the model.
Parameters values (array-like, optional) – If given, an array-like vector of values
should be provided that will replace the current parameter values. The vector must be exactly
as long as the number of parameters in the model (including holdfast parameters). If any holdfast parameter values differ in the provided values, the new values are ignored and a warning is
emitted to the model logger.
Model.d_loglike([values ])
Find the first derivative of the log likelihood of the model, with respect to the parameters.
Parameters values (array-like, optional) – If given, an array-like vector of values
should be provided that will replace the current parameter values. The vector must be exactly
as long as the number of parameters in the model (including holdfast parameters). If any holdfast parameter values differ in the provided values, the new values are ignored and a warning is
emitted to the model logger.
Returns An array of partial first derivatives with respect to the parameters, thus matching the size
of the parameter array.
Return type array
1.4.4 GEV Network
Nested logit and Network GEV models have an underlying network structure.
Model.nest(id, name=None, parameter=None)
A function-like object mapping node codes to names and parameters.
This can be called as if it was a normal method of Model. It also is an object that acts like a dict with integer
keys representing the node code numbers and larch.core.LinearComponent values.
1.4. Logit Models
25
Larch Documentation, Release 3.3.0
Parameters
• id (int) – The code number of the nest. Must be unique to this nest among the set of all
nests and all elemental alternatives.
• name (str or None) – The name of the nest. This name is used in various reports. It
can be any string but generally something short and descriptive is useful. If None, the name
is set to “nest_{id}”.
• parameter (str or None) – The name of the parameter to associate with this nest. If
None, the name is used.
Other Parameters
• parent (int, optional) – The code number of the parent node of the nest, for which a link
will automatically be created.
• parents (list of ints, optional) – A list of code numbers for the parent nodes of the nest, for
which links will automatically be created.
• children (list of ints, optional) – A list of code numbers for the child nodes of the nest, for
which links will automatically be created.
Returns The component object for the designated node
Return type larch.core.LinearComponent
Notes
Earlier versions of this software required node code numbers to be non-negative. They can now be any 64 bit
signed integer.
Because the id and name are distinct data types, Larch can detect (and silently allow) when they are transposed
(i.e. with name given before id).
Model.node
an alias for nest
Model.new_nest(nest_name=None, param_name=’‘, branch=None, **kwargs)
Generate a new nest with a new unique code.
If you don’t want to bother managing the code numbers for nests and instead just work with them more abstractly,
this handy function allows you to create a new nest node without worrying about the code number; an otherwise
unused number will be selected for you (and returned by this method, so you can use it elsewhere).
Parameters
• nest_name (str or None) – The name of the nest. This name is used in various
reports. It can be any string but generally something short and descriptive is useful. If
None, the name is set to “nest_{id}”, although since you’re not picking your own id, this
might not be the best way to go.
• param_name (str) – The name of the parameter to associate with this nest. If not given,
or given as an empty string, the nest_name is used.
• branch (str or other immutable) – An optional label for the branch of the
network that this nest is in.
The new code will be populated into the set at
model.branches[branch].
Other Parameters
• parent (int, optional) – The code number of the parent node of the nest, for which a link
will automatically be created.
26
Chapter 1. Contents
Larch Documentation, Release 3.3.0
• parents (list of ints, optional) – A list of code numbers for the parent nodes of the nest, for
which links will automatically be created.
• children (list of ints, optional) – A list of code numbers for the child nodes of the nest, for
which links will automatically be created.
Returns The code for the newly created nest.
Return type int
Notes
It may be convenient to give all of the parent and child linkages when calling this function, but it is not necessary,
as linkages can be created seperately later.
Model.new_node()
an alias for new_nest()
Model.link(up_id, down_id)
A function-like object defining links between network nodes.
Parameters
• up_id (int) – The code number of the upstream (i.e. closer to the root node) node on the
link. This should never be an elemental alternative.
• down_id (int) – The code number of the downstream node on the link. This can be an
elemental alternative.
Model.edge
an alias for link
Model.root_id
The root_id is the code number for the root node in a nested logit or network GEV model. The default value
for the root_id is 0. It is important that the root_id be different from the code for every elemental alternative
and intermediate nesting node. If it is convenient for one of the elemental alternatives or one of the intermediate
nesting nodes to have a code number of 0 (e.g., for a binary logit model where the choices are yes and no), then
this value can be changed to some other integer.
Model.graph
A networkx.DiGraph() representing the nesting structure.
You can use this DiGraph to explore the network structure, and use standard networkx tools to describe and
iterate over the graph. Note that this is a read-only attribute; changes to network (nesting) structure must be
made using Model.link and Model.nest.
Raises ImportError – If the networkx module is not installed.
1.4.5 Reporting Tools
Model.title
The is a descriptive title to attach to this model. It is used in certain reports, and can be set to any string. It has
no bearing on the numerical representation of the model.
Model.reorder_parameters(*ordering)
Reorder the parameters in the model.
This method reorders the model parameters as they appear in the model. It should have no material impact on
the model results, although it may be convenient for presentation.
1.4. Logit Models
27
Larch Documentation, Release 3.3.0
The ordering is defined by a series of regular expressions (regex). For each regex, all of the parameter names
matching that regex are grouped together and moved to the front of the list, retaining their original ordering
within the group. Any subsequent matches for the same parameter are ignored. All unmatched parameters
retain their original ordering and move to the end of the list as a group.
Parameters ordering (list or tuple of str) – A list of regex expressions.
Examples
>>> import larch
>>> m = larch.Model()
>>> m.parameter("A1", value=1)
ModelParameter('A1', value=1.0)
>>> m.parameter("B1", value=2)
ModelParameter('B1', value=2.0)
>>> m.parameter("C1", value=3)
ModelParameter('C1', value=3.0)
>>> m.parameter("A2", value=4)
ModelParameter('A2', value=4.0)
>>> m.parameter("B2", value=5)
ModelParameter('B2', value=5.0)
>>> m.parameter("C2", value=6)
ModelParameter('C2', value=6.0)
>>> m.parameter_names()
['A1', 'B1', 'C1', 'A2', 'B2', 'C2']
>>> m.parameter_values()
(1.0, 2.0, 3.0, 4.0, 5.0, 6.0)
>>> m.reorder_parameters('A','C')
>>> m.parameter_names()
['A1', 'A2', 'C1', 'C2', 'B1', 'B2']
>>> m.parameter_values()
(1.0, 4.0, 3.0, 6.0, 2.0, 5.0)
1.4.6 Related Classes
Model Parameters
class larch.ModelParameter(model, index)
A ModelParameter is a reference object, referring to a Model and a parameter index. Unlike a
roles.ParameterRef, a ModelParameter is explicitly bound to a specific Model, and edits to attributes of a ModelParameter automatically pass through to the underlying Model. These attributes support
both reading and writing:
value
the current value for the parameter
null_value
the null value for the parameter (used for null models and t-stats)
initial_value
the initial value of the parameter
min_value
the min bound for the parameter during estimation
max_value
the max bound for the parameter during estimation
28
Chapter 1. Contents
Larch Documentation, Release 3.3.0
holdfast
a flag indicating if the parameter value should be held fast (constrained to keep its value) during estimation
These attributes are read-only:
name
the parameter name (read-only)
index
the parameter index within the model (read-only)
t_stat
the t-statistic for the estimator (read-only)
std_err
the standard error of the estimator (read-only)
robust_std_err
the robust standard error of the estimator via bhhh sandwich (read-only)
covariance
the covariance of the estimator (read-only)
robust_covariance
the robust covariance of the estimator via bhhh sandwich (read-only)
class larch.ParameterManager
The ParameterManager class provides the interface to interact with various model parameters. You can call a
ParameterManager like a mathod, to add a new parameter to the model or to access an existing parameter. You
can also use it with square brackets, to get and set ModelParameter items.
When called as a method, in addition to the required parameter name, you can specify other ModelParameter
attributes as keyword arguments.
When getting or setting items (with square brackets) you can give the parameter name or integer index.
See the Model section for examples.
Linear Parts
Unlike other discrete choice tools (notably Biogeme), which allow for the creation of a variety of arbitrary non-linear
functions, Larch relies heavily on linear functions.
class larch.core.LinearComponent(data=”“, param=”“, multiplier=1.0, category=None)
A combination of a parameter and data.
This class represents a single term of a linear function, i.e. a parameter multiplied by some data. The data may
be a single column of raw data from a data Fountain, or it may be some prescribed function of raw data (e.g.
logarithm of cost, or cost divided by income); the principal requirement is that the data function contains only
data and no parameters to be estimated, other than the single linear coefficient.
Parameters
• param (str or ParameterRef) – The name of, or reference to, a parameter.
• data (str or DataRef) – The name of, or reference to, some data. This may be a raw
column in a data Fountain, or an expression that can be evaluated, including a number
expressed as a string. To express a constant (i.e. a parameter with no data) give 1.0.
• multiplier (float) – A convenient method to multiply the data by a constant, which
can be given as a float instead of a string.
1.4. Logit Models
29
Larch Documentation, Release 3.3.0
• category (None or int or string or tuple) – Some LinearComponent’s apply only to certain things.
data
The data associated with this LinearComponent, expressed as a DataRef. You can assign a numerical
value or a plain str to this attribute as well.
param
The parameter associated with this LinearComponent, expressed as a ParameterRef. You can
assign a plain str that names a parameter to this attribute as well. Parameter names are case-sensitive.
In addition to creating a LinearComponent using the regular constructor, you can also create these objects by
multiplying a ParameterRef and a DataRef. For example:
>>> from larch.roles import P,X
>>> P.TotalCost * X.totcost
LinearComponent(data='totcost', param='TotalCost')
class larch.core.LinearFunction
This class is a specialize list of LinearComponent, which are summed across during evaluation.
Instead of creating a LinearFunction through a constructor, it is better to create one simply by adding multiple
LinearComponent objects:
>>> from larch.roles import P,X
>>> u1 = P.TotalCost * X.totcost
>>> u2 = P.InVehTime * X.ivt
>>> u1 + u2
<LinearFunction with length 2>
= LinearComponent(data='totcost', param='TotalCost')
+ LinearComponent(data='ivt', param='InVehTime')
>>> lf = u1 + u2
>>> lf += P.OutOfVehTime * X.ovt
>>> lf
<LinearFunction with length 3>
= LinearComponent(data='totcost', param='TotalCost')
+ LinearComponent(data='ivt', param='InVehTime')
+ LinearComponent(data='ovt', param='OutOfVehTime')
You can also add a ParameterRef by itself to a LinearFunction,
>>> lf += P.SomeConstant
>>> lf
<LinearFunction with length 4>
= LinearComponent(data='totcost', param='TotalCost')
+ LinearComponent(data='ivt', param='InVehTime')
+ LinearComponent(data='ovt', param='OutOfVehTime')
+ LinearComponent(data='1', param='SomeConstant')
Although not just data by itself:
>>> lf += X.PlainData
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: Wrong number or type of arguments...
class larch.core.LinearBundle
A LinearBundle represents a bundle of linear terms that, when combined, form a complete linear relationship
from data to a modeled factor.
30
Chapter 1. Contents
Larch Documentation, Release 3.3.0
ca
The ca attribute is a single LinearFunction that can be applied for all alternatives. Depending on the
data source used, the data used in this function might need to be exclusively from the idca Format data
(e.g. for DB), or it could be a combination of idca Format and idco Format data (e.g. for DT)
co
The co attribute is a mapping (like a dict), where the keys are alternative codes and the values are
LinearFunction of idco Format data for each alternative. If an alternative is omitted, the implied
value of the LinearFunction is zero.
As a convenince, the LinearBundle object also provides __getitem__ and __setitem__ functions that pass
through to the co attribute.
MetaModels
A MetaModel is an object that encapsulates a collection of Model objects, for which the parameters are to be
estimated simultaneously.
The basic tool for analysis in Larch is a discrete choice model. A model is a structure that interacts data with a set of
ModelParameters.
Creating MetaModel Objects
class larch.MetaModel(segment_descriptors=None, submodel_factory=None, args=())
The MetaModel is actually a subclass of Model, so much of that class’s functionality (notably, interacting with
parameters) is inherited here. The MetaModel overloads loglike, d_loglike, d2_loglike, and bhhh methods of Model
in the manner you might expect: the overloaded functions return the composite values, totalled across all submodels.
1.5 Roles
In some situations (particularly when dealing with multiple related models) it can be advantageous to access model
parameters and data via roles. These are special references that are able to pull values from models, or assume default
values, without actually changing the models.
1.5.1 Parameter References
class larch.roles.ParameterRef(name, default=None, fmt=None)
An abstract reference to a parameter, which may or may not be included in any given model.
Parameters
• name (str) – The name of the parameter to reference.
• default (numeric or None) – When a targeted model does not include the referenced
parameter, use this value for value() and str(). If given as None, those methods will
raise an exception.
• fmt (str or None) – The format string to use for str().
For convenience, ParameterRef is aliased as P in the roles module. After a ParameterRef object is created, its
attributes can be modified using these methods:
ParameterRef.name(n)
Set or re-set the name of the parameter referenced.
1.5. Roles
31
Larch Documentation, Release 3.3.0
Parameters n (str) – The new name of the parameter.
Returns This method returns the self object, to facilitate method chaining.
Return type ParameterRef
ParameterRef.default_value(val)
Set or re-set the default value.
Parameters val (numeric or None) – The new default value. This will be used by value()
when the parameter is not included in the targeted model. Set to None to raise an exception
instead.
Returns This method returns the self object, to facilitate method chaining.
Return type ParameterRef
ParameterRef.fmt(format)
Set or re-set the format string for the parameter referenced.
Parameters format (str) – The new format string. This will be used to format the parameter
value when calling str().
Returns This method returns the self object, to facilitate method chaining.
Return type ParameterRef
You can access a referenced parameter from a model using these methods:
ParameterRef.value(m)
The value of the parameter in a given model.
Parameters m (Model) – The model from which to extract a parameter value.
Raises LarchError – When the model does not contain a parameter with the same name as this
ParameterRef, and the default_value for this ParameterRef is None.
ParameterRef.valid(m)
Check if this ParameterRef would give a value for a given model.
Parameters m (Model) – The model from which to extract a parameter value.
Returns False if the value method would raise an exception, and True otherwise.
Return type bool
ParameterRef.str(m, fmt=None)
Gives the value() of the parameter in a given model as a string.
The string is formated using the fmt() string if given. If not given, Python’s default string formatting is used.
Parameters
• m (Model) – The model from which to extract a parameter value.
• fmt (str or None) – If not None, a format string which will be used, overriding any
previous setting by a fmt() command. The override is valid for this method call only, and
does not change the format for future calls.
Raises LarchError – When the model does not contain a parameter with the same name as this
ParameterRef, and the default_value for this ParameterRef is None.
32
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Math using Parameter References
You can do some simple math with ParameterRef objects. Currently supported are addition, subtraction, multiplication, and division. This allows, for example, the automatic calculation of values of time for a model:
>>> m = larch.Model.Example(1, pre=True)
>>> from larch.roles import P
>>> VoT_cents_per_minute = P.tottime / P.totcost
>>> VoT_cents_per_minute.value(m)
10.434...
>>> print("The implied value of time is", VoT_cents_per_minute.str(m, fmt="{:.1f}¢/minute"))
The implied value of time is 10.4¢/minute
>>> VoT_dollars_per_hour = (P.tottime * 60) / (P.totcost * 100)
>>> print("The implied value of time is", VoT_dollars_per_hour.str(m, fmt="${:.2f}/hr"))
The implied value of time is $6.26/hr
1.5.2 Data References
class larch.roles.DataRef
1.6 Reporting
Model.report(style, *args, filename=None, tempfile=False, **kwargs)
Generate a model report.
Larch is capable of generating reports in five basic formats: text, xhtml, docx, and latex. This function serves as
a pass-through, to call the report generator of the given style, and optionally to save the results to a file.
Parameters style ([’txt’, ’docx’, ’latex’, ’html’, ’xml’]) – The style of output. Both ‘html’ and ‘xml’ will call the xhtml generator, with the only difference being the output
type, with html generating a finished [x]html document and xml generating an etree for further
processing by Python directly.
Other Parameters
• filename (str or None) – If given, then a new stack file is created by
util.filemanager.open_stack(), the output is generated in this file, and
the file-like object is returned.
• tempfile (bool) – If True then a new util.TemporaryFile is created, the output is
generated in this file, and the file-like object is returned.
• throw_exceptions (bool) – If True, exceptions are thrown if raised while generating the
report. If False (the default) tracebacks are printed directly into the report for each section
where an exception is raised. Setting this to True can be useful for testing.
Raises LarchError – If both filename and tempfile evaluate as True.
class larch.report.Category(name, *members)
Defines categories of parameters to be used in report generation.
Parameters
• name (str) – A descriptive category name that will be used to label the category in a report.
• members (tuple) – The various members of this category, which can be parameter names
(as str), or other [sub-]‘Category‘ objects.
1.6. Reporting
33
Larch Documentation, Release 3.3.0
class larch.report.Rename(name, *members)
Defines an alternate display name for parameters to be used in report generation.
Often the names of parameters actually used in the estimation process are abbreviated or arcane, especially if
some aspect of the source data or legacy models are older and not compatible with longer and more descriptive
names. But modern report generation can be enhanced by using those longer and more descriptive names.
This function allows a descriptive name to be attached to one or more parameters; attaching the same name to
different parameters allows those parameters to be linked together and appear on the same line of multi-model
reports.
Parameters
• name (str) – A descriptive name that will be used to label the parameter in a report.
• members (tuple) – The various members of this category, which should be parameter
names.
1.6.1 Reporting in Text Format
Model.txt_report(cats=[’title’, ‘params’, ‘LL’, ‘latest’], throw_exceptions=False, **format)
Generate a model report in text format.
Parameters
• cats (list of str, or ’*’) – A list of the report components to include. Use ‘*’
to include every possible component for the selected output format.
• throw_exceptions (bool) – If True, exceptions are thrown if raised while generating
the report. If False (the default) tracebacks are printed directly into the report for each
section where an exception is raised. Setting this to True can be useful for testing.
Returns The report content. You need to save it to a file on your own, if desired.
Return type str
1.6.2 Reporting in HTML Format
You can either use the pre-made Model.xhtml_report() to generate a report on a model, or you can roll your
own using a combination of xhtml_* components and custom elements. For example, you might make a custom table
to hold some facts about your model or data:
>>> from larch.util.xhtml import XML_Builder
>>> def report_valueoftime(m):
...
from larch.roles import P
...
VoT_cents_per_minute = P.tottime / P.totcost
...
VoT_dollars_per_hour = (P.tottime * 60) / (P.totcost * 100)
...
x = XML_Builder("div", {'class':"value_of_time"})
...
x.h2("Implied Value of Time", anchor=1)
...
with x.block("table"):
...
with x.block("tr"):
...
x.th("Units")
...
x.th("Value")
...
with x.block("tr"):
...
x.td("cents per minute")
...
x.td(VoT_cents_per_minute.str(m, fmt="{:.1f}\xa2/minute"))
...
with x.block("tr"):
...
x.td("dollars per hour")
34
Chapter 1. Contents
Larch Documentation, Release 3.3.0
x.td(VoT_dollars_per_hour.str(m, fmt="${:.2f}/hr"))
return x.close()
...
...
Then you could incorporate that table into a model report like this:
>>> from larch.util.xhtml import XHTML
>>> m = larch.Model.Example(1, pre=True)
>>> with XHTML(quickhead=m) as f:
...
f.append( m.xhtml_title() )
...
f.append( report_valueoftime(m) )
...
f.append( m.xhtml_params() )
...
print(f.dump())
...
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xht
Instead of printing to the console, you can send it to a file and open in your favorite browser. Then it might look like
this:
There are a variety of xhtml components that can be used in roll your own report writing:
Model.xhtml_report(cats=[’title’, ‘params’, ‘LL’, ‘latest’], raw_xml=False, throw_exceptions=False,
**format)
Generate a model report in xhtml format.
Parameters
• cats (list of str, or ’*’) – A list of the report components to include. Use ‘*’
to include every possible component for the selected output format.
1.6. Reporting
35
Larch Documentation, Release 3.3.0
• raw_xml (bool) – If True, the resulting output is returned as a div Elem containing a
subtree of the entire report. Otherwise, the results are compiled into a single bytes object
representing a complete html document.
• throw_exceptions (bool) – If True, exceptions are thrown if raised while generating
the report. If False (the default) tracebacks are printed directly into the report for each
section where an exception is raised. Setting this to True can be useful for testing.
Returns The report content. You need to save it to a file on your own, if desired.
Return type bytes or larch.util.xhtml.Elem
Example
>>> m = larch.Model.Example(1, pre=True)
>>> from larch.util.temporaryfile import TemporaryHtml
>>> html = m.xhtml_report()
>>> html
b'<!DOCTYPE html ...>'
Model.xhtml_title(**format)
Generate a div element containing the model title in a H1 tag.
The title used is taken from the title of the model. There are no format keywords that are relevant for this
method.
Returns A div containing the model title.
Return type larch.util.xhtml.Elem
36
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Model.xhtml_params(groups=None, display_inital=False, **format)
Generate a div element containing the model parameters in a table.
Parameters
• groups (None or list) – An ordered list of parameters names and/or categories. If
given, this list will be used to order the resulting table.
• display_inital (bool) – Should the initial values of the parameters (the starting point
for estimation) be included in the report. Defaults to False.
Returns A div containing the model parameters.
Return type larch.util.xhtml.Elem
Example
>>> from larch.util.pmath import category, rename
>>> from larch.util.xhtml import XHTML
>>> m = larch.Model.Example(1, pre=True)
>>> param_groups = [
...
category('Level of Service',
...
rename('Total Time', 'tottime'),
...
rename('Total Cost', 'totcost') ),
...
category('Alternative Specific Constants',
...
'ASC_SR2',
...
'ASC_SR3P',
...
'ASC_TRAN',
...
'ASC_BIKE',
...
'ASC_WALK' ),
...
category('Income',
...
'hhinc#2',
...
'hhinc#3',
...
'hhinc#4',
...
'hhinc#5',
...
'hhinc#6'
),
... ]
>>> with XHTML(quickhead=m) as f:
...
f.append( m.xhtml_title() )
...
f.append( m.xhtml_params(param_groups) )
...
html = f.dump()
>>> html
b'<!DOCTYPE html ...>'
1.6. Reporting
37
Larch Documentation, Release 3.3.0
Model.xhtml_ll(**format)
Generate a div element containing the model estimation statistics.
Returns A div containing the model parameters.
Return type larch.util.xhtml.Elem
Example
>>> from larch.util.xhtml import XHTML
>>> m = larch.Model.Example(1, pre=True)
>>> with XHTML(quickhead=m) as f:
...
f.append(m.xhtml_title())
...
f.append(m.xhtml_ll())
...
html = f.dump()
>>> html
b'<!DOCTYPE html ...>'
38
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Model.xhtml_data(**format)
Generate a div element containing the summary statistics for choice and availability.
Note that the choice and availability must be provisioned (loaded into the model) to generate these summary
statistics.
Returns A div containing the summary statistics for choice and availability.
Return type larch.util.xhtml.Elem
Example
>>> from larch.util.xhtml import XHTML
>>> m = larch.Model.Example(1, pre=True)
>>> m.df = larch.DT.Example('MTC')
>>> m.provision()
>>> with XHTML(quickhead=m) as f:
...
f.append(m.xhtml_title())
...
f.append(m.xhtml_data())
...
html = f.dump()
>>> html
b'<!DOCTYPE html ...>'
1.6. Reporting
39
Larch Documentation, Release 3.3.0
Model.xhtml_utilitydata(**format)
Summary statistics for the data used in the utility function.
Note that the utility data must be provisioned (loaded into the model) to generate these summary statistics.
Returns A div containing the summary statistics for choice and availability.
Return type larch.util.xhtml.Elem
Example
>>> from larch.util.xhtml import XHTML
>>> m = larch.Model.Example(1, pre=True)
>>> m.df = larch.DT.Example('MTC')
>>> m.provision()
>>> with XHTML(quickhead=m) as f:
...
f.append(m.xhtml_title())
...
f.append(m.xhtml_utilitydata())
...
html = f.dump()
>>> html
b'<!DOCTYPE html ...>'
40
Chapter 1. Contents
Larch Documentation, Release 3.3.0
1.6.3 Reporting Multiple Models in a Single Consolidated Report
Larch includes a facility to report multiple models side-by-side in a single consolidated report.
larch.report.multireport(models_or_filenames, params=(), ratios=[], *, filename=None, overwrite=False, spool=True, title=None)
Generate a combined report on a number of (probably related) models.
Parameters
• models_or_filenames (iterable) – A list of models, given either as str containing
a path to a file that can be loaded as a Model, or pre-loaded Model objects.
• params (iterable) – An ordered list of parameters names and/or categories. If given,
this list will be used to order the resulting table.
• ratios (iterable) – An ordered list of factors to evaluate.
Other Parameters
• filename (str) – The file into which to save the multireport
• overwrite (bool) – If filename exists, should it be overwritten (default False).
• spool (bool) – If filename exists, should the report file be spooled into a similar filename.
• title (str) – An optional title for the report.
1.6. Reporting
41
Larch Documentation, Release 3.3.0
To make the consolidated report reasonably legible, we will organize the parameters into Category groups, and
used the Rename facility to make sure that the parameter names (which might sometimes vary from model to model,
even when they apply in the same manner) line up correctly.
from larch.report import multireport, Category, Rename
cat_ASC = Category("Alternative Specific Constants",
Rename('Shared Ride 2', 'ASC_SR2' ,
),
Rename('Shared Ride 3+', 'ASC_SR3+', 'ASC_SR3P'),
Rename('Transit',
'ASC_TRAN', 'ASC_Tran'),
Rename('Bike',
'ASC_BIKE', 'ASC_Bike'),
Rename('Walk',
'ASC_WALK', 'ASC_Walk'),
)
cat_LOS = Category("Level of Service",
Rename('Total Cost',
'totcost'),
Rename('Total Time',
'tottime'),
Rename('Cost / Income',
'costbyincome'),
Rename('Motorized IVT',
'motorized_time'),
Rename('Motorized OVT / Distance','motorized_ovtbydist'),
Rename('Non-Motorized Time',
'nonmotorized_time'),
)
cat_HHIncome = Category("Household Income",
Rename('Drive Alone',
'hhinc#1',),
Rename('Shared Ride 2', 'hhinc#2',),
Rename('Shared Ride 3+', 'hhinc#3',),
Rename('Transit',
'hhinc#4',),
Rename('Bike',
'hhinc#5',),
Rename('Walk',
'hhinc#6',),
)
cat_VehPerWork = Category("Vehicles per Worker",
Rename('Drive Alone',
'vehbywrk_DA' ,
),
Rename('Shared Ride',
'vehbywrk_SR' ,
),
Rename('Shared Ride 2', 'vehbywrk_SR2' ,
),
Rename('Shared Ride 3+', 'vehbywrk_SR3+',
),
Rename('Transit',
'vehbywrk_TRAN', 'vehbywrk_Tran'),
Rename('Bike',
'vehbywrk_BIKE', 'vehbywrk_Bike'),
Rename('Walk',
'vehbywrk_WALK', 'vehbywrk_Walk'),
)
cat_EmpDen = Category("Work Zone
Rename('Drive Alone',
Rename('Shared Ride 2',
Rename('Shared Ride 3+',
Rename('Transit',
Rename('Bike',
Rename('Walk',
)
Employment Density",
'wkempden_DA' ,
),
'wkempden_SR2' ,
),
'wkempden_SR3+',
),
'wkempden_TRAN', 'wkempden_Tran'),
'wkempden_BIKE', 'wkempden_Bike'),
'wkempden_WALK', 'wkempden_Walk'),
cat_CBD = Category("Work Zone in
Rename('Drive Alone',
Rename('Shared Ride 2',
Rename('Shared Ride 3+',
Rename('Transit',
Rename('Bike',
Rename('Walk',
)
CBD",
'wkcbd_DA' ,
'wkcbd_SR2' ,
'wkcbd_SR3+',
'wkcbd_TRAN',
'wkcbd_BIKE',
'wkcbd_WALK',
42
),
),
'wkcbd_SR3P'),
'wkcbd_Tran'),
'wkcbd_Bike'),
'wkcbd_Walk'),
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Having defined all out categories, we can organize them into a single list:
cat = [
cat_ASC,
cat_LOS,
cat_HHIncome,
cat_VehPerWork,
cat_EmpDen,
cat_CBD,
]
Then it’s just a matter of loading in our models that we will report, and generating the multireport.
m1 = larch.Model.Example(1)
m1.maximize_loglike()
m2 = larch.Model.Example(17)
m2.maximize_loglike()
m1.title = "Model 1"
m2.title = "Model 17"
htmlreport = multireport([m1,m2], params=cat, ratios=(), title="Model Compare")
Save htmlreport to a file and you can open it in your favorite browser:
1.6. Reporting
43
Larch Documentation, Release 3.3.0
Tip: If you want access to the report in this example without worrying about assembling all the code blocks together
on your own, you can load the HTML report as a string like this:
htmlreport = larch.examples.reproduce(801,'htmlreport')
44
Chapter 1. Contents
Larch Documentation, Release 3.3.0
1.7 Miscellaneous Tools
1.7.1 File Management
larch.util.filemanager.open_stack(filename=None,
*arg,
mat=’{basename:s}.{number:03d}{extension:s}’,
fix=None, **kwarg)
Opens the next file in this stack for writing.
forsuf-
Parameters filename (str, optional) – The base file name to use for this stack. New files
will have a number appended after the basename but before the dot extension. For example, if
the filename is “/tmp/boo.txt”, the first file created will be named “/tmp/boo.001.txt”. If None,
which is the default, then a temporary file is created instead.
Other Parameters
• suffix (str, optional) – If given, use this file extension instead of any extension given in the
filename argument. The unsual use case for this parameter is when filename is None, and a
temporary file of a particular kind is desired.
• format (str, optional) – If given, use this format string to generate new stack file names in a
different format.
Notes
Other positional and keyword arguments are passed through to the normal Python open() function.
The returned file-like object also has an extra view method, which will open the file in the default webbrowser.
larch.util.filemanager.next_stack(filename, format=’{basename:s}.{number:03d}{extension:s}’,
suffix=None, plus=0, allow_natural=False)
Finds the next file name in this stack that does not yet exist.
Parameters filename (str or None) – The base file name to use for this stack. New files
would have a number appended after the basename but before the dot extension. For example, if
the filename is “/tmp/boo.txt”, the first file created will be named “/tmp/boo.001.txt”. If None,
then a temporary file is created instead.
Other Parameters
• suffix (str, optional) – If given, use this file extension instead of any extension given in the
filename argument. The unsual use case for this parameter is when filename is None, and a
temporary file of a particular kind is desired.
• format (str, optional) – If given, use this format string to generate new stack file names in a
different format.
• plus (int, optional) – If given, increase the returned filenumber by this amount more than
what is needed to generate a new file. This can be useful with pytables.
• allow_natural (bool) – If true, this function will return the unedited filename parameter if
that file does not already exist. Otherwise will always have a number appended to the name.
1.7.2 System Info
larch.util.sysinfo.get_processor_name()
Get a descriptive name of the CPU on this computer
1.7. Miscellaneous Tools
45
Larch Documentation, Release 3.3.0
1.7.3 XHTML
class larch.util.xhtml.Elem(tag, attrib={}, text=None, **extra)
Extends xml.etree.ElementTree.Element
class larch.util.xhtml.XML_Builder(tag=None, attrib={}, **extra)
Extends xml.etree.ElementTree.TreeBuilder
class larch.util.xhtml.XHTML(filename=None, *, overwrite=False, spool=True, quickhead=None,
css=None, view_on_exit=True)
A class used to conveniently build xhtml documents.
1.8 Mathematics of Logit Choice Modeling
This documentation will eventually provide some instruction on the underlying mathematics of logit models. For
example:
exp(𝑉𝑖 )
𝑃 (𝑖) = ∑︀
𝑗 exp(𝑉𝑗 )
with 𝑉𝑖 = 𝛽𝑋𝑖 .
1.9 Examples
1.9.1 MTC Examples
Example Data
The MTC sample dataset is the same data used in the Self Instructing Manual for discrete choice modeling: “The San
Francisco Bay Area work mode choice data set comprises 5029 home-to-work commute trips in the San Francisco Bay
Area. The data is drawn from the San Francisco Bay Area Household Travel Survey conducted by the Metropolitan
Transportation Commission (MTC) in the spring and fall of 1990. This survey included a one day travel diary for each
household member older than five years and detailed individual and household socio-demographic information.”
You can use this data to reproduce the example models in that document, and several examples are provided here to
do so.
Example Models
Importing idca Data
In this example we will import the MTC example dataset, starting from a csv text file in idca format. Suppose that data
file is gzipped, named “MTCwork.csv.gz” and is located in the current directory (use os.getcwd() to see what is
the current directory).
Tip: If you want to practice with this example, you can put this file into the current directory by using the command:
larch.DB.Example().export_idca("MTCwork.csv.gz", exclude={'caseid','altid'})
We can take a peek at the contents of the file, examining the first 10 lines:
46
Chapter 1. Contents
Larch Documentation, Release 3.3.0
>>> import gzip
>>> with gzip.open("MTCwork.csv.gz", 'rt') as previewfile:
...
print(*(next(previewfile) for x in range(10)))
casenum,altnum,chose,ivtt,ovtt,tottime,totcost,hhid,perid,numalts,dist,wkzone,hmzone,rspopden,rsempde
1,1,1,13.38,2,15.38,70.63,2,1,2,7.69,664,726,15.52,9.96,37.26,3.48,1,0,35,1,0,4,1,42.5,7,0,1,1,0,0,0,
1,2,0,18.38,2,20.38,35.32,2,1,2,7.69,664,726,15.52,9.96,37.26,3.48,1,0,35,1,0,4,1,42.5,7,0,1,1,0,0,0,
1,3,0,20.38,2,22.38,20.18,2,1,2,7.69,664,726,15.52,9.96,37.26,3.48,1,0,35,1,0,4,1,42.5,7,0,1,1,0,0,0,
1,4,0,25.9,15.2,41.1,115.64,2,1,2,7.69,664,726,15.52,9.96,37.26,3.48,1,0,35,1,0,4,1,42.5,7,0,1,1,0,0,
1,5,0,40.5,2,42.5,0,2,1,2,7.69,664,726,15.52,9.96,37.26,3.48,1,0,35,1,0,4,1,42.5,7,0,1,1,0,0,0,0,0,0,
2,1,0,29.92,10,39.92,390.81,3,1,2,11.62,738,9,35.81,53.33,32.91,764.19,1,0,40,1,0,1,1,17.5,7,0,1,1,0,
2,2,0,34.92,10,44.92,195.4,3,1,2,11.62,738,9,35.81,53.33,32.91,764.19,1,0,40,1,0,1,1,17.5,7,0,1,1,0,0
2,3,0,21.92,10,31.92,97.97,3,1,2,11.62,738,9,35.81,53.33,32.91,764.19,1,0,40,1,0,1,1,17.5,7,0,1,1,0,0
2,4,1,22.96,14.2,37.16,185,3,1,2,11.62,738,9,35.81,53.33,32.91,764.19,1,0,40,1,0,1,1,17.5,7,0,1,1,0,0
The first line of the file contains column headers. After that, each line represents an alternative available to a decision
maker. In our sample data, we see the first 5 lines of data share a caseid of 1, indicating that they are 5 different
alternatives available to the first decision maker. The identity of the alternatives is given by the number in the column
altid. The observed choice of the decision maker is indicated in the column chose with a 1 in the appropriate row.
We can import this data easily:
>>> d = larch.DB.CSV_idca("MTCwork.csv.gz", caseid="casenum", altid="altnum", choice="chose")
We can then look at some of the attibutes of the imported data:
>>> d.variables_ca()
('caseid', 'altid', 'casenum', 'altnum', 'chose', 'ivtt', 'ovtt', 'tottime', 'totcost')
>>> d.variables_co()
('caseid', 'casenum', 'hhid', 'perid', 'numalts', 'dist', 'wkzone', 'hmzone', 'rspopden', 'rsempden',
>>> d.alternative_codes()
(1, 2, 3, 4, 5, 6)
>>> d.alternative_names()
('1', '2', '3', '4', '5', '6')
Larch automatically analyzed the data file to find variables that do not vary within cases, and transformed those into
idco format variables. If you would prefer that Larch not do this (there are a variety of reasons why you might not
want this) you can set the keyword argument tablename_co to none:
>>> d1 = larch.DB.CSV_idca("MTCwork.csv.gz", tablename_co=None, caseid="casenum", altid="altnum", cho
>>> d1.variables_ca()
('caseid', 'altid', 'casenum', 'altnum', 'chose', 'ivtt', 'ovtt', 'tottime', 'totcost', 'hhid', 'peri
>>> d1.variables_co()
('caseid',)
>>> d1.alternative_codes()
(1, 2, 3, 4, 5, 6)
>>> d1.alternative_names()
('1', '2', '3', '4', '5', '6')
In this case the set of variables in the idco table isn’t actually empty, because that table is actually now expressed as a
special view of the single idca table:
>>> d1.queries.qry_idco()
'SELECT DISTINCT caseid AS caseid FROM (SELECT casenum AS caseid, altnum AS altid, * FROM data)'
In either case, the set of all possible alternatives is deduced automatically from all the values in the altid column.
However, the alterative names are not very descriptive when they are set automatically, as the csv data file does not
have enough information to tell what each alternative code number means.
1.9. Examples
47
Larch Documentation, Release 3.3.0
1: MTC MNL Mode Choice
This example is a mode choice model built using the MTC example dataset. First we create the DB and Model objects:
d = larch.DB.Example('MTC')
m = larch.Model(d)
Then we can build up the utility function. We’ll use some idco Format data first, using the Model.utility.co attribute.
This attribute is a dict-like object, to which we can assign LinearFunction objects for each alternative code.
from larch.roles import P, X, PX
m.utility.co[2] = P("ASC_SR2") +
m.utility.co[3] = P("ASC_SR3P") +
m.utility.co[4] = P("ASC_TRAN") +
m.utility.co[5] = P("ASC_BIKE") +
m.utility.co[6] = P("ASC_WALK") +
P("hhinc#2")
P("hhinc#3")
P("hhinc#4")
P("hhinc#5")
P("hhinc#6")
*
*
*
*
*
X("hhinc")
X("hhinc")
X("hhinc")
X("hhinc")
X("hhinc")
Next we’ll use some idca data, with the utility.ca attribute. This attribute is only a single LinearFunction that is
applied across all alternatives using idca Format data. Because the data is structured to vary across alternatives, the
parameters (and thus the structure of the LinearFunction) does not need to vary across alternatives.
m.utility.ca = PX("tottime") + PX("totcost")
We can specify some model options too. And let’s give our model a descriptive title.
m.option.calc_std_errors = True
m.title = "MTC Example 1 (Simple MNL)"
Having created this model, we can then estimate it:
>>> result = m.maximize_loglike()
>>> result.message
'Optimization terminated successfully per computed tolerance. [bhhh]'
>>> m.loglike()
-3626.18...
>>> print(m.report('txt', sigfigs=3))
============================================================================================
MTC Example 1 (Simple MNL)
============================================================================================
Model Parameter Estimates
-------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
tottime
0.0
-0.0513
0.0031
-16.6
0.0
totcost
0.0
-0.00492
0.000239
-20.6
0.0
ASC_SR2
0.0
-2.18
0.105
-20.8
0.0
hhinc#2
0.0
-0.00217
0.00155
-1.4
0.0
ASC_SR3P
0.0
-3.73
0.178
-21.0
0.0
hhinc#3
0.0
0.000358
0.00254
0.141
0.0
ASC_TRAN
0.0
-0.671
0.133
-5.06
0.0
hhinc#4
0.0
-0.00529
0.00183
-2.89
0.0
ASC_BIKE
0.0
-2.38
0.305
-7.8
0.0
hhinc#5
0.0
-0.0128
0.00532
-2.41
0.0
ASC_WALK
0.0
-0.207
0.194
-1.07
0.0
hhinc#6
0.0
-0.00969
0.00303
-3.19
0.0
============================================================================================
Model Estimation Statistics
-------------------------------------------------------------------------------------------Log Likelihood at Convergence
-3626.19
Log Likelihood at Null Parameters
-7309.60
48
Chapter 1. Contents
Larch Documentation, Release 3.3.0
-------------------------------------------------------------------------------------------Rho Squared w.r.t. Null Parameters
0.504
============================================================================================
...
It is a little tough to read this report because the ASC and income parameters are interweaved. We can use the reorder
method to fix this:
m.reorder_parameters("ASC", "hhinc")
Then the report will look more reasonable (although ultimately the content is the same):
>>> print(m.report('txt', sigfigs=3))
============================================================================================
MTC Example 1 (Simple MNL)
============================================================================================
Model Parameter Estimates
-------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_SR2
0.0
-2.18
0.105
-20.8
0.0
ASC_SR3P
0.0
-3.73
0.178
-21.0
0.0
ASC_TRAN
0.0
-0.671
0.133
-5.06
0.0
ASC_BIKE
0.0
-2.38
0.305
-7.8
0.0
ASC_WALK
0.0
-0.207
0.194
-1.07
0.0
hhinc#2
0.0
-0.00217
0.00155
-1.4
0.0
hhinc#3
0.0
0.000358
0.00254
0.141
0.0
hhinc#4
0.0
-0.00529
0.00183
-2.89
0.0
hhinc#5
0.0
-0.0128
0.00532
-2.41
0.0
hhinc#6
0.0
-0.00969
0.00303
-3.19
0.0
tottime
0.0
-0.0513
0.0031
-16.6
0.0
totcost
0.0
-0.00492
0.000239
-20.6
0.0
============================================================================================
Model Estimation Statistics
-------------------------------------------------------------------------------------------Log Likelihood at Convergence
-3626.19
Log Likelihood at Null Parameters
-7309.60
-------------------------------------------------------------------------------------------Rho Squared w.r.t. Null Parameters
0.504
============================================================================================
...
You can then access individual parameters from the model either by name or number (using zero-based indexing).
>>> m[0]
ModelParameter('ASC_SR2', value=-2.17...)
>>> m['totcost']
ModelParameter('totcost', value=-0.00492...)
The len() function retrieves the number of parameters.
>>> len(m)
12
You can get a list of the parameter names in order.
>>> m.parameter_names()
['ASC_SR2', 'ASC_SR3P', 'ASC_TRAN', 'ASC_BIKE', 'ASC_WALK', 'hhinc#2',
'hhinc#3', 'hhinc#4', 'hhinc#5', 'hhinc#6', 'tottime', 'totcost']
1.9. Examples
49
Larch Documentation, Release 3.3.0
1b: Using Loops
This example is a mode choice model built using the MTC example dataset. In fact, we are going to build the exact
same model as Example 1, just a bit more automagically. First we create the DB and Model objects:
d = larch.DB.Example('MTC')
m = larch.Model(d)
Then, we will extract the set of alternatives from the data:
alts = d.alternatives()
print(alts)
Which gives us
[(1, 'DA'), (2, 'SR2'), (3, 'SR3+'), (4, 'Tran'), (5, 'Bike'), (6, 'Walk')]
You’ll see that we have a list of 2-tuples, with each containing a code number and a name. We’ll denote the code
number (the first part of the tuple, indexed as zero) of the first alternative as our reference alterative.
ref_id = alts[0][0]
To create the alternative specific constants, we loop over alternatives.
for code, name in alts:
if code != ref_id:
m.utility.co("1",code,"ASC_"+name)
To create the alternative specific parameters on income, we loop over alternatives again. (We could also do this inside
the same loop with the ASCs but then the parameters would appear interleaved in the output, which we don’t want.)
for code, name in alts:
if code != ref_id:
m.utility.co("hhinc",code)
The other two parameters are generic, so we don’t need to do anything different from the original example.
m.utility.ca("tottime")
m.utility.ca("totcost")
Let’s see what we get:
>>> m.option.calc_std_errors = True
>>> m.estimate()
<larch.core.runstats, success ...
>>> m.loglike()
-3626.18...
>>> print(m)
============================================================================================
Model Parameter Estimates
-------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_SR2
0
-2.17804
0.104638
-20.815
0
ASC_SR3+
0
-3.72513
0.177692
-20.964
0
ASC_Tran
0
-0.670973
0.132591
-5.06047
0
ASC_Bike
0
-2.37634
0.304502
-7.80403
0
ASC_Walk
0
-0.206814
0.1941
-1.0655
0
hhinc#2
0
-0.00217
0.00155329
-1.39704
0
hhinc#3
0
0.000357656
0.00253773
0.140935
0
50
Chapter 1. Contents
Larch Documentation, Release 3.3.0
hhinc#4
0
-0.00528648
0.00182882
-2.89064
0
hhinc#5
0
-0.0128081
0.00532408
-2.40568
0
hhinc#6
0
-0.00968626
0.00303305
-3.19358
0
tottime
0
-0.0513404
0.00309941
-16.5646
0
totcost
0
-0.00492036
0.000238894
-20.5964
0
============================================================================================
Model Estimation Statistics
-------------------------------------------------------------------------------------------Log Likelihood at Convergence
-3626.19
Log Likelihood at Null Parameters
-7309.60
-------------------------------------------------------------------------------------------Rho Squared w.r.t. Null Parameters
0.504
============================================================================================
...
Exactly the same as before. Awesome!
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example('1b')
1l: MTC MNL Mode Choice (Legacy Style)
This example is a mode choice model built using the MTC example dataset. We will use a “legacy” style of coding the
utility functions, which is deprecated as of larch version 3.3. It still works, but it is not as easy to read and understand
as the newer style.
First we create the DB and Model objects:
d = larch.DB.Example('MTC')
m = larch.Model(d)
Then we can build up the utility function. We’ll use some idco data first, using the utility.co command. This command
takes two or three arguments: a data column, an alternative code, and an (optional) parameter name. If no parameter
name is given, one will be created automatically. The data column is a string and can be any idCO data column or a
pre-calculated value derived from one or more idCO data columns, or no data columns at all (e.g., by just giving an
integer, as a string, as the name of the column).
m.utility.co("1",2,"ASC_SR2")
m.utility.co("1",3,"ASC_SR3P")
m.utility.co("1",4,"ASC_TRAN")
m.utility.co("1",5,"ASC_BIKE")
m.utility.co("1",6,"ASC_WALK")
m.utility.co("hhinc",2)
m.utility.co("hhinc",3)
m.utility.co("hhinc",4)
m.utility.co("hhinc",5)
m.utility.co("hhinc",6)
Next we’ll use some idca data, with the utility.ca command. This command takes one or two arguments: a data
column, an alternative code, and an (optional) parameter name. If no parameter name is given, one will be created
automatically. You can give an integer as the data column here as well, but you probably won’t want to (as it will
create problems in parameter estimation if, for an idca variable, there is no variance across alternatives.
1.9. Examples
51
Larch Documentation, Release 3.3.0
m.utility.ca("tottime")
m.utility.ca("totcost")
We can specify some model options too. And let’s give our model a descriptive title.
m.option.calc_std_errors = True
m.title = "MTC Example 1 (Simple MNL)"
Having created this model, we can then estimate it:
>>> m.estimate()
<larch.core.runstats, success ...
>>> m.loglike()
-3626.18...
>>> print(m)
============================================================================================
MTC Example 1 (Simple MNL)
============================================================================================
Model Parameter Estimates
-------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_SR2
0
-2.17804
0.104638
-20.815
0
ASC_SR3P
0
-3.72513
0.177692
-20.964
0
ASC_TRAN
0
-0.670973
0.132591
-5.06047
0
ASC_BIKE
0
-2.37634
0.304502
-7.80403
0
ASC_WALK
0
-0.206814
0.1941
-1.0655
0
hhinc#2
0
-0.00217
0.00155329
-1.39704
0
hhinc#3
0
0.000357656
0.00253773
0.140935
0
hhinc#4
0
-0.00528648
0.00182882
-2.89064
0
hhinc#5
0
-0.0128081
0.00532408
-2.40568
0
hhinc#6
0
-0.00968626
0.00303305
-3.19358
0
tottime
0
-0.0513404
0.00309941
-16.5646
0
totcost
0
-0.00492036
0.000238894
-20.5964
0
============================================================================================
Model Estimation Statistics
-------------------------------------------------------------------------------------------Log Likelihood at Convergence
-3626.19
Log Likelihood at Null Parameters
-7309.60
-------------------------------------------------------------------------------------------Rho Squared w.r.t. Null Parameters
0.504
============================================================================================
...
You can then access individual parameters from the model either by name or number (using zero-based indexing).
>>> m[0]
ModelParameter('ASC_SR2', value=-2.17...)
>>> m['totcost']
ModelParameter('totcost', value=-0.00492...)
The len() function retrieves the number of parameters.
>>> len(m)
12
You can get a list of the parameter names in order.
52
Chapter 1. Contents
Larch Documentation, Release 3.3.0
>>> m.parameter_names()
['ASC_SR2', 'ASC_SR3P', 'ASC_TRAN', 'ASC_BIKE', 'ASC_WALK', 'hhinc#2',
'hhinc#3', 'hhinc#4', 'hhinc#5', 'hhinc#6', 'tottime', 'totcost']
17: Better MTC MNL Mode Choice
For this example, we’re going to create a richer and more sophisticated mode choice model, using the same MTC data.
We’ll jump straight to the preferred model 17 from the Self Instructing Manual.
To build that model, we are going to have to create some variables that we don’t already have: cost divided by income,
and out of vehicle travel time divided by distance. The tricky part is that cost and time are idca Format variables, and
income and distance are idco Format variables, in a different table. Fortunately, we can use SQL to pull the data from
one table to the other, but first we’ll set ourselves up to do so efficiently.
d = larch.DB.Example('MTC')
d.execute("CREATE INDEX IF NOT EXISTS data_co_casenum ON data_co (casenum);")
The index we create here on the idco Format table will allow SQLite to grab the correct row from the data_co table
almost instantly (more or less) each time, instead of having to search through the whole table for the matching caseid.
Once we have this index, we can write a couple UPDATE queries to build our two new idca Format variables:
d.add_column("data_ca", "costbyincome FLOAT")
qry1="UPDATE data_ca SET costbyincome = 1.0*totcost/(SELECT hhinc FROM data_co WHERE data_co.casenum=
d.execute(qry1)
d.add_column("data_ca", "ovtbydist FLOAT")
qry2="UPDATE data_ca SET ovtbydist = 1.0*ovtt/(SELECT dist FROM data_co WHERE data_co.casenum=data_ca
d.execute(qry2)
In each block, we first add a new column to the data_ca table, and then populate that column with the calculated values.
Now we are ready to build our model.
m = larch.Model(d)
m.utility.ca("costbyincome")
m.utility.ca("tottime * (altnum IN (1,2,3,4))", "motorized_time")
m.utility.ca("tottime * (altnum IN (5,6))", "nonmotorized_time")
m.utility.ca("ovtbydist * (altnum IN (1,2,3,4))", "motorized_ovtbydist")
The costbyincome data is already computed above so we can add it to the model very simply. In our preferred
specification, we want to differentiate the total travel time by motorized modes (1 to 4) and non-motorized modes (5
and 6), which we can do by specifying some math inside the data string. Often the data string is just the name of a
column as we have seen before, but it can also be any valid SQLite expression that can be evaluated on the relevant
master query (either larch_idca or larch_idco).
m.utility.co("hhinc",4)
m.utility.co("hhinc",5)
m.utility.co("hhinc",6)
Since the model we want to create groups together DA, SR2 and SR3+ jointly as reference alternatives with respect to
income, we can simply omit all of these alternatives from the block that applies to hhinc.
For vehicles per worker, the preferred model include a joint parameter on SR2 and SR3+, but not including DA and
not fixed at zero. Here we might use an alias, which allows us to specify one or more parameters that are simply
a fixed proportion of another parameter. For example, we can say that vehbywrk_SR2 will be equal to 1.0 times
vehbywrk_SR.
1.9. Examples
53
Larch Documentation, Release 3.3.0
m.parameter("vehbywrk_SR")
m.alias("vehbywrk_SR2","vehbywrk_SR",1.0)
m.alias("vehbywrk_SR3+","vehbywrk_SR",1.0)
Having defined these parameter aliases, we can then loop over all alternatives (skipping DA in the index-zero position)
to add vehicles per worker to the utility function:
for a,name in m.df.alternatives()[1:]:
m.utility.co("vehbywrk",a,"vehbywrk_"+name)
We can also run similar loops over workplace in CBD, etc:
for a,name in m.df.alternatives()[1:]:
m.utility.co("wkccbd+wknccbd",a,"wkcbd_"+name)
for a,name in m.df.alternatives()[1:]:
m.utility.co("wkempden",a,"wkempden_"+name)
for a,name in m.df.alternatives()[1:]:
m.utility.co("1",a,"ASC_"+name)
m.option.calc_std_errors = True
Having created this model, we can then estimate it:
>>> result = m.maximize_loglike()
>>> result.message
'Optimization terminated successfully...
>>> m.loglike()
-3444.1...
>>> print(m)
====================================================================================================
Model Parameter Estimates
---------------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
costbyincome
0
-0.0524213
0.0104042
-5.03849
0
motorized_time
0
-0.0201867
0.00381463
-5.2919
0
nonmotorized_time
0
-0.045446
0.00576857
-7.87821
0
motorized_ovtbydist
0
-0.132869
0.0196429
-6.76423
0
hhinc#4
0
-0.00532375
0.00197713
-2.69266
0
hhinc#5
0
-0.00864285
0.00515439
-1.67679
0
hhinc#6
0
-0.00599738
0.00314859
-1.90478
0
vehbywrk_SR
0
-0.316638
0.0666331
-4.75196
0
vehbywrk_Tran
0
-0.946257
0.118293
-7.99925
0
vehbywrk_Bike
0
-0.702149
0.258287
-2.71849
0
vehbywrk_Walk
0
-0.72183
0.169392
-4.26131
0
wkcbd_SR2
0
0.259828
0.123353
2.10638
0
wkcbd_SR3+
0
1.06926
0.191275
5.59018
0
wkcbd_Tran
0
1.30883
0.165697
7.89889
0
wkcbd_Bike
0
0.489274
0.361098
1.35496
0
wkcbd_Walk
0
0.101732
0.252107
0.403529
0
wkempden_SR2
0
0.00157763
0.000390357
4.04152
0
wkempden_SR3+
0
0.00225683
0.000451972
4.9933
0
wkempden_Tran
0
0.00313243
0.00036073
8.68358
0
wkempden_Bike
0
0.00192791
0.00121547
1.58614
0
wkempden_Walk
0
0.00289023
0.000742102
3.89465
0
ASC_SR2
0
-1.80782
0.106123
-17.035
0
54
Chapter 1. Contents
Larch Documentation, Release 3.3.0
ASC_SR3+
0
-3.43374
0.151864
-22.6106
0
ASC_Tran
0
-0.684817
0.247816
-2.7634
0
ASC_Bike
0
-1.62885
0.427399
-3.81108
0
ASC_Walk
0
0.0682096
0.348001
0.196004
0
====================================================================================================
Model Estimation Statistics
---------------------------------------------------------------------------------------------------Log Likelihood at Convergence
-3444.19
Log Likelihood at Null Parameters
-7309.60
---------------------------------------------------------------------------------------------------Rho Squared w.r.t. Null Parameters
0.529
====================================================================================================
...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(17)
17t: MTC MNL Mode Choice Using DT
For this example, we’re going to re-create model 17 from the Self Instructing Manual, but using the DT data format.
Unlike for the DB based model, we won’t need to manipulate the data in advance of creating our model, because
combinations of idca Format and idco Format variables can be done on-the-fly using broadcasting techniques for
numpy arrays.
To build that model, we are going to have to create some variables that we don’t already have: cost divided by income,
and out of vehicle travel time divided by distance. The tricky part is that cost and time are idca Format variables, and
income and distance are idco Format variables, in a different table. Fortunately, we can use SQL to pull the data from
one table to the other, but first we’ll set ourselves up to do so efficiently.
d = larch.DT.Example('MTC')
We don’t need to do anything more that open the example DT file and we are ready to build our model.
m = larch.Model(d)
m.utility.ca("totcost/hhinc",
"costbyincome")
m.utility.ca("tottime * (altnum <= 4)", "motorized_time")
m.utility.ca("tottime * (altnum >= 5)", "nonmotorized_time")
m.utility.ca("ovtt/dist * (altnum <= 4)", "motorized_ovtbydist")
The totcost/hhinc data is computed once as a new variable when loading the model data. The same for tottime filtered
by motorized modes (we harness the convenient fact that all the motorized modes have identifying numbers 4 or less),
and ovtt/dist.
m.utility.co("hhinc",4)
m.utility.co("hhinc",5)
m.utility.co("hhinc",6)
Since the model we want to create groups together DA, SR2 and SR3+ jointly as reference alternatives with respect to
income, we can simply omit all of these alternatives from the block that applies to hhinc.
For vehicles per worker, the preferred model include a joint parameter on SR2 and SR3+, but not including DA and
not fixed at zero. Here we might use an alias, which allows us to specify one or more parameters that are simply
1.9. Examples
55
Larch Documentation, Release 3.3.0
a fixed proportion of another parameter. For example, we can say that vehbywrk_SR2 will be equal to 1.0 times
vehbywrk_SR.
m.parameter("vehbywrk_SR")
m.alias("vehbywrk_SR2","vehbywrk_SR",1.0)
m.alias("vehbywrk_SR3+","vehbywrk_SR",1.0)
Having defined these parameter aliases, we can then loop over all alternatives (skipping DA in the index-zero position)
to add vehicles per worker to the utility function:
for a,name in m.df.alternatives()[1:]:
m.utility.co("vehbywrk",a,"vehbywrk_"+name)
We can also run similar loops over workplace in CBD, etc:
for a,name in m.df.alternatives()[1:]:
m.utility.co("wkccbd+wknccbd",a,"wkcbd_"+name)
for a,name in m.df.alternatives()[1:]:
m.utility.co("wkempden",a,"wkempden_"+name)
for a,name in m.df.alternatives()[1:]:
m.utility.co("1",a,"ASC_"+name)
m.option.calc_std_errors = True
Having created this model, we can then estimate it:
>>> result = m.maximize_loglike()
>>> result.message
'Optimization terminated successfully...
>>> m.loglike()
-3444.1...
>>> print(m)
====================================================================================================
Model Parameter Estimates
---------------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
costbyincome
0
-0.0524213
0.0104042
-5.03849
0
motorized_time
0
-0.0201867
0.00381463
-5.2919
0
nonmotorized_time
0
-0.045446
0.00576857
-7.87821
0
motorized_ovtbydist
0
-0.132869
0.0196429
-6.76423
0
hhinc#4
0
-0.00532375
0.00197713
-2.69266
0
hhinc#5
0
-0.00864285
0.00515439
-1.67679
0
hhinc#6
0
-0.00599738
0.00314859
-1.90478
0
vehbywrk_SR
0
-0.316638
0.0666331
-4.75196
0
vehbywrk_Tran
0
-0.946257
0.118293
-7.99925
0
vehbywrk_Bike
0
-0.702149
0.258287
-2.71849
0
vehbywrk_Walk
0
-0.72183
0.169392
-4.26131
0
wkcbd_SR2
0
0.259828
0.123353
2.10638
0
wkcbd_SR3+
0
1.06926
0.191275
5.59018
0
wkcbd_Tran
0
1.30883
0.165697
7.89889
0
wkcbd_Bike
0
0.489274
0.361098
1.35496
0
wkcbd_Walk
0
0.101732
0.252107
0.403529
0
wkempden_SR2
0
0.00157763
0.000390357
4.04152
0
wkempden_SR3+
0
0.00225683
0.000451972
4.9933
0
wkempden_Tran
0
0.00313243
0.00036073
8.68358
0
wkempden_Bike
0
0.00192791
0.00121547
1.58614
0
56
Chapter 1. Contents
Larch Documentation, Release 3.3.0
wkempden_Walk
0
0.00289023
0.000742102
3.89465
0
ASC_SR2
0
-1.80782
0.106123
-17.035
0
ASC_SR3+
0
-3.43374
0.151864
-22.6106
0
ASC_Tran
0
-0.684817
0.247816
-2.7634
0
ASC_Bike
0
-1.62885
0.427399
-3.81108
0
ASC_Walk
0
0.0682096
0.348001
0.196004
0
====================================================================================================
Model Estimation Statistics
---------------------------------------------------------------------------------------------------Log Likelihood at Convergence
-3444.19
Log Likelihood at Null Parameters
-7309.60
---------------------------------------------------------------------------------------------------Rho Squared w.r.t. Null Parameters
0.529
====================================================================================================
...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example('17t')
1.9.2 Swissmetro Examples
Example Data
The Swissmetro sample dataset is the same data used in the examples for Biogeme. Note that Larch and Biogeme
have different capabilities, and it is not possible to reproduce every example model shown for Biogeme.
Example Models
101: Swissmetro MNL Mode Choice
This example is a mode choice model built using the Swissmetro example dataset. First we create the DB and Model
objects:
d = larch.DB.Example('SWISSMETRO')
m = larch.Model(d)
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
m.title = "swissmetro example 01 (simple logit)"
The swissmetro dataset, as with all Biogeme data, is only in co format.
m.utility.co("1",1,"ASC_TRAIN")
m.utility.co("1",3,"ASC_CAR")
m.utility.co("TRAIN_TT",1,"B_TIME")
m.utility.co("SM_TT",2,"B_TIME")
m.utility.co("CAR_TT",3,"B_TIME")
m.utility.co("TRAIN_CO*(GA==0)",1,"B_COST")
m.utility.co("SM_CO*(GA==0)",2,"B_COST")
m.utility.co("CAR_CO",3,"B_COST")
We can estimate the models and check the results match up with those given by Biogeme:
1.9. Examples
57
Larch Documentation, Release 3.3.0
>>> m.estimate()
<larch.core.runstats, success ...
>>> m.loglike()
-5331.252...
>>> m['B_TIME'].value
-0.01277...
>>> m['B_COST'].value
-0.01083...
>>> m['ASC_TRAIN'].value
-0.7012...
>>> m['ASC_CAR'].value
-0.1546...
>>> print(m.report('txt', sigfigs=3))
=========================================================================================...
swissmetro example 01 (simple logit)
=========================================================================================...
Model Parameter Estimates
-----------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_TRAIN
0.0
-0.701
0.0549
-12.8
0.0
ASC_CAR
0.0
-0.155
0.0432
-3.58
0.0
B_TIME
0.0
-0.0128
0.000569
-22.5
0.0
B_COST
0.0
-0.0108
0.000518
-20.9
0.0
=========================================================================================...
Model Estimation Statistics
-----------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-5331.25
Log Likelihood at Null Parameters
-6964.66
-----------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.235
=========================================================================================...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(101)
101b: Swissmetro MNL, Biogeme Style
This example is a mode choice model built using the Swissmetro example dataset. We will use a style for writing the
utility functions that is similar to the style used in Biogeme. First we create the DB and Model objects, as usual:
d = larch.DB.Example('SWISSMETRO')
m = larch.Model(d)
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
m.title = "swissmetro example 01b (simple logit)"
Unlike Biogeme, the usual way of using Larch does not fill the main namespace with all the parameters and data
column references as distinct objects. Instead, we can use two master classes to fill those roles.
58
Chapter 1. Contents
Larch Documentation, Release 3.3.0
from larch.roles import P,X
# Parameters, Data
All of our parameter references can be written as instances of the P (roles.ParameterRef) class, and all of our
data column references can be written as instances of the X (roles.DataRef) class.
The swissmetro dataset, as with all Biogeme data, is only in co format. Which is great, because it lets us ignore the ca
format and just write out the utility functions directly.
m.utility[1] = (
+
+
m.utility[2] = (
+
m.utility[3] = (
+
+
P.ASC_TRAIN
P.Time * X.TRAIN_TT
P.Cost * X("TRAIN_CO*(GA==0)") )
P.Time * X.SM_TT
P.Cost * X("SM_CO*(GA==0)") )
P.ASC_CAR
P.Time * X.CAR_TT
P.Cost * X("CAR_CO") )
Note that when the data field is too complex to be expressed as a single python identifier (variable name), we can write
it as a quoted string instead.
We can estimate the models and check the results match up with those given by Biogeme:
>>> m.estimate()
<larch.core.runstats, success ...
>>> m.loglike()
-5331.252...
>>> m['Time'].value
-0.01277...
>>> m['Cost'].value
-0.01083...
>>> m['ASC_TRAIN'].value
-0.7012...
>>> m['ASC_CAR'].value
-0.1546...
>>> print(m.report('txt', sigfigs=3))
=========================================================================================...
swissmetro example 01b (simple logit)
=========================================================================================...
Model Parameter Estimates
-----------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_TRAIN
0.0
-0.701
0.0549
-12.8
0.0
Time
0.0
-0.0128
0.000569
-22.5
0.0
Cost
0.0
-0.0108
0.000518
-20.9
0.0
ASC_CAR
0.0
-0.155
0.0432
-3.58
0.0
=========================================================================================...
Model Estimation Statistics
-----------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-5331.25
Log Likelihood at Null Parameters
-6964.66
-----------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.235
=========================================================================================...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
1.9. Examples
59
Larch Documentation, Release 3.3.0
m = larch.Model.Example('101b')
101s: Swissmetro MNL, Stacked Variables
This example is a mode choice model built using the Swissmetro example dataset. We will use a style for writing
the utility functions that uses stacked variables, which is a feature of the DT data format. First we create the DT and
Model objects, as usual:
d = larch.DT.Example('SWISSMETRO')
m = larch.Model(d)
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
m.title = "swissmetro example 01s (simple logit)"
Unlike Biogeme, the usual way of using Larch does not fill the main namespace with all the parameters and data
column references as distinct objects. Instead, we can use two master classes to fill those roles.
from larch.roles import P,X
# Parameters, Data
All of our parameter references can be written as instances of the P (roles.ParameterRef) class, and all of our
data column references can be written as instances of the X (roles.DataRef) class.
The swissmetro dataset, as with all Biogeme data, is only in co format. But, in our model most of the attributes are
“generic”, i.e. stuff like travel time, which varies across alternatives, but for which we’ll want to assign the same
parameter to for each alternative (so that a minute of travel time has the same value no matter which alternative it is
on). So, here we will create the generic ca format variables by stacking the relevant co variables.
d.stack_idco('traveltime', {1: X.TRAIN_TT, 2: X.SM_TT, 3: X.CAR_TT})
d.stack_idco('cost', {1: X("TRAIN_CO*(GA==0)"), 2: X("SM_CO*(GA==0)"), 3: X("CAR_CO")})
Then we can use these stacked variables in the utility.ca function:
m.utility.ca = X.traveltime * P.Time + X.cost * P.Cost
Not all our variables are stack-able; the alternative specific constants are not because we want to assign different
parameters for each alternative. So those we’ll leave in idco Format format (implicitly):
m.utility[1] = P.ASC_TRAIN
m.utility[2] = 0
m.utility[3] = P.ASC_CAR
We can estimate the models and check the results match up with those given by Biogeme:
>>> result = m.maximize_loglike()
>>> print(result.message)
Optimization terminated successfully...
>>> m.loglike()
-5331.252...
>>> m['Time'].value
-0.01277...
>>> m['Cost'].value
-0.01083...
>>> m['ASC_TRAIN'].value
-0.7012...
>>> m['ASC_CAR'].value
-0.1546...
60
Chapter 1. Contents
Larch Documentation, Release 3.3.0
>>> print(m.report('txt', sigfigs=3))
=========================================================================================...
swissmetro example 01s (simple logit)
=========================================================================================...
Model Parameter Estimates
-----------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
Time
0.0
-0.0128
0.000569
-22.5
0.0
Cost
0.0
-0.0108
0.000518
-20.9
0.0
ASC_TRAIN
0.0
-0.701
0.0549
-12.8
0.0
ASC_CAR
0.0
-0.155
0.0432
-3.58
0.0
=========================================================================================...
Model Estimation Statistics
-----------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-5331.25
Log Likelihood at Null Parameters
-6964.66
-----------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.235
=========================================================================================...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example('101s')
102: Swissmetro Weighted MNL Mode Choice
This example is a mode choice model built using the Swissmetro example dataset. First we create the DB and Model
objects. When we create the DB object, we will redefine the weight value:
d = larch.DB.Example('SWISSMETRO')
d.queries.weight = "(1.0*(GROUPid==2)+1.2*(GROUPid==3))*0.8890991"
m = larch.Model(d)
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
m.title = "swissmetro example 02 (simple logit weighted)"
The swissmetro dataset, as with all Biogeme data, is only in co format.
m.utility.co("1",1,"ASC_TRAIN")
m.utility.co("1",3,"ASC_CAR")
m.utility.co("TRAIN_TT",1,"B_TIME")
m.utility.co("SM_TT",2,"B_TIME")
m.utility.co("CAR_TT",3,"B_TIME")
m.utility.co("TRAIN_CO*(GA==0)",1,"B_COST")
m.utility.co("SM_CO*(GA==0)",2,"B_COST")
m.utility.co("CAR_CO",3,"B_COST")
We can estimate the models and check the results match up with those given by Biogeme:
>>> result = m.maximize_loglike()
>>> print(result.message)
Optimization terminated successfully...
1.9. Examples
61
Larch Documentation, Release 3.3.0
>>> print(m.report('txt', sigfigs=4))
=========================================================================================...
swissmetro example 02 (simple logit weighted)
=========================================================================================...
Model Parameter Estimates
-----------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_TRAIN
0.0
-0.7566
0.05604
-13.5
0.0
ASC_CAR
0.0
-0.1143
0.04315
-2.65
0.0
B_TIME
0.0
-0.01321
0.0005693
-23.21
0.0
B_COST
0.0
-0.0112
0.0005201
-21.53
0.0
=========================================================================================...
Model Estimation Statistics
-----------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-5273.74
Log Likelihood at Null Parameters
-7016.87
-----------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.248
=========================================================================================...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(102)
104: Swissmetro MNL with Modified Data
This example is a mode choice model built using the Swissmetro example dataset. First we create the DB and Model
objects:
d = larch.DB.Example('SWISSMETRO')
m = larch.Model(d)
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
m.title = "swissmetro example 04 (modified data)"
The swissmetro dataset, as with all Biogeme data, is only in co format. To be consistent with the Biogeme example,
we divide travel time by 100.0.
Note: We use 100.0 instead of 100 to ensure that the division is done with real (floating point) numbers and not
integers. Larch internally uses integers only for nominal & ordinal values (i.e. identifying names and positions in
arrays) and doesn’t use integer values to represent cardinal data values, although SQLite sometimes does. Math
completed inside the SQLite kernel, such as that contained within a string used as a data item, can be impacted by this.
m.utility.co("1",1,"ASC_TRAIN")
m.utility.co("1",3,"ASC_CAR")
m.utility.co("TRAIN_TT/100.0",1,"B_TIME")
m.utility.co("SM_TT/100.0",2,"B_TIME")
m.utility.co("CAR_TT/100.0",3,"B_TIME")
For this model, we will use the natural log of (cost/100.0), instead of cost. But when cost is zero, this would give an
error. So, use the a “CASE ... WHEN ... THEN ... ELSE ... END” construct from SQL to give us a non-error value
62
Chapter 1. Contents
Larch Documentation, Release 3.3.0
(here, we set it to 0) when cost is zero.
m.utility.co("CASE TRAIN_CO*(GA==0) WHEN 0 THEN 0 ELSE LOG((TRAIN_CO/100.0)*(GA==0)) END",1,"B_LOGCOS
m.utility.co("CASE SM_CO*(GA==0) WHEN 0 THEN 0 ELSE LOG((SM_CO/100.0)*(GA==0)) END",2,"B_LOGCOST")
m.utility.co("CASE CAR_CO WHEN 0 THEN 0 ELSE LOG(CAR_CO/100.0) END",3,"B_LOGCOST")
We can estimate the models and check the results match up with those given by Biogeme:
>>> result = m.maximize_loglike()
>>> print(result.message)
Optimization terminated successfully...
>>> print(m.report('txt', sigfigs=4))
=========================================================================================...
swissmetro example 04 (modified data)
=========================================================================================...
Model Parameter Estimates
-----------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_TRAIN
0.0
-0.8513
0.05597
-15.21
0.0
ASC_CAR
0.0
-0.2745
0.04571
-6.005
0.0
B_TIME
0.0
-1.072
0.05471
-19.58
0.0
B_LOGCOST
0.0
-1.036
0.05946
-17.43
0.0
=========================================================================================...
Model Estimation Statistics
-----------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-5423.30
Log Likelihood at Null Parameters
-6964.66
-----------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.221
=========================================================================================...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(104)
105: Swissmetro Normal Mixed MNL Mode Choice
Tip: Mixed logit models are under development. The interface should not be considered stable and may change with
future versions of Larch. Use at your own risk.
This example is a mode choice model built using the Swissmetro example dataset. First we create the DB and Model
objects:
d = larch.DB.Example('SWISSMETRO')
kernel = larch.Model(d)
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
kernel.title = "swissmetro example 05 (normal mixture logit)"
1.9. Examples
63
Larch Documentation, Release 3.3.0
The swissmetro dataset, as with all Biogeme data, is only in co format. To add a mixed parameter on time, we give
the time parameter twice (once with *1 added to give a distinctive name) and add the plain (mean of the normal) and
distributional (std dev of the normal).
kernel.utility.co("1",1,"ASC_TRAIN")
kernel.utility.co("1",3,"ASC_CAR")
kernel.utility.co("TRAIN_TT",1,"B_TIME")
kernel.utility.co("SM_TT",2,"B_TIME")
kernel.utility.co("CAR_TT",3,"B_TIME")
kernel.utility.co("TRAIN_TT *1",1,"B_TIME_S")
kernel.utility.co("SM_TT *1",2,"B_TIME_S")
kernel.utility.co("CAR_TT *1",3,"B_TIME_S")
kernel.utility.co("TRAIN_CO*(GA==0)",1,"B_COST")
kernel.utility.co("SM_CO*(GA==0)",2,"B_COST")
kernel.utility.co("CAR_CO",3,"B_COST")
From the kernel MNL model we create a normal mixed model. We set the starting value for the std dev to be nonzero
to improve numerical stability:
m = larch.mixed.NormalMixedModel(kernel, ['B_TIME_S'], ndraws=100, seed=0)
v = m.parameter_values()
v[-1] = 0.01
m.parameter_values(v)
We can estimate the models and check the results match up with those given by Biogeme:
>>> result = m.maximize_loglike()
>>> m.loglike()
-5213.34...
The reporting features of mixed logit models have not been developed yet. A placeholder simple report of the parameters is available for now:
>>> print(m)
<larch.mixed.NormalMixedModel> Temporary Report
============================================================
ASC_TRAIN
-0.396863
ASC_CAR
0.140273
B_TIME
-0.0235885
B_COST
-0.0128322
Choleski_0
0.0160842
============================================================
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(105)
109: Swissmetro Nested Logit Mode Choice
This example is a mode choice model built using the Swissmetro example dataset. First we create the DB and Model
objects. When we create the DB object, we will redefine the weight value:
d = larch.DB.Example('SWISSMETRO')
m = larch.Model(d)
64
Chapter 1. Contents
Larch Documentation, Release 3.3.0
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
m.title = "swissmetro example 09 (nested logit)"
The swissmetro dataset, as with all Biogeme data, is only in co format.
m.utility.co("1",1,"ASC_TRAIN")
m.utility.co("1",3,"ASC_CAR")
m.utility.co("TRAIN_TT",1,"B_TIME")
m.utility.co("SM_TT",2,"B_TIME")
m.utility.co("CAR_TT",3,"B_TIME")
m.utility.co("TRAIN_CO*(GA==0)",1,"B_COST")
m.utility.co("SM_CO*(GA==0)",2,"B_COST")
m.utility.co("CAR_CO",3,"B_COST")
To create a new nest, we can use the new_nest command, although we’ll need to know what the alternative codes are
for the alternatives in our dataset. To find out, we can do:
>>> m.df.alternatives()
[(1, 'Train'), (2, 'SM'), (3, 'Car')]
For this example, we want to nest together the Train and Car modes into a “existing” modes nest. It looks like those
are modes 1 and 3, so we can use the new_nest command like this:
m.new_nest("existing", parent=m.root_id, children=[1,3])
We can estimate the models and check the results match up with those given by Biogeme:
>>> result = m.maximize_loglike()
>>> print(result.message)
Optimization terminated successfully...
>>> print(m.report('txt', sigfigs=4))
=========================================================================================...
swissmetro example 09 (nested logit)
=========================================================================================...
Model Parameter Estimates
-----------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_TRAIN
0.0
-0.512
0.04518
-11.33
0.0
ASC_CAR
0.0
-0.1672
0.03714
-4.502
0.0
B_TIME
0.0
-0.008987
0.0005699
-15.77
0.0
B_COST
0.0
-0.008567
0.0004627
-18.51
0.0
existing
1.0
0.4868
0.0279
-18.39
1.0
=========================================================================================...
Model Estimation Statistics
-----------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-5236.90
Log Likelihood at Null Parameters
-6964.66
-----------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.248
=========================================================================================...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(109)
1.9. Examples
65
Larch Documentation, Release 3.3.0
111: Swissmetro Cross-Nested Logit Mode Choice
This example is a mode choice model built using the Swissmetro example dataset. First we create the DB and Model
objects. When we create the DB object, we will redefine the weight value:
d = larch.DB.Example('SWISSMETRO')
m = larch.Model(d)
We can attach a title to the model. The title does not affect the calculations as all; it is merely used in various output
report styles.
m.title = "swissmetro example 11 (cross nested logit)"
The swissmetro dataset, as with all Biogeme data, is only in co format.
m.utility.co("1",1,"ASC_TRAIN")
m.utility.co("1",3,"ASC_CAR")
m.utility.co("TRAIN_TT",1,"B_TIME")
m.utility.co("SM_TT",2,"B_TIME")
m.utility.co("CAR_TT",3,"B_TIME")
m.utility.co("TRAIN_CO*(GA==0)",1,"B_COST")
m.utility.co("SM_CO*(GA==0)",2,"B_COST")
m.utility.co("CAR_CO",3,"B_COST")
To create a new nest, we can use the new_nest command, although we’ll need to know what the alternative codes are
for the alternatives in our dataset. To find out, we can do:
>>> m.df.alternatives()
[(1, 'Train'), (2, 'SM'), (3, 'Car')]
For this example, we want to nest together the Train and Car modes into a “existing” modes nest, and we want to nest
Train and SM together into a “public” modes nest. This creates a structure different from a traditional nested logit
model, because the Train mode is “cross-nested”: it appears in more than one nest. The desired nesting structure then
looks like this:
Root
Public
SM
66
Existing
Train
Car
Chapter 1. Contents
Larch Documentation, Release 3.3.0
To do this we can use the new_nest command like this:
existing_id = m.new_nest("existing", parent=m.root_id, children=[1,3])
public_id = m.new_nest("public", parent=m.root_id, children=[1,2])
The new_nest() method returns an identifying code number for the newly created nest. We’ll use that code number
below.
For a cross-nested model, we need to assign an allocation level to each graph link for all entering links of any node
that has more than one predecessor. In this case, that applies only to the “Train” node.
Larch employs a logit-type function to manage this allocation, instead of specifying the allocation directly as a parameter. So, the allocation on the link Public->Train (PT) is given by
𝛼𝑃 𝑇 =
exp(𝜑𝑃 𝑇 𝑍)
,
exp(𝜑𝑃 𝑇 𝑍) + exp(𝜑𝐸𝑇 𝑍)
where 𝜑𝑃 𝑇 is a vector of zero or more parameters associated with the link PT, 𝜑𝐸𝑇 is a similar vector of parameters
for the link Public->Existing (ET) and 𝑍 is a vector of idco Format variables.
If we give our model no other commands, the length of these parameter and data vectors will be zero, and the allocation
parameters for links PT and ET will default to 0.5 each. But, we can relax this constrained default value by putting
some defined parameters into the formula:
m.link[existing_id, 1](param="phi_et", data="1")
Note that, as for alternative specific constants in the utility function, we leave out one of the possibilities (here, we
skip ET) to avoid overspecifying the model.
Constructing the allocation parameters like this has a few benefits. It automatically ensures that the total allocation for
each node to it’s incoming links is always 1. It also allows the parameter phi_pt to be estimated as unconstrained, as
it can take on any value without bound and still produce a properly normalized model. Lastly, it allows the allocation
to be a function of the idco Format data, and not merely a fixed value. One drawback is that there is no statistical test
to use on the value of the estimated phi parameters to determine if the true value of the alloction should be 1 or 0,
because those allocations are associated with infinite values for phi.
We can estimate the models and check the results match up with those given by Biogeme:
>>> result = m.maximize_loglike()
>>> print(result.message)
Optimization terminated successfully...
>>> print(m.report('txt', sigfigs=4))
=========================================================================================...
swissmetro example 11 (cross nested logit)
=========================================================================================...
Model Parameter Estimates
-----------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
ASC_TRAIN
0.0
0.09828
0.05634
1.744
0.0
ASC_CAR
0.0
-0.2404
0.03844
-6.255
0.0
B_TIME
0.0
-0.007769
0.0005576
-13.93
0.0
B_COST
0.0
-0.008189
0.000446
-18.36
0.0
existing
1.0
0.3976
0.02761
-21.82
1.0
public
1.0
0.2431
0.03361
-22.52
1.0
phi_et
0.0
-0.01971
0.1157
-0.1703
0.0
=========================================================================================...
Model Estimation Statistics
-----------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-5214.05
1.9. Examples
67
Larch Documentation, Release 3.3.0
Log Likelihood at Null Parameters
-6964.66
-----------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.251
=========================================================================================...
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(111)
1.9.3 Itinerary Choice Examples
Example Data
This sample data file represents something like what you might use to estimate an aviation itinerary choice model.
The actual content of the file contains completely fabricated data that does not correspond to any real itinerary choice
observations, so the model results you get from this dataset will likely be poor; the data is given only as an example
for how to structure and code these models, and should not be used to understand how to interpret the results.
Example Models
201: Network GEV Itinerary Choice
This example is an itinerary choice model built using the example itinerary choice dataset included with Larch. As
usual, we first create the DB objects:
d = larch.DB.Example('ITINERARY')
Our itinerary choice data has a lot of alternatives, but they are not ordered or numbered in a regular way; each elemental
alternative has an arbitrary code number assigned to it, and the code numbers for one case are not comparable to
another case. But we can renumber the alternatives in a manner that is more suited for our application, such that based
on the code number we can programatically extract a few relevant features of the alternative that we will want to use in
building our Network GEV model. Suppose for example that we want to test a model which has level of service nested
inside carriers on one side, and carriers nested inside level of service on the other side. To assign each alternative into
this structure, obviously we’ll need to be able to easily identify the level of service and carrier for each alternative.
To renumber, first we will define the relevant categories and values, and establish a numbering system using a special
object:
from enum import Enum
class levels_of_service(Enum):
nonstop = 1
withstop = 0
class carriers(Enum):
Robin = 1
Cardinal = 2
Bluejay = 3
Heron = 4
Other = 5
from larch.util.numbering import numbering_system
ns = numbering_system(levels_of_service, carriers)
68
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Then we can use a special command on the DB object to assign new alternative numbers.
d.recode_alts(ns, 'data_ca', 'casenum', 'itinerarycode',
'nonstop', 'MIN(carrier,5)',
newaltstable='itinerarycodes',
)
As arguments to this command, we provide the numbering system object, the name of the table that contains the idca
data to be numbered (here data_ca), the name of the column in that table that names the caseids, and the name of a
new column to be created (or overwritten) with the new code numbers. We also need to give a set of SQL expressions
that can be evaluated on the rows of the table to get the categorical values that we defined in the Enums above. Lastly,
we can pass the name of a new table that will be created to identify every observed alternative code.
Now let’s make our model. We’ll use a few variables to define our linear-in-parameters utility function.
m = larch.Model(d)
vars = [
"carrier=2",
"carrier=3",
"carrier=4",
"carrier>=5",
"aver_fare_hy",
"aver_fare_ly",
"itin_num_cnxs",
"itin_num_directs",
]
for var in vars:
m.utility.ca(var)
Now it’s time to build the network that will define our nesting structure. First we’ll need to have the list of all the
alternative codes we created above.
alts = d.alternative_codes()
Then we can build the network by looping over various categories to define the nodes.
for los in levels_of_service:
los_nest = m.new_nest(los.name, param_name="mu_los1", parent=m.root_id)
for carrier in carriers:
carrier_nest = m.new_nest(los.name+carrier.name, param_name="mu_carrier1", parent=los
for a in alts:
if ns.code_matches_attributes(a, los, carrier):
m.link(carrier_nest, a)
In this first block, the outermost loop is over levels of service, and within that we loop over carriers to create lower
level nests, and then over the alternatives, linking in those alternatives that match the level of service and carrier. Note
we reuse the same ns object from when we did the alternative renumbering, as it knows how to quickly extract the
relevant attributes from the code number.
for carrier in carriers:
carrier_nest = m.new_nest(carrier.name, param_name="mu_carrier2", parent=m.root_id)
for los in levels_of_service:
los_nest = m.new_nest(carrier.name+los.name, param_name="mu_los2", parent=carrier_nes
for a in alts:
if ns.code_matches_attributes(a, los, carrier):
m.link(los_nest, a)
m.link[los_nest, a](data='1',param='PHI')
The second block mirrors the first block, except reversing the order of the categories, so that the carrier nests are on
1.9. Examples
69
Larch Documentation, Release 3.3.0
top and the level of service nests are underneath them. Also, we add one extra line after the link command, which
associates a PHI parameter with the link, to manage the allocation of the elemental alternatives to the nodes above
them. The PHI takes the place of alpha allocation parameters, and it is omitted in the earlier block because one side
needs to be implicitly normalized to zero. For more details on this, check out Newman (2008).
For this example, since we want it to run quickly, we’ll limit the input data to only a few cases, and turn off the the
automatic calculation of standard errors. We’ll also tell the optimization engine to enforce logsum parameter ordering
constraints.
filter = 'casenum < 20000'
d.queries.idca_build(filter=filter)
d.queries.idco_build(filter=filter)
m.option.calc_std_errors = False
m.option.enforce_constraints = True
By default, logsum parameters created automatically by the new_nest method have min/max bounds set at 0.0/1.0.
But the network GEV can become numerically unstable if these parameters get too close to zero, especially when the
dataset is small (as it is here in this example). So we can set a minimum value a little bit away from zero like this:
m['mu_los1'].minimum = 0.01
m['mu_carrier1'].minimum = 0.01
m['mu_los2'].minimum = 0.01
m['mu_carrier2'].minimum = 0.01
Now it’s time to run it and see what we get. We’ll use the SLSQP algorithm because it can use the automatic parameter
constraints (most of the available algorithms cannot):
>>> result = m.maximize_loglike('SLSQP', metaoptions={'ftol': 1e-08})
>>> result.message
'Optimization terminated successfully. [SLSQP]'
>>> print(m.report('txt', sigfigs=3))
=================================================================================================...
Model Parameter Estimates
-------------------------------------------------------------------------------------------------...
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
carrier=2
0.0
-0.203
nan
nan
0.0
carrier=3
0.0
0.0054
nan
nan
0.0
carrier=4
0.0
-0.04
nan
nan
0.0
carrier>=5
0.0
0.0294
nan
nan
0.0
aver_fare_hy
0.0
-0.00216
nan
nan
0.0
aver_fare_ly
0.0
-0.000162
nan
nan
0.0
itin_num_cnxs
0.0
-0.477
nan
nan
0.0
itin_num_directs
0.0
-0.263
nan
nan
0.0
mu_los1
1.0
0.0505
nan
nan
1.0
mu_carrier1
1.0
0.045
nan
nan
1.0
mu_carrier2
1.0
1.0
nan
nan
1.0
mu_los2
1.0
1.0
nan
nan
1.0
PHI
0.0
1.21
nan
nan
0.0
=================================================================================================...
Model Estimation Statistics
-------------------------------------------------------------------------------------------------...
Log Likelihood at Convergence
-6086.89
Log Likelihood at Null Parameters
-6115.43
-------------------------------------------------------------------------------------------------...
Rho Squared w.r.t. Null Parameters
0.005
=================================================================================================...
Don’t worry that these results don’t look great; we used a very tiny dataset here with a complex model. This example
70
Chapter 1. Contents
Larch Documentation, Release 3.3.0
was just to walk you through the mechanics of specifying and estimating this model, not showing you how to get a
good model with the data, which is an exercise left for the reader.
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(201)
202: OGEV Itinerary Choice
This example is an OGEV itinerary choice model built using the example itinerary choice dataset included with Larch.
As usual, we first create the DB objects:
d = larch.DB.Example('ITINERARY')
Our itinerary choice data has a lot of alternatives, but they are not ordered or numbered in a regular way; each elemental
alternative has an arbitrary code number assigned to it, and the code numbers for one case are not comparable to
another case. But we can renumber the alternatives in a manner that is more suited for our application, such that
based on the code number we can programatically extract a few relevant features of the alternative that we will want
to use in building our OGEV model. Suppose for example that we want to test a model which has a departure time of
day OGEV structure. To assign each alternative into this structure, obviously we’ll need to be able to easily identify
the departure time (here we will group by hours) for each alternative. To renumber, first we will define the relevant
categories and values, and establish a numbering system using a special object:
from enum import Enum
class departure_hours(Enum):
h0 = 0
h1 = 1
h2 = 2
h3 = 3
h4 = 4
h5 = 5
h6 = 6
h7 = 7
h8 = 8
h9 = 9
h10 = 10
h11 = 11
h12 = 12
h13 = 13
h14 = 14
h15 = 15
h16 = 16
h17 = 17
h18 = 18
h19 = 19
h20 = 20
h21 = 21
h22 = 22
h23 = 23
from larch.util.numbering import numbering_system
ns = numbering_system(departure_hours)
Then we can use a special command on the DB object to assign new alternative numbers.
1.9. Examples
71
Larch Documentation, Release 3.3.0
d.recode_alts(ns, 'data_ca', 'casenum', 'ogevitinerarycode',
'depart_time/60',
newaltstable='ogevitinerarycodes',
)
As arguments to this command, we provide the numbering system object, the name of the table that contains the idca
data to be numbered (here data_ca), the name of the column in that table that names the caseids, and the name of a
new column to be created (or overwritten) with the new code numbers. We also need to give a set of SQL expressions
that can be evaluated on the rows of the table to get the categorical values that we defined in the Enums above. Lastly,
we can pass the name of a new table that will be created to identify every observed alternative code.
Now let’s make our model. We’ll use a few variables to define our linear-in-parameters utility function.
m = larch.Model(d)
vars = [
"carrier=2",
"carrier=3",
"carrier=4",
"carrier>=5",
"aver_fare_hy",
"aver_fare_ly",
"itin_num_cnxs",
"itin_num_directs",
]
for var in vars:
m.utility.ca(var)
Now it’s time to build the network that will define our nesting structure. First we’ll need to have the list of all the
alternative codes we created above.
alts = d.alternative_codes()
Then we can build the network by looping over various categories to define the nodes.
for hour in departure_hours:
prev_hour = departure_hours((hour.value-1)%24)
next_hour = departure_hours((hour.value+1)%24)
time_nest = m.new_nest(hour.name, param_name="mu_ogev", parent=m.root_id)
for a in alts:
if ns.code_matches_attributes(a, hour):
m.link(time_nest, a)
if ns.code_matches_attributes(a, next_hour):
m.link(time_nest, a)
m.link[time_nest, a](data='1',param='PHI_next')
if ns.code_matches_attributes(a, prev_hour):
m.link(time_nest, a)
m.link[time_nest, a](data='1',param='PHI_prev')
In this block, the outermost loop is over departure hour. Within that, we define the previous and next hours (this is
easily expandible to include multiple hours in each OGEV nest) and a nest to group together the three hours. Then we
loop over alternatives and link each one if it matches the target hour, or the next or previous hour. We also add a PHI
parameter to the next and previous hour alternatives, to control the allocation of the alternatives to the OGEV level
nests. The target hour link has no PHI parameter because it is the reference point.
The PHI parameters replace the ALPHA allocation parameters commonly seen in OGEV models in the literature, and
72
Chapter 1. Contents
Larch Documentation, Release 3.3.0
are used to recreate the allocation parameters according to:
exp(𝜑𝑖 · 𝑍𝑖 )
𝛼𝑖 = ∑︀
𝑗 exp(𝜑𝑗 · 𝑍𝑗 )
where Z is potentially some idco Format data, although in the example shown here it is simply a constant (1).
For this example, since we want it to run quickly, we’ll limit the input data to only a few cases, and turn off the the
automatic calculation of standard errors. We’ll also tell the optimization engine to enforce logsum parameter ordering
constraints.
filter = 'casenum < 20000'
d.queries.idca_build(filter=filter)
d.queries.idco_build(filter=filter)
m.option.calc_std_errors = False
m.option.enforce_constraints = False
m.option.enforce_bounds = False
Now it’s time to run it and see what we get. We’ll use the SLSQP algorithm because it can use the automatic parameter
constraints (most of the available algorithms cannot):
>>> result = m.maximize_loglike('SLSQP', metaoptions={'ftol': 1e-08})
>>> result.message
'Optimization terminated successfully...
>>> print(m)
===================================================================================================..
Model Parameter Estimates
---------------------------------------------------------------------------------------------------..
Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
carrier=2
0
-0.208...
nan
nan
0
carrier=3
0
0.0150...
nan
nan
0
carrier=4
0
-0.0588...
nan
nan
0
carrier>=5
0
0.0561...
nan
nan
0
aver_fare_hy
0
-0.00228...
nan
nan
0
aver_fare_ly
0
-0.000211...
nan
nan
0
itin_num_cnxs
0
-0.441...
nan
nan
0
itin_num_directs
0
-0.104...
nan
nan
0
mu_ogev
1
0.887...
nan
nan
1
PHI_prev
0
0.0891...
nan
nan
0
PHI_next
0
1.84...
nan
nan
0
===================================================================================================..
Model Estimation Statistics
---------------------------------------------------------------------------------------------------..
Log Likelihood at Convergence
-6085.30
Log Likelihood at Null Parameters
-6115.43
---------------------------------------------------------------------------------------------------..
Rho Squared w.r.t. Null Parameters
0.005
===================================================================================================..
...
Don’t worry that these results don’t look great; we used a very tiny dataset here with a complex model. This example
was just to walk you through the mechanics of specifying and estimating this model, not showing you how to get a
good model with the data, which is an exercise left for the reader.
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
1.9. Examples
73
Larch Documentation, Release 3.3.0
m = larch.Model.Example(202)
203: OGEV-NL Itinerary Choice
This example is an multi-level OGEV-NL itinerary choice model built using the example itinerary choice dataset
included with Larch. As usual, we first create the DB objects:
db = larch.DB.Example('ITINERARY')
Our itinerary choice data has a lot of alternatives, but they are not ordered or numbered in a regular way; each elemental
alternative has an arbitrary code number assigned to it, and the code numbers for one case are not comparable to
another case. But we can renumber the alternatives in a manner that is more suited for our application, such that based
on the code number we can programatically extract a few relevant features of the alternative that we will want to use in
building our OGEV-NL model. Suppose for example that we want to test a model which has a departure time of day
OGEV structure with carriers nested inside time of day. To assign each alternative into this structure, obviously we’ll
need to be able to easily identify the departure time (here we will group by hours) and carrier for each alternative.
To renumber, first we will define the relevant categories and values, and establish a numbering system using a special
object:
from enum import Enum
class departure_hours(Enum):
h0 = 0
h1 = 1
h2 = 2
h3 = 3
h4 = 4
h5 = 5
h6 = 6
h7 = 7
h8 = 8
h9 = 9
h10 = 10
h11 = 11
h12 = 12
h13 = 13
h14 = 14
h15 = 15
h16 = 16
h17 = 17
h18 = 18
h19 = 19
h20 = 20
h21 = 21
h22 = 22
h23 = 23
class carriers(Enum):
Robin = 1
Cardinal = 2
Bluejay = 3
Heron = 4
Other = 5
from larch.util.numbering import numbering_system
74
Chapter 1. Contents
Larch Documentation, Release 3.3.0
ns = numbering_system(departure_hours, carriers)
Then we can use a special command on the DB object to assign new alternative numbers.
db.recode_alts(ns, 'data_ca', 'casenum', 'ogevitinerarycode',
'depart_time/60', 'MIN(carrier,5)',
newaltstable='ogevitinerarycodes',
)
As arguments to this command, we provide the numbering system object, the name of the table that contains the idca
data to be numbered (here data_ca), the name of the column in that table that names the caseids, and the name of a
new column to be created (or overwritten) with the new code numbers. We also need to give a set of SQL expressions
that can be evaluated on the rows of the table to get the categorical values that we defined in the Enums above. Lastly,
we can pass the name of a new table that will be created to identify every observed alternative code.
Now let’s make our model. We’ll use a few variables to define our linear-in-parameters utility function.
m = larch.Model(db)
vars = [
"carrier=2",
"carrier=3",
"carrier=4",
"carrier>=5",
"aver_fare_hy",
"aver_fare_ly",
"itin_num_cnxs",
"itin_num_directs",
]
for var in vars:
m.utility.ca(var)
Now it’s time to build the network that will define our nesting structure. First we’ll need to have the list of all the
alternative codes we created above.
alts = db.alternative_codes()
Then we can build the network by looping over various categories to define the nodes.
hour_carrier_nests = {}
for hour in departure_hours:
for carrier in carriers:
n = m.new_nest(hour.name+carrier.name, param_name="mu_carrier")
hour_carrier_nests[hour,carrier] = n
for a in alts:
if ns.code_matches_attributes(a, hour, carrier):
m.link(n, a)
In this block, we define a set of nests by hour and carrier, and allocate the elemental alternatives into those nests. Note
we reuse the same ns object from when we did the alternative renumbering, as it knows how to quickly extract the
relevant attributes from the code number.
for hour in departure_hours:
prev_hour = departure_hours((hour.value-1)%24)
next_hour = departure_hours((hour.value+1)%24)
time_nest = m.new_nest(hour.name, param_name="mu_ogev", parent=m.root_id)
for carrier in carriers:
m.link(time_nest, hour_carrier_nests[hour,carrier])
m.link(time_nest, hour_carrier_nests[next_hour,carrier])
m.link[time_nest, hour_carrier_nests[next_hour,carrier]](data='1',param='PHI_next')
1.9. Examples
75
Larch Documentation, Release 3.3.0
m.link(time_nest, hour_carrier_nests[prev_hour,carrier])
m.link[time_nest, hour_carrier_nests[prev_hour,carrier]](data='1',param='PHI_prev')
In this second block, the outermost loop is over departure hour. Within that, we define the previous and next hours (this
is easily expandible to include multiple hours in each OGEV nest) and a nest to group together the three hours. Then
we loop over carriers link the nests we created in the previous block, with one link to each current hour carrier nest,
and one link to each next hour carrier nest. We also add a PHI parameter to the second link to control the allocation of
the hour-carrier nests to the OGEV level nests.
For this example, since we want it to run quickly, we’ll limit the input data to only a few cases, and turn off the the
automatic calculation of standard errors. We’ll also tell the optimization engine to enforce logsum parameter ordering
constraints.
filter = 'casenum < 20000'
db.queries.idca_build(filter=filter)
db.queries.idco_build(filter=filter)
m.option.calc_std_errors = False
m.option.enforce_constraints = False
m.option.enforce_bounds = False
Now it’s time to run it and see what we get. We’ll use the SLSQP algorithm because it can use the automatic parameter
constraints (most of the available algorithms cannot):
>>> result = m.maximize_loglike('SLSQP', metaoptions={'ftol': 1e-08})
>>> result.message
'Optimization terminated successfully...
>>> print(m.report('txt', cats=['params'], param='< 12.3g'))
====================================================================================================
Model Parameter Estimates
---------------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
carrier=2
0
-0.209
nan
nan
0
carrier=3
0
0.00105
nan
nan
0
carrier=4
0
-0.0577
nan
nan
0
carrier>=5
0
0.046
nan
nan
0
aver_fare_hy
0
-0.00231
nan
nan
0
aver_fare_ly
0
-0.000213
nan
nan
0
itin_num_cnxs
0
-0.442
nan
nan
0
itin_num_directs
0
-0.108
nan
nan
0
mu_carrier
1
0.844
nan
nan
1
mu_ogev
1
0.922
nan
nan
1
PHI_next
0
0.737
nan
nan
0
PHI_prev
0
0.00259
nan
nan
0
====================================================================================================
>>> print(m.report('txt', cats=['LL']))
================================================
Model Estimation Statistics
-----------------------------------------------Log Likelihood at Convergence
-6084.54
Log Likelihood at Null Parameters
-6115.43
-----------------------------------------------Rho Squared w.r.t. Null Parameters
0.005
================================================
Don’t worry that these results don’t look great; we used a very tiny dataset here with a complex model. This example
was just to walk you through the mechanics of specifying and estimating this model, not showing you how to get a
good model with the data, which is an exercise left for the reader.
76
Chapter 1. Contents
Larch Documentation, Release 3.3.0
Tip: If you want access to the model in this example without worrying about assembling all the code blocks together
on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(203)
220: Partially Segmented Itinerary Choice Using a MetaModel
MetaModels are a way to estimate a group of models simultaneously.
This example is an itinerary choice meta-model built using the example itinerary choice dataset included with Larch.
import larch
import itertools
from larch.metamodel import MetaModel
db = larch.DB.Example('ITINERARY', shared=True)
db.queries.idco_query = "SELECT distinct(casenum) AS caseid, 1 as weight FROM data_ca "
db.queries.idca_query = "SELECT casenum AS caseid, itinerarycode AS altid, * FROM data_ca "
dow_set = [0,1]
type_set = ['OW','OB','IB']
common_vars = [
"carrier=2",
"carrier=3",
"carrier=4",
"carrier>=5",
"aver_fare_hy",
"aver_fare_ly",
"itin_num_cnxs",
"itin_num_directs",
]
segmented_vars = [
"sin2pi",
"sin4pi",
"sin6pi",
"cos2pi",
"cos4pi",
"cos6pi",
]
We can construct a MetaModel in two general ways: manually (explicitly giving every submodel), or by using a
submodel factory function. Such a function must take a segment descriptor, plus some set of other arguments, and
return a submodel for the given segment.
Here we show an example of a submodel factory function, which takes the (shared cache) db object, plus lists of
common and segemented variables, and returns a submodel. The db object needs to have a shared cache because
we will use the NewConnection method of the DB class to spawn a new DB object that shares the same underlying
database. This lets us have several different database connections, each with its own QuerySet, so that each submodel
can use a unique data set pulled from the master data.
The advantage of using the submodel factory is that the rest of the set up of the MetaModel can be handled automatically.
1.9. Examples
77
Larch Documentation, Release 3.3.0
def submodel_factory(segment_desciptor, db, common_vars, segmented_vars):
# Create a new larch.DB based on the shared DB
dx = larch.DB.NewConnection(db)
# introduce a case filter to apply to the data table, to get the segment
casefilter = " WHERE dow=={0} AND direction=='{1}'".format(*segment_desciptor)
dx.queries = qry = larch.core.QuerySetTwoTable(dx)
qry.idco_query = "SELECT distinct casenum AS caseid, dow, direction FROM data_ca "+casefilter
qry.idca_query = "SELECT casenum AS caseid, itinerarycode AS altid, * FROM data_ca "+casefilt
# The rest of the QuerySet is defined as usual and is the same for all segments
qry.alts_query = "SELECT * FROM itinerarycodes "
qry.choice = 'pax_count'
qry.avail = '1'
qry.weight = '1'
dx.refresh_queries()
# Create a new submodel using the filtered data DB
submodel = larch.Model(dx)
# If the submodel has no cases, skip the rest of setting it up
if submodel.df.nCases()==0:
return submodel
# Populate the submodel with the common parameters
for var in common_vars:
submodel.utility.ca(var)
# Populate the submodel with the segmented parameters
for var in segmented_vars:
built_par = var+"_{}_{}".format(*segment_desciptor)
submodel.utility.ca(var, built_par)
return submodel
Thst should be all we need to create our metamodel. Then to actually create the MetaModel object, we’ll need to give
an iterator over all the segements, the submodel factory, and a tuple with the arguments for the factory:
m = MetaModel( itertools.product(dow_set, type_set), submodel_factory, args=(db, common_vars, segment
Let’s confirm that we got a model that has the parameters we want.
>>> print(m)
====================================================================================================
Model Parameter Estimates
---------------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
carrier=2
0
0
nan
nan
0
carrier=3
0
0
nan
nan
0
carrier=4
0
0
nan
nan
0
carrier>=5
0
0
nan
nan
0
aver_fare_hy
0
0
nan
nan
0
aver_fare_ly
0
0
nan
nan
0
itin_num_cnxs
0
0
nan
nan
0
itin_num_directs
0
0
nan
nan
0
sin2pi_0_OW
0
0
nan
nan
0
sin4pi_0_OW
0
0
nan
nan
0
sin6pi_0_OW
0
0
nan
nan
0
78
Chapter 1. Contents
Larch Documentation, Release 3.3.0
cos2pi_0_OW
0
0
nan
nan
0
cos4pi_0_OW
0
0
nan
nan
0
cos6pi_0_OW
0
0
nan
nan
0
sin2pi_0_OB
0
0
nan
nan
0
sin4pi_0_OB
0
0
nan
nan
0
sin6pi_0_OB
0
0
nan
nan
0
cos2pi_0_OB
0
0
nan
nan
0
cos4pi_0_OB
0
0
nan
nan
0
cos6pi_0_OB
0
0
nan
nan
0
sin2pi_0_IB
0
0
nan
nan
0
sin4pi_0_IB
0
0
nan
nan
0
sin6pi_0_IB
0
0
nan
nan
0
cos2pi_0_IB
0
0
nan
nan
0
cos4pi_0_IB
0
0
nan
nan
0
cos6pi_0_IB
0
0
nan
nan
0
sin2pi_1_OW
0
0
nan
nan
0
sin4pi_1_OW
0
0
nan
nan
0
sin6pi_1_OW
0
0
nan
nan
0
cos2pi_1_OW
0
0
nan
nan
0
cos4pi_1_OW
0
0
nan
nan
0
cos6pi_1_OW
0
0
nan
nan
0
sin2pi_1_OB
0
0
nan
nan
0
sin4pi_1_OB
0
0
nan
nan
0
sin6pi_1_OB
0
0
nan
nan
0
cos2pi_1_OB
0
0
nan
nan
0
cos4pi_1_OB
0
0
nan
nan
0
cos6pi_1_OB
0
0
nan
nan
0
sin2pi_1_IB
0
0
nan
nan
0
sin4pi_1_IB
0
0
nan
nan
0
sin6pi_1_IB
0
0
nan
nan
0
cos2pi_1_IB
0
0
nan
nan
0
cos4pi_1_IB
0
0
nan
nan
0
cos6pi_1_IB
0
0
nan
nan
0
====================================================================================================
...
Yup, looks good. We have one of all of the common parameters, plus a set of the segmented parameters for each
segment. Now let’s estimate our model. We’ll turn off the calculation of standard errors, because that takes a bit of
time and we’re not interested in those results yet.
>>> m.option.calc_std_errors = False
>>> r = m.maximize_loglike("SLSQP")
>>> print(r)
ctol: 1.351...e-09
loglike: -27540.2525...
loglike_null: -27722.6710...
message: 'Optimization terminated successfully per computed tolerance. [SLSQP]'
...
>>> print(m)
====================================================================================================
Model Parameter Estimates
---------------------------------------------------------------------------------------------------Parameter
InitValue
FinalValue
StdError
t-Stat
NullValue
carrier=2
0
-0.143742
nan
nan
0
carrier=3
0
0.00833003
nan
nan
0
carrier=4
0
0.0325161
nan
nan
0
carrier>=5
0
0.0464821
nan
nan
0
1.9. Examples
79
Larch Documentation, Release 3.3.0
aver_fare_hy
0
-0.00151544
nan
nan
0
aver_fare_ly
0
-0.00111023
nan
nan
0
itin_num_cnxs
0
-0.678798
nan
nan
0
itin_num_directs
0
-0.255871
nan
nan
0
sin2pi_0_OW
0
-0.0908166
nan
nan
0
sin4pi_0_OW
0
0.030409
nan
nan
0
sin6pi_0_OW
0
0.00678247
nan
nan
0
cos2pi_0_OW
0
-0.154815
nan
nan
0
cos4pi_0_OW
0
-0.130647
nan
nan
0
cos6pi_0_OW
0
-0.0811669
nan
nan
0
sin2pi_0_OB
0
0.00748833
nan
nan
0
sin4pi_0_OB
0
0.0386351
nan
nan
0
sin6pi_0_OB
0
0.0245834
nan
nan
0
cos2pi_0_OB
0
-0.108559
nan
nan
0
cos4pi_0_OB
0
-0.0573955
nan
nan
0
cos6pi_0_OB
0
-0.0168128
nan
nan
0
sin2pi_0_IB
0
-0.0779617
nan
nan
0
sin4pi_0_IB
0
0.0490604
nan
nan
0
sin6pi_0_IB
0
-0.0037936
nan
nan
0
cos2pi_0_IB
0
0.00643352
nan
nan
0
cos4pi_0_IB
0
-0.0437634
nan
nan
0
cos6pi_0_IB
0
-0.0458394
nan
nan
0
sin2pi_1_OW
0
-0.0896607
nan
nan
0
sin4pi_1_OW
0
-0.0741084
nan
nan
0
sin6pi_1_OW
0
-0.0221
nan
nan
0
cos2pi_1_OW
0
0.00213618
nan
nan
0
cos4pi_1_OW
0
-0.0183732
nan
nan
0
cos6pi_1_OW
0
0.0319627
nan
nan
0
sin2pi_1_OB
0
-0.0505187
nan
nan
0
sin4pi_1_OB
0
-0.046858
nan
nan
0
sin6pi_1_OB
0
-0.0286364
nan
nan
0
cos2pi_1_OB
0
-0.18624
nan
nan
0
cos4pi_1_OB
0
-0.113927
nan
nan
0
cos6pi_1_OB
0
0.00489613
nan
nan
0
sin2pi_1_IB
0
-0.0591861
nan
nan
0
sin4pi_1_IB
0
-0.0305713
nan
nan
0
sin6pi_1_IB
0
0.0151741
nan
nan
0
cos2pi_1_IB
0
-0.0184999
nan
nan
0
cos4pi_1_IB
0
-0.0291761
nan
nan
0
cos6pi_1_IB
0
-0.0543221
nan
nan
0
====================================================================================================
Model Estimation Statistics
---------------------------------------------------------------------------------------------------Log Likelihood at Convergence
-27540.25
Log Likelihood at Null Parameters
-27722.67
---------------------------------------------------------------------------------------------------Rho Squared w.r.t. Null Parameters
0.007
====================================================================================================
...
Tip: If you want access to the metamodel in this example without worrying about assembling all the code blocks
together on your own, you can load a read-to-estimate copy like this:
m = larch.Model.Example(220)
80
Chapter 1. Contents
Larch Documentation, Release 3.3.0
1.10 Frequently Asked Questions
Why should I use Larch instead of Biogeme to estimate my logit models? Biogeme is a different free software
package that also estimates parameters for logit models. If you know it, and how to use it, awesome! If you
don’t, it’s really not that hard to learn and in many ways quite similar to Larch. Depending on your particular
application and needs, (especially if you want to explore more complex non-linear models) then Biogeme could
be right for you.
The principal systematic differences between Biogeme and Larch are in data and numerical processing: Larch
uses NumPy to manage arrays, and invokes highly optimized linear algebra subroutines to compute certain
matrix transformations. There is a loss in model flexibility (non-linear-in-parameters models are not supported
in Larch) but a potentially significant gain in speed.
Did this software used to be called ELM? Why the name change? Yes, earlier versions of this software were
called ELM. But it turns out ELM is a pretty common name for software. Larch is not unique, but much
less commonly used. Also, it is the tradition of Python software to use names from Monty Python, particularly
when you want to be able to identify your software from quite a long way away.
Why is the Windows wheel download so much larger than the Mac one? The Windows wheel include the openblas library for linear algebra computations. The Mac version does not need an extra library because Mac OS X
includes vector math libraries by default in the Accelerate framework.
It is not working. Can you troubleshoot for me? Are you using the 64 bit (amd64) version of Python? Larch is
only compiled for 64 bit at present.
For some unknown reason, certain mathematical tools are not available on PyPI as wheels for Windows. One
way to get them is to download numpy, scipy, and pandas and install them manually. But a better option if you
can spare a few extra MB of installation disk space is to install the 64 bit Anaconda version of Python 3.5. This
comes with a nice stack of other helpful statistical tools as well, and you’ll probably not need any other libraries.
Once you’ve installed Anaconda, you can install Larch by typing “pip install larch” on the command line.
Why is Larch not recognizing my input file? If you are using windows, you might have a filename that looks like
this:
C:\path\to\my\file.csv
Unfortunately, in Python the single backslash is an “escape character” and is interpreted differently depending
on the next character. For example, you might get:
>>> filename = "C:\path\to\my\file.csv"
>>> print(filename)
C:\path o\my♀ile.csv
Obviously, that’s not what you want. Instead, you could use either of these for filenames:
>>> filename = "C:\\path\\to\\my\\file.csv"
>>> filename = "C:/path/to/my/file.csv"
If you are using Mac OS X or Linux, your file pathnames use forward slashes and there should be no problem.
1.10. Frequently Asked Questions
81
Larch Documentation, Release 3.3.0
82
Chapter 1. Contents
Index
A
alternative_code() (larch.core.Fountain method), 22
alternative_codes() (larch.core.Fountain method), 22
alternative_name() (larch.core.Fountain method), 22
alternative_names() (larch.core.Fountain method), 22
alts_query (larch.core.QuerySetSimpleCO attribute), 15
alts_query (larch.core.QuerySetTwoTable attribute), 16
alts_values (larch.core.QuerySetSimpleCO attribute), 15
array_caseids() (larch.DB method), 10
array_idca() (larch.core.Fountain method), 23
array_idca() (larch.DB method), 10
array_idco() (larch.core.Fountain method), 23
array_idco() (larch.DB method), 11
attach() (larch.DB method), 12
avail (larch.core.QuerySetSimpleCO attribute), 16
avail (larch.core.QuerySetTwoTable attribute), 17
avail_idco (larch.DT attribute), 22
C
ca (larch.core.LinearBundle attribute), 30
Category (class in larch.report), 33
check_ca() (larch.core.Fountain method), 23
check_co() (larch.core.Fountain method), 23
choice (larch.core.QuerySetSimpleCO attribute), 15
choice (larch.core.QuerySetTwoTable attribute), 16
choice_idco (larch.DT attribute), 22
co (larch.core.LinearBundle attribute), 31
Copy() (larch.DB static method), 5
covariance (larch.ModelParameter attribute), 29
crack_idca() (larch.DB method), 12
CSV_idca() (larch.DB static method), 6
CSV_idco() (larch.DB static method), 6
D
d_loglike() (larch.Model method), 25
data (larch.core.LinearComponent attribute), 30
dataframe_all() (larch.core.Fountain method), 23
dataframe_idca() (larch.core.Fountain method), 23
dataframe_idco() (larch.core.Fountain method), 23
DataRef (class in larch.roles), 33
DB (class in larch), 5
default_value() (larch.roles.ParameterRef method), 32
detach() (larch.DB method), 12
DT (class in larch), 17
E
edge (larch.Model attribute), 27
Elem (class in larch.util.xhtml), 46
estimate() (larch.Model method), 25
Example() (larch.DB static method), 5
Example() (larch.DT static method), 18
Example() (larch.Model method), 24
export_all() (larch.core.Fountain method), 23
export_idca() (larch.DB method), 9
export_idco() (larch.DB method), 9
F
fmt() (larch.roles.ParameterRef method), 32
Fountain (class in larch.core), 22
G
get_processor_name() (in module larch.util.sysinfo), 45
graph (larch.Model attribute), 27
H
holdfast (larch.ModelParameter attribute), 29
I
idca_query (larch.core.QuerySetTwoTable attribute), 16
idco_query (larch.core.QuerySetSimpleCO attribute), 15
idco_query (larch.core.QuerySetTwoTable attribute), 16
import_csv() (larch.DB method), 7
import_dataframe() (larch.DB method), 8
import_dbf() (larch.DB method), 8
import_idca() (larch.DT method), 18
import_idco() (larch.DT method), 18
import_xlsx() (larch.DB method), 8
index (larch.ModelParameter attribute), 29
initial_value (larch.ModelParameter attribute), 28
83
Larch Documentation, Release 3.3.0
L
LinearBundle (class in larch.core), 30
LinearComponent (class in larch.core), 29
LinearFunction (class in larch.core), 30
link (larch.Model attribute), 27
loglike() (larch.Model method), 25
set_choice_ca() (larch.core.QuerySetTwoTable method),
17
set_choice_co() (larch.core.QuerySetTwoTable method),
17
std_err (larch.ModelParameter attribute), 29
str() (larch.roles.ParameterRef method), 32
M
T
max_value (larch.ModelParameter attribute), 28
maximize_loglike() (larch.Model method), 24
MetaModel (class in larch), 31
min_value (larch.ModelParameter attribute), 28
Model (class in larch), 23
ModelParameter (class in larch), 28
multireport() (in module larch.report), 41
N
name (larch.ModelParameter attribute), 29
name() (larch.roles.ParameterRef method), 31
nest (larch.Model attribute), 25
new_nest() (larch.Model method), 26
new_node() (larch.Model method), 27
next_stack() (in module larch.util.filemanager), 45
node (larch.Model attribute), 26
null_value (larch.ModelParameter attribute), 28
O
open_stack() (in module larch.util.filemanager), 45
P
param (larch.core.LinearComponent attribute), 30
parameter() (larch.Model method), 24
ParameterManager (class in larch), 29
ParameterRef (class in larch.roles), 31
Q
queries (larch.DB attribute), 13
QuerySet (class in larch.core), 13
QuerySetSimpleCO (class in larch.core), 15
QuerySetTwoTable (class in larch.core), 16
t_stat (larch.ModelParameter attribute), 29
tbl_alts() (larch.core.QuerySet method), 14
tbl_avail() (larch.core.QuerySet method), 14
tbl_caseids() (larch.core.QuerySet method), 14
tbl_choice() (larch.core.QuerySet method), 14
tbl_idca() (larch.core.QuerySet method), 13
tbl_idco() (larch.core.QuerySet method), 13
tbl_weight() (larch.core.QuerySet method), 14
title (larch.Model attribute), 27
txt_report() (larch.Model method), 34
V
valid() (larch.roles.ParameterRef method), 32
validate() (larch.DT method), 22
value (larch.ModelParameter attribute), 28
value() (larch.roles.ParameterRef method), 32
variables_ca() (larch.core.Fountain method), 23
variables_co() (larch.core.Fountain method), 23
W
weight (larch.core.QuerySetSimpleCO attribute), 16
weight (larch.core.QuerySetTwoTable attribute), 17
X
XHTML (class in larch.util.xhtml), 46
xhtml_data() (larch.Model method), 39
xhtml_ll() (larch.Model method), 38
xhtml_params() (larch.Model method), 37
xhtml_report() (larch.Model method), 35
xhtml_title() (larch.Model method), 36
xhtml_utilitydata() (larch.Model method), 40
XML_Builder (class in larch.util.xhtml), 46
R
Rename (class in larch.report), 33
reorder_parameters() (larch.Model method), 27
report() (larch.Model method), 33
robust_covariance (larch.ModelParameter attribute), 29
robust_std_err (larch.ModelParameter attribute), 29
roll() (larch.Model method), 24
root_id (larch.Model attribute), 27
S
seer() (larch.DB method), 10
84
Index