Module conventions

Source
Expand description

Utilities for working with IPUMS conventions and metadata structure.

This module provides structs and methods for loading metadata and storing information about a IPUMS data collection based on IPUMS conventions and minimal configuration. Every collection has a set of data record types and a hierarchy those records belong to. For instance, person records belong to household records. Each household record owns 0 or more person records.

The MicroDataCollection struct initialization makes heavy use of IPUMS directory and naming conventions. This includes loading IPUMS metadata for the collection.

The Context struct is the entry point for setting up a MicroDataCollection object. It will figure out a “data root” or use one provided to it to locate available data and metadata and load it if requested.

Other operations in this library require a Context object to find data and use metadata.

Metadata for IPUMS data follows naming and organizational conventions. Following these allows us to skip a lot of repetitive configuration. IPUMS data resides under “data root” directories in a “current” directory (compressed fixed-width data) and under “current” in a “parquet” directory for the Parquet version of the same data. A “layouts” directory under “current” contains two “layout” files per dataset: One describing the input layout and labels for those inputs, and one describing the IPUMS version of the data with variable names, record types, data types and designated width in printable characters for the variables. This layout information can serve as basic metadata for other uses besides parsing the fixed-width data. Currently the Parquet data does not have variable level metadata on its columns, so we rely on the layout metadata. Eventually we plan to put variable metadata like formatting directives, codes and labels in the Parquet.

See the .layout.txt files in the tests directory.

Structs§

Context
Holds loaded metadata and information for finding data and additional metadata.
DatasetsForVariable
MetadataEntities
MicroDataCollection
Key characteristics of data collections
VariablesForDataset
There is a master Vec with Variables by IpumsVariableId this structure points into.