Parquet Format

Introduction

Parquet is a column based file format that is often used for data pipelines. Part of the Apache Hadoop ecosystem, it is designed to be compact and efficient for large scale data analysis.

Regrid Parquet Files

  • Our parquet files are generated via PyArrow using the zstd compression option.
  • In addition to our schema columns, Parquet format files have one additional column in the parquet file schema: custom_column_json. This column contains any extra data columns the county provides to us, packaged in a json object with the custom column names as keys. Because each county sends different extra data columns, there is no set schema for the json object itself, and it will vary county by county.
  • Geometries are delivered in Well-known Text (WKT) format in a column called wkt.
  • Nationwide Premium tier clients will find the 'parquet' download directory in their downloads directory.

Additional Information

The following links are also recommended for more introductory information on the parquet file format:

If you have any questions about our parquet files, please contact tech@regrid.com.

In this section