Parcel Data Onboarding and FAQ
Documentation and Frequently Asked Questions
Where does your county data come from?
We source our data directly from each county or whom they designate as the official source for their parcel data.
How do you standardize county data generally?
The main way we make county data much easier to work with is by standardizing the column names of the raw data provided by each county. We do not standardize the values in most columns, most we keep those exactly as provided by the county, but we do make sure that every county in our system is converted to a standard table schema, with consistent column names across the nationwide dataset. Please see “What is the Regrid Parcel Schema?” question below.
In addition, we further standardize the parcel address fields using the US Postal Service database of addresses. For details on address specific standardization please see the "How do you standardize and normalize addresses?" question below.
How do you clean parcel geometries?
We rely on authoritative data from county and county designated sources for our data. We seek to minimize unwarranted changes that would inappropriately modify this source data. That said, we take accuracy of the data seriously, including the accuracy of the spatial alignment of the data. To that end, we have the below processes to ensure the highest level of spatial accuracy of our data.
- We perform a visual inspection to assess the completeness of the county.
- Recognizing that imagery can itself contain spatial inaccuracies; we do not manually align parcels to imagery.
- If large scale shifts, skews, or rotations are discovered we work back with the source for them to correct, or we seek alternative source.
- We use standard geospatial functions to correct common errors in the individual parcel polygons and remove any polygons that can not be made valid in that process.
- We also remove some slivers and overly large or detailed polygons like road sets, water or wetlands that have proven to cause issues for many users.
How do you deliver bulk data?
All bulk data is provided via SFTP as zip files of each county in the format of your choice (GeoJSON, NDGeoJSON, SQL, Shapefile, FileGDB, GeoPackage, KML, CSV), using a pull model. We organize things on a county by county basis using the county’s FIPS code (geoid in our Regrid Parcel Schema column).
What is the Regrid Parcel Schema?
We standardize column names for easy access across counties in our nationwide dataset into our Regrid Parcel Schema for tables.
We are currently at version 6 of our Regrid Parcel Schema and it standardizes the column names for approximately 80 county provided data columns, and 25 Regrid provided data columns. This schema is applied to 100% of our dataset. A data dictionary is available: https://docs.google.com/spreadsheets/d/14RcBKyiEGa7q-SR0rFnDHVcovb9uegPJ3sfb3WlNPc0/edit#gid=1010834424
Why do your parcel value fields ('parval') not look the way I expect?
Our parcel value related fields are all directly from the county assessor's data. We populate as directly from the assessor attributes as possible our 'improvements value' (improvval), 'land value' (landval), 'parcel value' (parval), 'ag value' (agval) and 'parcel valuation method' (parvaltype) attributes. However, while those are the most common value related attributes, every county has their own definition for those attributes and their own methods for how they calculate, record and display amounts for tax purposes. We can not answer questions about why the county records the values in those attributes. We suggest visiting the county's website or calling the assessor's office directly to better understand those values. If, after contacting the county it appears we have an error in what values we have in those attribute fields, please send an email to firstname.lastname@example.org and we will review our data asap.
Why do your parcel numbers (apn, pin, etc) not look the way I expect or are null?
County assessor parcel identification numbers ('parcelnumb') are well known for being complicated and often have variable punctuation or zeros (0) that can affect searching or matching by parcel id number. Counties do occasionally change their method for generating or assigning parcel numbers and that can lead to "new" and "old" parcel number situations. We always retain any identification number attributes as 'County Custom Columns' so it should be possible to match up our data with county data directly, even if our parcel number field is not the only identification number used by the county.
Also, sometimes a State GIS source will add their own unique id to a local parcel id number.
We suggest visiting the county's website or calling the assessor's office directly to better understand their parcel numbering system. If, after contacting the county it appears we have an error in what values we have in our parcel id attribute field, please send an email to email@example.com and we will review our data asap.
Null and duplicated parcel numbers are common across the US. Counties do it for a variety of reasons, but the most common reason is they do not bother with parcel numbers for non taxable parcels like city parks, rights of way, etc. However, condos, timeshares, mineral or air rights, unassigned rights etc, all might use duplicated or null parcel numbers and the county uses some other attribute, combination of attributes or system for tracking them.
Do you have a specific attribute for a specific county?
A current, detailed list of every county in our data set and what data fields we have for each is always available at the following URL. This spreadsheet can be downloaded as a CSV for closer analysis: https://docs.google.com/spreadsheets/d/1q0PZB72nO8935EMGmsh3864VjEAMUE-pdHcPkoAiS5c/
Why do my files have columns not in the Regrid Parcel Schema?
Every county has our schema columns, so any code or process can rely on those columns (also called attributes in GIS software) being present. However, most counties also provide attributes that do not map into one of our Regrid Parcel Schema columns, so we keep those attributes and include them on the end of our schema columns using whatever name is provided by the county. In many tools you can control what attributes are retained during import or merging, so we suggest folks working across multiple counties just keep the schema columns or a sub set of them and leave what we call "custom columns" off entirely to get a uniform set of columns. Going back and reviewing those custom columns can reveal interesting information provided by the counties, but not provided very often in other counties.
Nationwide dataset clients also have access to a 'schema-columns-only' directory by default, which will not have any of the county custom columns and will always just be the columns listed in our current parcel schema. If you do not receive the full nationwide dataset, please just let us know and we can provide you a 'schema-columns-only' version of the counties you receive. Regrid Parcel Schema and technical data dictionary
How can I explore the custom columns for each county?
We work with all of our counties in a PostgreSQL database, each county in its own table. That makes managing the custom columns from each county much easier. Most database servers provide a way to search the column names of the tables in a database. For example, in Postgres you would do it this way:
SELECT table_name, column_name FROM information_schema.columns WHERE table_schema = public and column_name ~ juri order BY table_name, column_name;
Also, directly browsing the data on a place-by-place process of areas or regions you are interested in can be very useful. DBeaver is a cross platform, multi vendor database client that can render geographic data.
Why do Shapefile attribute names not match the Regrid Parcel Schema column names?
Some of our Regrid Parcel Schema column names are longer than the Esri Shapefile format allows and the column names in your attribute table will be truncated to the first 10 characters of the Regrid Parcel Schema column names.
When was your data last updated?
On average 94% of our parcels have been refreshed in the last 12 months, with most of those in the last 6 months. We work on a rolling update schedule, refreshing county data directly for 100 - 300 counties per month, usually grouped by state.
Monthly we share, both in machine readable format and via a monthly update email, what counties have been updated, and what states are in the pipeline for the upcoming quarter. An example of our Monthly Data Update Email is available.
Our USPS related attributes are updated monthly for our entire data set.
All data is tracked with the date of “last refresh” from the county.
A detailed listing of every county in our data set and the date we last refreshed directly from the county is always available. This spreadsheet can be downloaded as a CSV for closer analysis: https://docs.google.com/spreadsheets/d/1q0PZB72nO8935EMGmsh3864VjEAMUE-pdHcPkoAiS5c/
How do you provide data updates?
Twice a month we make available to every client in their download directories the counties we have refreshed with data directly from our county sources, usually 100 - 200 counties every two weeks.
Once a month, premium client download directories are updated with every county in the dataset, because every county's USPS related attributes have been updated for the month.
Quarterly, every data tier will have every county exported. We make improvements to the nationwide data set on a regular basis, often based on feedback from clients. Quarterly, full data set exports ensure every data tier has all of the improvements across the data set, even if the data was not refreshed directly from the county that quarter. We are working to provide an easily shareable changelog for data set wide fixes, but currently give a notice to significant updates in our monthly data update emails.
Each time we refresh a county from the county source, we replace the existing county zip file in a client's download directory with the refreshed county files. We do not provide 'diff' files or deltas for the parcel data at this time.
All bulk data is provided via SFTP as zip files of each county in the format of your choice (GeoJSON, NDGeoJSON, SQL, Shapefile, FileGDB, GeoPackage, KML, CSV, Parquet), using a pull model. We organize things on a county by county basis using the county's FIPS code (
geoid in our Regrid Parcel Schema column). We add or refresh 100 - 300 counties every month with data from the county source.
We send out an email each month listing the counties that were refreshed, as well as an updated listing of all the counties and their
last_refresh date, which is the last date we updated the data directly from the source.
We also provide a CSV file of our VERSE table that lists the
last_refresh date for each county.
On each parcel we provide a
ll_uuid number that permanently and uniquely identifies each parcel across data refreshes and updates. The
ll_uuid can be used to match any locally stored parcels with updated parcels in the county file.
ll_uuid is tied to the county provided parcel numbers. Parcel revisions like splits and other modifications that result in new parcel numbers, will get new
ll_uuid's. When the county retires a parcel number, we also retire our ll_uuid for that parcel.
We also make improvements to the data that do not come directly from the county, like standardizing addresses, cleaning out non-standard characters, adding additional data attributes, etc. We generally re-export our full dataset quarterly to reflect those changes. Improvements Regrid makes to the data do NOT affect the
last_refresh of a county, that is always the date we last pulled data directly from the county source. Notices of full dataset exports are also included in the monthly email update sent to all clients.
How do I keep my data up-to-date?
We provide our
verse tracking table as a csv updated monthly. It has every county in the US's state, county name, state+county fips (we call
geoid), date we last pulled directly from the source (
last_refresh column), and a
filename_stem column that indicates the file name with no format .extension or .zip, just the basename of the file. Note: null in our verse
table_name column means the county is not in our dataset.
Everyone's software stack and internal systems' environment are different, but generally we think the outline of steps is as follows:
- Pull a copy of our
versetable from our SFTP server
- Use the
last_refreshdate in our
versetable to determine which county or counties you need to update
- Use the
filename_stemfield in our
versetable to pull the needed county or counties via SFTP from our server
- Once you have the file(s) locally, import it/them into a database (we use PostgreSQL/PostGIS, GDAL and ogr2ogr for this last step internally)
We do not track changes at any level below the county level, but with our
ll_uuid it should be possible to determine if you need to update a row in your database if you do not want to replace the whole county.
ll_uuid should be used if you need a permanent, nationwide unique id for a parcel, and to match up with data you might attach to our parcel data in your usage.
If your workflow is easier with a set date every month to pull updates, we suggest the 7th of every month is the best date to configure for that process. Due to the variability in some of our monthly processes, the exact date we have completed the monthly process is somewhat variable, but we always expect a month's processing to be fully completed and available in your download directories by the 7th day of the following month.
How do you standardize and normalize addresses?
The county-provided parcel address data is transformed to populate the typical address fields in use, address line 1, address line 2, city, state, zip. Basic text cleaning is done at that same time to remove special or non-printing characters.
Where counties have not provided full addresses for parcels, we use US Census data to fill in missing zip codes and cities.
Parcel addresses are then passed through a USPS address verification system which further normalizes and standardizes the address to USPS guidelines where a match was made against the USPS database of addresses. Non matching addresses are left as they were sent by the county.
The original county-provided raw address data is always retained unaltered and included on the parcel record in the
What software can I use to work with your data?
Editing or working with most of our data requires software for working with geographic and geospatial data The OSGeo project provides free and open source desktop software to work this kind of data called QGIS.
There are other free and paid software options for working with geospatial data, but all of it has a learning curve. We are not able to provide support for those 3rd party software applications. For QGIS at least, the best options for learning are via the many community developed tutorial videos and texts.
We suggest starting with the GeoPackage (geoPKG, .gpkg) formatted files to use in QGIS or any geospatial software.
What about Google Earth?
We provide KML/KMZ options for Google Earth and Google Earth Pro, but neither of those applications support editing our data, only viewing the data. If you need to make changes to the data you get from us, you will need a desktop application like QGIS discussed above.
How large is the nationwide dataset?
The nationwide dataset is approximately 400-800 GB uncompressed, varying by file format, storage method, attribute tier, and other factors.
How do you load all of these files into a database?
We generally work in a GNU/Linux environment using the command line. Our internal workflow makes use of the OSGeo Foundation libraries and tools including GDAL/OGR and PostGIS for PostgreSQL. The OSGeo project also provides an MS Windows 10 installer for using the tools on a Windows machine named OSGeo4Win.
ogr2ogr command line tool is the best way to import data into a PostgreSQL/PostGIS or MS SQL Server database.
Below is a typical command line to cycle through a set of GeoDB files and append them to a table in PostgreSQL:
for gdb in /path_to_files/*.gdb ; do echo "$gdb" ; ogr2ogr -f "PostgreSQL" PG:"dbname=dbname" $gdb -progress --config PG_USE_COPY YES -nln public_schema.target_table_name -append ; done
Dealing with custom columns at scale is best accomplished by ignoring them during import. One approach is to use the
-sql option with
ogr2ogr to restrict what columns you load to all of, or a sub-set of, the columns in our Regrid Parcel Schema. We use the
filename_stem value from our
verse table to assign layer names in formats that support layer names. In the below example,
ogr2ogr -f 'PostgreSQL' PG:'dbname=dbname' st_county.gpkg -sql 'select wkb_geometry,geoid,etc from <filename_stem>' -nln st_county_imported
Using ogr2ogr to load parcel data into a MS SQL Server works the same way. Parcel data should use the
geometry data type in MS SQL Server. A good example of how to do that is this blog post by Alastair Aitchison. They also cover installing the OSGeo4Win environment.
An example osgeo4w shell command to load a folder full of geopackages into MS SQL Server looks like this:
for /R %G in (*.gpkg) do ogr2ogr -progress -f "MSSQLSpatial" "MSSQL:server=localhost;database=alabama;trusted_connection=yes" %G -nln "%~nG"
The main item of the command line options are the database connection options. You will have to make sure the user name and password are available and that the client can actually connect to the database and has all the needed permissions. For PostgreSQL on GNU/Linux, there are standard PG_* environmental variables and the .pgpass file for storing credentials that will work with the ogr2ogr commands so they do not have to be included in the command line.
Why do some Shapefiles say '2GB_WARN' in the file name?
The shapefile format itself has a 'soft limit' of 2 gigabytes (GB) of data. 'Soft limit' means it is just a rule of the format: "no data larger than 2GB". There is no technical limit preventing more data being encoded as a shapefile. When we export large counties, our tools inform us the resulting shapefile is over that soft limit.
We can confirm that some software handles 2GB and larger shapefiles just fine (OSGeo tools like QGIS), but some software will just silently ignore attribute data above the 2GB limit (ArcGIS). A sincere thank you to the Regrid client who did a lot of in depth research and testing on this and shared their results with all of us.
Starting in July 2020, we made the following changes to help flag the counties where the data exceeds the 2GB soft limit. Please double check how you are handling these files.
- The filenames themselves will indicate the county generated the 2GB warning on export. A
_2GB_WARNsuffix is added to the file names so you can know just by checking the name.
- We added a column to our
shapefile_size_flagso you can check if a place needs a different format than Shapefile, or to generate a list of places you need to pull the alternate format for.
- Shapefiles that are larger than 2GB are not available through our data store.
We recommend the GeoPackage (GeoPKG) exports for situations where you would normally use a Shapefile. These can be opened by both Open Source and ArcGIS tools. ArcGIS uses ESRI’s "Data Interoperability" extension to work with GeoPackages.
Why do my files not have the
ogc_fid column listed in your parcel record schema?
ogc_fid (Open Geospatial Consortium Feature ID) is a limited, table-by-table unique identifier in our data and is either not used or not supported by several geospatial file formats so is not included in those files. Specifically the following file formats will not include an 'ogc_fid' column: geojson, kml, shp, csv.
Typically when importing data into a database, a unique record/row/feature id will be automatically created if needed.
What about multi unit parcels or parcels with secondary addresses?
Our parcel data packages do not contain the list of every secondary unit on a parcel. If you need the full secondary address list per parcel, please contact us at firstname.lastname@example.org as we do have a separate nationwide, complete Matched Secondary Addresses product available.
Our parcel data attribute packages have the main, primary physical address for a parcel and the mailing address of record for the parcels. These do vary somewhat based on what the county provides, so please review our "Attribute Completeness Report":
What about condominium or cooperative buildings or parcels?
Our data is directly from county GIS and county assessor’s offices around the US and as such there can be some disparity from county to county in how condominium and cooperative buildings/parcels are handled. In many local jurisdictions condominiums and cooperative buildings are assessed and taxed individually by each owner and reflected in the related GIS data, however this is not always the case. In some instances, local authorities will assess and tax condominiums and co-ops collectively and they will be linked to a single GIS Parcel record. It is also possible that while these condominiums and co-ops are assessed and taxed independently on the assessors tax roll, these unique assessment records may not be reflected as a GIS parcel, or they may be reflected as a non-unique parcel geometry (duplicate/stack). Regrid does make an effort to add condo/co-op records from tax and assessor rolls where they are missing from the GIS when possible.
Why are some parcels duplicated or stacked parcels?
Our data is directly from county GIS and county assessor's offices around the US. They are primarily focused on collecting taxes so recording ownership and mailing addresses is their main goal. They often use their software in creative ways to get things recorded so they can track the taxes. For parcels with multiple, individual owners the counties often 'stack' parcels so they can enter different owner and contact information for each owner. The 3 most common ways are:
Identical polygons stacked, exactly the same size parcel, just 4 or 10 or 100 stacked, all exactly on top of each other, with different attributes for the different owners. By far this is the most common way and is widespread around the US.
Puzzle pieces, ground parcels with exact cutouts for the footprint of the building. This is common for downtown buildings. They have no intentional overlap.
Laying condo parcels on top of a ground parcel. This is a polygon the exact size of the building, but instead of 'cut out' of the ground parcels, it just lays on top of the ground parcel, like a stacked solution, but only 2 layers: one big ground parcel, with smaller parcels stacked on the ground parcel, but spread out.
In these cases, our dataset usually does contain all of the addresses associated with the parcel, as each owner's 'parcel' usually has what is considered a primary address.
The vast majority of the counties create unique parcel numbers for each stacked parcel. The unique parcel number is a benefit of the stacked parcel approach. However, some counties will duplicate the parcel number and use a secondary id field for the sub-parcels. We think this is much more rare, and we would retain any secondary parcel numbers as a custom column in our data.
Why do some counties have partial data?
Some counties in our dataset exclude parcels for a variety of reasons. The most common reason is non-taxable state, federal and/or tribal lands. In other places, parcel shapes may not be digitized, or the place may not distribute them. When we know a county excludes some parcels intentionally, we indicate that in the
verse table in the
partial column with the value "partial". No value (null) indicates the county should have all the parcels in the county.
What are your premium schema building footprints attributes?
Our premium schema building footprints attributes include
ll_bldg_footprint_sqft (total building footprint in square feet) and
ll_bldg_count (total number of buildings on the parcel) which are calculated by Regrid using the nationwide building footprints data provided by EarthDefine.
Please note that these fields do not provide the actual building footprint geometries. Our Premium Parcel Data + Matched Building Footprints dataset provides the full spatial dataset of nationwide building footprints joined with our parcel data. See below to learn more about that dataset.
How are these attributes different from the ones included in the matched building footprints schema?
Our Premium Parcel Data + Matched Building Footprints dataset combines 156+ million nationwide building footprints geometries with our 152+ million nationwide parcel data as one solution. The matched buildings schema comes with the building geometries along with relevant buildings data attributes such as building uuid, structure uuid, and much more. See our Matched Buildings Schema. We provide this data with a join table - every building and structure in the buildings dataset is matched with our parcel uuid. If you are interested in learning more about this combined parcel data solution, please contact us at email@example.com.
How do you determine the Regrid Building Count value (
We work with EarthDefine, an industry leading Machine Learning and AI firm that specializes is feature extraction from high-resolution imagery and point clouds, to generate a comprehensive, nationwide, seamless building footprint data set across the US.
We then process that footprint data set with our parcel shapes data set to determine how many buildings are on the parcel and how many square feet of parcel are covered by buildings.
How do you calculate the parcel acres, parcel square feet, and building footprint square feet (
We projected each parcel and building footprint into its UTM Zone SRS, calculated the area in meters and converted that to acres and sqft. This should provide a relatively uniform and consistent value across the US. The
ll_bldg_footprint_sqft value for buildings that cover more than one parcel are only for the portion of the building that is on the single parcel. For example, a 10,000 square foot building footprint might only have 500 square feet of building footprint on a parcel, so that parcel's
ll_bldg_footprint_sqft value would be 500.
What is an API and why might I use one?
API stands for Application Programming Interface, and is for use by software developers to interact directly with our national dataset. They are usually used when someone wants to automate some process that interacts with our national parcel dataset. All APIs require programming work to make them useful for any specific application.
What APIs does Regrid make available?
We provide two different APIs for working with our data. Please remember, all APIs are intended for use by software developers and are not for end users’ direct use.
Tile Map Service (TMS) Layer - This is an interactive, vector and raster rendering of our dataset, for use in web or desktop applications where using the parcel shapes overlaid on other layers is useful. Regrid’s TMS layer is in Mapbox Vector Tile (.mvt) format. https://docs.mapbox.com/vector-tiles/reference/
RESTful Parcel API - This is a typical stateless, client/server based API that supports retrieving parcel shape and meta data using lat / lon coordinates, among other options. Please see Using The Api for a complete reference to the API.
How do I download Regrid Parcel data?
We use the "Secure File Transfer Protocol", also called SFTP. This is supported in most traditional FTP clients and SSH client software.
Both of the clients listed below are multi protocol and also support connecting to services like S3.
MacOS users can use CyberDuck
Regrid Standardized Land Use Codes - Classification
Please visit our Regrid Standardized Land Use Codes specific documentation.
In this section
- Documentation and Frequently Asked Questions
- Where does your county data come from?
- How do you standardize county data generally?
- How do you clean parcel geometries?
- How do you deliver bulk data?
- What is the Regrid Parcel Schema?
- Why do your parcel value fields ('parval') not look the way I expect?
- Why do your parcel numbers (apn, pin, etc) not look the way I expect or are null?
- Do you have a specific attribute for a specific county?
- Why do my files have columns not in the Regrid Parcel Schema?
- How can I explore the custom columns for each county?
- Why do Shapefile attribute names not match the Regrid Parcel Schema column names?
- When was your data last updated?
- How do you provide data updates?
- How do I keep my data up-to-date?
- How do you standardize and normalize addresses?
- What software can I use to work with your data?
- What about Google Earth?
- How large is the nationwide dataset?
- How do you load all of these files into a database?
- Why do some Shapefiles say '2GB_WARN' in the file name?
- Why do my files not have the
ogc_fidcolumn listed in your parcel record schema?
- What about multi unit parcels or parcels with secondary addresses?
- What about condominium or cooperative buildings or parcels?
- Why are some parcels duplicated or stacked parcels?
- Why do some counties have partial data?
- Building Data
- What are your premium schema building footprints attributes?
- How are these attributes different from the ones included in the matched building footprints schema?
- How do you determine the Regrid Building Count value (
- How do you calculate the parcel acres, parcel square feet, and building footprint square feet (
- API Access
- Regrid Standardized Land Use Codes - Classification