Using FME for SFTP
You can use the FME suite to automate downloading and ingesting Regrid data from the Regrid SFTP into the database of your choice.
Downloading a single file
In order to download data from Regrid’s SFTP source, open a new Workspace in FME and add an FTPCaller transformer:
Now that you have the FTPCaller transformer in your workspace, click the gear icon on the transformer to set your preferences.
- For the URL input enter the path to the file you want to download, for instance if we wanted to download Mono County, California’s parcels you would enter:
sftp://sftp.regrid.com/parcels/shapefile/ca_mono.shp.zip
- For Transfer Type the main options we are interested in are Download to Attribute and Download to a File, since you should not be able to upload or delete files
- Download to An Attribute allows you to use the downloaded data directly in your work flow
- Download to a File will allow you to download the file to specific path and then get that download path in the
_downloaded_file_path
attribute
- In the Authentication section change the Authentication Type to Password then fill in the Username and Password fields with the credentials supplied by Regrid
In the above screenshot we’ve chosen to Download to a file saved on disk. Once that is in place, we can read the data in the file using the FeatureReader transformer.
Note: You can read some data directly from a zip file in FME without first decompressing the file if you know the name of the file inside of the archive.
Downloading multiple files
It’s also possible to download multiple files in your Workspace. You can accomplish this by creating a CSV or other tabular file type and reading it.
Like in the above image we simply added the File Names and the URLs of the files we want to retrieve under simple headings and then saved the file as a CSV.
In FME, we add a CSV reader and point it at the CSV file.
Next, connect the CSV Reader to the FTPCaller Transformer and in the FTPCaller Transformer set the URL property to the name of the column that contains the files you would like to download. In the Target File input, open a Text Editor, write your path and add @Value(file_name)
.
Then, in the FeatureReader transformer, change the Dataset use the Text Editor to create a the dataset location and read the zipped file. In the Text Editor we used @Value(_downloaded_file_path)\@ReplaceString(@Value(file_name),.zip,,caseSensitive=TRUE)
to read directly from the downloaded zipped files.
Finally, if you use the Run To on the FTPCaller before you populate the information in your FeatureReader you can click on Output - Output Ports - Specified and select your downloaded datasets. Select all of the zipped files in that location and another small window will appear. In that window, place a check next to all of the ports you want to create.
Then, from each port you can transform and load your data that best suits your needs.
Using the Verse file to automatically import updated files (recommended)
Regrid offers a verse file that contains up-to-date metadata about the files offered. The verse file can be found on the sftp site in multiple formats in the parcels directory. We can use this file to get only the latest version of the data that we need.
In this example we’re grabbing the verse.csv.zip file from the sftp site using the FTPCaller Transformer, and downloading it to a file. Then, we read the verse.csv in the FeatureReader Transformer.
If we only want data that has been updated since the last time we ran the workbench, we can use a CSV or text file to store that information and read it on subsequent runs. At the end of the process, we can update the CSV file with the current date. On the workbenches first run, we can create the CSV by opening a text editor like Notepad adding:
last_run
20241101
And then saving it with File - SaveAs…, change the Save as type field to All files (*.*
) and setting the File name to whatever you will remember with the .csv extension.
This is accomplished using the pattern in the above image. We setup the Sampler with a Sample Rate (N) of 1, Sampling Type of Last N Features, and setting Randomize Sampling to No. Then, we use a DateTimeStamper to generate a new timestamp by setting the Type to Date and the Time Zone to Local. Finally, we write the output to the CSV we read at the beginning of the workbench the next time we run it.
Back at the beginning of the workbench, we read the CSV with the date of the last time we ran the workbench and use an AttributeFilter transformer to make sure a date exists. Next, we use a FeatureMerger transformer to append the last_run
date from our CSV to each row of the verse.csv file for comparison in our next step.
Next, we use a TestFilter Transformer to compare dates and further filter the data in verse.csv to find any records where last_refresh from verse.csv is newer (>
) than last_run
from our CSV file. In the TestFilter transformer double-click on the first If row to create a new Test Condition. A new window will pop up and in the Left Value column set it to last_refresh
. Set the Operator column to >
. Set the Right Value to last_run
. If you want to further limit your results you can use the Logic column to add an AND/OR to your statement. In the above example I further limited the output to only Alaska. Finally, name your Output Port.
Next, we add a second FTPCaller. In the URL field we open the Text Editor and add:
sftp://sftp.regrid.com/parcels/shapefile/@Value(filename_stem).shp.zip
Using the filename_stem
field from the verse.csv to create the file download name. This will loop through all of the filtered rows downloading each file to be used in the rest of your transformation.