Using FME for SFTP

You can use the FME suite to automate downloading and ingesting Regrid data from the Regrid SFTP into the database of your choice.

Downloading a single file

In order to download data from Regrid’s SFTP source, open a new Workspace in FME and add an FTPCaller transformer:
FME.png

Now that you have the FTPCaller transformer in your workspace, click the gear icon on the transformer to set your preferences.

  • For the URL input enter the path to the file you want to download, for instance if we wanted to download Mono County, California’s parcels you would enter:
sftp://sftp.regrid.com/parcels/shapefile/ca_mono.shp.zip
  • For Transfer Type the main options we are interested in are Download to Attribute and Download to a File, since you should not be able to upload or delete files
    • Download to An Attribute allows you to use the downloaded data directly in your work flow
    • Download to a File will allow you to download the file to specific path and then get that download path in the _downloaded_file_path attribute
  • In the Authentication section change the Authentication Type to Password then fill in the Username and Password fields with the credentials supplied by Regrid

FME.png

In the above screenshot we’ve chosen to Download to a file saved on disk. Once that is in place, we can read the data in the file using the FeatureReader transformer.

FME.png Note: You can read some data directly from a zip file in FME without first decompressing the file if you know the name of the file inside of the archive.

FME.png

Downloading multiple files

It’s also possible to download multiple files in your Workspace. You can accomplish this by creating a CSV or other tabular file type and reading it.

FME.png Like in the above image we simply added the File Names and the URLs of the files we want to retrieve under simple headings and then saved the file as a CSV.

In FME, we add a CSV reader and point it at the CSV file.
FME.png
Next, connect the CSV Reader to the FTPCaller Transformer and in the FTPCaller Transformer set the URL property to the name of the column that contains the files you would like to download. In the Target File input, open a Text Editor, write your path and add @Value(file_name).
FME.png
Then, in the FeatureReader transformer, change the Dataset use the Text Editor to create a the dataset location and read the zipped file. In the Text Editor we used @Value(_downloaded_file_path)\@ReplaceString(@Value(file_name),.zip,,caseSensitive=TRUE) to read directly from the downloaded zipped files.
FME.png

Finally, if you use the Run To on the FTPCaller before you populate the information in your FeatureReader you can click on Output - Output Ports - Specified and select your downloaded datasets. Select all of the zipped files in that location and another small window will appear. In that window, place a check next to all of the ports you want to create.

FME.png

Then, from each port you can transform and load your data that best suits your needs.

Regrid offers a verse file that contains up-to-date metadata about the files offered. The verse file can be found on the sftp site in multiple formats in the parcels directory. We can use this file to get only the latest version of the data that we need.

FME.png

In this example we’re grabbing the verse.csv.zip file from the sftp site using the FTPCaller Transformer, and downloading it to a file. Then, we read the verse.csv in the FeatureReader Transformer.

If we only want data that has been updated since the last time we ran the workbench, we can use a CSV or text file to store that information and read it on subsequent runs. At the end of the process, we can update the CSV file with the current date. On the workbenches first run, we can create the CSV by opening a text editor like Notepad adding:

last_run  
20241101

And then saving it with File - SaveAs…, change the Save as type field to All files (*.*) and setting the File name to whatever you will remember with the .csv extension.

FME.png

This is accomplished using the pattern in the above image. We setup the Sampler with a Sample Rate (N) of 1, Sampling Type of Last N Features, and setting Randomize Sampling to No. Then, we use a DateTimeStamper to generate a new timestamp by setting the Type to Date and the Time Zone to Local. Finally, we write the output to the CSV we read at the beginning of the workbench the next time we run it.

FME.png

Back at the beginning of the workbench, we read the CSV with the date of the last time we ran the workbench and use an AttributeFilter transformer to make sure a date exists. Next, we use a FeatureMerger transformer to append the last_run date from our CSV to each row of the verse.csv file for comparison in our next step.

FME.png
Next, we use a TestFilter Transformer to compare dates and further filter the data in verse.csv to find any records where last_refresh from verse.csv is newer (>) than last_run from our CSV file. In the TestFilter transformer double-click on the first If row to create a new Test Condition. A new window will pop up and in the Left Value column set it to last_refresh. Set the Operator column to >. Set the Right Value to last_run. If you want to further limit your results you can use the Logic column to add an AND/OR to your statement. In the above example I further limited the output to only Alaska. Finally, name your Output Port.

Next, we add a second FTPCaller. In the URL field we open the Text Editor and add:

sftp://sftp.regrid.com/parcels/shapefile/@Value(filename_stem).shp.zip

FME.png

Using the filename_stem field from the verse.csv to create the file download name. This will loop through all of the filtered rows downloading each file to be used in the rest of your transformation.

FME.png

In this section