08 January, 2013

Copy Files From an FTP Server Pt. 1

Recently I was posed with a dilemma. We are using a 3rd party application for taking data from a file, scrubbing, quality checking and finally pushing it into a database for use by our main product. This data comes from different sources and is dropped onto an FTP server. The 3rd party application could technically hand that as it can function “on file drop”. There are, however, two limitations to the 3rd party app’s functionality.

  • It does not recognize that a file has changed. Only that a new file has arrived.
  • The 3rd party app runs on a server that is on a different network than the FTP server is on.

With these issues in mind, my task was to make it work, and make it work on file drop.

The solution I arrived at was to create a console application that will poll the FTP server, and if a file drop has occurred, copy that file to a location where the 3rd party service can see it. I should note that there are some 500 clients each with their own file drop. Some clients are migrated to using the new 3rd party application while others still are on the old, and rather inefficient way of importing this data.

First, I need some static properties. Note the RE_GEX constant, this will allow me to parse the string returned from the FTP server directly listing call. I got this expression from a Bing search, not sure where I got it from, but it does work.

static string RootFtpUri;
static string FtpDestinationRoot;
const string RE_GEX = @"^([d-])([rwxt-]{3}){3}\s+\d{1,}\s+.*?(\d{1,})\s+(\w+\s+\d{1,2}\s+(?:\d{4})?)(\d{1,2}:\d{2})?\s+(.+?)\s?$";
static Regex _regex;
static NetworkCredential Credentials;
static string LogFilePath;



I setup the properties:




FtpDestinationRoot = ConfigurationManager.AppSettings["FtpDestinationRoot"];
var di = new DirectoryInfo(FtpDestinationRoot);
if (!di.Exists) di.Create();
RootFtpUri = ConfigurationManager.AppSettings["RootFtpUri"];
Credentials = new NetworkCredential(ConfigurationManager.AppSettings["FtpUsername"], ConfigurationManager.AppSettings["FtpPass"]);
_regex = new Regex(RE_GEX, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);



Of course, I use the AppSettings from the app.config file to hold all the pertinent information just in case something changes, like the FPT server name, or the login credentials required. This allows me to make those changes without having to re-build and re-deploy the application.



 



The data for each dealer, their FTP username, drop directory and the name of the dropped file is stored in a MSSQL database. My application then, polls the database to see which files I would need to copy over to the new location.  I have a Records class which holds the required data from the SQL database. I put that into a list.



 




var records = (from d in dt.AsEnumerable()
select new FTPRecord
{
FTPServerAddress = d.Field<string>("FTPServerAddress"),
FTPServerUsername = d.Field<string>("FTPServerUsername"),
FTPServerPassword = d.Field<string>("FTPServerPassword"),
FTPServerFilePath = ParseFPTServerFilePath(d.Field<string>("FTPServerFilePath")),
DMSTypeID = d.Field<Int16>("DMSTypeID"),
Status = d.Field<string>("Status"),
RecordType = d.Field<string>("RecordType"),
NextScheduledAttempt = d.Field<DateTime?>("NextScheduledAttempt")
}).ToList();



The call to the method “ParseFTPServerFilePath” merely strips away any preceding slashes that sometimes get put in when the data is created.



Then it goes to the FTP server and looks for the file, if it exists, I copy the file over. But wait, remember how the 3rd party app only sees new files and not changed files? That process doesn’t work, so I first delete the existing file in the new location before copying the new file from the FTP server.



We have to setup the required stuff for accessing the FTP server, as well as where we are going to put the file. If the destination directory does not exist, we of course have to create it or the whole thing will blow up – and not in a spectacular way so it’s really just a waste of time and effort not to do it.




var destinationFilePath = string.Format("{0}\\{1}\\{2}", FtpDestinationRoot, r.FTPServerUsername, r.FTPServerFilePath.Replace('/', '\\'));
var fi = new FileInfo(destinationFilePath);
var di = new DirectoryInfo(fi.DirectoryName);
if (!di.Exists) di.Create();
var ftpFilePath = string.Format("{0}/{1}/{2}", RootFtpUri, r.FTPServerUsername, r.FTPServerFilePath);
var request = (FtpWebRequest)WebRequest.Create(ftpFilePath);
request.Credentials = Credentials;
request.Method = WebRequestMethods.Ftp.ListDirectoryDetails;
var line = string.Empty;



Then connect and grab the directory listing from the FTP server. This comes back in the standard UNIX format which looks something like:



-rw-r--r-- 1 ftp ftp        2107325 Jan 08 00:24 INV.CSV



The RegEx takes each line and chunks it into groups, I check to see if the first group is a “d” – this signifies it is a directory – which I bypass and go to the next line. I also check to see if the 6th group is a dot or double dot (“.” or “..”) these are not real files so I pass that line up as well. Now, if the first group is not “d” and the 6th group is not “.” or “..” that means I have a file! W00t!



The RegEx spits out groups in the following format:



comp.Groups[0]    {-rw-r--r-- 1 ftp ftp        2107325 Jan 08 00:24 INV.CSV}
comp.Groups[1] {-}
comp.Groups[2] {r--}
comp.Groups[3] {2107325}
comp.Groups[4] {Jan 08 }
comp.Groups[5] {00:24}
comp.Groups[6] {INV.CSV}



Group 0 is the actual directory entry string, group 3 is the size, 4 & 5 are the date and time and 6 is the file name.




using (var response = (FtpWebResponse)request.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
using (var sr = new StreamReader(responseStream))
{
while ((line = sr.ReadLine()) != null)
{
var comp = _regex.Match(line);
if (comp.Groups[1].ToString() == "d") continue;
if (comp.Groups[6].ToString() == "." || comp.Groups[6].ToString() == "..") continue;
if (fi.Exists)
{
var ftpSize = NullableParser.ParseInt64(comp.Groups[3].ToString());
var ftpLastDate = NullableParser.ParseDateTime(comp.Groups[4].ToString());
if (ftpLastDate > DateTime.Now) return; // File is actually from last year (or older) but FTP only returns month/day and .NET assumes current year.
if (r.NextScheduledAttempt == null || ftpSize != fi.Length || ftpLastDate > fi.LastWriteTime)
{
fi.Delete();
Thread.Sleep(30000);
fi = new FileInfo(destinationFilePath);
}
}
}
}
}
}



 



Then comes another problem, what if the file on the FTP server hasn’t changed? I don’t want to unnecessarily copy files that haven’t changed and cause the import process to kick off if it isn’t needed. So I check first for the size of the file, if the size is different, I delete the existing and copy the new. If the file is the same, I then check the date on the file, if the date of the file on the FTP server is newer than the one on the network location, I delete and copy.



If there is no file on the destination location, I copy it from FTP without further ado. This affords us a manner in which we can re-import the data if need be for whatever reason. We simply go to the network location and delete the file and it will be re-copied and kick off the import process.



Next time, I’ll go through creating the Windows Service that runs this thing.