A few weeks ago I was struggling with fixed width text files. Remember 1995? Me neither, but I’m living it every day. I happened upon a stackoverflow.com post that dealt with using ADO for this purpose. I don’t remember the post, so no link. It did, however, lead me to this MSDN article, which was very helpful.
If you haven’t been following along, I wrote a post about creating a sample fixed-width file and one about importing said file. This post is about importing that file with ADO. You may remember ADO from such database objects as Connection, Command, and Recordset. But, like me, you may never have considered using to import text files; or even new that it could.
Before I get into the specifics, there are two things that endeared me to this method. First, it allows me to only import the data I want and easily ignores headers, totals, and other non-record data. Second, it’s super fast. I had a huge text file that took several minutes to read using the Input$ function. I got it down to 90 seconds using ADO. Ninety seconds still stinks, but it beats having to get a coffee every time the code runs. The bonus third reason I love ADO is that I can replicate it for different text files easily. Usually, fixed width text files are not arranged as normalized data, so there are some challenges. But I went from setting this up for one specific report to a dozen reports very quickly. Let’s see how it’s done.
Go read the MSDN article if you want the specifics, but basically you need a file called Schema.ini that tells ADO what’s what in your file. I don’t take a crap without a class module, so we’ll be using class modules here. Didn’t this use to be a family friendly blog?
If you couldn’t tell, the text file in question contains transactions from a general ledger. That means I need a Transaction class to hold each of them. Using the column headers from the file, I create a CTransaction class module.
Public TransactionID As Long
Public Entry As String
Public Period As Long
Public PostDate As Date
Public GLAccount As String
Public Description As String
Public Srce As String
Public Cflow As Boolean
Public Ref As String
Public Post As Boolean
Public Debit As Double
Public Credit As Double
Public Alloc As Boolean
My VBHelpers add-in quickly converts those to properties and creates a CTransactions parent class. Next, I create an MEntryPoints standard module and insert the following code.
Public Sub ImportGLTransactions()
Dim clsTransactions As CTransactions
Dim sh As Worksheet
Dim sFile As String
sFile = Application.GetOpenFilename("*.txt, *.txt")
If sFile <> "False" Then
Set clsTransactions = New CTransactions
Set sh = Workbooks.Add.Sheets(1)
I don’t have a FillFromFile method or a WriteToRange method, but I like to write my main procedure as if I already had those. You’ll need a reference to Microsoft ActiveX Data Objects 2.8 Library (although any version close to that will do). The FillFromFile method is pretty simple. It creates and ADO Connection and an ADO Recordset, then loops through the recordset adding CTransaction instances as it reads them in. It’s treating our text file as if it’s a database with field names and everything.
Public Sub FillFromFile(ByVal sFile As String)
Dim adCn As ADODB.Connection, adRs As ADODB.Recordset
Dim vaConn As Variant, aSql(1 To 4) As String
Dim sPATH As String
Dim clsTransaction As CTransaction
sPATH = Replace$(sFile, Dir$(sFile), vbNullString)
'We'll talk about this line later
MakeSchema sFile, sPATH, Me.Schema
'Create a connection string and SQL statement
vaConn = GetConnectionString(sPATH)
aSql(1) = "SELECT"
aSql(2) = Join(Me.Columns, ",")
aSql(3) = "FROM [" & Dir$(sFile) & "]"
aSql(4) = "WHERE PostDate Like ""[0-1][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]"""
'Open the connection and the recordset
Set adCn = New ADODB.Connection
adCn.Open Join(vaConn, ";")
Set adRs = New ADODB.Recordset
adRs.Open Join(aSql, Space(1)), adCn, adOpenStatic, adLockReadOnly, adCmdText
'Loop through the rs and create CTransaction instances
If Not adRs.BOF And Not adRs.EOF Then
Do While Not adRs.EOF
Set clsTransaction = New CTransaction
We’ll hold off on how to create the Schema file for now. The Connection string is created with this little utility. You pass in the path and returns an array of strings ready to be joined.
Function GetConnectionString(ByVal sPATH As String) As Variant
Dim aConn(1 To 3) As String
aConn(1) = "Provider=Microsoft.Jet.OLEDB.4.0"
aConn(2) = "Data Source=" & sPATH
aConn(3) = "Extended Properties=""text;HDR=No;FMT=FixedLength"""
GetConnectionString = aConn
I’ve recently starting using arrays and Joins to concatenate strings of any length. I find it makes the code much more readable and manageable once you get used to it. Let’s talk about that SQL statement. In Schema.ini, I’ve defined column names and column widths. We’ll look at it in a moment. The SQL statement selects all the columns from the text file based on some criteria. The first section of the SQL statement is the SELECT keyword. For the second section, I have a property that returns an array of columns. You could just as easily use “SELECT * FROM”, but I was recently shown the benefit of following the never-select-astrisk rule, so I’m trying to be good.
Public Property Get Columns() As Variant
Columns = Array("Entry", "Period", "PostDate", "GLAccount", "Description", "Srce", "Cflow", "Ref", "Post", "Debit", "Credit", "Alloc")
Just an array of column names used in the SQL statment and in Schema.ini. The third section of the SQL statement is the FROM keyword followed by the file name in brackets. The Dir$ function strips the path out of the fullname and returns only the file name. You don’t need the path in the SQL statement because Schema.ini is in the same directory as the text file. It has to be, so it’s not looking anywhere else.
The final section of the SQL statement is the WHERE clause. This is where you have to get a little creative. As I scan down my text file, I need to find some characteristic of “good” rows that is not present in “bad” rows. For this example, it was pretty easy. Every row that I want has a real date in PostDate and every row that I don’t want doesn’t. They aren’t all that easy. Would you like to see some examples of WHERE clauses I’ve used? Well, would you?
aSql(4) = "WHERE (Tax Like ""__/__/____"" And Not(IsNull(Vendor))) Or Account Like ""[1-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]"""
aSql(4) = "WHERE PaymentDate Like ""__/__/____"" And Not DiscountAmount Like ""%[ ][ ].00"""
aSql(4) = "WHERE PostDate Like ""__/__/____"" Or GLAccount Like ""[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]"""
aSql(4) = "WHERE InvoiceDate Like ""__/__/____"" AND ItemID <> ""Invoice Totals"""
aSql(4) = "WHERE Kitted = ""YES"" Or Kitted = ""NO"""
aSql(4) = "WHERE TranDate Like ""__/__/____"" Or (LotNo Like ""[A-Z][A-Z]%"" And TranDate Like "" [A-Z]%"" And Not LotNo Like ""Item ID%"")"
Some of those are pretty gnarly. Our example file has lots of options, not just PostDate. We have three Yes/No fields and we could use any of those. It’s pretty unlikely that header information or totals rows are going to have a Yes or No in that same position. The idea is only get the rows you want. For some of the dates I used “__/__/____” and I think it’s pretty safe. But for this example, I used “[0-1][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]” which is a little more specific.
With a good connection string and SQL statement, I open the connection, open the recordset, and start looping. There’s not much to filling the CTransaction class via the FillFromRecordset method. It uses an Nz function that I wrote to avoid Null problems and return a default. For numbers and dates, I specify that I want to return a zero in place of a Null. For strings, it automatically returns vbNullString. The FillFromRecordset procedure is a method in CTransaction. The Nz function is in a standard module.
Public Sub FillFromRecordset(ByRef adRs As ADODB.Recordset)
Me.Entry = Nz(adRs.Fields("Entry"))
Me.Period = Nz(adRs.Fields("Period"), 0)
Me.PostDate = Nz(adRs.Fields("PostDate"), 0)
Me.GLAccount = Nz(adRs.Fields("GLAccount"))
Me.Description = Nz(adRs.Fields("Description"))
Me.Srce = Nz(adRs.Fields("Srce"))
Me.Cflow = Nz(adRs.Fields("Cflow")) = "Yes"
Me.Ref = Nz(adRs.Fields("Ref"))
Me.Post = Nz(adRs.Fields("Post")) = "Yes"
Me.Debit = Nz(adRs.Fields("Debit"), 0)
Me.Credit = Nz(adRs.Fields("Credit"), 0)
Me.Alloc = Nz(adRs.Fields("Alloc")) = "Yes"
Function Nz(fldTest As ADODB.Field, _
Optional vDefault As Variant) As Variant
If IsNull(fldTest.Value) Then
If IsMissing(vDefault) Then
Select Case fldTest.Type
Case adBSTR, adGUID, adChar, adWChar, adVarChar, adVarWChar
Nz = vbNullString
Nz = 0
Nz = vDefault
Nz = fldTest.Value
Now on to Schema.ini, at long last. I have a MakeSchema procedure in a standard module that simply creates the file where it’s supposed to. One of the arguments to MakeSchema is a string for the contents of the file. That comes from the Schema property of the CTranscations class (shown as Me.Schema in the FillFromFile method above). The Schema property takes the columns from the Columns property and puts them together with column widths to create the string.
Public Property Get Schema() As String
Dim aReturn() As String
Dim vaNames As Variant
Dim vaWidths As Variant
Dim i As Long
vaNames = Me.Columns
vaWidths = Array(8, 4, 12, 13, 27, 5, 4, 10, 4, 19, 22, 4)
ReDim aReturn(LBound(vaNames) To UBound(vaNames))
For i = LBound(vaNames) To UBound(vaNames)
aReturn(i) = "Col" & i + 1 & "=" & vaNames(i) & Space(1) & "Text Width" & Space(1) & vaWidths(i)
Schema = Join(aReturn, vbNewLine)
The widths array is simply how many characters wide each column is. The lines in my file are 132 characters long. Counting them is a pain. Usually, I grab a couple of representative lines from the text file and put them in a spreadsheet. Here’s how I came up with the column widths for this file.
Lines 2-5 are Courier New 9pt and the first two lines are typed – no fancy formula to get those numbers. I like to get a couple of representative lines so I don’t miss anything. Then I go put pipes where I want the column breaks to be and put this formula in A7
and fill down until I get an error. Column B is just the difference. Finally, the MakeSchema utility takes that string and puts into a file.
Public Sub MakeSchema(ByVal sFile As String, ByVal sPATH As String, ByVal sCols As String)
Dim lFile As Long
Dim aWrite(1 To 4) As String
Const sSCHEMA As String = "Schema.ini"
aWrite(1) = "[" & Dir$(sFile) & "]"
aWrite(2) = "Format=FixedLength"
aWrite(3) = vbNullString
aWrite(4) = sCols
lFile = FreeFile
Open sPATH & sSCHEMA For Output As lFile
Print #lFile, Join(aWrite, vbNewLine)
And the Schema.ini file looks like this:
The final piece is writing all of the CTransaction objects to a range. The CTransactions collection class has a WriteToRange method that calls an OutputRange property. It’s pretty straightforward.
Public Sub WriteToRange(rStart As Range)
Dim vaWrite As Variant
vaWrite = Me.OutputRange
rStart.Resize(UBound(vaWrite, 1), UBound(vaWrite, 2)).Value = vaWrite
Public Property Get OutputRange() As Variant
Dim aReturn() As Variant
Dim clsTransaction As CTransaction
Dim lCnt As Long
Dim vaHead As Variant
Dim i As Long
ReDim aReturn(1 To Me.Count + 1, 1 To 12)
vaHead = Me.Columns
lCnt = lCnt + 1
For i = LBound(vaHead) To UBound(vaHead)
aReturn(lCnt, i + 1) = vaHead(i)
For Each clsTransaction In Me
lCnt = lCnt + 1
aReturn(lCnt, 1) = "'" & .Entry
aReturn(lCnt, 2) = .Period
aReturn(lCnt, 3) = .PostDate
aReturn(lCnt, 4) = "'" & .GLAccount
aReturn(lCnt, 5) = "'" & .Description
aReturn(lCnt, 6) = "'" & .Srce
aReturn(lCnt, 7) = .Cflow
aReturn(lCnt, 8) = "'" & .Ref
aReturn(lCnt, 9) = .Post
aReturn(lCnt, 10) = .Debit
aReturn(lCnt, 11) = .Credit
aReturn(lCnt, 12) = .Alloc
OutputRange = aReturn
I put apostrophes in front of the strings so they don’t get converted just because they look like a number or a date. And when I’m done, I sum up the Debit column and compare it to the text file.
It’s a match! I must have done something right. No comments for you accountants about how my debits don’t equal my credits. You try to make fake data this internally consistent.
You can download FixedWidthADO.zip