Handling Delimeters in CSV Files

When rolling your own csv files, you have to account for commas in the text. Extra commas will create extra columns and you don’t want that. Let’s start with this data:

Two of the names have commas and two don’t. By ignoring those commas, extra columns are created because a comma delimits a column. In this code, no account is made for commas in the text.

Sub MakeCSV1()
   
    Dim rCell As Range
    Dim rRow As Range
    Dim lFnum As Long
    Dim sFname As String
    Dim sText As String
   
    lFnum = FreeFile
    sFname = “C:MyCSV1.csv”
   
    Open sFname For Output As lFnum
   
    For Each rRow In Sheet1.UsedRange.Rows
        For Each rCell In rRow.Cells
            sText = sText & rCell.Text & “,”
        Next rCell
        sText = Left$(sText, Len(sText) – 1) ‘remove comma
       sText = sText & vbNewLine ‘add line break
   Next rRow
   
    Print #lFnum, sText
   
    Close lFnum
   
End Sub

When you open MyCSV1.csv in Excel, you get this:

To keep that name field as one, test for the existence of a comma and enclose the field text in double quotes (Chr$(34) is used here).

Sub MakeCSV2()
   
    Dim rCell As Range
    Dim rRow As Range
    Dim lFnum As Long
    Dim sFname As String
    Dim sText As String
   
    lFnum = FreeFile ‘get availabe file number
   sFname = “C:MyCSV2.csv”
   
    Open sFname For Output As lFnum
   
    For Each rRow In Sheet1.UsedRange.Rows
        For Each rCell In rRow.Cells
            If InStr(1, sText, “,”) > 0 Then
                sText = sText & Chr$(34) & rCell.Text & Chr$(34) & “,”
            Else
                sText = sText & rCell.Text & “,”
            End If
        Next rCell
        sText = Left$(sText, Len(sText) – 1) ‘remove comma
       sText = sText & vbNewLine ‘add line break
   Next rRow
   
    Print #lFnum, sText ‘write to csv file
   
    Close lFnum
   
End Sub

That keeps everything nice and neat.

Posted in Uncategorized

27 thoughts on “Handling Delimeters in CSV Files

  1. Dick, isn’t it easier just to format the names column as text? Then when you save the sheet as a.CSV file, Excel puts the quotes round the names for you, and it loads correctly. No VBA required.

  2. Hi,

    What about the other way around?

    Often data is exported from an application as CSV.

    When I import it into excel, commas in text fields cause the same problem. However, I have no control over the export functionality (it is built into the source application).

    So, is there any way to get around that using VBA (say)? I can’t think of one, since there is no way to programmatically tell a delimiter from puntuation – or is there?

    Has anyone addressed this using VBA?

    Thanks,

    Alan.

  3. More a question then a comment.

    Has anyone ever moved to tab delimited? As we all have issues with commas in .CVS files at some time and I have never seen a spreadsheet with a tab in a string. Admittedly I have never seen a tab delimited file used, just wondering if anyone else had any thoughts.

  4. Hi Dick,

    I haven’t had much cause to work with CSV files so I don’t know why one would roll one’s own. But given that you are, I’d suggest the test Application.WorksheetFunction.IsText(rCell.Value) rather than testing for a comma. The IsText() check would also ‘protect’ fields that contain text that might be misinterpreted as a number (zip codes with a leading zero or a 16 character credit card, etc.)

    I’ve been burnt by code that copied a VBA array into a worksheet for sorting with all sorts of unintended consequences!

  5. Regular expressions are a well-established way to deal with CSV parsing. Even then, you need to take care that you understand by what conventions the file was created, and whether it’s at all likely to change. I frequently get “CSV” file saved from German Excel – since they use commas where we have decimal points, the values are delimited by semicolons. Fun.

    I don’t know why, but I really don’t like that common “concatenate commas then strip off the last one” code style, it just looks clumsy. So I tend to clamber through hoops to get my data into a form where I can call Join().

  6. @Mike Woodhouse
    You wrote:
    I frequently get “CSV” file saved from German Excel – since they use commas where we have decimal points, the values are delimited by semicolons. Fun.

    As I am a german, I can tell you lot of funny stories about commas and semicolons. E.g. our DSC-Software exports the data with . as decimal separator and ; as colum separator. For this reasons I am frequently changing my language settings to avoid problems.

    Here is another approach to export data (tab as delimiter):

    Sub ExportMitPrintVar()
        Dim var As Variant
        Dim row As Long, col As Long

        ‘kein Set verwenden!!!
       var = ActiveSheet.UsedRange
        Open “C:Export.txt” For Output As #1
        For row = 1 To UBound(rng, 1)
            For col = 1 To UBound(rng, 2) – 1
                ‘Daten schreiben und nächste Spalte
               Print #1, Format(var(row, col), “0.00”); Tab;
            Next
            Print #1, Format(var(row, col), “0.00”)
        Next
        Close #1
    End Sub

  7. This is useful.

    The main reason for “rolling your own” CSV files is for when you have to transfer information to another package over which you have no control (IE: you have to match the required input specification because you cannot modify the code).

    I have one app. which requires some number fields as numbers, others always as text (to allow $NA). All text fields have to be surrounded by double quotes. [it’s not on a PC]

  8. An alternative approach I’ve used is a pipe: | (the character above
    the backslash) instead of the comma as a delimiter. The pipe is a
    relatively rare character in most text, so the comma-within-text
    problem is avoided and the file can still be saved as a “.csv”.

    If the receiving application can have its delimiter re-set to the
    pipe, all’s well: the receiving application will parse on the pipe
    character. If not, then a second process can be used to search-and-
    replace the pipe with a quote-comma-quote (to take care of any
    comma-within-text instances), followed by a single quote at the
    beginning or each line and another single quote at the end of each
    line (replace the
    or any other special character that indicates
    the end of a line and/or carriage return).

    So, best case is a single, simple pass through the fields to add the
    pipe without any need for regular expressions. Worst case is three
    passes, but regular expressions only needed for passes two (changing pipes to quoted commas) and three (adding beginning- and ending-of-
    line quotes).

  9. The real problems occur when you get CSV files. They are so bad it sours you on the file format. In addition to the comma problem, don’t forget quotation marks in text fields, e.g. “Bank of .. The “Honest” Bankers” or “7’6? ceilings”.

  10. I too have had to deal with the “I have no control over this incoming CSV file, but we need to use it in Excel… can you please import it for me?”. I have to remind everyone (and myself too sometimes) that the CSV file is nothing more than a text file. Open it up with a text editor, do a find & replace and begone with the offending stuff.

  11. I regularly use tab separated values files as that’s the format currently exported by the ERP system I use. I assume that the discussed challenges with CSV are main the reason behind that ERP system’s shift from csv to tsv.

    Of course, for anyone finding themselves fighting with data translations in general, I highly recommend having a look at Monarch.

  12. When we say CSV it is not just a file with comma delimited(comedy limited :-) ) fields in the record.
    It is a special file format. Let Excel to do the job.
    We can get desireable result using the following:

    Public Sub XL2CSV()
    ThisWorkbook.SaveAs Filename:=ThisWorkbook.Path & “TestCSV.csv”, FileFormat:=xlCSV
    End Sub

    In the TestCSV.csv we will have:

    1234,”Kusleika, Dick”,82
    5678,”Smith, John”,95
    9012,Cher,66

  13. Hi,
    I am new to VBA. How easy is it to save 4 columns (random) from the fist sheet of an excel file having around 45,000 rows and 6 columns. Would appreciate if some one can point to some sample code.

    thanks in advance.

    Venu

  14. Mike: “I don’t know why, but I really don’t like that common “concatenate commas then strip off the last one” code style, it just looks clumsy.”

    A more elegant approach is to add the comma at the start of every field (except the first, of course), and the return characters at the start of every record (again, except the first).

    Additionally, string building/concatenation in VBA is very slow. If your CSV file is going to be very large, it’s worth creating a huge empty string, filling it using mid() and then trimming the excess off the end.

  15. Sorry “huge empty string” doesn’t really make sense, does it? What I meant was create a long string using either of the string() or space() functions.

  16. @Rob Bruce
    I’ve compared the elapsed time when exporting data using the “SaveAs csv” and the “print”-function. I don’t think there is a big difference in time necessary. More important is how you get the data out of the excel sheet.

    Here are the three approaches:
    1. Using “SaveAs csv”
    2. Reading the data in a variant -> use print
    3. Reading the data in a range -> use print

    And that’s the result:
    1. 5.2 seconds
    2. 4.7 seconds
    3. 1.2 seconds
    (for the setup see here: http://www.vba-blog.de/?q=content/datenexport-textdatei-mit-print – Sorry! It’s only a german website, but I’m still working on my english. )

    If there are a lot of formulas in the workbook, the “SaveAs csv” will take even more time. So I would never recommend using the “SaveAs csv”-macro.

    I will try to implement your “huge empty string” and compare it with the others.

    Nice weekend!
    Tobi

  17. Damn! Error! I confused the numbers! Here’s the correct version:

    Here are the three approaches:
    1. Using “SaveAs csv”
    2. Reading the data in a range -> use print
    3. Reading the data in a variant -> use print

    And that’s the result:
    1. 5.2 seconds
    2. 4.7 seconds
    3. 1.2 seconds

  18. I stick to tab delimited files whenever possible – they just work better.

    Leonid,

    The example you gave shows the downfall of the SaveAs CSV method – Excel’s inconsistent use of quote marks. There are many programs that cannot read this style of CSV properly, including (ridiculously) Microsoft Access. That’s why we often need to roll our own CSVs.

  19. sir

    i have excel file which was converted to .csv file in that one row have 14 characters number field but once i save and open it is giving 1.33333+4 error. Actually i want to upload a .csv file to our software because of this error i cannot do

  20. In you code comma Chr(44) can be replaced by CHR(130).
    Chr (130) looks just like normal comma, but doesn’t split the field in CSV

  21. I have been trying to use MakeCSV2 and the following error occurs when I run it

    Run-time error ’75’:
    Path/File access error

    When I click on the Debug button the following line is highlited

        Open sFname For Output As lFnum

    Any help would be appreciated

  22. Thanks for the reply Jon.

    As I am just trying to get my head around VBA can you please explain how to ‘define sFname and lFnum’

  23. David – if you have copied and pasted the code without change the most likely problem is you don’t have write access to C:. If you change C: in the code to the full path to your Documents folder (or any other folder you know you have write access to), it should work.


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.