External Data – Mixed Data Types

This is a newsgroup post from OneDayWhen. He’s an expert (in my opinion) on External Data. The issue he discusses is that of mixed data types. If you have, for instance, both strings and numerics in one external data column, you can get unexpected results – like some of your data doesn’t show up.

This is a common problem in the newsgroups and this is the best post I’ve ever seen on the subject. As OneDayWhen says himself, it needs looking after. Therefore, I’m posting it here in its entirety for all time.

Here my notes on the subject: (look after them; as my granddad used to say, ‘I won’t be around forever, there’ll be one day when you’ll have to fend for yourself, sonny’):

The relevant registry keys (for Jet 4.0) are in:

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/

The ImportMixedTypes registry key is always read (whether it is honored is discussed later). You can test this by changing the key to ImportMixedTypes=OneDayWhen and trying to use the ISAM: you get the error, ‘Invalid setting in Excel key of the Engines section of the Windows Registry.’ The only valid values are:

ImportMixedTypes=Text
ImportMixedTypes=Majority Type

Data type is determined column by column. ‘Majority Type’ means a certain number of rows (more on this later) in each column are scanned and the data types are counted. Both a cell’s value and format are used to determine data type. The majority data type (i.e. the one with the most rows) decides the overall data type for the entire column. There’s a bias in favor os numeric in the event of a tie. Rows from any minority data types found that can’t be cast as the majority data type will be returned with a null value.

For ImportMixedTypes=Text, the data type for the whole column will be:

Jet (MS Access UI): ‘Text’ data type
DDL: VARCHAR(255)
ADO: adWChar (‘a null-terminated Unicode character string’)

Note that this is distinct from:

Jet (MS Access UI): ‘Memo’ data type
DDL: N/A
ADO: adLongVarWChar (‘a long null-terminated Unicode string value’)

ImportMixedTypes=Text will curtail text at 255 characters as ‘Memo’ is cast as ‘Text’. For a column to be recognized as ‘Memo’, majority type must be detected, meaning the majority of rows detected must contain 256 or more characters.

But how many rows are scanned for each column before is decided that mixed and/or what the majority type is? There is a second registry Key, TypeGuessRows. This can be a value from 0-16 (decimal). A value from 1 to 16 inclusive is the number of rows to scan. A value of zero means all rows will be scanned.

There is one final twist. A setting of IMEX=1 in the connection string’s extended property determines whether the ImportMixedTypes value is honored. IMEX refers to IMport EXport mode. There are three possible values. IMEX=0 and IMEX=2 result in ImportMixedTypes being ignored and the default value of ‘Majority Types’ is used. IMEX=1 is the only way to ensure ImportMixedTypes=Text is honored. The resulting connection string might look like this:

Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=C:\ db.xls;
Extended Properties=’Excel 8.0;HDR=Yes;IMEX=1′

Finally, although it is mentioned in MSDN articles that MAXSCANROWS can be used in the extended properties of the connection string to override the TypeGuessRows registry keys, this seems to be a fallacy. Using MAXSCANROWS=0 in this way never does anything under any circumstances. Put another way, is has just the same effect as putting ONEDAYWHEN=0 in the extended properties, being none (not even an error!). The same applied to ImportMixedTypes i.e. can’t be used in the connection string to override the registry setting.

In summary, use TypeGuessRows to get Jet to detect whether a ‘mixed types’ situation exists or use it to ‘trick’ Jet into detecting a certaint data type as being the majority type. In the event of a ‘mixed types’ situation being detected, use ImportMixedTypes to tell Jet to either use the majority type or coerce all values as ‘Text’ (max 255 characters).

12 thoughts on “External Data – Mixed Data Types

  1. This is a very, very late comment: but DailyDoseOfExcel came up on Google when I searched for IMEX and text imports, so here goes…

    Time has moved on a bit since then, and you have a couple more registry keys to check:

    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel

    And people comeing here to look at IMEX settings might be interested in text files, rather than Excel. If that’s you, your keys are:

    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Text
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Text

    The Registry value you’re looking for is ‘ImportMixedTypes’ and you’ll probably find that it’s been set to ‘Majority Type’ – which means that the data provider reads your column of (say) SEDOL identifiers, decides that the majority type is long integer, and silently discards all other values, replacing them with Null.

    The correct value is ImportMixedTypes=Text and I would like to know why anyone thought that ‘Majority Type’ was a good idea for the default installation settings.

    If it isn’t ‘Text’, you’re hosed. The connection string clause ‘IMEX=1’ allows you to use ‘ImportMixedTypes=Text’ if it’s present in the registry key, which is just great if it’s the key you’ve got, and a complete waste of electrons if it isn’t.

    So you probably need to edit the registry. Ask me nicely, and I’ll post a safe library of Registry commands based on WMI script: ‘safe’, that is, in Registry terms. This means “safer than the API calls, but it’s still the Registry and Here Be Dragons”.

    What if you don’t want to edit the registry, because you’d rather not risk breaking applications which rely on the default setting?

    Here’s an interesting trick: you can tell the driver which registry keys to read…

    Note the position of that property change: it must come *after* the connection string is read, and *before* the connection is opened.

    The registry key must be in HKEY_LOCAL_MACHINE – note that this is omitted from the “Jet OLEDB:Registry Path” property, which will always go to HKLM, no matter what you tell it to do.

    So you can write a custom data specification of your own for Excel (or text).

    If you can’t update registry values in HKLM, and you can’t write new keys into your own ‘MyApp’ tree, you have a problem.

    You might be lucky, and you might discover one of those alternative key paths actually has the settings you wanted.

    If you’re not so lucky, you’re in the unfortunate position of working with text files; either you started from there, or you’ve resorted to exporting ranges from Excel and performing SQL on the txt files, which suggests that you googled for ‘Horrible Hack’ and ended up pasting code out of Excellerando.Blogspot.com, and I am very, very sorry.

    The hack for misbehaving text file IMEX is to insert a Schema.ini file into the data directory. This time, you really do override the registry settings:

    [Limits.csv]
    Format=CSVDelimited
    ImportMixedTypes=Text

    And that’s it: you don’t have to specify each column by name and by data type, and you’ve coerced IMEX=1.

    It’s *that* simple.

  2. This is a very, very late comment: but DailyDoseOfExcel came up on Google when I searched for IMEX and text imports, so here goes…

    Time has moved on a bit since then, and you have a couple more registry keys to check:

    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel

    And people comeing here to look at IMEX settings might be interested in text files, rather than Excel. If that’s you, your keys are:

    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Text
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text
    HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Text

    The Registry value you’re looking for is ‘ImportMixedTypes’ and you’ll probably find that it’s been set to ‘Majority Type’ – which means that the data provider reads your column of (say) SEDOL identifiers, decides that the majority type is long integer, and silently discards all other values, replacing them with Null.

    The correct value is ImportMixedTypes=Text and I would like to know why anyone thought that ‘Majority Type’ was a good idea for the default installation settings.

    If it isn’t ‘Text’, you’re hosed. The connection string clause ‘IMEX=1’ allows you to use ‘ImportMixedTypes=Text’ if it’s present in the registry key, which is just great if it’s the key you’ve got, and a complete waste of electrons if it isn’t.

    So you probably need to edit the registry. Ask me nicely, and I’ll post a safe library of Registry commands based on WMI script: ‘safe’, that is, in Registry terms. This means “safer than the API calls, but it’s still the Registry and Here Be Dragons”.

    What if you don’t want to edit the registry, because you’d rather not risk breaking applications which rely on the default setting?

    Here’s an interesting trick: you can tell the driver which registry keys to read…

    Note the position of that property change: it must come *after* the connection string is read, and *before* the connection is opened.

    The registry key must be in HKEY_LOCAL_MACHINE – note that this is omitted from the “Jet OLEDB:Registry Path” property, which will always go to HKLM, no matter what you tell it to do.

    So you can write a custom data specification of your own for Excel (or text).

    If you can’t update registry values in HKLM, and you can’t write new keys into your own ‘MyApp’ tree, you have a problem.

    You might be lucky, and you might discover one of those alternative key paths actually has the settings you wanted.

    If you’re not so lucky, you’re in the unfortunate position of working with text files; either you started from there, or you’ve resorted to exporting ranges from Excel and performing SQL on the txt files, which suggests that you googled for ‘Horrible Hack’ and ended up pasting code out of Excellerando.Blogspot.com, and I am very, very sorry.

    The hack for misbehaving text file IMEX is to insert a Schema.ini file into the data directory. This time, you really do override the registry settings:

    [Limits.csv]
    Format=CSVDelimited
    ImportMixedTypes=Text

    And that’s it: you don’t have to specify each column by name and by data type, and you’ve coerced IMEX=1.

    It’s *that* simple.

  3. “… I would like to know why anyone thought that ‘Majority Type’ was a good idea…”

    Testify!

    “Ask me nicely, and I’ll post a safe library of Registry commands…”

    Consider yourself asked – please. I can host them here if you want to email to dick@kusleika.com

    I use Schema.ini for .txt files all the time, but never thought of using it for .csv. I already know three pesky files that will get that treatment.

    Thanks for posting this.

  4. Okay, lets try that again, see which bit of formatting fails *this* time…

    I just got ’round to re-reading and replying: apologies for the delay.

    Firstly, @Doug:

    http://excellerando.blogspot.com/2014/09/from-time-to-time-it-necessary-to.html

    That’s the code for exporting ranges and running SQL. It’s *horrible*. I would know: I wrote it.

    If you’re looking to set a schema that imposes ImportMixedTypes=Text there’s less to do than you think. Here’s my implementation:

    You’ll need to supply your own location for the ‘database’ folder: for demonstration purposes, I’ve given you C:\temp.

    Next, @Dick:

    The safe library of registry functions is listed below:

  5. Thanks Nigel. I pasted your code into the VBE, fixed the HTML escapes and put it back in your comment. I don’t know ampersands don’t work for you, but do for me.

  6. Maybe I should post not logged in

  7. And here I am posting logged in


Posting code? Use <pre> tags for VBA and <code> tags for inline.

Leave a Reply

Your email address will not be published.