Generate Sample Data

I’ve long been a fan of Jimmy Pena’s Random Sample Data Generator Add-in and I use it quite a bit. I wanted to write my own, however, and introduce a little more flexibility. In particular, I was creating some sample inventory data and I needed to create inventory part numbers. My part numbers are of the form AIM-0000 or PP-0000. Easy enough to do with a RANDBETWEEN and some concatenation, but I wanted to build it in to the generator. So I did.

I’ve attempted to break down the samples into types. I have

  • Number. A number between a minimum and a maximum with a specified number of decimal places. It can be random or sequential. If it’s sequential, it can skip by a specified amount.
  • Date. A date between a minimum and a maximum. It can be random or sequential. It can skip by days, months, quarters, or years.
  • Time. A time between a minimum and a maximum. It can be random or sequential. It can skip by hours, minutes, or seconds.
  • Constant. This is a string that simply outputs itself over and over. This would mainly be used in compounds.
  • List. A predefined list of text. It can be random or sequential. You can import lists from a text file rather than typing in all the entries.
  • Text. A string of a specified length. It can be upper, lower, or proper case.
  • Compound. This consists of one or more of the above types.

Here’s the main screen.

You select any number of samples and put them in the output box. An example is shown to make sure you’re getting what you think you’re getting. You can specify the number of records (defaults to 50). You can output the data to a new workbook, a new worksheet, or the active cell. To add a new sample, click the Add link above the Samples box. You are presented with a dialog to specify the properties of the sample. Here’s what the dialog for a new Number look like.

You can also edit existing samples. Here’s what editing a list looks like.

The compound type is the most interesting. The below example shows how the social security number was constructed using other sample types.

You can also add or edit existing samples from within the Add/Edit Compound dialog. You can’t delete, though, because that causes too many problems.

All of the samples can be put into categories. There is a textbox on each of the dialogs that accepts a comma separated list of category names.

I’ve already noted some bugs as I was writing this post, and I’m sure there are a few more. If you want to give it a try, let me know your questions and comments. The code is wide open, so feel free to have a look around.

You can download dkRandomData.zip

Stick the add-in somewhere and make sure the two xml files are in the same directory as the add-in. Load the add-in and look for a Quick Sample Data entry under the Tools menu (it’s under Add-in in 2007 – the Ribbon part is on my todo list).

Key to the usefulness of this, I think, is have pre-built samples in there already. If you note something that is obviously missing, let me know.

31 thoughts on “Generate Sample Data

  1. Alex Kerin

    Nice job, the only thing I would really like is being able to add numbers that fluctuate up an down. For example, you could specify a degree of randomness around a sine curve, and then specify the periodicity of the sine curve (e.g. 3/4 of a cycle over the length of the data, starting at a certain number). When I create dashboards in Excel or other tools, I like it to tell a story – that’s hard to do with data that is either random or just goes up or down.

    Great tool, thanks.

  2. Dick Kusleika

    Good one Alex. I’ll have to learn some probability math, but I’ve added it to my to do list.

    JP. Yeah, that’s on my list. I don’t know why I didn’t just get it done.

  3. Shawn Wheatley

    Nice work, Dick! I had thought about writing something similar (in Python, I’m just learning Excel and VBA) and wasn’t aware of Jimmy Pena’s solution or yours. The one feature I wanted to add in mine (and I’m not sure that Excel can do this or not) is to be able to seed the random number generator so that you get the same “random” data set multiple times. This could be useful for unit testing, so that everybody is working from the same data set.

    The simple solution is, of course, just save the resulting data and share that. The geek in me cannot accept this answer :)

  4. JoshG

    Shawn,
    You can generate the exact same set of “random” numbers. From the help files for Randomize:
    To repeat sequences of random numbers, call Rnd with a negative argument immediately before using Randomize with a numeric argument.

    So it could look something like this:

    Sub TestRandom()
        Dim a As Long
       
        Rnd (-1)
        Randomize (2)
        For a = 1 To 50
            Cells(a, 1) = Rnd
        Next a
    End Sub

    You should get the same result every time.

    Josh

  5. SFdude

    Thanks Dick –

    Works *great* in Excel 2002,
    under XP Pro/SP3
    …Superb!

    Q1) Suggest adding ZIP codes to the “LOCATION” category,
    i.e.: 94108-123 (with and w/o -### extension),

    Q2) Tried adding “ZIP code” myself,
    using the “Add” link (above the middle column),
    but couldn’t add a new category to generate ZIP codes
    with a -### extension:
    i.e.: 94108-123

    Q3) What are the Categories with an _underscore?
    i.e.: _Compound, _Constant, etc…

    SFdude

  6. Dick Kusleika

    A zip code would be a compound sample. Make a five digit number, a three digit number, and a hyphen (if they don’t already exist). Then create a compound sample which combines them in the right order.

    The underscore categories are sample types. So if you make the above zip code, it would appear in the _compound category because it’s a compound sample.

  7. Khushnood Viccaji

    Would it be too much to ask for Web addresses matching with the Organization names in all records ?
    I mean, if I use both fields while generating a sample, an organization name ends up with multiple web-addresses.

    Unless, the RANDOM bit generates each field’s values individually, and hence this would not be possible.

    Khushnood

  8. Dick Kusleika

    Khushnood: I considered that feature, but it was too much work. When I need relatable data, I use a formula. For domain names, for instance, I would not include the domain in the output, but I would include the company name. Then I’d use a formula in another column like

    =”http://www.”&SUBSTITUTE(A1,” “,”-“)&”.com”

    and fill down.

  9. Dave

    Hi Dick

    This is a really useful add-in. Thanks

    Obviously I have opened this up in the VBA editor to take a look at how it works (struggling my way through as we speak) but have a question:

    Why XML for the data? Would it not have been as easy to include the data in the spreadsheets? Is this a speed or processing overhead issue?

    I presume it is possible (but don’t know how) to add the sex to the FirstName items so that ‘realistic’ data is available – i.e. not having girls called Nathan.

    Anyway – thanks again.

  10. Dick Kusleika

    Why XML: A couple of reasons. First I wanted the data to be independent of the code. That way I could release version 2 and it wouldn’t erase your custom sample data structures – it would just use (or convert) the existing XML file. I could have used CSV or some other text format, but I wanted to do a project with XML for the experience, so I picked it.

    The other reason was I wanted to be able to share XML files. You could set up an XML file that had a lot of custom samples that were particular to your company and distribute the XML file to colleagues. Beyond that I wanted to build an import procedure so that I could import your XML file and it would merge the two. Then an export function so that if you created a really great list of vehicle make/model combinations, for instance, you could post that xml for other people to import.

    Name and Gender: The way I did this for the US Cities and States sample was to create a sample of US Cities to a worksheet. Then I manually added the state to the adjacent cell. Then I concatenated them in another cell and used that to create new list which insured that the cities and states matched up. It’s a manual process to link them together, but might be worth the effort depending on how many you need.

  11. Pingback: Excel 2003 Ecrire dans un XML dont la structure est connue

  12. Jola

    Hi, It seems like a good piece of work, but I experience some trouble running it… When I load the add in excel (2007) says “This workbook has lost its VBA project, ActiveX controls and any other programmability-related features.”. I’m new with the subject, so perhaps I’m doing something wrong. I would appreciate your help. Thank you in advance.

  13. Jeff Weir

    Hey, this is sweet! What would be cool would be the ability to build compound output using freetext fields, like “Police Academy” Three Digits “was the best ever Police Academy movie”. Oh, and the ability to specify a set number of duplicates.

  14. Dick Kusleika

    I considered the compound field, but decided that it was better done in Excel. That is, put the components on the sheet and concatenate them there.

    I like the ‘max duplicates’ thing. I just don’t want to code it. :)

    The next thing I want to do is make it relational. So if you have a random list of customers and you want a random list of invoices, the invoices would use the customer id’s as a field.

  15. Dick Kusleika

    Wait, I did make a compound one. I just created a constant called PA1 that is “Police Academy” and one called PA2 that is “is the greatest”. Then I made a new compound that took PA1+ThreeDigit+PA2.

  16. Jeff Weir

    Cool. Don’t really need a random number in the case of this particular string, mind: Police Academy 1 still stands as the paragon of the franchise.

  17. Pingback: dataprose.org » Sur La [Excel] Table

  18. Glenn Lloyd

    This looks like a very useful utility. However, after I have installed it, it will not compile and errors on declarations referring to msxml2 ( Dim xmlDoc As MSXML2.DOMDocument
    Dim xmlNodes As MSXML2.IXMLDOMNodeList Dim xmlNode As MSXML2.IXMLDOMNode ). I tried to add the reference but have been unable to locate the required DLL on my system. I am running Excel 2010 32 bit on Windows 8 64 bit

    Glenn

  19. Glenn Lloyd

    Hi Dick,

    Thanks,

    I have just modified the declarations Dim xmlDoc As MSXML2.DOMDocument to Dim xmlDoc As MSXML2.DOMDocument60 (I found two instances) the add-in now compiles without error. I will let you know what else I learn.

    Glenn

  20. Glenn Lloyd

    Dick,

    Just to confirm, after modifying the DOMDocument Declaration to .DOMDocument60, the add-in now compiles and functions “as advertised” for me.

    Thanks very much,

    Glenn

  21. Dick Kusleika Post author

    Interesting. I think I used DOMDocument because it was first on the list. Or maybe I thought I just didn’t need whatever the heck new features DOMDocument60 had. But in the brave new world of Win8, I guess I need to update that add-in. That’s good, because I have some other feature changes I’ve been thinking about.

  22. Glenn Lloyd

    Dick,

    This is a great add-in. I work primarily in Access and my sample data solution until now was a database that I built. With your add-in I can create sample data on an ad hoc basis and import it into my access projects. The real advantage is the variety of data type you have included in the add-in.

    Once again, many thanks for sharing this.

    Glenn

  23. Pingback: dataprose.org » It’s a Cartesian Product, “Dummy”

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax