One problem that can really drive you crazy when working with data from text files is the many different formats that dates and numbers can be stored in around the world. For example, take a look at the contents of a tab-delimited file shown in the following screenshot:
There’s a single row of data and two columns, Date and Sales. What number and date are shown though? For me, living in the UK, the date shown is the 2nd of March 2015 (in the UK, like most of the world, we use the DD/MM/YYYY date format) and the number is one hundred thousand and two (in the UK we use the . sign as a decimal separator and the , as a thousands separator). However, if I was from the US and used the MM/DD/YYYY format I’d read the date as the 3rd of February 2015; and if I was from France and used a comma as a decimal separator, I’d read the number as a value just a tiny amount over one hundred. Of course, if you’re working in one country and getting data sent from another, which uses a different date or number format, you need to take all of these variations into account.
The good news is that Power Query has a really easy way of doing this for you, even if it’s not immediately obvious where this functionality is. You don’t need to change your Windows locale or split dates into their constituent parts and rearrange them in the correct order, or anything like that.
When you first load a csv file like the one shown above into Power Query, it will try to guess the data types of each column. Here’s the code generated by Power Query for this file:
let Source = Csv.Document( File.Contents( "C:\Users\Chris\Documents\Power Query demos\PQDateNumberFormats.csv"), null,"#(tab)",null,1252), #"First Row as Header" = Table.PromoteHeaders(Source), #"Changed Type" = Table.TransformColumnTypes( #"First Row as Header", {{"Date", type date}, {"Sales", Int64.Type}}) in #"Changed Type"
As you can see, it has created three steps in the query automatically:
- Source loads the data from the CSV file given and sees that it’s tab delimited
- First Row as Header uses the first row of data as the column headers
- Changed Type sets the Date column to be type date, and Sales to be an integer
To change how the column data types are interpreted though, you first need to delete the Changed Type step by clicking on the cross icon next to it, shown above. Then, right-click on the Date column, choose Change Type then Using Locale…
When you do this, the Change Type With Locale dialog appears:
A locale is simply a name for all of the rules for date and number formats and more associated with a particular language and region. So, for example, setting the Date column so that it is interpreted using the English (United States) locale, means that when the data is loaded into Power Query on my machine I see the date 3/2/2015.
What’s happened is that the csv file contains the date “2/3/2015”, I have told Power Query that the data source uses a US English format date, Power Query has then assumed this date is therefore in MM/DD/YYYY format, loaded the data for me and shown the date in my own locale (which is English (United Kingdom)) in DD/MM/YYYY format as 3/2/2015. The date itself hasn’t changed, just the way it is displayed.
Using the same technique to import the Sales column as a decimal value using the English United States locale like so:
…gives the following final result, where the comma is assumed to be a thousands separator:
Here’s the new code for the query:
let Source = Csv.Document( File.Contents( "C:\Users\Chris\Documents\Power Query demos\PQDateNumberFormats.csv"), null,"#(tab)",null,1252), #"First Row as Header" = Table.PromoteHeaders(Source), #"Changed Type with Locale" = Table.TransformColumnTypes( #"First Row as Header", {{"Date", type date}}, "en-US"), #"Changed Type with Locale1" = Table.TransformColumnTypes( #"Changed Type with Locale", {{"Sales", type number}}, "en-US") in #"Changed Type with Locale1"
However, if I change the locale used to set the types of these two columns to French (France), I see the following:
You will see that I have told Power Query to interpret the value “100,002” as a decimal number using a French format, where a comma is used as a decimal separator, and it has therefore imported and displayed it as 100.002 in the English (United Kingdom) format with a decimal point used as the decimal separator.