Removing Diacritics From Text In Power Query

I faced an interesting challenge recently: I had to use Power Query to remove all non-ASCII characters from some text but maintain the readability of that text. Let’s take the French phrase “un garçon très âgé à Noël” (which I know doesn’t make much sense but is great for testing) as an example. If you remove everything apart from the basic latin alphabet characters and spaces using Text.Select as I blogged here:

Text.Select(
"un garçon très âgé à Noël", 
List.Combine(
{{"A".."Z"},{"a".."z"},{" "}}
)
)

…then the output is “un garon trs g Nol”. As you can see, removing all the characters leads to unreadable text. Instead, what you have to do is find all the letters with diacritics (accents and other glyphs that can be added to characters) and remove the diacritics. Doing this may be ungrammatical and make it harder to understand the meaning of the text but in most cases the text will still be readable.

The bad news is that there is no straightforward way to do this in Power Query, and indeed there is no straightforward way to do this at all because there are no hard-and-fast rules about what to replace a letter with a diacritic with: should “ö” become “o” or “oe” for example? My first thought was to create a big lookup table with all the rules of what to replace each character with in, similar to the approach taken here for solving this problem in Google Sheets. Building a comprehensive lookup table would be gigantic task though.

Luckily there is another solution (thanks to Curt Hagenlocher for finding this one for me) that is described in this thread on StackExchange: converting the text to binary using the Greek (ISO) code page and converting it back to text again using the Text.FromBinary and Text.ToBinary M functions. I’m not sure I understand why it works but it seems to work well enough for my purpose. Here’s the M code to do this:

Text.FromBinary(
Text.ToBinary(
"un garçon très âgé à Noël", 
28597
)
)

The output of this is “un garcon tres age a Noel”, which is exactly what we want. Of course it’s not perfect and there are cases where characters can’t be converted. If you take the text “Malus × zumi” (it’s a type of crabapple apparently), then:

Text.FromBinary(
Text.ToBinary(
"Malus × zumi", 
28597
)
)

returns the text “Malus ? zumi” – the “×” has been replaced by “?”. As a result you’ll probably also want to replace any question marks with a space and then remove any excess spaces; I’m going to ignore the fact that doing this might also remove any question marks that were in the original text.

Here’s an M custom function that does all this and also solves my original problem of only returning ASCII characters:

(inputText as text) as text =>
  let
    ReplaceDiacritics = Text.FromBinary(Text.ToBinary(inputText, 28597), TextEncoding.Ascii), 
    RemoveQuestionMarks = Text.Replace(ReplaceDiacritics, "?", " "), 
    RemoveExcessSpaces = Text.Combine(
      List.RemoveItems(Text.Split(Text.Trim(RemoveQuestionMarks), " "), {""}), 
      " "
    )
  in
    RemoveExcessSpaces

Removing Diacritics From Text In Power Query

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112