Recently I had a request for help from someone who wanted to do the following in Power Query: take a piece of text and then, using a table, search for all of the occurrences of the words in one column of the table in the text and replace those words with those in the other column. So, for example, given these two tables in Excel:
You want to take the table on the left and for each piece of text replace the words in the ‘Word To Replace’ column of the right-hand table with those in the ‘Replace With’ column of the right-hand table. The output would therefore be:
An interesting challenge in itself, and one I solved first of all using a recursive function. Here’s some code showing how I did it:
let
//Get table of word replacements
Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],
//Get table containing text to change
TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],
//Get a list of all words to replace
WordsToReplace = Table.Column(Replacements, "Word To Replace"),
//Get a list of all words to replace with
WordsToReplaceWith = Table.Column(Replacements, "Replace With"),
//Recursive function to do the replacement
ReplacementFunction = (InputText, Position)=>
let
//Use Text.Replace to do each replace
ReplaceText = Text.Replace(
InputText,
WordsToReplace{Position},
WordsToReplaceWith{Position})
in
//If we have reached the end of the list of replacements
if Position=List.Count(WordsToReplace)-1
then
//return the output of the query
ReplaceText
else
//call the function again
@ReplacementFunction(ReplaceText, Position+1),
//Add a calculated column to call the function on every row in the table
//containing text to change
Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text], 0))
in
Output
It does the job, but… after thinking about this some more, I wondered if there was a better way. A lot of my recent Power Query blog posts have used recursive functions, but are they a Good Thing? So I asked on the forum, and as usual the nice people on the Power Query dev team answered very promptly (that’s one of the things I like about the Power Query dev team – they engage with their users). Recursive functions are indeed something that should be avoided if there is an alternative, and in this case List.Generate() can be used instead. Here’s how:
let
//Get table of word replacements
Replacements = Excel.CurrentWorkbook(){[Name="Replacements"]}[Content],
//Get table containing text to change
TextToChange = Excel.CurrentWorkbook(){[Name="Text"]}[Content],
//Get list of words to replace
WordsToReplace = Table.Column(Replacements, "Word To Replace"),
//Get list of words to replace them with
WordsToReplaceWith = Table.Column(Replacements, "Replace With"),
//A non-recursive function to do the replacements
ReplacementFunction = (InputText)=>
let
//Use List.Generate() to do the replacements
DoReplacement = List.Generate(
()=> [Counter=0, MyText=InputText],
each [Counter]<=List.Count(WordsToReplaceWith),
each [Counter=[Counter]+1,
MyText=Text.Replace(
[MyText],
WordsToReplace{[Counter]},
WordsToReplaceWith{[Counter]})],
each [MyText]),
//Return the last item in the list that
//List.Generate() returns
GetLastValue = List.Last(DoReplacement)
in
GetLastValue,
//Add a calculated column to call the function on every row in the table
//containing the text to change
Output = Table.AddColumn(TextToChange, "Changed Text", each ReplacementFunction([Text]))
in
Output
List.Generate() is a very powerful function indeed, albeit one that took me a while to understand properly. It’s a bit like a FOR loop even if it’s a function that returns a list. Here’s what each of the parameters I’m passing to the function in the example above do:
- ()=> [Counter=0, MyText=InputText] returns a function that itself returns a record (a record is a bit like a table with just one row in it). The record contains two fields: Counter, which has the value 0, and MyText which is given the value of the text where the values are to be replaced. This record is the initial value that List.Generate() will modify at each iteration.
- each [Counter]<=List.Count(WordsToReplaceWith) returns a function too. An each expression is a quick way of declaring a function that takes one, unnamed parameter, and in this case the value that will be passed to this parameter is a record of the same structure as the one declared in the previous bullet. The expression [Counter] gets the value of the Counter field from that record. The function returns a boolean value, true when the value in the [Counter] field of the record is less than or equal to the number of items in the list of words to replace. List.Generate() returns a list, and while this function returns true it will keep on iterating and adding new items to the list it returns.
- each [Counter=[Counter]+1, MyText=Text.Replace([MyText], WordsToReplace{[Counter]}, WordsToReplaceWith{[Counter]})] returns yet another function, once again declared using an each expression. The function here takes the record from the current iteration and returns the record to be used at the next iteration: a record where the value of the Counter field is increased by one, and where the value of the MyText field has one word replaced. The word that gets replaced in MyText is the word in the (zero-based) row number given by Counter in the ‘Word To Replace’ column; this word is replaced by the word in the row number given by Counter in the ‘Replace With’ column.
- each [MyText] returns a very simple function, one that returns the value from the MyText field of the record from the current iteration. It’s the value that this function returns that is added to the list returned by List.Generate() at every iteration.
To illustrate this, here’s a simplified example showing how List.Generate() works in this case:
let
WordsToReplace = {"cat", "dog", "mat"},
WordsToReplaceWith = {"fish", "snake", "ground"},
Demo = List.Generate(
()=> [Counter=0, MyText="the cat and the dog sat on the mat"],
each [Counter]<=List.Count(WordsToReplaceWith),
each [Counter=[Counter]+1,
MyText=Text.Replace(
[MyText],
WordsToReplace{[Counter]},
WordsToReplaceWith{[Counter]})],
each [MyText])
in
Demo
The output of this query is the list:
This list can be written as (with the words changed at each iteration highlighted):
{“the cat and the dog sat on the mat”, “the fish and the dog sat on the mat”, “the fish and the snake sat on the mat”, “the fish and the snake sat on the ground”}
So, another useful function to know about. I’m slowly getting to grips with all this functional programming!
You can download the sample workbook here.