Azure Data Factory - ForEach Activity#
The ForeEach
activity in Azure Data Factory has some important limitations. One of them is when working with the batch
mode, it would be nice to embed only pipeline activities inside.
Problem#
When you run the ForEach
activity in batch
mode, and you loop over a list of items, and inside the ForEach
, you run some activities (not pipeline activity), you might find the same item is processed multiple times. The doc already says that the SetVariable
should not be used in the ForEach activity, as it will set the variable at pipeline level (the pipeline where hosts the ForEach activity), and it will be shared by all the iterations.
Solution#
There are 2 solutions to this problem:
- If you want to keep the
batch
mode, use the ForEach activity with pipeline activity inside only, and send the item to the pipeline as a parameter. Pipeline is more a less like a function in programming, each batch (and the item sent by the ForEach activity) is run in an isolated pipeline, so the variables defined in the pipeline is not shared with other batch (pipeline). This solution takes more time in authoring your ADF workflow, but it's faster in execution. - Or set the ForEach activity to
sequential
mode, then everything will work as expected, but it will be much slower.