When building business intelligence solutions one problem is that data usually contains errors, e.g. attributes are written in different ways so that data cannot be grouped correctly. The attribute could be written in upper case or not; it could be abbreviated or not; and sometimes several synonyms exist for the same thing.
For instance, ‘United Kingdom’ could be referred to as ‘UNITED KINGDOM’, ‘United Kingdom’, ‘Great Britain’, or just ‘UK’.
As a consequence, what the users really think of as the same instance will appear on several rows in a list box, or be displayed in several bars in a bar chart. This will cause problems in the data analysis, since selections and numbers displaying totals often will be incomplete.
But there are ways to solve this. The best way, is of course to correct it in the source data. But this is not always possible, so it may be that the correction must be made elsewhere.
In QlikView and Qlik Sense there are several ways to do this. The most obvious (but not the best), is to use a hard-coded, conditional expression in the script:
Similar constructions can be made using Replace() or Pick(). These all work and will do the job.
But they are not manageable.
Should you want to add more cases or change some previous ones, you will soon realize that this isn’t a good method. The expressions will become too long and they will be error-prone. So I strongly recommend not doing this.
There is however a solution which is both manageable and simple: Mapping Load. The first step is to create a mapping table with all changes you want to make:
Then you load this table using the Mapping prefix:
Mapping Load ChangeFrom, ChangeTo
From MapTable.xlsx (…) ;
Now you can use this table in the script to correct all field values. The simplest way to use the Map statement: Declare the mapping early in the script before any of the relevant fields are loaded, and the corrections will be made automatically:
Map Country, Department, Person Using MapTable ;
Alternatively, you can use either ApplyMap() or MapSubstring() when you load the field, which both will make a lookup in the mapping table and if necessary make the appropriate replacement, e.g.:
ApplyMap( ‘MapTable’, Country ) as Country ,
The mapping table will be discarded at the end of the script run and not use any memory in the final application.
Using a mapping table is by far the best way to manage this type of data cleansing in QlikView and Qlik Sense:
- It is easy to add new corrections and to change the old ones
- The mapping table can be stored separately from the script; in an Excel sheet or even in a database