Merging modelled data sources: Additional options

When you merge two modelled data sources, you do so by choosing a column from each that contains the same data in order to align the two. This is your join key.

Are there discrepancies between the data in the data sources you want to merge? If there are multiple matches or no matches for specific values in your join key, you can align your data using our additional merge options.

What are the additional options?

When using the additional merge options, you first select what to do when multiple matches are found for a value in your join key, then you specify which action to take when no matches are found for a value in your join key.

Your selections dictate how the data will be incorporated into your merged modelled data source.

The following variations are available based on your selections:

If you choose to include only the first match and...

  • Exclude rows that do not match
  • Include rows for unmatched values from the primary data source
  • Add rows for unmatched values from the secondary data source

If you choose to include all matches and...

  • Exclude rows that do not match
  • Include rows for unmatched values from the primary data source
  • Add rows for unmatched values from the secondary data source

How do I access the additional options?

After choosing which modelled data sources to merge, aligning your data by selecting the join key, and adding columns, you can choose to continue to the additional merge options. To access the additional options from the merge wizard, finish making your column selections, then click Next from the Add columns to merged data step.

For general information on merging modelled data sources, go here.

The examples given in this article refer to the merging of these two data sources:

Include only the first match 

In this scenario no new rows are created.

The following options describe what to do when the join key contains multiple instances of the same data and you want to keep only the first match.

For instance, Support Rep Id 1001 in our primary data source has two matches in our secondary data source, the entries associated with Kaushi and the entries associated with Seema. In the examples below, Support Rep ID 1001 is only associated with the name Kaushi and entries associated with the name Seema are omitted from the merged data.

Selecting Include only the first match does not add any new rows to your primary modelled data source.

Option A - Include only the first match and Exclude rows that do not match

For multiple matches, only the first match is kept. If either data source contains entries that do not match the other data source, those rows will be omitted. For example, in the table below, the data from line three of the primary modelled data source and the data from lines two and four of the secondary modelled data source have been omitted. This is because those lines contain data with no matches in the other data source.

 

Option B - Include only the first match and Include rows for unmatched values from the primary data source

For multiple matches, only the first match is kept. If there is an entry in the primary modelled data source for which there are no matches in the secondary data source, the merged data will keep the rows from the primary modelled data source with blank entries for the unmatched data. For example, in the table below the fourth line of data contains blank entries for Name and Favourite Color.

If the secondary data source contains entries with no matches in the primary data source, those rows are omitted. In the example below, the data from line two and line four of the secondary modelled data source has been omitted.

 

Option C - Include only the first match and Add rows for unmatched values from the secondary data source

For multiple matches, only the first match is kept. As in the previous scenario, if the primary modelled data source does not have a matching entry in the secondary data source, the merged data will keep the rows from the primary modelled data source with blank entries for the unmatched data. See the fourth row of data in the example below.

If the secondary data source does not have a matching entry in the primary source modelled data source, the merged data will keep the rows from the second data source with a blank entries. See the eighth and ninth rows in the example below.

 

Include all matches

In this scenario new rows are created for each match.

The following options describe what to do when the join key contains multiple instances of the same data and you want to keep all matches. Selecting Include all matches creates a new row for each match. For example, the primary modeled data source contains two entries for Sales Rep ID 1001 (See above). The first instance is associated with the name Kaushi and the second is associated with the name Seema. Selecting Include all matches results in both displaying in the merged data.

Option A - Include all matches and Exclude rows that do not match

For multiple matches, each match is included. If either data source contains entries that do not match the other data source, those rows will be omitted. For example, in the table below, the data from line four of the primary modelled data source and the data from line two and line four of the secondary modelled data source has been omitted because those lines contain data with no matches in the other data source.

 

Option B - Include all matches and Include rows for unmatched values from the primary data source

For multiple matches, each match is included. If the primary modelled data source contains entries that don’t have matches in the secondary data source, the merged data will keep the unmatched rows from the primary modelled data source with blank entries. If the secondary data source contains entries with no matches in the primary data source, those rows are omitted. For example, in the table below the fifth line of data contains blank entries for Name and Favourite Color. The data from line two and line four of the secondary modelled data source has been omitted.

 

Option C - Include all matches and Add rows for unmatched values from the secondary data source

For multiple matches, each match is included. As in the previous example, if the primary modelled data source contains entries that don’t have matches in the secondary data source, the merged data will keep the unmatched rows from the base modelled data source with blank entries. You can see this in the fifth line of data in the example below.

If the second data source contains entries that don’t have matches in the primary modelled data source, the merged data will keep the unmatched rows from the secondary data source with blank entries. You can see this in the eleventh and twelfth lines of data in the example below.

Learn more:

Have more questions? Submit a request