Combining Two Data Sources Using Data Blending in Analytics


This instruction contains information about how to combine two data sources using data blending in P2P Analytics web edit.

How Data Blending Works

Data blending is a process that allows to use many data sources in your visualizations and calculations at the same time. It leads to greater flexibility when creating reports. For example, using data blending allows you to use an invoice header level data and invoice coding row level data in the same visualization or dashboard.

When blending data, you can build the view either way around, but the outcome will be affected on which level the data is aggregated. The secondary (blended) data source will be able to contribute extra information. The secondary data source's values can be aggregated and applied to the existing view after you have established a relationship by assigning a variable (dimension) that both the primary and secondary data sources have in common.

For example, in the previous screenshot you can find:

All other potential fields are shown with an unlinked sign which can be selectively chosen.

Data blending is similar to left join, but the only difference is that it does not duplicate rows in the primary table if there are many values, row is not multiplied, and it still contains a single row.

From version 17.5 onwards, Basware Analytics provides you an enriched choice of greater flexibility while using all custom fields in coding row only.

To enable this flexibility, two new data sources InvoiceCodingRowCustom and InvoiceCodingRowMain have been added (as shown in the following figure) to replace the older one InvoiceCodingRow. You can fetch coding row level information of all kinds of fields including Text, Date, and Num from InvoiceCodingRowCustom data source. Whereas InvoiceCodingRowMain data source contains main fields like Text1 through Text5 and Date1 through Date5.  

InvoiceCodingRow data source is available in limited format versions 17.5 of Basware Analytics.
It will be phased out in future releases, because it is included to allow migration of existing reports to use new data sources.

Selecting Primary Data Sources

The view uses all rows from the primary data source, and the aggregated rows from the secondary data source, based on the dimension of the linking fields. Dimension values are aggregated and return a single value for all rows in the secondary data source. If there are many values for the rows, an asterisk (*) is shown. To understand the significance of selecting data source, in the following two scenarios the user wishes to see in which account invoices are booked and what are the values of custom field Text_1.

First Scenario

InvoiceCodingRowMain data source is considered as primary.

 

Output would be as in the previous screenshot where you can get information based on Account Name even if there would be many entries in the secondary data source with respective to Account Name. Primary data source determines level of detail on canvas, secondary data source provides only additional information. For example, if there is no entry in primary data source for certain fields, then it would not show data in view.

Second Scenario

InvoiceHeader data source is considered as primary.

 

Output is as shown in the previous screenshot where you will be able to get details based on Invoice Number. Here the asterisk (*) means that there is one-to-many association for Invoice Number and Account Name. To get the desired results, selection of the primary data source is vital which can be noticed by changing the primary data source to be InvoiceCodingRowMain.

For example, using data sources InvoiceHeader and InvoiceCodingRowMain you want to see details (company code, account code, Text_1) of Invoice Number "7".

InvoiceHeader is considered as the primary data source where Account Name is null, as shown in the following screenshot. There is no match against Invoice Number in the secondary data source.

On the other hand, when InvoiceCodingRowMain is considered as the primary data source, then Invoice Number "7" is not shown in the view because Invoice Number is not available in the primary data source.

 

 

How to Determine Linking/Blending Field           

Once the primary and secondary data sources are designated, you must define the common dimension or dimensions between the two data sources. This common dimension is called the linking field. Selection of linking field determines how data is linked and aggregated.

For example, two available data sources are InvoiceHeader and InvoiceCodingRowMain. Both data sources have common dimensions as Creation date, Invoice ID, Tax Code, and Tax percent.

 

Consider Invoice Id as a linking field. Then the output would be like in the following screenshot. Invoice Id is unique in primary data source and the corresponding Coding Row Net Total is aggregated (seen as SUM in the following screenshot).

You must use Invoice ID between Header data source to Coding data source and Coding Row ID between Coding data sources as linking field.

It is always the best practice to limit the data by using filters before bringing any dimension/measure for not loading too many rows on the canvas.

For more information, see http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#multiple_connections.html.