We are SQL Server Experts. Call Us TODAY! +17738095456 or email us at sales@itimpact.com
  • The Team
  • Our Services
    • Dashboards Developers
    • Microsoft SQL Server
    • Microsoft SQL Server hosting for Access
    • SQL Server Integration Services (SSIS)
    • SQL Server Reporting Services (SSRS)
    • SQL Server Data Quality Services (DQS)
    • Microsoft Access
    • Microsoft VB.Net
    • Cross Functional / Business Unit Integration
    • Financial Statement Consolidation
    • Website Design
  • Case Studies
    • Automatic Reporting for Security Services Firm
    • Amazon API Integration with Access
    • Custom Quoting and Proposal Solution
    • Data Quality Analysis for Government Program
    • New Claim System Reduces Processing Time
    • Pipe Tracking Software
    • Product Search Analysis
    • Safety Training, Incident and Accident Tracking Tool
    • Union Dues and Membership Management System
  • Blog
  • Contact
  • The Team
  • Our Services
    • Dashboards Developers
    • Microsoft SQL Server
    • Microsoft SQL Server hosting for Access
    • SQL Server Integration Services (SSIS)
    • SQL Server Reporting Services (SSRS)
    • SQL Server Data Quality Services (DQS)
    • Microsoft Access
    • Microsoft VB.Net
    • Cross Functional / Business Unit Integration
    • Financial Statement Consolidation
    • Website Design
  • Case Studies
    • Automatic Reporting for Security Services Firm
    • Amazon API Integration with Access
    • Custom Quoting and Proposal Solution
    • Data Quality Analysis for Government Program
    • New Claim System Reduces Processing Time
    • Pipe Tracking Software
    • Product Search Analysis
    • Safety Training, Incident and Accident Tracking Tool
    • Union Dues and Membership Management System
  • Blog
  • Contact
Toggle panel Social icons
SQL Server

How to Work Around the Record Status Bug in the DQS Component of SSIS

A few months back while working on a data cleansing project using Data Quality Services (DQS), I encountered the same problem that I later saw described by Greg Low on this blog post and confirmed as a bug here.  It has to do with the Record Status column that is part of the output of the DQS Cleansing component in SSIS; its value is not being set correctly by the component.  In this post I will share a script to work around this bug.  In fact, you may use it to implement your own logic for processing status values.

The DQS Cleansing component takes as input a set of data to cleanse against an existing knowledge base.  As output, it adds a status column for each input column (each called ColumnName_Status) and one column called Record Status that is supposed to summarize all the individual status columns.  As indicated by Mr. Low, the value of this column is supposed to be as follows (evaluated in order):

•    “Invalid”: If any of the individual status columns has “Invalid” as its value
•    “Corrected”: If any of the individual status columns has “Corrected” as its value
•    “New”: If any of the individual status columns has “New” as its value
•    “Correct”: If all of the individual status columns have “Correct” as their value

In my particular case, I was “losing” rows with columns that had New values when all of the other status values in the row were Correct, because the Record Status was – incorrectly –being set to Correct by the DQS component.  This is the example data flow I created to illustrate the problem, using the DQS Data knowledge base that ships with SQL Server:

data flow

Notice the data viewer between the DQS component and the Conditional Split component.  With the sample data I set up, I get this:

data viewer

As you can see, the last row has individual column status values of New and Correct, but the resulting Record Status has a value of Correct instead of New.  My solution was to insert a Script component before the Conditional Split component to examine, for each row, the individual column status values and overwrite the value of Record Status.

The work in the Script component will be done in the PreExecute and Input0_ProcessInputRow methods.  In PreExecute, I used a one-dimension array of the same length as the number of columns that the DQS component outputs; on it, I mark the index of each of the individual status columns.  In Input0_ProcessInputRow, which is called once per row, I examined the status value of each of the individual columns and overwrite the Record Status value of the row if necessary.

First, the class-wide variable declarations:

string[] strInputStatusColumns;
int intNumberOfInputColumns;

In PreExecute, I mark the status columns, identifying them by their suffix.  All of the other cells in the array will keep their default value of null.

IDTSInput100 input = ComponentMetaData.InputCollection[0];
intNumberOfInputColumns = input.InputColumnCollection.Count;
strInputStatusColumns = new string[intNumberOfInputColumns];

for (int x = 0; x < intNumberOfInputColumns; x++)
{
    if (input.InputColumnCollection[x].Name.EndsWith("_Status"))
    {
        strInputStatusColumns[x] = input.InputColumnCollection[x].Name;
    }
}

 Finally, in Input0_ProcessInputRow I implement the desired logic for my situation:

for (int i = 0; i < intNumberOfInputColumns; i++)
{
    if (strInputStatusColumns[i] != null)
    {
        if (inputBuffer[i].ToString() == "New")
        {
            // this is the Record Status column
            inputBuffer[intNumberOfInputColumns - 1] = "New";
        }
    }
}

I am not exactly correcting the DQS bug, for I had a particular interest in New values, but you may implement your own logic armed now with this sample code.

Recent Posts

  • 23 Jan. 2019 7 essential SQL skills that can help you command a higher paycheck

  • 18 Jan. 2018 Quick Guide on Importing Data from SQL Server (or SQL Azure) to Cosmos DB

  • 10 Jan. 2018 Top 7 Questions about Basic Availability Groups

  • 04 Jan. 2018 An introduction to mssql-cli, a command-line client for SQL Server

  • Share and work together on the intranet with SharePoint Team Sites

ITImpact

About IT Impact, Inc.

IT Impact, Inc.

IT Impact, Inc. is a minority-owned software company specializing in custom software solutions. We work with SQL Server, Microsoft Access, web applications, and much more.

Our staff is top of the line and includes two Microsoft MVPs!

Twitter

Tweets by @ITImpactInc

Facebook

© ITImpact. Webdesign by GreatCustomWebsites.com