SSIS – Connecting to Access 97 or Excel 2010

You would like to include data from Access 97 (because lets face it, they are still around) and/or Excel 2010 in your SSIS package.  Both methods are supported by installing the Microsoft Access Database Engine redistributable.   This driver needs to be installed on your SQL Server and facilitates the transfer of data between existing Microsoft Office files such as Microsoft Office Access 2010 down to 97 (*.mdb and *.accdb) files and Microsoft Office Excel 2010 (*.xls, *.xlsx, and *.xlsb) files to other data sources such as Microsoft SQL Server. 

To install, follow the download link and select either the 64 or 32 bit depending on your server operating system.

http://www.microsoft.com/en-us/download/details.aspx?id=13255

Note, before installing, make sure you don’t install both versions on the same server otherwise you may run into this issue.

http://support.microsoft.com/kb/2269468

Once installed, add a linked server to your Access database then attempt a SELECT from this database using OPENQUERY as follows.

EXEC sp_addlinkedserver
 @server='access97',
 @provider='Microsoft.ACE.OLEDB.12.0',
 @srvproduct='OLE DB Provider for ACE',
 @datasrc='\\Server\fileshare\Access97.mdb'
SELECT * FROM OPENQUERY(access97, 'select * from table')

The following config may be required if you get an error message after attempting to SELECT above.

EXEC master.dbo.sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1
EXEC master.dbo.sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1

Use a similar method to connect to Excel 2010.  Add the linked server then you can faciliate data transfer using OPENQUERY.

EXEC sp_addlinkedserver
    @server = 'excel2010',
    @srvproduct = 'Excel',
    @provider = 'Microsoft.ACE.OLEDB.12.0',
    @datasrc = 'C:\Temp\ExcelWorkbook.xlsx',
    @provstr = 'Excel 12.0;IMEX=1;HDR=YES;'
   
SELECT * FROM OPENQUERY(excel2010, 'SELECT * FROM [Sheet1$]')

Happy extracting!

Fuzzy Match your Data without Enterprise Edition

You have a need to match unsanitised data whether it be person names or transaction descriptions, to a list of sanitised reference data for easier analysis. Your transaction data column will only partially match on your reference data column thus the word “Fuzzy”. You only have standard edition of SQL Server so Lets talk Excel goodness.

First lets go back 12 years or so when Excel was a dirty word…proliferation of Excel spreadsheets, multiple versions of the truth throughout the organisation and a business rules tier that was hidden on someone’s hard drive with no backup or business continuity process. Now Excel is the comeback kid in the red corner of the ring for managed Business Intelligence.

To tackle the issue of fuzzy data matching, we might normally utilise the SQL Server Integration Services SSIS Fuzzy Lookup task to achieve this. It will work nicely, except when you don’t have Enterprise or Developer editions (Fuzzy components don’t even come with the new Business intelligence edition).

We turn our hand at Excel (the future of business friendly data integration and analysis) and utilise the fuzzy lookup add-in. Its free
http://www.microsoft.com/en-us/download/details.aspx?id=15011 and easier to use than the SSIS alternative.

Simply download the plugin, run the setup, then next time you open Excel 2010 you will get a prompt to install which add a Fuzzy Lookup tab to play with.

Your worksheet should have at least two worksheets to begin with, a list of dirty unsanitised data, and a list of clean reference data to match against. In fuzzy lookup terminology these two tables will be referred to from now on as the Left (dirty) and the Right (reference) table respectively.

Select a new (third) worksheet, then select the Fuzzy Lookup tab and click the Fuzzy Lookup button at top left of screen. This will launch a Fuzzy Lookup workspace where you can create your transformation.

You now need to tell the Fuzzy lookup engine what a partial match will look like and how it will behave. Clear the existing Match columns, then select the column in the Left Columns list you want to match or cleanse against to the column in the Right Columns list (your reference table). For name matching this could be more than 1 column however this example assumes only a one column match. Select the column from your clean reference worksheet in the Right Columns box.

Move the Similarity Threshold Slider to the desired match score which is how strict you the match to be (1 is the strictest i.e. exact match), less than 1 is a partial match. Select the number of matches you want to return (default is 1).

Adjust the Output columns as required which are the combination of columns from your two tables, that will be returned in the new third worksheet.

Once the fuzzy transformation completes you can use this data for further analysis in your Powerpivot database. Convert your new Fuzzy output data to an Excel table (Ctrl+L) then you will be able to create a Linked table into Powerpivot where the data can be further analysed.

The Fuzzy lookup add in for Excel provides a free means of performing fuzzy lookup transformations on data. This will work well for manual executions, however should you require a seamless automated mechanism, you may want to look at SSIS components (either the Fuzzy components or third party SSIS add-ons). The Fuzzy lookup add-in will also support several match criteria.

Enough of the Hand Rolling – Easy Stored Procedures for your SELECTs

The problem with the I.T. industry in general, is everyone still wants to hand roll.

Here is a handy helper block supplied by Microsoft to help you create stored procedures for your tables.   The only requirement is you need to create the helper functions and procedures in the same database as your tables and your tables must have a primary key.  The following article works through how to use this helper code to create SELECT stored procedures for your tables.

For a start you will need to download and install the helper functions and stored procedures from MSDN Hand Roll Quit Kit.  Be sure to create the functions first, followed by the stored procedures.

Now, to generate a SELECT stored procedure for your table as output but not execute (my sample table is called “FactError”) …

EXEC pr__SYS_MakeSelectRecordProc 'FactError', 0

…and to output the script code but create the stored procedure in the same go …

EXEC pr__SYS_MakeSelectRecordProc 'FactError', 1

The output of the script for my FactError demo table produced the following code.  You can then go and modify the procedure as required to pull back any additional columns and joins, like your dimension tables.

IF EXISTS(SELECT * FROM sysobjects WHERE name = 'prApp_FactError_Select')

       DROP PROC prApp_FactError_Select

GO

----------------------------------------------------------------------------

-- Select a single record from FactError

----------------------------------------------------------------------------

CREATE PROC prApp_FactError_Select

       @FactErrorId int,

       @Error_Id int,

       @Date_Id int,

       @Time_Id int,

       @Process_Id int,

       @Server_Id int

AS

SELECT FactErrorId,

       Error_Id,

       Date_Id,

       Time_Id,

       Process_Id,

       Server_Id,

       ErrorCount

FROM   FactError

WHERE FactErrorId = @FactErrorId

 AND   Error_Id = @Error_Id

 AND   Date_Id = @Date_Id

 AND   Time_Id = @Time_Id

 AND   Process_Id = @Process_Id

 AND   Server_Id = @Server_Id

GO

As an added bonus the helper procedures and functions can also create UPDATE, INSERT and DELETE procedures however just the SELECT is good enough for the purposes of business intelligence.

Part 1 : Xray Vision for SQL Server Error Logs

You could be forgiven for not looking at your error logs….but only once….after all, look what happened to Krypton!

In this multi part series, we walk through how to transform the volumes of mundane SQL Error logs for your entire SQL Server fleet, into insightful, integrated and simplified 30,000 foot view dashboards.  Consolidating your error logs and creating Powerful Centralised Dashboards on a common Sharepoint 2010 platform then creates other flow on benefits, such as delegating those monitoring tasks to someone else – like the helpdesk !

In this first part of a multi part series we consolidate the error logs for your entire SQL Server fleet into one table.   I decided not to reinvent the wheel here….after all, Rodney Landrum has already done an outstanding job of creating a SSIS package for you already - download the source code here.

Rodney also has published many other useful methods of monitoring your SQL Server databases and an excellent Ebook to download called SQL Server Tacklebox full of work smarter not harder solutions, but enough of the Rodney plug…

Skills required for this walkthrough are Database and SSIS.   If you are new to business intelligence then this will be a perfect end to end learning tool, allowing you to apply these techniques to data you are very familiar with….SQL Server Error Logs !  Lets move on with staging and consolidating the data we are interested in.

1. To get started, download the original package here and extract the solution files to a common folder where you will be able to add additional solutions from this series later.   If you are having trouble with downloading Rodney’s package just Download All My Part 1 Files Here containing the SSIS solution, the specific database objects required and the entire database backup.  Whilst SQL 2012 now released I have used SQL 2008 R2 versions of files for better compatibility.

2. Create yourself a database that will contain your consolidated error logs, then create the database objects required.  You can either create the database objects manually or just restore the starter database both of which are available in my Part 1 Download Link.

4. Populate the SSIS_ServerList configuration table to include one row for each server you want to manage. 

As an example you could build a series of insert statements

INSERT dbo.ServerList_SSIS

(Server, Connect, DMZ)

VALUES

('YourSQLInstanceName',1,0)

5. Open the SSIS solution and update the Connection Manager named “DBA_Rep” to reflect the correct connection string of your new database.

6. Set the Package Protection Level property accordingly (e.g. don’t save sensitive) then save the package.

7. Create a SQL Server Agent Job that runs this package.  This will require adding a job step that runs the SSIS package either from file location or MSDB database (whatever deployment method you choose for the package).  If unsure just go for the file location option.  A good general reference for running SSIS packages through SQL Server Agent jobs is outlined here.

8. Schedule the package to run once daily (say, 7am).  This will capture the overnight logs which will set us up well for Part 2 of this series.

9. Run the package and make sure the package runs successfully.  Note –  before we move onto Part 2 of the series, you need to ensure there are rows in the consolidated error log table for each of the servers you configured in the ServerList_SSIS table. 

 SELECT Server, COUNT(1) AS CountErrorLogRows

FROM SQL_ErrorLog

GROUP BY Server

ORDER BY Server

In Part 2 of this Xray Vision for SQL Server Error Logs series - we use a second SSIS package to funnel our consolidated error logs into a predesigned Kimball Star Schema data model. Stay tuned !

If you have any quesitons please Get in Touch on Linked In.

Scripting Data From Excel

You have a few columns in Excel you would like to import into a SQL Server table as a one off job. You don’t have Powerpivot installed, which would be ideal to import custom reference data into a repository where is can be joined to other data.

By creating a calculated column, then entering the following formula (using starter cells A4, B4, C4, D4, E4, F4 as an example), you will have the DML to perform an insert into a SQL table called TableName. Cells, A,C,D are text, B is a date format but will need to be formatted as text and therefore bound by single quotes whilst E-F are currency hence do not have the single quotes.


=CONCATENATE("INSERT dbo.TableName VALUES ('",A4,"',","'",TEXT(B4,"dd/mm/yyyy"),"',","'",C4,"',",E4,",",F4,",'",D4,"'",")")

Note the expression used to format the date column i.e.

TEXT(B4,"dd/mm/yyyy")

Now to copy the formula to remaining cells, copy the cell where the new formula was just created, right click on remaining cells, paste special and formulas only.   You can then copy the values in all calculated cells to management studio where  the code can be executed as SQL Server DML statements (looking something like the following) ..

INSERT dbo.TableName VALUES ('String1','String2','String3',0.2)
INSERT dbo.TableName VALUES ('String1','String2','String3',0.25)
INSERT dbo.TableName VALUES ('String1','String2','String3',0.4)
INSERT dbo.TableName VALUES ('String1','String2','String3',0.3)

Convert string with $ and , 000’s separator to a number

You have imported some data from a file or Sharepoint list and the numbers have come across as strings, in their display format.  Unfortunately for you this means if you want to perform some arithmetic or sorting on this column, you will need to convert to a number and remove the formatting characters like $ , -.

If you do try to perform some arithmetic on this column directly, you will most likely get the error … “Error converting data type varchar….”.

Now there is nothing like a clean datamart to store this data, however the following example demonstrates how to perform this function in an adhoc query.   The code sample searches the varchar column aptly named “Varchar Actual ($)” and replaces any symbols containing the characters $ , – and the space character with empty string…allowing the string to be successfully cast to a number…


CAST(REPLACE(REPLACE(REPLACE(RTRIM(LTRIM

([Varchar Actual ($)])),'$',''),',',''),'-','') as float)

AS [Float - Actual ($)]

If some of your values contain valid negative you will of course need to preserve the leading “-” character.

Reference …

Social MSDN Thread

Cast and Convert

Cast and Convert Basics

Some REPLACE basics with Bru ! Have a Cuppa with Bru

Sigh…You get “A duplicate attribute key has been found when processing ..” error message when processing a dimension …

Sigh…is the sound you make, when you get “A duplicate attribute key has been found when processing ..” error message when processing a dimension.  Most of the time you can quickly get to the bottom of these errors.  In this case you have checked the attributes participating in a hierarchy so that there is a 1 to many between each level running top to bottom.   You have also read the following reference kb2002757 and are back to sighing again.

To make matters worse the error is in relation to an attribute that is not part of the hierarchy you created.  You have checked and you don’t have any null values.

One of the conditions that can contribute to this error is when you have trailing special characters in the offending column such as char(13) or char(10).   The trimming property of the key column for the failing attribute is set to RIGHT by default.   So you will need to either trim the special characters out of your database, ETL or through the DSV.  Setting the trimming property to NONE may get you over the line in some circumstances but this is not foolproof and not a good long term strategy.

Bottom line – check your data for char(13) and char(10) characters, perhaps test the length of values in the offending column to make the problem easier to visualise.


SELECT col1, LEN(col1) AS Col1Length

FROM TableName WHERE col1 LIKE 'Value%' ORDER BY col1 

Additional reading …

Official BI Twibe Blog

Alter a column’s datatype in your database

Ideally we should set the datatype for the columns in our database correctly from the start of the project. However over time, you may find yourself facing a requirement to change the data type for one of the columns in your database.   If your negotiation skills fail to convince the stakeholder party otherwise, well I guess you need to change it.  To change the datatype for a column in your database use the following as a guide.  For example to change the column called ColumnName from varchar(7) null to varchar(10) not null …


ALTER TABLE dbo.TableName ALTER COLUMN ColumnName varchar(10) not null

But wait, there’s more !  To determine any other varchar(7) columns that could potentially be impacted you could issue a query against your database like the following, then apply similar code above to change those columns …


SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, IS_NULLABLE, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
 FROM INFORMATION_SCHEMA.COLUMNS
 WHERE DATA_TYPE = 'varchar' and CHARACTER_MAXIMUM_LENGTH = 7

But wait, there’s still more !  If this particular column is involved in any SSIS packages you will need to refactor those packages.   If your SSIS package is stored in the MSDB database you may have some luck performing a SSIS Package Object Search, otherwise there are various 3rd party tools you can use to scan through your SSIS package and allow object searching.  I touched on this in the blog post PopBI DTS SSIS Object Search - there are many tools to facilitate this.

For any cube impacts you can install and then utilise the BIDS helper (Data Type Discrepancy Check).

For subtle data type changes such as the above example, your Reporting Services reports may still run without  error, however you may need to adjust the width of a column for example.

Ok now you can have the steak knives …