Algorithm for data quality in a data warehouse?

I have seen some authors suggest adding a special dimension called a data quality dimension to describe each facttable-record further.

I have seen some authors suggest adding a special dimension called a data quality dimension to describe each facttable-record further. Typical values in a data quality dimension could then be “Normal value,” “Out-of-bounds value,” “Unlikely value,” “Verified value,” “Unverified value,” and “Uncertain value. €.

I would recommend using a dedicated data quality tool, like DataCleaner (datacleaner.eobjects.org), which I have been doing quite a lot of work on. You need a tool that not only check strict rules like constraints, but also one that will give you a profile of your data and make it easy for you to explore and identify inconsistencies on your own. Try for example the "Pattern finder" which will tell you the patterns of your string values - something that will often reveal the outliers and errornous values.

You can also use the tool for actual cleansing the data, by transforming values, extracting information from them or enriching using third party services. Good luck improving your data quality!

Please be sure to disclose your affiliation with the site you're linking. You're answers are relevant, the links seem useful, but make sure you indicate your affiliation. – Tim Post?14 hours ago @TimPost there is a proposed edit, that adds that.

Presumably by the same person. – Brad Gilbert 9 hours ago.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions