4 feature requests actually, listed in order of complexity to implement.
The first - rename "Profile column value distributio" to "Profile column value distribution"
The second - ability to profile column on demand. What I mean is, lets say I have a 100 column table of various data types. I run the tool with Profile column value distribution turned off. I see that a column I expected to be 100% distinct is actually 95% distinct. I would like to click on it and click on "profile" and it would give me a column value distribution result.
The third - advanced options on the "Profile column value distribution" feature. Mostly to allow limiting the value count to customizable values. Default is "Top-50". If there are fewer than 50 it works well, but as soon as you have more than a certain number of distinct values in the list (not sure on that number... more than 62, but less than 267), the Column Value Distribution window comes up empty. For example, a table with 565 rows, I have 267 distinct values in a particular column, I get no results in the Column Value Distribution window
The last one - multi-column profile selection for the column value distribution. What I mean is if I had a table like:CREATE TABLE persons ( Gender CHAR(1), FirstName VARCHAR(255), LastName VARCHAR(255))And I am curious to see how many boys are named Sue, I would like to be able to select both Gender and FirstName and have the column value distribution get calculated.
Otherwise, I really like the tool. It is informative and is a quick and easy way for me to see which tables have incorrect datatypes. I found a VARCHAR(255) column that % IsNumeric is 100.00. It is also 100% distinct. I also see some columns created as "int" when they could be bit or tinyint. The tool is very useful and I enjoy it, I just think the above 3 features would make the tool even better. Well, the first is just a typo which doesn't really affect the performance or usefulness of the tool. The other 3 are features that I think would improve the overall usefulness of the tool.
I would really like to see the Column Value Distribution not come up empty if my data has more than a set number of distinct values. I would like it to create the distribution and then allow me to pick how many to show (i.e. what is my top-n). To show no results at all is a really hindrance when I'm looking for outliers in the data.
If you click the gear, on the Profiling Thresholds Property Windows you can adjust the Number of distinct values for reference table to adjust when items are populated in the Column Value Distribution table. A design decision was made to throttle this threshold for very large tables to prevent an increase in processing time it takes to populate this table since the query is very disk and compute intensive.
I see that gear icon and the properties I can configure... this leads to a different problem though. A lot of the properties are being cut off. "Number of records of threshold for anom" for example. And the % signs seem to be cutting off the second O.
What is your DPI setting? That may be interfering with UI. This is what it should look like under normal settings.
That is what I would expect it to look like, but with a screen resolution of 1920x1080 (the recommended for my monitor) and the DPI is set to the default of 100% or 96 pixels per inch. This is on a Windows 7 machine if that makes a difference, but the tool looks like:
Thanks for the details. I have opened a development ticket with the information you have provided.