T-SQL Tuesday #114: A Puzzle

One of the main things a new cribbage player needs to learn is how to easily spot the combos that make ‘a 15’ (the ways to combine cards to add up to a numeric value of 15). Let’s do that with SQL!

It’s that time again! The 2nd Tuesday of the month, T-SQL Tuesday. This month’s invitation is on the lighter side, which is nice, and it comes from Matthew McGiffen (b | t). The theme is “Puzzle Party!” And I’m going to cheat, since it’s getting horribly late already and I’m lacking in inspiration.

So, I propose a puzzle! Which you must solve using SQL. Then I’ll post my own solution in a day or two. Bwahahaha.

I actually really wanted to do a Sudoku solver, but @SQLRnnr beat me to it. By a few years. =P   I might still work on that when I’m bored, just to have a standby for another blog post. Maybe we’ll compare notes.

But for now…

Do You Even Cribbage, Bro?

If you’ve never heard of the card game cribbage, it might sound weird. When you read the rules, it sounds even weirder. Legend has it that it was invented by drunk Englishmen in a pub. Reality is actually not that far off. It’s also heavily played by Navy submariners, and that’s how it was passed down in my family.

There are already many great mobile & web versions of the game, and it will quickly become obvious to anyone who’s tried to program a card game before, that a query language like T-SQL is NOT suited (omg see what I did there?) to the task. However, we can probably come up with a small sub-task of the game that’s acceptable for our purposes.

Enter: the hand scorer. There’s a nice example of a finished product here. The input would be a set of 5 ‘cards’ — the ‘hand’ has 4, and the ‘cut’ adds 1 more, used as part of each player’s hand in scoring (like community property). A ‘card’ is simply an alphanumeric value — 1-10 plus JQK (which are ‘worth’ 10 for arithmetic, but can be used like normal for ‘straights’ aka ‘runs’) — and a ‘suit’ (heart, spade, diamond, club). Think for a moment on how you’d store that as a data structure.

The output, then, is a single numeric value, the ‘score’. But how do you score? You look for the following: combinations of any numeric values that add up to 15; pairs, 3-of-a-kinds, or 4-of-a-kinds; straights (suit does not matter); a flush, if all 4 ‘hand’ cards are the same suit (and a bonus point if the ‘cut’ card matches as well). And then there’s a funky thing where you get an extra point if you have a Jack that matches the suite of the ‘cut’ card. o_@

Dude… What?

Wow, that sounds complicated, no? Let’s make it simpler. One of the main things a new cribbage player needs to learn is how to easily spot the combos that make ‘a 15′ (the ways to combine cards to add up to a numeric value of 15). For each ’15’ you make, you score 2 points. That sounds pretty feasible in SQL, right?

For starters, we don’t really care about suit anymore. But we do need some way to distinguish the cards from each other. This is a single-deck game, so you’re never going to have more than 4 of the same number; never more than one of the same card (like the Ace of Spaces). And when you’re counting combinations (or is it permutations?), you can’t use the same card twice. So let’s still use the suits for card distinction; I’ll just suffix the number with an ‘h’, ‘s’, ‘d’, or ‘c’.

We also don’t care about differentiating a 10 or J/Q/K, since they’re all just worth 10, numerically. So your ‘input’ can just consist of five numbers between 1 and 10. Cool? Just find the ’15’s!

Example:

  • Your hand is 3h, 6s, 6d, 9c, and the ‘cut’ is 3c.
  • Combos for ’15’: 6s+9c, 6d+9c, 3h+3c+9c, 3h+6s+6d, 3c+6s+6d.

That’s five unique combos, for a total of 10 points! Good job, that’s a bit better than average hand. In cribbage lingo, you’d say it like so: “fifteen two, fifteen four, fifteen six, fifteen eight, and fifteen ten.” Or if you’re playing with more experience, you’d abbreviate to simply “two four six eight ten”.

In “normal” programming land, we’d probably use a loop and some branching logic. What will we do in SQL? A loop, a cursor, or something more (or less!) elegant? You decide!

I’ll come up with something solution-y soon. Update: Solution posted! Enjoy! ❤

cribbage board close-up of winning peg and partial hand
Red won by 2 points! Close game.

Quickie: SSRS Sort Descending

Let’s make it real simple for the end-user to get the ‘best’ behavior by default. Let’s sort by our ‘Revenue’ column in DEscending order first.

You ever have a paginated report, i.e. a standard old-fashioned report with a big table (tablix, matrix), where you want the users to be able to sort by any column they want? And let’s say one such column is ‘Revenue’ — some money amount where the “biggest is the most important”. Like your top customer, for example.

Now, you know about ‘Interactive Sorting’ right? No?!? Oh it’s awesome! Let me remind you. It’s super easy to configure: just get to the Text Box Properties of the header box of whatever column you want to use, & go to the ‘Interactive Sorting’ tab. Like so:

ssrs interactive sorting property
Simple example — we just have ‘Details’ rows, no other row-groups or hierarchies.

Make sure you pick the appropriate Field in the Dataset to “Sort by”. Hint: it’s usually the one you’re displaying in that column!

But you know what’s kinda annoying? You can’t dictate a “first sort direction” — it just assumes that the first time you click the sort-arrows, you want ‘Ascending’ (lowest first). Then you can switch to ‘DEscending’ (highest first). This makes perfect sense for alpha values (strings), but not always for numeric values — at least not when you’re dealing with money, when generally the highest dollar amount is the most important!

So let’s make it real simple for the end-user to get the ‘best’ behavior by default. Let’s try to make it sort by our ‘Revenue’ column in DEscending order first. Ready?

ssrs interactive sort expression negative revenue
Set the ‘Sort by’ expression to the NEGATIVE of the field value! Simple as that.

It’s so small in that picture you might miss it. I’m doing this, but I’ll add whitespace below for easier reading:

= - Fields!Revenue.Value
..^look, there's the minus sign!
(assuming your viewing device hasn't mangled the fixed-width text)

You could also get more pedantic/readable and spell-it-out like so:

=(-1)*(Fields!Revenue.Value)

Right? Beautiful. So what does it look like on the report?

ssrs interactive sort by revenue descending
I clicked it! And it sorted highest-first just like I wanted!

Notice my silly hyphenated column names, like ‘Custom-er No’ and ‘Rev-enue’. That’s because those sort-icons take up space within the textbox, and you can’t control it, so the text wraps in a non-obvious manner. Thus, I purposefully added a hyphen and a line-break to the text, so I could control how they looked, instead of leaving it to the wiles of the SSRS render-er.

That’s all there is to it! Happy reporting. =)

 

The EAVil Cycle, Part 2

In which we discuss more about the EAV model and some of its merits and pitfalls.

continued from last week…

The Ugly (aka the “Wow really?!?”)

You’ll see this ‘creep’ even in product-catalog systems as mature as Amazon’s. If you search for (geeky as we are) graphics cards, and use the product attribute filters in the left pane to narrow it down, you’ll find that some correctly have their memory type (GDDR5, etc.) listed, while others may not. If you’re really unfortunate, there will be two semi-redundant attribute-sets that you’ll have to “juggle” between to really get at what you want. TVs, for example, may see both an “HDR support” (yes/no) and an “HDR type” (standard, ultra, etc.) — I’m kinda pulling those out of my arse for example’s sake, but you get the point.

Why does this happen? Because at some level, humans are still responsible for ‘tagging’ those products (for lack of better word). And as much encouragement and guidance as the ‘admin software’ may give them, they can (and do) still decide at times to side-step that guidance and say “Nope, I know better; make me this new thing!”

But isn’t that a problem with nearly all data-driven systems? Sure, of course it is. Yet with a model as flexible as EAV, the problem is intensely magnified by the fact that it’s made so easy to do — to ‘extend’.

so preoccupied with whether or not you could, you didn't stop to think if you should
It’s probably not the exact quote, for you pedants, but it’s close enough for government blog work.

And unfortunately, the biggest contributor to this problem is the lack of development-time and forethought given to the administration, or management, of the data. You see, this problem could be largely assuaged if the admin-toolset were the FIRST thought and priority in the roadmap. But so often, that thought comes LAST, if at all. So sure, your product feature tagging system looks great, it’s flexible and your customers love it. But you’re throwing tickets over the wall to your data team every time a requirement or use-case changes, or when you need to fix a data-quality problem caused by the users not knowing there was already a “Widget Type” before creating their new “Widget Kind” tag, or misspelling “Is Wierd” because English is weird and has more exceptions to the “I before E” rule than not.

Does this problem go away with a dedicated search-index or NoSQL technology like Elasticsearch or (shudder) MongoDB? Of course not! If anything, it may be worse. Maybe. But wait, those systems make it easier to de-dupe and manage redundancy & data quality, don’t they? Probably. I can’t speak from experience myself, but I’ve heard good things. Once again, it all comes down to the effort you’re willing to invest in the system. If you make data quality a priority, you’ll be happier with the experience. If you don’t, well you’re just another amateur data scientist complaining about dirty non-standardized/non-validated address fields, aren’t ya?  =P

I joke with the data scientists, of course. What they do is awesome. They just tend to often re-invent the wheel of data-cleansing/data-wrangling that we DBAs have been doing for a few decades, because they didn’t know the right questions to ask or the right place to look. We need to get better at working together WITH them, not ‘for’ or ‘against’ them.

ninja cat riding a unicorn with laser-eyes
How the data scientists see themselves…

The Why or When (aka “Is it a decent model for this?”)

The long-story-short version is, consider your business and your data. Try to plan for the future, and anticipate potential changes and growth. It’s not easy, and we never “get it right the first time”. But we can try.

When your attributes are fairly static, and you know that you can tightly control them, you might consider a more rigid model. Something with a handful of lookup tables referenced by the main product entity. This is advantageous for performance and management, at the expense of scalability and extensibility.

When you literally need to support on-the-fly extension, and you’re absolutely married to SQL (i.e. not ready to venture out into NoSQL land just yet), the EAV model may fit the bill. Aaron’s article, and the comments therein, present some fairly valid and reasonable implementation suggestions to make it a little more palatable. Just beware the date — that was written back in 2009. Before we had such things as Elasticsearch and its ilk. I’d heavily encourage the consideration of purpose-built data-stores for this sort of thing, if you have any hope of scaling-out.

Other tools in your toolbox can help with this, too. For example, consider an in-memory data-grid for super-fast reads. The vast majority of data-access to these attributes & values is going to be reading, using it to filter & slice & dice a data-set. You can pay the small performance cost (e.g. write to the underlying SQL database) on the rare occasion when a write/change needs to occur.

In Conclusion

Proving the age-old rule of “Just because you CAN, doesn’t mean you SHOULD”, the EAV model is sometimes okay and sometimes not. You need to understand your business and your data to make that call. And you need to consider the magnitude of effort that may be involved in pivoting from one model to another. Unfortunately, in many cases, that part overshadows the rest, and the show business must go on.

queen the show must go on
You’re welcome again, ears.

Still, I encourage you to at least think about it, and be ready with that knowledge of pros/cons when the time is right to discuss it with stakeholders.

The EAVil Cycle

In which we discuss the EAV model and some of its merits and pitfalls.

EAV, or Entity-Attribute-Value, is an data model that’s been around the block. It’s typically injected into a relational database at some point during the overall application/architecture life-cycle, somewhere between when the team realizes that they’ve got way too many lookup tables for a “main business entity” thing, and when they finally make the shift into polyglot data stores.

Wake me up when that actually happens, successfully.

I’m not going to rehash aging internet arguments here, nor bore you with replicated diagrams that you could just as easily look up on Google Images. No, instead, I’m going to tell you why this model is not great, why it’s not bad, and how you should go about deciding if it’s not wrong for you.

Yes, I did negate all those adjectives on purpose. Mostly because nobody really enjoys working with these structures, regardless of how they’re implemented; customers and business stakeholders are ALWAYS CHANGING the requirements and the attributes in question. But, we press on.

Refs:

PS: Postgres is looking really freakin’ cool these days.

the good the bad and the ugly
Another Clint Eastwood pic… apologies if you’re not a fan. =P

The Good (aka “Not Bad”)

Proponents tell us that this model is easily searchable and easy to administer. The “searchable” bit is true; especially when you have the attribute-values pre-defined and don’t rely on end-user text-entry. But that’s true of basically any data model. The key here is that all attribute-values are effectively in one “search index”. But wait, don’t we have purpose-built search indexes nowadays? (Hint: see Elasticsearch.) This will come up again later.

Administerable? Administrable? Administratable? Damn you English! Anyway. Yes, again, if you’re fairly confident in your business users’ ability to effectively track and de-dupe (de-duplicate) incoming requirements/requests using their own brains/eyeballs and the admin tool-set that you build for them.

Oh, right, did I mention that? You have to build the admin app. Because you do NOT want to be writing ad-hoc SQL queries every time a new attribute requirement comes in. (Still, it’s better than making schema changes for new req’s, as I’ll discuss in a bit.)

Mainly, though, the biggest ‘pro’ of this model is that your business requirements, i.e. your attributes and the values they’re allowed to contain, can be flexible. The model allows a theoretically infinite amount of customization to suit your needs; though in practice, as Allen writes in the CodingBlocks article, you do run up against some pretty hard scalability hurdles right-quick. So in practice, you might want to consider more horizontally-scalable data stores, or (God help you) try scaling-out your SQL databases. (Spoiler-alert: big money big money!)

shut up and take my money
Millions of query-bucks…

The Bad (aka the “Not Great”)

Which brings me to the first ‘con’. Performance. If you’re modeling this in a relational DB, and you expect it to scale well, you’re probably overly optimistic. Or very small. (If the latter, great! But you don’t always want to be small, right? Right!)

Don’t get me wrong; you can make it work half-decent with good indexing and sufficient layers of abstraction (i.e. don’t write a “kitchen-sink view” that’s responsible for pivoting/piecing-together all the attributes for a product straight outta SQL). But again, is it really the right tool for the job?

Momentary digression. SQL Server, or more generally, the relational database, has long been touted as the “Swiss army knife” of IT; we’ve thrown it at so many problems of different size and shape, that we’ve almost lost track of what it’s actually very GOOD at. Hint: it’s about relationships and normalization.

Another argument against it seems to be data integrity and enforcement. I can understand that, but again, with some clever software overlay and user-guidance, this can become almost a non-issue. But remember, your developers are the ones building said software. So that effort needs to be considered.

The Ugly (to be continued…)

The biggest problem, and quite a legit one, is ‘creep’ — both scope and feature. See, the inherent flexibility in the model will almost encourage data managers to be less careful and considerate when deciding when to add an attribute or value-set, and how to govern the data-set as a whole.

creep wish i was special
No, not THAT creep. But, you’re welcome ears.

Stay tuned for more…

T-SQL Tuesday #113: Personal-Use Databases

So when I dived down the rabbit-hole of the Nested Set Model, of course I created a sample database to write & test the code against.

tsql2sday150x150It’s that time again! This month, Todd Kleinhans (b/t) asks us how we use databases in our day to day life, i.e. personal use or “outside of our work / day-job”. Actually, the question is kinda vague — because if you think about it, we all use TONS of databases in our daily lives. Your phone’s contact list, your calendar, online shopping, banking.. the list goes on. As I’ve said before, Data is everything.

But what I think he meant, and the way most of the community has interpreted it, is “How do you manage/administrate/build/work-with/develop databases in your day-to-day life outside of work?”. So we’ll go with that.

Now this may out me as “not a real DBA” or some such nonsense, but honestly.. I don’t spend much of my time creating silly playground databases. Not that anybody else’s are ‘silly’ — just look at some of the fantastic posts for this month! Such neat ideas brought to life.

Special shout-out to Kenneth Fisher, who, if you look closely at his screenshot (and it’s not even related to this post), committed the abhorrent sin of creating a database name of pure emojis — FOR SHAME sir! But also you’re awesome. ❤

Me, I’m more of a quick-n-dirty spreadsheet guy. If I need, say, an inventory of my computer parts & gadgets so I know what I can & can’t repair, what materials I have to work with as I tinker, etc.. well, I create a Google Sheet. And it serves my needs well enough. (Spoiler alert: yes, you can view that one; I shared it. But it’s fairly outdated since I moved in March and haven’t had time to re-do inventory since.. last autumn.)

But for blogging in the tech field, you gotta get your hands dirty. So when I dived down the rabbit-hole of the Nested Set Modelof course I created a sample database to write & test the code against. And there have been some additional bits & pieces for blog demos and GitHub samples.

Most of the time, I’m creating databases / entities on SQL 2016 Developer Edition. Of course by now, that’s 2 major versions ‘behind’, but since I don’t run Linux personally (yet?), and I’m not a conference speaker (yet??), I don’t feel a burning need to upgrade. It’s FAR superior to Express Edition, though, so please for the love of all that is holy, if you find yourself using Express for personal/playground use, save yourself the headache and go grab Developer.

Containers/Docker? Meh. If you want to start playing with those, definitely look at 2017 or higher. It sounds appealing in theory — “just spin it up when you need it, spin it down when you don’t!” — and I’m sure that’s great if you’re starved for resources on whatever laptop you’re working with, but if you’ve done your due diligence and set your local SQL instance to appropriate resource limitations (hello, ‘max server memory’ and file-growths!), I’ve found that its impact is quite tolerable.

But come now. Surely this isn’t just a “shameless self-promotion” post or a grumpy-old-DBA “get off my lawn” post. Right?? Right!

To you folks out there creating your own nifty little databases for personal projects, learning/development, or even hopes & dreams of building a killer app on top of it one day — you’re amazing! Keep doing what you do, and write about it, because I love reading about it. Heck, go try to create the same model in PostgreSQL or MariaDB and see how it goes. We could all use a little cross-stack exposure once in a while.

That’s all I have for this month; short & sweet. I need to finalize plans for virtualizing our main SQL instances (which is really just a migration off bare-metal & onto VMs) within the coming weeks. Yes, we’re that far behind the curve. Now get off my lawn!

=P

clint eastwood frowning angrily
I’m old and racist! But I’m still adorable for some reason!

5 Things I Learned at SQLSaturday

Go find a SQL Saturday near you, at sqlsaturday.com!

The weekend before last, I attended my 4th #SQLSaturday event; my 2nd in San Diego CA (the others were Orange County CA, which is equally fantastic, and a little closer to work, though about the same distance from home). If you haven’t heard of or been to one, check out the home page and find one coming to a city near you! They’re fabulous FREE training events for the MS data platform, including but certainly not limited to SQL Server. For example, you’ll almost always find Kevin Feasel talking about aRrr or Kafka or Hadoop.

Did I mention free?

So I thought I’d share a few things that I learned this time!

The LinkedIn app’s Killer Feature

Did you know? The LinkedIn app has a “find nearby” feature that uses magic your phone’s various radios to instantly connect you with a fellow user who has the app open near you. It’s awesome! Now you don’t even have to look up from your convention coffee and security-blanket (phone) to network — just fire up the app, go to the People tab, hit “Find Nearby”, and commence trolling. =P

No, that’s horrible; be a normal human and talk to people. The tech is just there to help make the post-conversation connection.

linked-in find-nearby button

Storage Myths Busted

This was an interesting and even slightly entertaining session presented by Max @ SQLHA. One analogy that really stood out to me was this:

SANs have become a bit like the printer industry — You don’t pay a lot for the enclosure, the device itself, i.e. the SAN box & software; but you pay through the nose for ‘refills’, i.e. the drives that your SAN vendor gods deem worthy of their enclosure.

It’s frighteningly accurate. Ask your storage admin what it costs to add a single drive (or pair of drives, if you’re using something with built-in redundancy) to your SAN. Then compare that cost with the same exact drive off the retail market. It’s highway robbery. And we’re letting them get away with it because we can’t evolve fast enough to take advantage of storage virtualization tech (S2D, SOFS, RDMA) that effectively makes servers with locally attached SSDs a superior architecture. (As long as they’re not using a horribly outdated interface like SAS!)

Data Protection and Privacy is Hard

But completely necessary. We all need to become security practitioners to some extent. Even if it just means raising and documenting our concerns to our bosses. The great state of California has even jumped on the bandwagon with its very own privacy act. Still in the early stages, yet on the heels of GDPR, it can only mean that there will be more to come.

A few concrete action items from this, which should be “fairly simple” (with a big ol’ asterisk that says “depending on your organization and your change-management process”).

what if i told you you don't need 'sa'
For anything. Ever.

  1. At least encrypt your database backups. (And make a plan to implement some kind of “full” encryption across the data estate, be it TDE or AE or app-driven encryption via your developer teams.)
  2. Stop using sa! Reset the password, and disable it. Yes, your Agent Jobs will still run just fine.
  3. Disable Named Pipes & Shared Memory protocols; just use TCP/IP. Disable the SQL Browser service.
  4. Cut off your SQL servers from the public Internet (yes, you should still patch them; just download the patches to a fileshare and install them using PowerShell!). And stop letting people run SSMS on the server; that’s what client machines are for!

Columnstore All The Things!

Seriously. If you’re not using them yet, read about them, play with them, and start using them. They’re magic.

Okay, that’s a bit dramatic. As with any technology and feature, you need to know the WHY. Understand what the best use-cases are and how that translates to your own environment.

columnstore all the tables
Easy there sparky…

Here are just a few of the tips I gleaned from the session on this:

  • They were designed for data warehouses, but…
  • They’re also great for “operational analytics” — where you want to do aggregate reporting on your ‘live’ data, but that performance usually kinda sucks (and you don’t want to lock up those tables anyway).
  • Best with SQL 2016 or higher; 2012’s “v1” implementation was horrible, and 2014’s “v2” was semi-usable but still had some major drawbacks
  • Best candidate tables are “very large” (millions of rows or more), and best candidate columns have “low cardinality”, meaning they’re not full of unique values — they should be “compressible”. A simple example would be a Customer’s “State of residence” — you probably have millions of customers, but only 50-ish “State”s, and your typical report is asking “Hey, how many Customers per State ‘do X'” — that’s a great candidate for a columnstore index.

Users Don’t Like Date-Pickers

I mean they’re still better than text-entry boxes, but we can do better. Talking about SSRS here — reporting services, i.e. “the poor-man’s Tableau”.

Picture a typical business user, middle-manager type, going to your SSRS report that you just built for him/her. The first thing it asks them to do is “pick a start-date and an end-date” to define the “reporting period”. But you should know by now that they almost always want to see the current Fiscal Quarter. So you default them to those dates that define the current quarter. Great!

Now they want to quickly compare to the previous quarter, or the same quarter of the previous Fiscal Year. Ruh-roh. Nobody wants to go messing with those lame date-pickers with the pop-up calendar thing.

Give them a clickable label instead, which says “Compare to last Quarter” or “Previous Fiscal Year”.

The click action should be to “drill through” to the same report, while changing the (now internal/hidden) date parameters to the appropriate dates. Presto! The user only had to click once to get exactly what they wanted. Much nicer experience.

I’ll try to have a future post going into detail on this. I’ve got tons of ideas swimming around in my head after FishHeadTed‘s excellent SSRS classes, and not nearly enough time in the day to flesh them out.

i see what you did there?
Get it? Swimming, fish?!?

Stay tuned, and go find a SQLSaturday near you!

Quickie: SSRS Multi-Column List/Grid

While there’s no native control like it, with a few simple tricks and visual slight-of-hand, we can fake it.

Today I present to you, a hopefully less confusing and more intuitive explanation of how to make a “multi-column list” control on your SSRS report.  And if the term is confusing by itself, I’ll illustrate in a second.

Three links that helped me get going in the right direction:

They all roughly boil down to the same concept: placing redundant or repetitive controls (like a set of 2 or 3 tablixes), and using a RowNumber() expression with some modulo arithmetic to set “visibility” properties of the dataset rows shown in each.

We don’t quite get to “dynamic # of columns and rows based on the total # of records” nirvana (e.g. make it a 3×5 grid if you have somewhere around 15 records, but only make it 2×4 if you have 8 or less records, etc.), but I’d estimate that this approach would suffice in 90% of use-cases.  And whoever at Microsoft decided to not build a control like this into the toolset, curse you!!

Yes, we’re not even past the intro section and I’ve already cursed someone at MSFT.  Doesn’t change my passion for the platform in general; it’s just a minor curse.  =)

Concepts

SSRS uses a dataset to fill one or more controls with data; controls such as the tablix. In my regular implementations, a dataset is usually a stored-proc. The tablix shows the results of the proc in tabular form. The problem with the tablix control is that you can’t turn it into a multi-column layout, as you might see with some more advanced UI controls. You might think that the matrix control would be suitable for this, but no; that is just a table-like structure that can support hierarchical column-groups and row-groups.

For example, let’s say you have a table of Users, with columns ID, Name, and Role. Like so:

ID | Name    | Role   
---|---------|--------
1  | Bob     | Admin
2  | Alice   | User
3  | Jack    | User
4  | Jill    | User
5  | Joe     | Admin

But with many more records. We want to display this on a report, which is typically viewed on a PC screen, i.e. landscape orientation (wider than tall). So wouldn’t it be nice if we could display it like so?

ID | Name  | Role  || ID | Name  | Role  || ID | Name | Role
---|-------|-------||----|-------|-------||----|------|------
1  | Bob   | Admin || 2  | Alice | User  || 3  | Jack | User
4  | Jill  | User  || 5  | Joe   | Admin ||

etc. Right? Right. So essentially what we want is 3 “sections” of data — 3 “columns” each with its own “part” of the table.

Bonus: Yes, this works even when the # of rows you have is not evenly divisible by the # of sections you want to break into!

Implementation

The actual work is fairly simple. We add 3 tablixes to the report and place them side-by-side. We connect them all the same dataset, i.e. stored-proc — which will only be executed once, conveniently! On the data row (as opposed to the header row) of each tablix, right click to Row Visibility. Then use an expression like so:

=IIf(RowNumber(Nothing) Mod N = X, False, True) — where N is the # of sections (tablixes) you want, and X is going to vary from 1 to (# of columns) - 1, and finally to 0 for the last (right-most) tablix.

Remember, the expression defines the row’s Hidden property, not “Visible”. I know, it seems backwards, but that’s how it is.

In plain English, and if we just want 3 sections (tablixes), we could say the following. For Section 1, left-most: “If the row is the 1st row, show it, else hide it”. Likewise, for the Section 2, the middle, we’d say: “if the row is the 2nd row, show it.” And finally, for Section 3, the left-most: “If the row is the 3rd row, show it.” And so on.

So those Hidden expressions would look like this:

=IIf(RowNumber(Nothing) Mod 3 = 1, False, True) //section 1
=IIf(RowNumber(Nothing) Mod 3 = 2, False, True) //section 2
=IIf(RowNumber(Nothing) Mod 3 = 0, False, True) //section 3

Not bad, right?

Final Thoughts

It’s certainly odd that there’s no native way of building such a report. But with a few simple tricks and visual slight-of-hand, we can fake it.

ssrs-multiple-tablix
I highlighted the middle tablix so you can see that there are 3 separate ones, but we want to make sure they’re precisely adjacent to each other so the end-user can’t tell them apart.

T-SQL Tuesday 106: 5 Things Not to Do With Triggers

Here are a handful of anti-patterns that I’ve seen with triggers in my time…

This month’s invite comes once again from our benevolent overlord* community pillar Steve Jones, head of SQLServerCentral.com! If you haven’t been there, stop reading immediately and go check out the helpful forums and ‘Stairways’ articles. Some truly excellent content to be had.

No, don’t stop reading immediately… save it to your favorites/reading-list and look at it in 5 minutes when you’re done here.  =)

*Though I’ve not had the pleasure of meeting him in person, I’ve heard Steve is a phenomenally humble and down-to-earth guy, so my silly comment about him is even sillier in that light. ❤

Triggers – Love ’em or Hate ’em

Borrowing a bit of a mini-game from the CodingBlocks guys, the first “Google-feud” (auto-complete result) for “sql server triggers are ” is sql server are triggers bad.  Well, they can be, if used improperly / to excess.  Like most bits of tech, they have their place and their use-cases.

I’ve blogged about a few such use-cases here (a “who did what” audit-trail type of trigger) and here (as part of the Nested Set Model implementation), which you should definitely read. I didn’t condone click-baity titles back then. Alas…

click this right now omg
All we need now is a “blink” tag…

Here are handful of anti-patterns that I’ve seen with triggers in my time.

Thing 1: Using them as Queues

Repeat after me:

A trigger is not a queue.

Triggers are executed within the same transaction as the query that fires them. Meaning, they’re subject to all that ACID-y goodness of a transactional system. This is a double-edged sword. If all is successful, it guarantees that trigger will do its job when the calling query runs, which is great for audit-ability. On the other hand, if the trigger has a problem, anything and everything that triggers it will also fail.

The fundamental distinction between this and a queue, is that the success of the queued action is not immediately critical to the continued operation of the thing that called (queued) it.

So if your requirement matches the latter behavior, and not the former, do us all a favor and use a real queue. Heck, find one of the few people who know Service Broker. (Hint: one’s a Warewolf, another’s a Poolboy.)

Thing 2: Making them WAY TOO BIG

Mostly because of the transactional thing, the rule of thumb with triggers is K.I.S.S. “Keep it Small and Simple.” Even the audit-trail example is a bit much if the table being audited is under significant write load. Typically, if the business requirements are both high-throughput and high audit-ability, you’ll be implementing a much more complicated tech-stack than just plain ol’ SQL Server with triggers.

Some of the offenders I’ve seen include: A trigger that wanted to write to several other tables with IF conditions that branched based on what column was being updated. And a trigger that required near-SA level permissions to do some back-end maintenance-y stuff. Those are both recipes for problem pie.

pie that looks like a kraken attacking a boat
We’re gonna need a bigger… fork?

Thing 3: Doing Nothing Useful

Somewhat opposite of above, there’s no point in introducing the management and performance overhead of triggers if they’re not performing a transaction-critical operation. For instance, something better left to a queue.

Thing 4: Housing Business Logic

There’s always been a holy war about whether “business logic” belongs in the database or in the application code. And, since geeks love sub-classifying things to the Nth degree, there’s a sub-holy-war about what “business logic” actually means. But I won’t go into that here.

If you fall on the 1st side of the fence, and you feel the desperate desire to embed some logic in the data layer, it belongs in stored procedures, not in triggers. Reasons include maintaintability, discoverability, performance, documentability. Not to mention source-control on procs is a helluva lot easier than on triggers.

Thing 5: Too Many of Them

there's too many of them
Stay on target..

While multiple triggers can be defined for the same action on the same table, that’s not an invitation. You enforce trigger execution order to an extent (first and last), but any more than that and you’re asking for confusion. Falling back on the KISS principle, if you need more than one trigger on a table & action (insert, update, or delete), you probably need to rethink the underlying design.

Bonus

Using INSTEAD OF vs. AFTER: it’s fairly self-explanatory, but just be aware of what you’re doing, especially with the former. You’re literally replacing the desired action of, say, an update query with the contents of your instead of update trigger. If this is not obvious to all users of this table, you’re in for some really surprised faces and angry messages.

And that’s all I have for today folks! Enjoy triggering stuff. In moderation, as all things. =)

Keeping Track of SQL Scripts with SVN

I’m talking about a locally stored, locally managed, only-ever-have-one-user, repository of SQL scripts.  All it really needs to do is track changes on a daily (ish) basis.

Inspired by a Slack conversation with friend & former colleague, and building off last week’s T-SQL Tuesday post:  How do I keep track of my SQL scripts?

The answer shouldn’t surprise anybody at this point: version control!

Side-note:  I really should try using SSMS Solutions/Projects like Kenneth Fisher advocates, but I just haven’t gotten a chance yet.  It would probably help with organization.

Well, perhaps you’re still in for a surprise.  Because, you see, as much as I love using GitHub and Gist for my blog code samples, much of that love is lost in the daily grind of fix-it-tickets, BI building blocks, report development, and performance troubleshooting that comprises my business as usual.  So forgive me if I haven’t made the leap to Git like all the cool kids are doing.  I’m talking about a locally stored (i.e. on my very own hard-drive), locally managed, only-ever-have-one-user, repository of SQL scripts.  All it really needs to do is track changes on a daily (ish) basis.

That’s what good ol’ Apache SVN is for!  Yes, boys & girls, it’s still around.  Consider it one of the many enduring foundational technologies that we tend to take for granted, or forget about, in the ever-present onslaught of the “new and shiny”.

But fortunately for us luddites, there are a couple GUI tools out there that make it feel less like the ’90s green-screen.  Enter VisualSVN Server (free edition) and TortoiseSVN (also free).

Eeww, yuck!  A non-distributed VCS??  What are you, some sort of wild animal!?  The impudence!

the audacity -the grinch
The unmitigated GALL!!

Okay, calm down there sparky.  Remember my requirements?  They’re only about 5 sentences up.  Go back and read them again.  Breathe.  We good?  Good.

Another reason it doesn’t need to be distributed or ‘cloudy’ or web-based is that these scripts often contain private company I.P., so data governance demands that they stay within company IT space.  And sure, there are private repos and ways to keep the stuff locked-down within a GitHub or similar, but again I ask, why?  The first and primary requirement of my VCS is that it stays the heck out of my way when I’m working.  I’m sure someone will counterpoint me, and I’d love to hear it.  But for now, we’re keepin’ it local.

Getting Set Up

The first step, if it’s not obvious, is to install VisualSVN Server — it already contains the binaries for Apache SVN, so you don’t need to worry about that.  It’ll ask you where to stick your repos and your backups; you can see an example of mine here:

visualSVN server config screen
Normally you’d point Repos at a network location, but since we’re setting this up as a local-only server, just pick a drive/directory that’s easy to remember.

Once it’s done, let it start up the manager GUI.  Next step is to create a new repo:

VisualSVN server repo create new
You can also use the “Create new repository…” link located a bit down the intro page.

I called mine “SQL_Scripts”.  Because, as my blog’s tagline says, “why yes, I do like underscores.”  When you go thru the steps, it’s easiest to leave everything at the default settings — that way you don’t need to futz with permissions or anything.

One ‘gotcha’ to note: in a corporate domain setting, you should set the server name to your machine’s FQDN, e.g. MYWORKSTATION.company.com.  This may be done for you when you install, but I’m not 100% sure, so it’s worth checking — you can right-click on the VisualSVN Server (local) node in the left pane and go to Properties, to the Network tab, and verify it.

VisualSVN server properties Network tab
Just to be sure!

Next, install Tortoise SVN, or your favorite SVN client.  I chose Tortoise because it’s easy to use, includes some very useful File Explorer options (right-click menu goodies), and supports standard command-line interaction just like vanilla SVN.  We’re going to use said CLI interaction in just a bit!

Import (or is it Export?)

I almost always have trouble remembering which option is for use with a non-empty folder of “here’s a bunch of files that I want to dump into the repo to start with”, vs. “here’s an empty folder where I want to pull down the contents of an existing repo”.  Fortunately, Tortoise yells at you if you try to do the latter — which is Export — into a non-empty folder.  So we want to Import.  Assuming you have a folder where all your SQL scripts live already, right-clicky and say “Tortoise SVN .. Import.”

TortoiseSVN import dialog
You can use the file:/// notation or the https:// address, either should work.

You can verify that it worked by switching back to your VisualSVN Server for a moment, refreshing it, and seeing that the repo’s contents are now.. your happy files!

But wait… the folder I imported into the repo doesn’t have the pretty little icon overlays showing me that it’s in SVN… Why?

Don’t panic.  We have one more step to go.

Right-clicky again!  On the same folder you imported into SVN.  You now want to “SVN Checkout…”, which will essentially mark all your files as “versioned”, because it will see that they all match exactly what’s already in the repo (because you just imported them a few moments ago).

There’s an important ‘gotcha’ here, if you named your repo something other than the folder name that’s serving as your repository root.  By default, it will try to pull the repo’s contents into a sub-folder of the same name as the repo.  In the example below, that’d be “Workspace\PersonalScripts”, instead of just “Workspace”, which is where I want it to go.  This has happened to me more than once.  Check it out:

SVN checkout with folder names
Notice that the repo name is different than my root (source/destination) folder name — that’s OK, as long as I set it correctly here in this dialog.

Pull the trigger, and you’ll see all your files getting ‘Versioned’, like so.

svn checkout finished
It’s really just comparing the repo contents with the folder contents and saying “OK, got it, we’re now tracking this file for version-control.”

Yay?  Yay.  Now for the cool part, where you can stop thinking about all of this and just let the machines do their work.

Automatic Nightly Check-In

Windows Task Scheduler, specifically.  Let’s be honest, if you’re reading this far, you’re not really a CLI jockey.  And you’re certainly not a Linux geek — you could have had all this set up and done with about 5 lines of bash, I’m sure.  Party on, Wayne.  Us Garth’s still need a bit of help.

I’m not going to do a walk-thru screen-shot montage of Task Scheduler; it’s pretty idiot-proof if you read & understand the dialogs.  Here’s the key bits, the commands that you’ll actually want to enter as the actions.  We have two actions: first “add all new files to the repo”, followed by “commit everything new & changed”.  Schedule it however often you’d like; mine happens nightly at 8pm, when I’m about 99.9% sure I won’t be touching work.  Oh, make sure to set it to “Run whether user is logged on or not”.

The actions are both “Start a program”, and said program will be the SVN executable, which should be (by default) located at C:\Program Files\TortoiseSVN\bin\svn.exe.  Here are the arguments, respectively (1st action, then 2nd), subject to personalization of course!

add --depth infinity --quiet "C:\Users\Documents\Your\Scripts\Folder" --force --username YourUsername

commit --depth infinity --message "daily" "C:\Users\Documents\Your\Scripts\Folder" --force --username YourUsername

Commence Laziness!

And that about does it!  Now, without any need to remember any command-line syntax, or to touch a right-click menu again, your SQL scripts are being silently versioned and committed every night to your very own local repository.  If you ever forget when or what you did last week or last month, or need to “blame” yourself (which is short-hand for “show me all the changes from this part of this file so I can see what led to its current state”)… now you can.

PS: If you’re using a fantastic 3rd party diff/compare tool like I am, be sure to plug it into the TortoiseSVN options for diff-viewing/comparison.  While their built-in interface ain’t too bad, BeyondCompare and other similar tools are even better.

Thanks for reading!

Quickie: Read & Search the SQL Error Log in AWS RDS

Go check out your AWS RDS SQL Error Logs!  See what tickles your curiosity.

Today’s post is brought to you by an unplanned AWS RDS outage, the desire to search its SQL Error Logs for events during the target time-frame, and the horrendously inefficient interface which AWS gives you in the GUI console for reading said logs.

it's log from blammo (ren & stimpy)
A coworker of mine used to sing this a lot. Now I know what it’s actually from.

Short Outline

  1. Even the dedicated ‘admin’ user that you create for your instance, doesn’t have permission to read the error logs via the SSMS GUI nor with xp_readerrorlog.  You can, however, use sp_readerrorlog.  That’s with an ‘S‘.
  2. The parameters here are quite arcane, namely@1, @2, @3, @4.  Unfortunately, none of them allow you to filter on a time-span (those are the 5th and 6th parameters of the underlying xp, which we don’t have access to, as per #1).
  3. My solution involves a #temptable, a loop of insert / exec commands, and then whatever queries you desire to search/filter/analyze the results.  I also throw in a conversion to local time (from UTC, which is what the AWS servers use).

Details and The Why

You can check out the script; it’s short enough to embed, IMHO, so I’ll just leave it right here.

USE master;
DECLARE @CONST_P2 int = 1; Error log, not Agent log.
DECLARE @NumLogFiles int = 30; SET THIS APPROPRIATELY!
^ (there's no way to get it programatically outside of registry-reads/other-xp's, none of which is doable in RDS, AFAIK.
Example: show events from 7/19/2018 between 2am and 2:30am PDT
DECLARE @StartDate datetime = '2018-07-19T02:00:00'
, @EndDate datetime = '2018-07-19T02:30:00';
CREATE TABLE #ErrorLogs ([LogDate] datetime, [ProcessInfo] nvarchar(50), [Text] nvarchar(max));
DECLARE @LogNum int = 0;
WHILE (@LogNum <= @NumLogFiles)
BEGIN
INSERT #ErrorLogs (LogDate, ProcessInfo, [Text])
EXEC sys.sp_readerrorlog @p1 = @LogNum, @p2 = @CONST_P2
SET @LogNum += 1;
END;
ALTER TABLE #ErrorLogs
ADD LocalDate datetime;
UPDATE #ErrorLogs
SET LocalDate = CONVERT(datetime, LogDate AT TIME ZONE 'UTC'
AT TIME ZONE 'Pacific Standard Time');
SELECT el.LocalDate, el.ProcessInfo, el.[Text]
FROM #ErrorLogs el
WHERE el.LocalDate >= @StartDate
AND el.LocalDate < @EndDate
ORDER BY LogDate

Line 25-26 is particularly interesting to me, and only works with SQL 2016 and up.  I got the idea from this StackOverflow answer.  You can chain two AT TIME ZONE commands together to convert a given datetime value from one zone to another.  The reason you still need the CONVERT is because the output of the AT TIME ZONE command is always a datetimeoffset type, which, while quite useful in its own right, has its quirks, and doesn’t serve our purposes for ease-of-readability.

If you’re not running 2016, at least in RDS, you’ve got nearly no excuse.  The upgrade process is vastly simpler with RDS than with traditional on-prem servers.  Although, I did run into my share of snags with it recently, which I’ll blog about later.

You should plug in whatever values suit your need & environment — the @NumLogFiles and @StartDate & @EndDate.  I used 2-2:30am, because… well, that’s always when those damn outages seem to happen, ain’t it?

2am friday alert that load on aws rds server is critical
I’ve never actually had a pager, but I hear awful things. =P

As I mentioned, “the Why” is basically because AWS RDS limits your permissions (even as an admin) in some key ways, and one of those limitations prevents you from reading the error logs in the more “normal” ways — SSMS GUI, xp_readerrorlog, etc.  And the interface given to read the logs in the AWS console GUI is quite a sad-panda.  They offer a wrapper proc rdsadmin.dbo.rds_read_error_log @index = 0, @type = 1 , which really only serves the purpose of naming the parameters more nicely; under the hood it’s doing the exact same thing.

Of course we can’t prove that, because it’s encrypted, but the results are identical.

So there you have it.  Go check out your AWS RDS SQL Error Logs!  See what tickles your curiosity.  =)