Views & Mismatched Datatypes

If you’ve ever wondered “who would win in a fight” amongst SQL datatypes…

Allow me a short revisit to the previous topic, because this example came from “real life” in my actual job in a production environment, and I wanted to share.  And remember, take this with a grain of salt — there are very few absolutes, a DBA’s favorite answer is “it depends”, and even when we spout rules or decrees, there are usually some exceptions.

Now, let’s start at the top.  T-SQL is a strongly typed language.  SQL Server, like most RDBMSs, counts datatypes as a foundational element in its ecosystem.  Things like bit, int, datetime, varchar (aka string), etc.  We also have this concept of NULL.  Any element of any datatype can be NULL, which, coincidentally, means a bit field can actually be 3-valued instead of just binary; BUT, that’s not the topic of this post.  However, it does involve a bit field.  And, as the title says, a VIEW.

First, the larger problem, or “what these little oversights / mis-steps can lead to”, as blogged by someone much smarter than me: Jonathan Kehayias – Implicit Conversions.

Now the diagram:

category, categorygroup, view-category-by-group
In English, the view’s DisplayInMenu field is defined as “If Category’s IsRoot bit is false, always use False; otherwise, use the Group’s DisplayInMenu bit.”  In even plainer English, it means we only want to “Display this Category in the Menu if it’s a ‘Root’ Category and it’s a member of a Group that gets displayed in the Menu.”

Do you see the problem?  It’s not so obvious if you’re not thinking about datatypes, but look closely.  Yes, our view’s DisplayInMenu field is a different datatype than its base field (origin).  That’s because the result of the CASE expression that defines it is “promoted” to the int type, instead of remaining a bit.  The same would be true of a COALESCE or ISNULL expression.  This is an example of datatype precedence.  If you’ve ever wondered “who would win in a fight?” amongst the SQL datatypes, Microsoft has the answer right there in black & white.

This isn’t necessarily a problem, all by its lonesome.  So why did it bite us in the proverbial behind?  Two reasons.  First, query & usage patterns:  vwCategoryByGroup.DisplayInMenu happens to be an important field in our queries, and usually it’s compared against a bit parameter or variable that’s passed down from the app’s lower tier.  Sometimes it’s even JOINed to another column, say, GroupMenuProperty.DisplayInMenu — which is, of course, a bit.  But because it’s an int in the view, SQL is doing extra work every time to implicitly convert those bits to  ints so that each side of the comparison operation is of the same type.  And again, not always, but sometimes, this causes performance problems.

ms-sql-stop-making-me-work-so-hard
I’m tired! Can’t you see I’ve been implicitly converting these datatypes all day?

The second reason is, admittedly, a bit beyond my understanding, so I’ll just explain the symptoms and leave the technical details to the more inquisitive minds.  Basically, during a performance crisis, one of the measures we took was to “fix” the view by turning DisplayInMenu back into a bit.  However, we found that it literally broke the dependent .NET application sitting on top, which started throwing exceptions of the “invalid cast” flavor.  I believe it’s using Entity Framework.  Thanks to a helpful tip from the Brent Ozar Office Hours webcast, I found out that it’s because the EF entity mapped to this view had that field (property?) defined as an int, and in order to make it “see” the datatype change, the code itself would need to be changed, i.e. that property would need to be defined as a bit.  Yay learning!

the-more-you-know
Also, did you know that EF really isn’t all that bad? 😉

So, dear reader, be conscious of your decisions with datatypes, especially when it comes to views with superficial computed columns.  But more to the point, beware of implicit conversions and mis-matched datatypes.  They can be, at best, a source of technical debt; at worst, a silent killer.

Til next time!

Views – the Good, the Bad, & the Lazy

If your VIEW has a dependency tree more than 2 levels deep, you’re doing it wrong.

Let’s talk for a minute about views.  No, not like scenery.  Not like my family (they’re good peeps).  I mean SQL views – i.e. CREATE VIEW vwFooBar which combines commonly used fields (columns) from tables Foo and Bar into a single entity, which datavelopers (term borrowed from Richie Rump, who has an awesome name btw) can then use & abuse without having to know the details of and relationships between those two tables.

So far, this sounds like a great idea, right?  Encapsulation & reusability are great principals in IT/Dev land.  And I agree!  Views have their place, for sure.  But some reason, I find myself constantly deconstructing them– refactoring complex queries & stored-procedures that use them — into their base tables (in the above example, that’s Foo and Bar).  And I find myself asking the proverbial WHY?  Why is this such a common misstep of devs & query writers, that the DBA has to spend such time “fixing” the issues this can cause?  Naturally, that train of thought spawned a blog post.

Let’s dive in!

Organization, Customer, Order: vwCustomerOrder
A very simple example of a 3-table view.

So, somebody created a nice high-level view for us that summarizes Customer, Organization, and Order data into a single entity that should prove pretty handy for things like reports, ad-hoc analysis, and maybe even a display-grid on a page somewhere.  And the happy developers don’t need to care about the foreign keys between the tables or how to JOIN them properly.

But!  Do we see a problem yet?  Yes, our vwCustomerOrder doesn’t include our Customer’s Email.  Well what if we need that in our code?  Do we go through change-management procedures and get the column added to the view?  Do we JOIN the view to the Customer table again just to get the Email?  Neither of those options are ideal.  The latter means that now we’re referencing the Customer table twice; the former involves refactoring, and is even more difficult if the view is an indexed view.

Okay, well let’s say we’ve added Email to the view, and it’s working fine.  Now let’s add another layer of complexity.  Say Customer is actually a VIEW, which consists of base-tables CustomerHeader and CustomerContact, where the latter stores a collection of contact entries for each Customer.  Now, vwCustomerOrder is thus a 2-level nested view.  These aren’t really super fun, but they’re not the most offensive thing in the database.  But again, what happens when we need something from CustomerContact that’s not already in our “master” top-level view vwCustomerOrder?  Maybe the primary Phone1, for example.  So to build that query, we now have to combine our 2-level nested view with a redundant table (i.e. a table that’s already in the view)!  But because it’s so terribly important to some reporting modules or some code bits, it becomes a permanent fixture of the schema.  And on it goes.

Expanded example from earlier:

organization-customer-contact-order-views
Because vwCustomer doesn’t include Phone2, if we built vwCustomerOrder on top of it, we’d have to JOIN again to CustomerContact, when we know it’s already included in vwCustomer. Thus, it’s smarter to build vwCustomerOrder from the base-tables and avoid that double-join (duplicate reference).

This is the problem with VIEWs that are grown organically, re-actively, without careful attention to the underlying schema and dependencies.  And look, I get it.  Mature software system systems DO evolve organically; iteration is the name of the game, and that goes for the database too, not just the app code.  But let’s agree that, like the boyscout principle espoused earlier in my ode to Clean Code, we can try to leave things a little better than how we found them.  DB refactoring is … not necessarily harder, but definitely different (than app code refactoring).  You probably have to jump thru more hoops to get the changes pushed to production.  Regardless, it’s worthwhile.  Why?  Glad you asked!

Nested views are icky for a couple reasons.  First, they’re more difficult to troubleshoot – not because they annoy the snot out of your DBA, but because they legitimately make index tuning more tedious.  Second, laziness begets more laziness – when you’ve got these views, you’re automatically tempted to simply use, re-use, & abuse them, instead of taking the time to analyze their effectiveness and refactor, replace, or reevaluate the problem.  Rampant ill-constructed views can lead to some serious technical debt.  And…

technical-debt-is-bad-mmkay

There’s another nugget of wisdom in software development that’s appropriate here: Less is More.  Your views should address a specific need or use-case, do it well, and call it a day.  Feature-creep is not a goal; views that try to be many things to many people ultimately fail at it, because you end up committing sins against set-theory & the relational model in favor of “oh just one more thing!”  While the purpose of views is ultimately abstraction, we should not forget the underlying parts — tables, indexes, statistics, schema relationships.  In fact, a good view can help us construct the right indexes and visualize the relationships.  Whereas a BAD view will obfuscate indexing needs and confuse (or even outright lie about) the base relationships.

Venn-diagram time.  We’ll use the circle areas to represent that “scope” of each view, meaning the base-tables that comprise it.  In a healthy schema, there will be mostly independent views with a few overlaps in small portions.  These views “do one thing well”, with little redundancy and hardly any nesting.  Conversely, in a haphazard ill-managed schema, there is lots of overlap and redundancy, multi-level nesting, and ugly colors (who the heck likes brown, anyway?).

views-family-good
Views in happy harmony & balanced dependence vs. independence.
views-family-bad
Views in sad land with ugly brown and black overlapping areas. (I was going to try for puce or burnt-umber but Paint is sadly not up to Crayola 64-pack standards.)

So, dear reader, what’s the point?  (I feel like I rhetorically ask that fairly often.)  Build your SQL views with targeted use-cases and clear purpose.  Avoid the laziness trap and the temptation to tack-on “one more thing”.  A view should  be a concrete, concise, abstracted representation of the underlying tables & relationships.  Evaluate existing views for how well they meet the developer needs, and don’t be afraid to deprecate, drop, or rewrite bad code.  And above all, stop the insane nesting.

If your view has a dependency tree more than 2 levels deep, you’re doing it wrong.

Or, more visually…

ridiculous-swiss-army-knife
This is not a useful tool, despite the sheer volume of functionality.

That’s all for this week!  Thanks for reading, and apologies for the slightly longer delay between posts.

SQL Server for the Developer

And now for a brief interlude while I work on my next “real” post… continuing with the same theme… SQL Server for the developer/programmer!  Warning: potentially excessive snark ahead.

caution-sarcasm-ahead

SQL Server is a mystical fairy-tale land where you can store all your application data, and never have to worry about it again!  Sure, it “should” be relational, but you also “should” drive the speed limit.. PSHH!  Let’s throw blobs in there too — HTML, XML, hell, even JSON now that 2016’s got nice happy JSON-centric system functions.  And files?  Heck yes, what else is FILESTREAM for?  Sure, performance may be abysmal, but that’s somebody else’s problem, so why should we care?

It’s the only database platform your company invested in, so it must be good for everything.  Document storage?  Sure, not like there’s anything else better suited for the job, especially for ‘Free’!  Ooh, how about key-value data, and volatile temporal data like .NET Session-State?  Obviously!  Nothing else exists, and Microsoft makes it so simple to set it up in SQL.

sarcasm-sign-everybody-needs-one
Here’s your sign…
It’s that wonderful database system where we send all our Entity Framework queries and data just magically appears!  We don’t need to worry about how nasty and convoluted those queries turn out when they finally get parsed down to actual TSQL code.  It was only 3 nice simple lines of LINQ, why should it be any more complex than that?  Oh, and we can write all sorts of abhorrent ad-hoc queries against it too, nobody will even notice.  Let that code build all sorts of IN SELECTs & sub-queries, chains of unreadable JOINs, and Inception-level nested VIEWs, it’s perfectly fine!  Performance, shmerformance!  That’s what we have DBAs for.  Not that they’ll be able to help you when you refuse to refactor or rewrite your code that’s responsible for these abominable queries.

what has been seen cannot be unseen
Your terrible queries are scaring the dog…
SQL Server is so easy!  Microsoft gives me a free Developer Edition, I can code for all these awesome features that I know we’ll have in production, right?  Oh, we only run Standard Edition?  Woops.  Ooh, I know!  Let’s put everything in Azure.  It can do everything and make us breakfast!  Then we won’t need to worry about feature differences, backups, RTO/RPOs, index maintenance, performance tuning, scaling, or even performance in general!  Wait, no?  Oh well, that’s what we have DBAs for!  Besides, it’s so easy to move everything over, we just take our SQL backups and upload them to blob storage and then restore them on Azure SQL Database, right?  No?

this is why we can't have nice things
Exactly.
Hmm.  Maybe we should hire a DBA.  Or four.

That’s all, see ya next time!  XD

Mongos, anybody?

You’ve heard of this thing called MongoDB…

Unless you’ve been living under/in the proverbial rock/hole-in-the-ground, you’ve heard of this thing called MongoDB. It’s one of the biggest flavors of NoSQL database systems out there – specifically, it’s a “document DB”, which means it stores semi-structured data in the form of JSON documents, grouped into collections, grouped into (of course) DBs.

Now, I’m a Windows guy. SQL Server & Windows Server are my comfort zones. I like GUIs, but I’m ok with a command line too. PowerShell has become my friend good acquaintance. But every once in a while, we have to step outside our comfort zones, no? So when I was told “hey, you’ve got to get a handle on managing these MongoDB instances that we’ve got running on servers so-and-so, as part of this ‘Imaging system’ application”… I thought “Hahaha…”, but what came out was “Sure thing!”  Something a bit like this:

hahahayeah

(courtesy of DBA Reactions)

So the first order of business was to decide on a MongoDB “GUI” – a Windows app that could at least connect-to and give me a visual overview of the running MongoDB instances (properly referred to as a mongod, stemming from the Linux term “daemon”, which on the Windows side is basically like a process or service). I tried both the “official” desktop app from the big org, MongoDB Compass, and a neat open-source tool called Robomongo.

And I actually like them both, for different reasons; most of which can probably be attributed to my lack of depth with the technology, but hey. Anyway, Compass is really nice in that it gives you this kind of statistical overview of the collections in your DBs, performing some basic aggregates on a 10% or 1k sample of the collection documents to give you a graphical 40-thousand-foot view of the data. But where it breaks down for me is that little “query” bar, which is just an empty pair of curly-braces. It’s only for “selecting” (finding, querying); no other operations to see here. So we can’t manipulate our data with it, but it’s definitely great for viewing!

 

Whereas with Robomongo, I can right-click on the DBs/collections and do very simple things like “show documents”, “show statistics”, etc. And it actually writes the equivalent mongo shell command for me to poke at; say, to inject more logic to the find to get something specific or to write a new command or two as I read thru the docs and learn things like aggregates, indexes, and whatnot. Being a shell, it allows us to write or update data as well as read it.

But alas, GUIs will only take you so far. This is a tech born & bred on the command-line, so to really dig into it, I needed to let go of the mouse. And the mongo shell was actually pretty straightforward! A couple visits to their docs pages & some visual verification by using the GUI tools to check my work, and things were starting to come together. And even for a complete Javascript noob like me, the command syntax was straightforward enough. Soon I was configuring a replset and mongodump/mongorestore-ing and re-syncing a replica after purging some old stale collections.

Even though it’s a CLI/Linux flavored technology, it works perfectly fine in Windows… Except for one thing.  So, you install it as a service, and you typically start & stop services using net start & net start and as long as your service’s CLI arguments are all correct, you should be good — in theory! Trouble is, the service tends not to stop gracefully. So I found that, instead, the following command was more useful: mongo admin --eval "shutdownServer()". This uses the actual mongo shell to send the native shutdown command to the mongod, instead of relying on the Windows services gymnastics to do it correctly.

It just goes to show, dear reader, that you’ve got to get your hands dirty and try out new technology before dismissing it as “not my job” or “somebody else’s problem”.

PS: Nope, that’s not Compass’s logo or anybody else’s; I made it meself, with good old Paint.NET!

Drafted with StackEdit, finished with WordPress.

DEV, the SQL

See what I did there?

I have a confession.  I geek out over building & shopping for computers.  There’s barely any money in it, otherwise I’d do it for a living.  But those jobs belong to robots and overseas underpaid factory minions, and that’s probably for the best (at least, the first part).

But yeah, I’m a geek.  I’ll pour over articles and reviews, pick out the components, fill and empty my Amazon or Newegg cart, go back and read more, pick some other components… I love it.  It’s ridiculous.

This happens to be in direct conflict with one of my other passions, frugality.  Yes, I’m cheap.  Thus, usually I end up geeking-out helping other people with computer builds or replacements or whatnot.

So when my boss decided one day, “Hey, let’s go down to Micro Center and see if we can put together a replacement for the SQL Server Development box”, aka ‘SQLDEV‘, I was more than happy to oblige.  Because, you see, he loves computer shopping almost as much as me.  And he also happens to have a fancy corporate credit card.  *cha-ching!*

The current server “feels slow” — it’s a Dell Poweredge R710 with 24GB RAM, 2×2 (cores x hyperthreading) 3.6 GHz CPU, and an attached storage array of 7.2k SAS SATA II (3 Gbps) drives.  Shipped mid-2011 — it’s not that bad in the grand scheme of things, but it’s definitely outlived its life as a 3-instance SQL server with terabytes of data.

Here’s what we picked up at Micro Center:

Base system:

  • PowerSpec G403 Desktop – $950
    • Core i7-6700, 4×2 (cores x HT) @ 4.0GHz
    • 16GB DDR4-3200 RAM
    • 512GB SATA 6Gb/s SSD (Sandisk SD8SB8U512G1122)
    • GB LAN, DVD drive, integrated graphics
  • the one big problem I have with this PC is that it only has 4 SATA ports; more on that in a minute..

Add-ons:

  • 32GB DDR4-3200 RAM (for a total 48GB) – $180
  • 3x 2TB Samsung EVO SATA 6Gb/s SSD – $630 x3 = $1890
  • 500GB Samsung EVO M.2 SSD – $180
    • (for TempDB – supposedly even faster than SATA)
  • 5TB Toshiba 7.2k SATA III HDD – $145
    • (for pulling down backups & shuffling files between Production & Dev boxes)

Sub-total: $3345
Total with tax: $3612.60


For those of you keeping score at home, that’s 6.5TB of dedicated SQL space, plus another half TB for boot/OS, plus another 5TB for “slow” storage.

Now, if we start poking around Amazon for used servers in the same class as the R710, we do find some pretty compelling — and much cheaper! — systems.  So did we just spend that money for no good reason?  Well, cosmetic arguments aside, I still say, emphatically, NO.  Here’s why.

  1. This is dev. We don’t need redundancy, HA or DR.  We need one thing: speed.  Well, really two things: space and speed.  And sure, 2TB SSDs aren’t cheap.  But have you ever looked at 3 or 4 TB SSDs?  Holy bejeezus.  What about “more is less” — why not six 1TB SSDs?  Okay; can you find a desktop class motherboard with enough SATA ports?  Sure most of the good ones have 6, but that’s not all we need — we still need space for the OS/boot drive and the backup mechanical drive.  Sure, we can drop in a PCIe-SATA card and get 2-4 more free ports that way.  In fact, I already have to do that because this mobo skimped on ports!  But either way, as it turns out, most of our DB filegroups are near or over 1TB.  And again, without RAID, I’m looking at possibly sharding out data files across 2 drives per SQL instance, which A) doesn’t mimic production (although that’s a pretty weak argument at this point), and B) sounds like more of a headache than I care to deal with over saving a couple hundred bucks.
  2. Peace & quiet.  Servers are loud, power-hogging, heat-generating beasts.  A desktop PC is none of those things.  It’s sitting under my desk right now and I don’t even notice it.  Plus, it’s really easy to set up, tear down, move somewhere else, and set up again.
  3. Did I mention speed?  This thing is blazing fast.  CrystalDiskMark pics to follow.  But again, there’s no redundancy here, no warranties or service agreements to cover these parts or this system in the event of a bad component or data-loss incident.  That’s why you don’t run production (or even QA/UAT) on these types of machines — because parts do break, and when they do, you need redundancy and HA/DR built-in to your important systems.  On Dev, though, we can rebuild it in a day or less.  No problem!

Benchmarks: Boot, Temp, Data

So that’s where my geek flag has been flying this week.  My sales pitch to the boss was based on a post from Glenn Berry (http://www.sqlskills.com/blogs/glenn/building-a-desktop-workstation-for-sql-server-development-and-testing/), so if anybody’s to blame for feeding into the crazy, it’s probably him.  I’m sure he won’t mind!

Like any other business, our IT infrastructure has been slowly getting consolidated, converged, virtualized, and even moved to the cloud in some cases.  So granted, this is a short-term solution.  And because of said consolidations, we definitely do have decommissioned servers that may make a good home for “real” Dev/Test/QA environments very soon.  They’ll be expertly planned, well-managed, viable branches of the infrastructure… when the IT team has time to dedicate to those efforts.  But sometimes, there’s nothing wrong with a little good-old-fashioned geekdom, and having a little side-project that makes you gush like a schoolgirl & keeps you honest.

PS: Yes, the elephant in the room: the cloud option. I get it. Once again, the boss wanted to do it a certain way, and in this case, the cloud was not on the short-list. Next time, buddies!

Header image: our husky Keira at 2 months. No, that’s not a turd next to her mouth, it’s a treat.

Origins

And the DBA waxed wroth…

Genesis of a DBA Universe

In the beginning was the disk array, and all was empty and raw, and Windows Server moved over the face of the platters. And the DBA said: Let there be SQL Server. And there was SQL Server.

And the environment variables were set, and the disks were striped and mirrored, and the config was established, and behold spindle was rent asunder from spindle. And the DBA saw that all was in spec.

And it was day, and it was the evening of the first day.

And the DBA said: Let there be objects. And setup.exe brought forth myriad crawling things upon the face of the array. And instcat.sql brought forth all manner of tables and views that swim unseen beneath the waters. And procsyst.sql brought forth all the built-in procedures and all the operators of the air, that the users might be given wings and take fight over the data.

And it was day, and it was the evening of the second day.

And the DBA said: Let there be databases. And there were databases. And the system administrator looked upon the disk array and did see what the databases had wrought upon the disk arrays, and he did gnash his teeth and seek a new work upon the Internet with an engine of search.

And it was day, and it was the evening of the third day.

And the DBA created users. Male and female he created them. And he said unto the users: Thou mayest create tables and views as thou wilt. Yea, though mayest create even indexes upon the data. Only meddle not with the system database, for it is a holy place, and on the day wherein thou treadest upon it, on that day thy roles shall surely be revoked.

And the serpent crept among the users and whispered to them, saying: Thine roles shall not be revoked. Taste ye all of the system database, for ye shall know of b-trees and hints and ye shall be as DBAs. And the users heeded the serpent and did fill the system database with crap. And the instance did crash and the client did wax wroth at the DBA. And the DBA did gnash his teeth and partake of the fruit of the vine, for behold the users were permanent employees, and the DBA was but a contractor and could not revoke their roles.

And it was day, and it was the evening of the fourth day.

And the DBA did set default databases and default schemata, and did lock down all that was upon the face of the array with roles and encryptions and all manner of quotas, yea even from the transaction logs even unto the archived backup files.

And it was day, and it was the evening of the fifth day.

And the DBA created synonyms and links and did tune the server and apply patches upon the face of the server. And the DBA saw that is was good.

And it was day, and it was the evening of the sixth day.

And on the seventh day the DBA did rest from all the labors of the creation. But lo, his pager did ring and he ceased from resting, and did spend his sabbath on the phone with Microsoft support. And by the time the DBA got through to someone who knew whereof they spake, behold it was day, and it was morning of the eighth day.

And the DBA waxed wroth.