How I Work from Home

Instead of relying on my home internet bandwidth, I simply RDP into my workstation that’s still physically sitting in the office.

This is like a month late, but I didn’t have anything for the latest TSQL Tuesday on Unit Testing. Kudos to those of you who are doing it… it’s REALLY HARD. Harder than it should be anyway.

So as we all started working from home last month, it dawned on me that, apparently, the way I work from home is drastically different than most everybody around me. And it got me really curious, whether this is sort of generally applicable IT workers, or whether I’m really the odd-duck!

a basic VPN diagram
Courtesy of Wikipedia.

The Setup

Probably like many of you, my primary workstation at the office is a laptop with a docking station hooked up to dual monitors and a full size keyboard and mouse. It makes going to meetings easier, and in theory I’m able to take it to the occasional conference. (I say “in theory” because, as I’ll explain in a moment, it rarely happens.)

Now, apparently the ‘norm’, at least for non-IT staff, is to take their laptop home, maybe with a dock & an extra monitor or two, fire it up, connect to the VPN, and off you go. Some people need printers and/or scanners, which makes it more complicated. But for now let’s focus on the typical knowledge worker that just needs a PC. And of course, if their workstation at the office is not a laptop, IT is tasked to provide them with one. Which, during this Coronavirus outbreak, has caused its own subset of issues with supply & demand at the big-box tech retailers.

Contrast that to me, and hopefully maybe some of you. I have the good fortune (or not, depending on whether you’d asked my wife early on) of owning a plethora of computing devices, most of which are at least suitable for running VPN & RDP (Remote Desktop). All of the, say, resources which I need access to and that I manage and manipulate on a regular basis, are in a data center, a big building with tons of racks of servers and gear and super-high-speed connections. And the office, of course, has a much higher-speed/bandwidth connection TO that data center than anybody does from their home. Some of you may see where I’m going with this.

The Why

So instead of relying on my home internet bandwidth to keep up with all the bandwidth-intensive tasks that constitute my workday — including, as my fellow DBAs will attest, dozens upon dozens of queries against the SQL Servers in SSMS — I simply RDP into my workstation at the office. Yes, my work laptop, which is still sitting in its dock on my desk in an empty (mostly) building. It’s the closest thing to actually being there.

Here’s what that allows me to do. All my network resources are just as quick and responsive as they are at the office. My code repositories and my Visual Studio windows are instantly available. My queries return results in milliseconds because they’re only traveling from the data center to the office. The only thing traveling the longer and slower distance to my house is the image of the screen from the work laptop. Make sense?

This is the same principle that drives some of these new “cloud gaming” services like Google Stadia or NVidia GeForce Now. The only thing you as the player really need is the continuous moving picture of the game, not all the hard work behind-the-scenes that it takes the GPU to render it. (Of course, the thing they haven’t really solved for is the input latency and the turnaround time of the player’s action being replicated on the screen and leading to the next action in the chain.)

a picture of an old desktop computer on a dining table with several other computers nearby.
Disclaimer: not my actual dining room. But at one point in history, it was pretty close.

Okay, I get it, it’s not for everyone

I understand some workers need more than that to do their jobs. And some folks don’t have the handful of spare PCs that I do. So I get it. Not everybody can or should work this way. But I was quite shocked that, out of all the folks I work with, I was the only who thought this was THE way to work remotely. Or at least the best way.

Surely, most people with a.. what we might call “gray collar” jobs — we work at a desk all day but we aren’t Wall Street and we don’t wear suits, so.. your typical middle class occupations, yeah? — anyway. We all have a workstation at the office. And surely we have at least one computer at home, even if it’s another laptop. Right? But no, apparently, the “tablet revolution” was a thing, and people don’t have big ol’ desktops or even traditional laptops at home much anymore.

Not only that, even if you could afford to invest in one FOR this season of mass-remote-work, you’d have had a bitch of time getting ahold of one, because everybody and their mother decided now was the time to buy. And not without good reason. Most of that purchasing was actually done BY the small-medium enterprise business market, because their IT departments had to buy a bunch of new laptops for their non-IT colleagues to take home for work!

Anyway.

I guess there’s not much of a big important point or lesson here.

Is there ever?

audience

If anything, I would encourage people to TRY working from home this way, if they have the resources to do so. It significantly lessens the burden on your IT folks, as long as your home PC that you’re using to VPN is secure. That’s really the biggest variable in this equation, and in some cases, it’s why they actually prefer to issue you a company-imaged/company-secured device instead. But, thankfully, with a few network and firewall tweaks, they can lock down the VPN traffic so that it doesn’t allow your home device to talk to anything BUT your own workstation — or at least, any computer in the “workstations” group. So yes, there are some trade-offs. But again, especially if your employer has had trouble keeping up with the surge in demand for “loaner laptops”, maybe suggest you try this way and see if it can work for you.

Stay safe, stay healthy, and be kind.

Dirty Deeds Done in SQL

Yes, I wrote a loop to find a gap in an identity-value sequence. Sue me. (Or don’t, that’d be cool too.)

I’ve done some things I’m not proud of. We all do, in IT, typically when we’re under-the-gun for a deadline or when the systems and frameworks in which we work have some sort of nuance or limitation that we just cannot get around, past, or over. And so we hack. We write code we’re not happy with. We even write code that we despise with every fiber of our well-intentioned being. But it has to be done. Because there’s no other choice.

With that somewhat depressing introduction, I give you my latest “dirty deed, done dirt cheap” (or for a reasonable salary, but still very reprehensible for anybody who values clean code and clean architecture). Yes, this is an actual stored-procedure signature that I wrote. Thank gods it’s not integrated into any software; it just happened to serve a niche purpose at the time. But still! I felt ICKY.

cursors are bad mkay
Did you recognize the album art from the cover pic? Just wondering… ūüėČ
/*
Description: Search for an available ID value in Category starting 
at the low end (i.e. find a gap), because it's aesthetically pleasing 
to have Root level Categories use low ID values. Yes, really.
But mostly b/c they need to be the same on DEV & PROD environments.
*/
CREATE OR ALTER PROCEDURE dbo.[GetNextAvailableRootCategoryID]
  @StartAtID int = 10  /*
      '1' is not used but we don't feel like starting THAT low
      (that'd be like "starting over" or "starting from scratch")
      You never get a checkbook with the first check# being '1', right?
      Wait, what's a checkbook?
      Ask your parents. Or your grandparents. */
  , @MaxID int = 1000  /*
      Throw an error if we can't find any gaps before this point */
AS
BEGIN
--and you can imagine how that went...
--something about a loop, a couple IF/ELSE's, a RAISERROR or two,
--and a RETURN @val in a B-tree(eeee).
END;

I know, I know. You’re saying “wow that’s really not that bad!”. Or “wow, you REALLY couldn’t do something better about the constraints to make it less terrible?” Again, time, priorities, and energy. What’s 10 minutes in the big scheme of things, even if the code is silly and perpetuates the bad decisions that came before it? Should the overlaying systems actually depend on matching ID values? NO!!! Obviously not. But was that even on the software development roadmap for the next quarter? Also no.

So, what’ve you got? Show me your dirty deeds done in SQL. It’s okay, nobody’s judging! Except yourself, if you’ve any skin in the game. We’re all our own worst critics. But if we can’t laugh at ourselves sometimes, what the hell are we doing here? =)

T-SQL Tuesday #122: Impostors!

Be humble, but be confident in your own expertise. Realize that you will sometimes be the teacher, and sometimes the student.

Howdy folks! It’s been a while. My last post was in October of last decade (har har)! I’ve been busy with other projects, work, and life in general. ūüôā

Now, this month’s invitation is hosted by John Shaulis. I’m always excited to see someone in the #SQLcommunity that I’ve never heard of before. Looks like he’s familiar with the same WordPress templates as I am. Heh. I like it.

Side-note, I’ve been planning to buy my own domain and remove the ads here for a better reading experience (and more professional feel), so hang in there… it’s happening soon!

And now without further ado.

Who, Me?

Everybody reading this knows what impostor syndrome is. I won’t bore you with that. Fact is, if you’re in tech, you either have it or you know someone who does. If neither, well you’re probably a narcissist surrounded by other narcissists and you might wanna look for another job. Heh heh.

oh, get a job? just get a job? why don't I strap on my job helmet, get into a job cannon, and fire off into job land, where jobs grow on jobbies!
I think this is from “It’s Always Sunny in Philadelphia” but I’ve never seen the show.

Yes, You!

I’ll share a personal and ongoing example of where I’m constantly feeling “not good enough at my job”. It’s about server migrations. You know, where you take a running production SQL Server instance, and you have to move it to new hardware/infrastructure. Maybe it’s a physical (bare metal) cluster to a virtualized environment. Maybe it’s just one VM to another VM on a newer platform. It typically involves taking a maintenance window and convincing business stakeholders that the “downtime” is worthwhile.

Now surely we’ve all heard of DBATools by now, right? (If not, go there right now and check ’em out!) They were BUILT to do migrations. Yet I’ve not been able to successfully use them as the whole and ONLY toolset for that purpose. Let me clarify: I’ve used pieces of the framework (such as the copy logins / users command, and others) to HELP me in the migration project, but never been able to simply fire off Start-DbaMigration and go get coffee and watch as things magically successfully fall into place.

dog in a room on fire, casually drinking coffee, saying "this is fine."
It’s working EXACTLY AS IT SHOULD!

Couldn’t Be!

Even when I’ve built a beautiful check-list, lined up all my scripts, scheduled things via Agent Jobs, double-checked names and permissions and networks and drive letters and shares. Still, something inevitably goes wrong. Maybe it’s just that a database refuses to go into READ_ONLY mode when I ask it to. Maybe it’s that a whole series of jobs get lost because they were in the wrong category. Maybe user permissions don’t come over on this database because I didn’t sacrifice a chicken on the night of the blood-moon while the wind blew north-east.

Surely, by this point in my 10+ years as a database professional, 4 years at this particular company and role, I would be able to do this with my eyes closed. With no hiccups or misfires. Right? RIGHT??

murphy's law: if anything can go wrong, it will go wrong.
Full plaque samples, which are quite entertaining, can be seen here & here, to name a few.

Then Who?

Wrong. Murphy’s Law is a real thing. More than that, things CHANGE. The environment is always evolving to fit the business needs. The technology is always just a little ahead of my own knowledge-base. And every SQL instance in this environment is, unfortunately, still, a ‘pet‘. Not a ‘cattle‘ (cow?). They’re each carefully and fearfully constructed to suit a specific set of application and business needs, each with their own nuances and barely-documented dependencies.

Plus, it’s not like we’re doing these migrations very often. Anything you don’t do on a constant basis, at least a few times per week, tends to sink back toward the bottom of the mental stack, and you forget the details and gotchas that were involved because your brain needs to keep more important and relevant things near the top, for what you’re working on “right now”. This is normal, at least in my mind.

DevOps stole the cookie from the cookie jar!

Just hire a DevOps Engineer to solve all your problems. Then everything will be all sunshine and rainbows and unicorns. All your servers will be cattle, replaceable little containers that are allowed to fail because there’s always another one waiting to spin-up and replace it. Not like your DATA is important or anything. It’s only the lifeblood of the entire company.

Not really sure where I was going with this. I love DevOps, don’t get me wrong, and it does have a part to play in making the traditional RDBMS platform more agile, but the devilish details and difficulties therein are ALWAYS overlooked/brushed-off in popular discourse, and it’s got me a little jaded.

I mostly wanted to finish the song I’d started with the title-blocks. =P

N.

But srsly. Instead of feeling guilty about your impostor syndrome, just own it. Be humble, but be confident in your own expertise. Realize that you will sometimes be the teacher, and sometimes the student. And hopefully more often the latter, because there’s exponentially more to learn every year we evolve on this crazy spinning rock we call Earth.

live as if you were to die tomorrow. learn as if you were to live forever. -Mahatma Gandhi
Good advice in general.

Dealing with Time Zones in SQL

In my sample script, I have a “transaction table” where I insert a few events for each office, with the ‘original date’ in PST, and I show how to easily convert that to each office’s local time, or visa-versa.

I’ve ranted about times, datetimes, and the like before. As most programmers & IT pros do, I loathe time zones with a mild passion. As one of my favorite #SQLCommunity members quips:

daylight savings time was actually created by a government works project to ensure that programmers could forever write tedious conditional logic in their date-based queries [and code].

Bert Wagner

Thankfully, in SQL 2016, MS has somewhat heard our outcries and given us an easier way to convert things between time-zones in a somewhat sensible manner.

Resources:

  • The at time zone keyphrase
  • The system metadata table that drives it
  • A good blog post that explains essentially what I’m about to (because it’s great to have different perspectives and unique voices!)
  • Two StackOverflow answers (one, two) on the topic
  • Bert’s succinct and fun post about the same topic, with video!
give me utc or give me death
This is Patrick Henry. He famously said “Give me liberty or give me death!”. Now you know.

A Use-Case

In fact, I’m not even going to repeat what 2 (and many more, I’m sure) people have already said. Go check out their posts above! But since you’re here, this is how I use it for my reporting environment.

To start with, the transactional data is in PST/PDT — i.e. Pacific Time with DST fluctuation. Yes, it’s horrible. No, I don’t know what happens to events or jobs at 2am on the “Fall Back” date, or between 2am and 3am on the “Spring Forward” date. No, I can’t change it right now. Stop whining.

Now, I have offices in Paris France, Hong Kong, and Beijing China. These are 3 different “time zones”, but only 2 different offsets — China and Hong Kong are in the same bucket, namely, UTC +08:00. More on that later.

So I have my OfficeLocation lookup table:

Office  | TimeZone
--------|-------------
Paris | 'Central Europe Standard Time'
Beijing | 'China Standard Time'
HK | 'China Standard Time'

(Again, see below for why we can’t call HK’s zone “Hong Kong Time” like most websites/APIs would assume.)

Now, the cool thing about this is, we can pull those strings into a variable, or use them straight from the table, to convert our PST/PDT times to the appropriate zone.

Here’s a variable example:

DECLARE @TimeZoneStr sysname; --"sysname" is just nvarchar(128)
SELECT @TimeZoneStr = TimeZone
FROM OfficeLocation
WHERE Office = 'Paris'

DECLARE @MyTimeNow datetime = GETDATE();
DECLARE @TimeInParis datetime;
SELECT @TimeInParis = @MyTimeNow
AT TIME ZONE 'Pacific Standard Time' --Converts to datetimeoffset
AT TIME ZONE @TimeZoneStr --Shifts it to Paris time

PRINT ('The time now in Paris is ' + CONVERT(varchar(30), @TimeInParis, 121);

And here’s an example using the field straight from the table.

SELECT Office
    , [Time in my location] = GETDATE()
    , [Time in remote office location] = CONVERT(datetime, GETDATE()
        AT TIME ZONE 'Pacific Standard Time' 
        AT TIME ZONE ol.TimeZone)
FROM OfficeLocation ol

See the Gist for a full-fledged sample script. In it, I have a “transaction table” where I insert a few events for each office, with the ‘original date’ in PST, and I show how to easily convert that to each office’s local time, or visa-versa.

working with time is easy -jon skeet
Well YEAH, if you’re frickin’ JON SKEET! =P
But srsly, go watch his presentation and read his stuff.

Gotchas

Here’s the major catch. The information available to YOUR instance of SQL Server is pulled from that server’s Windows Registry hive. No, I’m not making this up. So if that box doesn’t know about, say, ‘Hong Kong Standard Time’, and you try to use that in your SQL statement.. you’re hosed.

And yes, that is a real example from my own experience.

This article shows the “Windows standard format” time zone list. As you can see, they merged some zones with others because.. they felt like it? But apparently Central Europe Standard Time, Central European Standard Time, and Romance Standard Time (all UTC +01:00) were completely necessary to keep separate. Go figure.

In my use-case above, then, I couldn’t actually store the string Hong Kong Time, because my SQL instances (hence my Windows Servers) don’t know what that is. Thankfully, at least for this decade, it doesn’t look like Hong Kong and China will diverge in terms of their geopolitical directions, and we’re safe to assume that HKT = CST (China Standard Time, not to be confused with US Central Time!).

In another example, the typical go-to site for timezone questions says Japan observes “Japan Standard Time”. Obviously enough. But Microsoft, in their infinite wisdom, decided to call that “Tokyo Standard Time”. Go figure again.

It also kinda makes you wonder.. how does this work on SQL on Linux? No such thing as “the registry” there. I’m sure there’s an internal OS data-store that houses time-zone info, of course. Heck, they might even be better at it than Windows. But it makes you think.

world time zone map
Disclaimer: this is probably already out-of-date. And it’s from Wikipedia.

So What?

If you’re not already running SQL 2016 or upward, this should give you yet another compelling reason to upgrade. Seriously.

And don’t do what I did and attempt to store a “business locations with time-zone offsets” table, that you have to remind yourself every 6 months to go update (manually), and will inevitably fail to do so, and will not support any sensible manner of long-term historical reporting.

More to the point, don’t try to implement dynamic time-zone logic and calendaring yourself, in general. Because trust me, you’re not gonna get it right. Use the built-in tools, use the community resources, and be smart.

That’s all for today! ‚̧

Facepalms Per Hour

My current velocity is sometimes measured in FPH – facepalms per hour.

This is a rant. Fair warning.

I guess the new ‘Millenial’ colloquialism for “grumpy” or “sarcastic” is “salty“. So I’m feeling extra salty this week. For several reasons. One, it’s audit season. Two, I had to churn out about a dozen new reports in the span of 4 days because the manager who was supposed to be tracking that project dropped the ball and forgot they were due by the end of this month until… yeah, last Friday. Wheeeeee!

Thus, I decided, my current ‘velocity’ (a SCRUM/DevOps term for “how much work are you getting done”) shall be measured in FPH – Facepalms Per Hour. Currently I’m at 3. Earlier this week I was approaching the double-digits, when the lovely report consumers kept thinking of “just one more little thing” they forgot about until after I’d delivered the ‘final’ product.

‘Final’ actually being a meaningful adjective in this context approximately NEVER.

facepalm original picard
The original gangsta.

Change Logs

How best to describe this scenario while still maintaining separation of “real job” from “blog land”… Hrm. So let’s say we have a CRM, like most companies. This stores customers, among other things, in a database. And since it also stores sales transactions and financials, it’s heavily audited — it has a lot of change-tracking mechanisms.

Now, auditors come along and want a report of some specific type of change over time. I happily oblige. Then… PANIC! And not at the disco. “What are all these changes to these customers by these users who don’t have permission to make said changes?!?”

K, calm down sparky. Try not to sound the alarm; auditors are a sensitive bunch.

Turns out, those changes are, in a word, “fake”. You see, there’s this background “customer sync” process that keeps them up to date with another part of the CRM where the actual changes were made. But, because it’s written poorly, it thinks that ANY field change, even just the Name or Address (which a lot more CSR’s, customer service reps, have the permission to change, because, you know, that’s their job), constitutes a change to the ENTIRE customer record on the other end. So the change tracker logs a change to every single field on the receiving end of that sync process, even though nothing really changed on the source side except maybe one or two fields.

With me so far? Great. So now the question is, “Well, can we get a report that doesn’t show those ‘fake’ changes?” But wait, it has to be “system generated” and you’re not allowed to “filter” or “add special exceptions” to it, because it still needs to be audit-able.

So what you’re saying is, give me a report that shows me what I care about, but you’re not allowed to change the logic behind said report.

Riiiiiiight.

So I give them a new report. I don’t explain how the sausage is made, I just make it and serve it up. “But why is this different from the original report?”

double facepalm

Well, do you want the audit-able answer, or the real answer? The audit-able answer is, “We made a system change that allowed us to prevent the ‘fake’ changes from being logged incorrectly.”

The real answer is, “B*tch, I AM the system!” — meaning yes, I excluded those with some hacky logic, and you need to stop asking questions about it.

Anyway. Change Logs are super fun.

Reports

Speaking of reporting. I could really go on for pages about how terrible and broken this whole system of “request-based report development” is. But it’s frankly all we have right now. Until there’s sufficient business buy-in to the concept of agile data warehousing and collaborative cross-functional data modeling, shit just comes in one funnel and goes out another with a little sparkle spackled to it. And we call it a report.

Example, you say? Sure! Let’s say we run a special sale on certain types of widgets every quarter. We want to track how these ‘specials’ perform — do they increase our sales of those widgets? By what factor, compared to the other not-on-sale widgets? Can we trend this over several quarters?

Oh but wait. The data structures that govern widget pricing and time-span-based sale pricing, and the logic that relates customer orders to what pricing structure they used at the time of ordering, is awful, terrible, and changes every time there’s a new quarterly promotional sale.

So you’re saying you want a report that trends sales of widgets based on arbitrarily changing promotional pricing as compared to other widgets that may or may not be subject to ‘normal’ pricing during that same time period, all without a simple definitive data-point that says “This is a Quarterly Promo sale, and That is Not.”

elrond's facepalm
Mister Anderson… I mean, wait.

Let’s try to get at the root of the problem, shall we? The business doesn’t seem to understand that the way they implement promo-sales is detrimental to long-term/comparative reporting. The data model makes this harder, not easier. Can we perhaps put some heads together and come up with a compromise that both A) makes more business sense, and B) improves the data model to be a bit more intuitive?

In Closing…

What’s your FPH? What causes you to facepalm on a regular basis? Let me know in the comments!  :o)

Quickie: SSRS Sort Descending

Let’s make it real simple for the end-user to get the ‘best’ behavior by default. Let’s sort by our ‘Revenue’ column in DEscending order first.

You ever have a paginated report, i.e. a standard old-fashioned report with a big table (tablix, matrix), where you want the users to be able to sort by any column they want? And let’s say one such column is ‘Revenue’ — some money amount where the “biggest is the most important”. Like your top customer, for example.

Now, you know about ‘Interactive Sorting’ right? No?!? Oh it’s awesome! Let me remind you. It’s super easy to configure: just get to the Text Box Properties of the header box of whatever column you want to use, & go to the ‘Interactive Sorting’ tab. Like so:

ssrs interactive sorting property
Simple example — we just have ‘Details’ rows, no other row-groups or hierarchies.

Make sure you pick the appropriate Field in the Dataset to “Sort by”. Hint: it’s usually the one you’re displaying in that column!

But you know what’s kinda annoying? You can’t dictate a “first sort direction” — it just assumes that the first time you click the sort-arrows, you want ‘Ascending’ (lowest first). Then you can switch to ‘DEscending’ (highest first). This makes perfect sense for alpha values (strings), but not always for numeric values — at least not when you’re dealing with money, when generally the highest dollar amount is the most important!

So let’s make it real simple for the end-user to get the ‘best’ behavior by default. Let’s try to make it sort by our ‘Revenue’ column in DEscending order first. Ready?

ssrs interactive sort expression negative revenue
Set the ‘Sort by’ expression to the NEGATIVE of the field value! Simple as that.

It’s so small in that picture you might miss it. I’m doing this, but I’ll add whitespace below for easier reading:

= - Fields!Revenue.Value
..^look, there's the minus sign!
(assuming your viewing device hasn't mangled the fixed-width text)

You could also get more pedantic/readable and spell-it-out like so:

=(-1)*(Fields!Revenue.Value)

Right? Beautiful. So what does it look like on the report?

ssrs interactive sort by revenue descending
I clicked it! And it sorted highest-first just like I wanted!

Notice my silly hyphenated column names, like ‘Custom-er No’ and ‘Rev-enue’. That’s because those sort-icons take up space within the textbox, and you can’t control it, so the text wraps in a non-obvious manner. Thus, I purposefully added a hyphen and a line-break to the text, so I could control how they looked, instead of leaving it to the wiles of the SSRS render-er.

That’s all there is to it! Happy reporting. =)

 

The EAVil Cycle, Part 2

In which we discuss more about the EAV model and some of its merits and pitfalls.

continued from last week…

The Ugly (aka the “Wow really?!?”)

You’ll see this ‘creep’ even in product-catalog systems as mature as Amazon’s. If you search for (geeky as we are) graphics cards, and use the product attribute filters in the left pane to narrow it down, you’ll find that some correctly have their memory type (GDDR5, etc.) listed, while others may not. If you’re really unfortunate, there will be two semi-redundant attribute-sets that you’ll have to “juggle” between to really get at what you want. TVs, for example, may see both an “HDR support” (yes/no) and an “HDR type” (standard, ultra, etc.) — I’m kinda pulling those out of my arse for example’s sake, but you get the point.

Why does this happen? Because at some level, humans are still responsible for ‘tagging’ those products (for lack of better word). And as much encouragement and guidance as the ‘admin software’ may give them, they can (and do) still decide at times to side-step that guidance and say “Nope, I know better; make me this new thing!”

But isn’t that a problem with nearly all data-driven systems? Sure, of course it is. Yet with a model as flexible as EAV, the problem is intensely magnified by the fact that it’s made so easy to do — to ‘extend’.

so preoccupied with whether or not you could, you didn't stop to think if you should
It’s probably not the exact quote, for you pedants, but it’s close enough for government¬†blog work.

And unfortunately, the biggest contributor to this problem is the lack of development-time and forethought given to the administration, or management, of the data. You see, this problem could be largely assuaged if the admin-toolset were the FIRST thought and priority in the roadmap. But so often, that thought comes LAST, if at all. So sure, your product feature tagging system looks great, it’s flexible and your customers love it. But you’re throwing tickets over the wall to your data team every time a requirement or use-case changes, or when you need to fix a data-quality problem caused by the users not knowing there was already a “Widget Type” before creating their new “Widget Kind” tag, or misspelling “Is Wierd” because English is weird and has more exceptions to the “I before E” rule than not.

Does this problem go away with a dedicated search-index or NoSQL technology like Elasticsearch or (shudder) MongoDB? Of course not! If anything, it may be worse. Maybe. But wait, those systems make it easier to de-dupe and manage redundancy & data quality, don’t they? Probably. I can’t speak from experience myself, but I’ve heard good things. Once again, it all comes down to the effort you’re willing to invest in the system. If you make data quality a priority, you’ll be happier with the experience. If you don’t, well you’re just another amateur data scientist complaining about dirty non-standardized/non-validated address fields, aren’t ya?¬† =P

I joke with the data scientists, of course. What they do is awesome. They just tend to often re-invent the wheel of data-cleansing/data-wrangling that we DBAs have been doing for a few decades, because they didn’t know the right questions to ask or the right place to look. We need to get better at working together WITH them, not ‘for’ or ‘against’ them.

ninja cat riding a unicorn with laser-eyes
How the data scientists see themselves…

The Why or When (aka “Is it a decent model for this?”)

The long-story-short version is, consider your business and your data. Try to plan for the future, and anticipate potential changes and growth. It’s not easy, and we never “get it right the first time”. But we can try.

When your attributes are fairly static, and you know that you can tightly control them, you might consider a more rigid model. Something with a handful of lookup tables referenced by the main product entity. This is advantageous for performance and management, at the expense of scalability and extensibility.

When you literally need to support on-the-fly extension, and you’re absolutely married to SQL (i.e. not ready to venture out into NoSQL land just yet), the EAV model may fit the bill. Aaron’s article, and the comments therein, present some fairly valid and reasonable implementation suggestions to make it a little more palatable. Just beware the date — that was written back in 2009. Before we had such things as Elasticsearch and its ilk. I’d heavily encourage the consideration of purpose-built data-stores for this sort of thing, if you have any hope of scaling-out.

Other tools in your toolbox can help with this, too. For example, consider an in-memory data-grid for super-fast reads. The vast majority of data-access to these attributes & values is going to be reading, using it to filter & slice & dice a data-set. You can pay the small performance cost (e.g. write to the underlying SQL database) on the rare occasion when a write/change needs to occur.

In Conclusion

Proving the age-old rule of “Just because you CAN, doesn’t mean you SHOULD”, the EAV model is sometimes okay and sometimes not. You need to understand your business and your data¬†to make that call. And you need to consider the magnitude of effort that may be involved in pivoting from one model to another. Unfortunately, in many cases, that part overshadows the rest, and the show business must go on.

queen the show must go on
You’re welcome again, ears.

Still, I encourage you to at least think about it, and be ready with that knowledge of pros/cons when the time is right to discuss it with stakeholders.

The EAVil Cycle

In which we discuss the EAV model and some of its merits and pitfalls.

EAV, or Entity-Attribute-Value, is an data model that’s been around the block. It’s typically injected into a relational database at some point during the overall application/architecture life-cycle, somewhere between when the team realizes that they’ve got way too many lookup tables for a “main business entity” thing, and when they finally make the shift into polyglot data stores.

Wake me up when that actually happens, successfully.

I’m not going to rehash aging internet arguments here, nor bore you with replicated diagrams that you could just as easily look up on Google Images. No, instead, I’m going to tell you why this model is not great, why it’s not bad, and how you should go about deciding if it’s not wrong for you.

Yes, I did negate all those adjectives on purpose. Mostly because nobody really enjoys working with these structures, regardless of how they’re implemented; customers and business stakeholders are ALWAYS CHANGING the requirements and the attributes in question. But, we press on.

Refs:

PS: Postgres is looking really freakin’ cool these days.

the good the bad and the ugly
Another Clint Eastwood pic… apologies if you’re not a fan. =P

The Good (aka “Not Bad”)

Proponents tell us that this model is easily searchable and easy to administer. The “searchable” bit is true; especially when you have the attribute-values pre-defined and don’t rely on end-user text-entry. But that’s true of basically any data model. The key here is that all attribute-values are effectively in one “search index”. But wait, don’t we have purpose-built search indexes nowadays? (Hint: see Elasticsearch.) This will come up again later.

Administerable? Administrable? Administratable? Damn you English! Anyway. Yes, again, if you’re fairly confident in your business users’ ability to effectively track and de-dupe (de-duplicate) incoming requirements/requests using their own brains/eyeballs and the admin tool-set that you build for them.

Oh, right, did I mention that? You have to build the admin app. Because you do NOT want to be writing ad-hoc SQL queries every time a new attribute requirement comes in. (Still, it’s better than making schema changes for new req’s, as I’ll discuss in a bit.)

Mainly, though, the biggest ‘pro’ of this model is that your business requirements, i.e. your attributes and the values they’re allowed to contain, can be flexible. The model allows a¬†theoretically infinite amount of customization to suit your needs; though in practice, as Allen writes in the CodingBlocks article, you do run up against some pretty hard scalability hurdles right-quick. So in practice, you might want to consider more horizontally-scalable data stores, or (God help you) try scaling-out your SQL databases. (Spoiler-alert: big money big money!)

shut up and take my money
Millions of query-bucks…

The Bad (aka the “Not Great”)

Which brings me to the first ‘con’. Performance. If you’re modeling this in a relational DB, and you expect it to scale well, you’re probably overly optimistic. Or very small. (If the latter, great! But you don’t always want to be small, right? Right!)

Don’t get me wrong; you can make it work half-decent with good indexing and sufficient layers of abstraction (i.e. don’t write a “kitchen-sink view” that’s responsible for pivoting/piecing-together all the attributes for a product straight outta SQL). But again, is it really the right tool for the job?

Momentary digression. SQL Server, or more generally, the relational database, has long been touted as the “Swiss army knife” of IT; we’ve thrown it at so many problems of different size and shape, that we’ve almost lost track of what it’s actually very GOOD at. Hint: it’s about relationships and normalization.

Another argument against it seems to be data integrity and enforcement. I can understand that, but again, with some clever software overlay and user-guidance, this can become almost a non-issue. But remember, your developers are the ones building said software. So that effort needs to be considered.

The Ugly (to be continued…)

The biggest problem, and quite a legit one, is ‘creep’ — both scope and feature. See, the inherent flexibility in the model will almost encourage¬†data managers to be less careful and considerate when deciding when to add an attribute or value-set, and how to govern the data-set as a whole.

creep wish i was special
No, not THAT creep. But, you’re welcome ears.

Stay tuned for more…

Career Day: SQL DBA

Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!

Our benevolent overlord¬†prompted us (months ago; I know, I’m always late to the party) to write about what we do in our day-to-day jobs as SQL Server professionals, aka #SQLCareer. The actual idea is to do 4 posts in a row, for 4 working days (either adjacently or the same day each week for 4 weeks).

Sticklers, us? Nope, never.

Without any kind of future guarantees, here is my first.

Morning Mash-up

Emails, tickets, and alerts, oh my! Today was actually pretty quiet on the monitoring front — no major performance issues, no interruptions, and no job failures. That one that occasionally takes 8+ hours did okay last night. More often than not, I have to kill it if I find it’s been running into the work-day. That should be automated. /backlogged

Re-wrote some analyst’s data change script to be more “friendly” and set-based, instead relying on identity values — which is important when they can differ between Dev & Production environments. Deployed a few change requests, including that one.

virtualize all the things
Even SQL? Yes! Especially SQL. Just know the caveats.

On the side of less tedium, I fleshed-out more of the plan to virtualize our remaining bare-metal SQL servers. The usual ‘gotchas’ have to be discussed with the Server Admins — reserving compute resources, preventing the ‘noisy neighbor’ problem, and having enough storage to do the migration.

Yes, that means about 2x the storage as they’re currently using. Got a problem with that? No? Good. ūüėČ

Finally, I worked on some code for querying the SSRS ReportServer¬†database to obtain report metadata (including stored-proc usage, parameters, and user activity). The core concepts came from Ted Stathakis‘s SQLSaturday¬†session on SSRS 2016. Hopefully, some of that code will be coming to GitHub soon.

Afternoon Delight

“Delight” being sarcastic. No really, I love my job, but sometimes there are just some things that make you go ‘grr arg’.

First up, developing and finalizing some “data cleanup effort” reports. These are things that show the business users “what’s wrong with the data” and give them some guidance on how to “fix it”. Now, because there is no easy “management GUI” for this data, and because actual changes need to go through change-control, it’s not going to be a cake-walk. But given the right inputs and some domain knowledge, they can make the decisions as to what those changes should be, and then hand them off to the BA’s (business analysts). Usually in Excel form.

Excel spreadsheet with woman screaming in front of it
What? It’s under 1,048,576 rows! Barely…

Next we have the ‘grr’ part. This is where I spent about 40 minutes trying to explain to the 3rd-party software vendor exactly where and how to integrate with our core data systems. Most of the info should not be ‘new’ to them, but since I never know which support tech will be assigned to the issue, it’s best to err on the side of verbosity. To make things more confusing, the folks on my end, who have been interfacing with the vendor thus far, aren’t intimately familiar with the underlying technology bits, so the conversation gets a little dicey before we all arrive on the same page.

Fortunately, since I had their attention, I was able to sneak in a suggestion to upgrade the back-end database for the software (which we self-host, but is essentially managed by them & their software) to something from this decade. MySQL 5.0 is old, guys, mmkay?

Wrapping Up

A developer needs a new table & some initial data-load to production; it’s passed QA. Great! They wrote the query a little weird, but nothing ol’ SQLPrompt can’t fix.

Commit outstanding stored-proc code & RDL files (SSRS reports) to source control. Even if I’m the only one working on them, I’d rather they be safe & track-able.

Ruh-roh.¬†A developer set off some proc in a Dev db, it’s been running for over 30 minutes… and then they cancelled it! But wait, it’s stuck in rollback. Yes, kids, rolling-back a transaction can take even longer than the transaction itself, due to rollbacks being single-threaded.

Now, since the user’s hitting of the ‘Cancel’ button (aka “Stop” in SSMS) does not wholly and entirely cause a kill SPID¬†command, I do that for them (since only sysadmin¬†or processadmin can kill sessions). Then I run ye olde kill 64 with statusonly¬†(64 happened to be the SPID) to check on its rollback status. Good news, everyone! It’s at… 0%.

bad news nobody
You heard me.

And it stays at 0%. For fifteen minutes.

I move on to other tasks of course, since there’s nothing more I can do (short of pulling the plug on the server, which is never¬†a good idea (unless you’re testing your Disaster Recovery Hope plan!). We also create a ticket to fix this stored-proc so that it doesn’t “take forever”, even if it has to churn through millions of records.

Finally, the statusonly¬†check starts moving. It’s about 45% rolled back, with about “300 seconds remaining”. This, as anybody who’s been a DBA will tell you, is in “Microsoft time”. You know those progress-bars you typically get in Windows while something is happening? They’ll start off fast, make it to around 90-some percent, tell you there’s about 1 minute remaining, then take at least five minutes to “finish that last little bit”? Yeah, that’s a “Microsoft minute”. Patent-pending.

But fortunately for us, it does finish in just about the time it promised, maybe a bit longer. Just in time for the Dev to test what they wanted to test before going home for the day. And just in time for me to have my evening not ruined by a runaway transaction.

PS: Anybody who’s ever worked with SQL, including yours truly, has done this (set off a long transaction and/or tried to rollback a long transaction) at least once in their career. If you ask a seasoned DBA or DB-Dev, and they say “No, I’ve never done that!”, they’re a dirty rotten liar.¬† =D

Are You Not Entertained?

If the life of a DBA sounds like fun to you, get in touch with your local IT recruiter and find out what the job market is like! You’ll hear all sorts of doom & gloom about how “the Cloud is taking over everything” and “the DBA is dead; long live DevSecDataOpsUnicorns!”.

No, I joke. Yes, some of the fundamental responsibilities of the DBA are shifting to the cloud vendors, to IaaS/PaaS services, but at the end of the day, being in IT is about learning tech and working with people. Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!

tee-shirt that says "tough enough to be a dba, crazy enough to love it"
We end on a happy note.

Just Another Brick (in the) Wall

Over-engineering may be foolish, but under-engineering is just as perilous.

This month’s T-SQL Tuesday topic, thanks to Wayne Sheffield, is a doozy.¬† It’s difficult to admit failures, even if we eventually overcame them.¬† Mostly, it’s tough to fess up to those hours upon hours of what feels like wasted time preceding the hopefully inevitable breakthrough.

Let me tell you a tale of square pegs and round holes.

Get your mind out of the gutter.

And strap in.¬† This puppy went over 2k words.¬† I didn’t intend it that way, but, c’est la vie.

A Short History Lesson

I used to work for a software company that made K-12 educational assessment & reporting products.¬† A large part of K12ED is dealing with learning standards, these historically fraught named concepts that students are supposed to have “mastered” and learned during their time in the US public school system.¬† You ever hear a reference in pop-culture or media referring to “teaching to the test” or “state standardized testing” or similar?¬† That’s what they’re talking about.

In the late 90’s, a bunch of states, including California in 1997, finalized and centralized a “list” of sorts, almost a database (but not formally, because let’s face it, this was still the early days of what we know as modern IT), of said Standards, which came to be unofficially known as the “CA ’97 Standards”.¬† For over a decade, these learning goals dictated what was taught when in our schools.¬† Government funding incentives, based on state standardized test scores, drove instruction to the point that teachers had little if any time to dynamically mold their students’ minds into critically thinking, multi-faceted, creative individuals.

But this article isn’t about my bias.¬† This is about the¬†broad, sweeping shift in the K12ED landscape called “Common Core”.¬† As the technology sector grew more mature and started permeating more “traditional” industries, a vocal minority of thought-leaders had what they deemed an epiphany.

Hey, what if we took the old-guard educational bureaucracy, and all those disparate state standards, and turned it into a unified, technology-driven learning ecosystem?

Sounds great in theory, right?¬† I mean, surely their intentions were good?¬† Well, you know what they say about the road to Hell and how it’s paved…

Gosh, there goes my bias again.  Sorry, let me just tuck that back in its pocket.

Anyway.¬† These new Core Standards introduced some new ways of thinking.¬† For example, that some learning concepts are inter-related and “cross-cutting” (a fancy way of saying that sometimes a Math concept requires a fundamental Reading-Literacy knowledge-point to fully grasp).¬† This had some very interesting impacts on the underlying technology systems which relied on, and housed, said Standards.¬† System which, I might add, had existed for over a decade at this point, in many cases.

Bringing it Back to Data

Our company’s system was one such.¬† Our key partner’s system, from which we quite literally inherited the traditional, relational database structure to house the CA ’97 Standards, was another.¬† You see, back before RESTful APIs ran the world, software systems relied heavily on what we call “local data stores”.¬† In order to show the teachers and administrators, who primarily used our system, the Standards, in which their students were performing well (or poorly), we had to¬†relate those Standards to the¬†test questions¬†that said students were tested on month after month, year after year.¬† And, like so many other small businesses of the late 90’s / early 00’s, we had a trusty ol’ SQL Server lying around, just waiting to be loaded with all our precious data.

This was fine, for the most part.¬† The legacy Standards conformed reasonably well to a relational data model, even though we had to throw in a bit of hierarchy (using the good ol’ adjacency list¬†scheme).¬† There wasn’t a complicated set of relationships from Standards in different subjects (Math, Science, History) to each other, and people didn’t care to do much in-depth analysis beyond “are my students getting well-prepared for the state tests?”.

Enter Common Core

You parents thought these things were complicated and convoluted — just Google “common core math problem” and you’ll find no shortage of critical satire.¬† Well, the underlying data structures required to store these new Standards were going to be significantly more complex as well.

One of my main jobs, for about a year or two, was to oversee the importation of said Core Standards into our SQL database system.¬† On the surface, it seemed reasonable — we had a hierarchy concept already, and we had a roll-up & drill-down mechanism for the handful of different “levels” of said hierarchy.¬† But it was all very static.¬† What that means, for us tech-heads, is that it was not easy to change or extend;¬†not adaptable to new and expanded requirements.¬† The older Standards adhered to a fairly strict hierarchy, and each level of roll-up had a distinct name.¬† With Common Core, they broke out of that old mold, while simultaneously keeping some of the names only to change their meaning depending on context.

Think of it this way.  A group of cattle is called a herd.  A group of sheep is also called a herd.  And a group of seagulls is called a flock.

And I ra-a-an.. I ran so far a-wa-a-ay…

Sorry, where was I?¬† Right, group names.¬† So what if the government suddenly decided for us that a group of sheep will from now on be called a ‘gaggle’.¬† But only if they’re all female.¬† If the group contains any male sheep, it’s called a ‘herd’ still.¬† And groups of cattle will be still be called herds, unless it’s purely a group of bulls being raised for beef, in which case we call it a ‘meatsock’.

Have I lost you yet?  Of course I have!  This is pure nonsense, right?  Language does not work this way.  Moreover, hierarchies of groups of things do not work this way.

But I digress.  There was, despite my jest, a method to the madness of the new Common Core hierarchy groupings.  And I did learn it and understand it, for the most part.  The problem was that it threw our existing legacy data model to the wind.

Enter Academic Benchmarks

As usual with a legacy software system in a small company, the resources and buy-in for a data-layer overhaul were nil.  So it fell to us intrepid DB-Devs to shove that snowflake-shaped peg into the very square hole of the old relational model.  We sketched out plenty of ERDs, brainstormed how to tack-on to our existing structures, but nothing short of a miracle would make this new data conform to these old static norms.

Thankfully, the “geniuses” (and yes, that is used sarcastically) over at Academic Benchmarks, or AB for short (at least for the purposes of this post), had already done this.¬† And we paid them, thousands of dollars per year, for the convenience of using their GUIDs to identify Standards across different systems and vendors.¬† Never mind that they were just perpetuating the bad model of yesteryear; never mind that they provided zero support for data quality feedback loops.¬† We could happily take their Excel files or CSVs and load them nearly straight into our database.

Enter, and Exit, ASN

While I was searching for the words to express how insufficient our data model was, I came across this little gem from the Gates Foundation: Achievement Standards Network, or ASN.¬† (ASN stands for many other things, so as with all acronyms, it’s all about context; just fair warning for the Google-happy.)¬† The architects here had understood that learning standards needed a better and more flexible model, not only in terms of storage, but also in terms of data interchange format.¬† This new kid on the block called JSON had been making waves for a while, and was subsequently widely adopted by the tech industry in general, so it stood to reason that this would be the preferred format for publishing and serving the Standards from ASN.

Bonus: it was FREE.  Yes, really.  What a wonderful thought, I said to my team, to imagine never having to pay those crooks at AB another red cent!  Eventually.  After years of transition.  But alas, it was not to be.  See, AB had been around the industry for a long time.  They had their hooks in almost every learning content publisher and assessment vendor.  So as tempting as this shiny new source of academic truth may have been, sadly, it was not practical.

Enter the Contractor

Somewhere around the same time, we took on a promising new developer who, not only had a very strong background in Math and CS fundamentals, but who had also proven his worth with real world applications with actual “in the wild” deployments and users.¬† He was a bit arrogant, actually.¬† One could argue that he’d earned it, perhaps, but we didn’t appreciate always being told everything we were doing wrong, constantly, to the point that it was hard to hear the more important bits of wisdom and insight into how we could improve.

Despite that ego, one of his biggest contributions to the team was a fresh impetus to learn new things and branch out to new technologies.  To start looking at new Javascript frameworks.  To revisit OO fundamentals like Dependency Injection, and stop writing such procedural code.  To consider NoSQL data stores.  In particular, graph data stores.

Sadly, that last part came in too little, too late.

Side-note, on the crest of the micro-services wave, he diagrammed and proposed an¬†entire system re-write for our core software product, using micro-services architecture and the concept of small purpose-dedicated data stores.¬† It looked real purty, but was ultimately and completely impractical for a company of our size.¬† If we were a “true startup” with millions in VC funding coming out the ears, and could afford to throw a brand new, young & hungry “2 pizza team” at the product, then sure.¬† But that was not reality.

The Brick Wall

No two bones about it: we had to support these new Standards.¬† So we soldiered on with our relational database tables.¬† And we tacked-on additional entities and relationships, logic and code, to make it “almost” do what the Common Core Standards wanted to do.¬† Effectively, what we were trying to do, was to shove that pretty round sparkly peg of graph data, into the dingy old square hole of a 15-year-old SQL table structure.

Somewhere along the way, AB caught up with the times and started offering other, less “flat” formats, for their Standards data.¬† So even though ASN was off the table, not all of our work toward JSON ingestion/conversion went to waste.¬† Consuming the data exports from the vendors wasn’t a problem — we’d already been doing this.¬† That was not the issue.

The issue, the¬†brick wall against which we continually banged our heads, was the fact that we just plain couldn’t support the advanced & complex relationships and groupings (categorizations) of the new Standards.¬† Which turned out, in retrospect, not to be the end of the world, because honestly it would take years, if not decades, for the educational system’s old-guard mentality to even comprehend such relationships and categorizations, let alone how they could help shape classroom instruction toward better outcomes for their students.

Good lord, that sounded like a bunch of jargon.

What I mean, in plainer English, is that we spent a lot of time worrying and arguing about something that did not matter as much as we thought it did.¬† The consumers of our system, the teachers and principals and such, honestly didn’t care about all that.¬† They just wanted to know if their kids were on-track to be “Proficient” on the state tests so their funding would remain steady.¬† (I don’t give them enough credit, I know; teachers themselves also needed to know, on a regular basis, which kids needed special attention in what areas of learning, but that’s beyond the scope of most generalized reporting tools.)

Hindsight is Always 20/20

So what was the lesson here?¬† I don’t want to overlook the fact that we were still using the wrong data storage technology for the problem, fundamentally.¬† Or at least, the wrong data model.¬† But, we live in a real world where software needs to be delivered on-time, in-budget, with less than perfect teams who have less experience and expertise than they feel they should.¬† So instead of butting our heads against that brick wall, let’s try to remember to be pragmatic.¬† To be realistic about what’s feasible for who & when; and to adapt the parts and segments of our application infrastructure, appropriately and efficiently, to the business problems and use-cases at-hand.¬† Over-engineering may be foolish, but under-engineering is just as perilous.