In light of https://natej.dev/2021/08/13/new-beginnings/ , which you should also read if you care about personal celebrations of life and love. I also have a bit of good news to celebrate at work. I’m finally not the LONE DBA!!
Look ma, I haz underlings!
Also, if you’re at all into the collectibles market, our company is hiring. And our tech org is growing FAST. Like, blazingly fast. Come check us out and apply. We’re super nice folks. (Except this Nate the DBA character, he gets ornery.) =P
I have almost zero down time anymore so don’t expect an impressive amount of content out of me, but I am getting back into writing and teaching, so I hope to do at least a few TSQL-Tuesdays over the rest of this year.
Instead of relying on my home internet bandwidth, I simply RDP into my workstation that’s still physically sitting in the office.
This is like a month late, but I didn’t have anything for the latest TSQL Tuesday on Unit Testing. Kudos to those of you who are doing it… it’s REALLY HARD. Harder than it should be anyway.
So as we all started working from home last month, it dawned on me that, apparently, the way I work from home is drastically different than most everybody around me. And it got me really curious, whether this is sort of generally applicable IT workers, or whether I’m really the odd-duck!
Probably like many of you, my primary workstation at the office is a laptop with a docking station hooked up to dual monitors and a full size keyboard and mouse. It makes going to meetings easier, and in theory I’m able to take it to the occasional conference. (I say “in theory” because, as I’ll explain in a moment, it rarely happens.)
Now, apparently the ‘norm’, at least for non-IT staff, is to take their laptop home, maybe with a dock & an extra monitor or two, fire it up, connect to the VPN, and off you go. Some people need printers and/or scanners, which makes it more complicated. But for now let’s focus on the typical knowledge worker that just needs a PC. And of course, if their workstation at the office is not a laptop, IT is tasked to provide them with one. Which, during this Coronavirus outbreak, has caused its own subset of issues with supply & demand at the big-box tech retailers.
Contrast that to me, and hopefully maybe some of you. I have the good fortune of owning a plethora of computing devices, most of which are at least suitable for running VPN & RDP (Remote Desktop). All of the, say, resources which I need access to and that I manage and manipulate on a regular basis, are in a data center, a big building with tons of racks of servers and gear and super-high-speed connections. And the office, of course, has a much higher-speed/bandwidth connection TO that data center than anybody does from their home. Some of you may see where I’m going with this.
So instead of relying on my home internet bandwidth to keep up with all the bandwidth-intensive tasks that constitute my workday — including, as my fellow DBAs will attest, dozens upon dozens of queries against the SQL Servers in SSMS — I simply RDP into my workstation at the office. Yes, my work laptop, which is still sitting in its dock on my desk in an empty (mostly) building. It’s the closest thing to actually being there.
Here’s what that allows me to do. All my network resources are just as quick and responsive as they are at the office. My code repositories and my Visual Studio windows are instantly available. My queries return results in milliseconds because they’re only traveling from the data center to the office. The only thing traveling the longer and slower distance to my house is the image of the screen from the work laptop. Make sense?
This is the same principle that drives some of these new “cloud gaming” services like Google Stadia or NVidia GeForce Now. The only thing you as the player really need is the continuous moving picture of the game, not all the hard work behind-the-scenes that it takes the GPU to render it. (Of course, the thing they haven’t really solved for is the input latency and the turnaround time of the player’s action being replicated on the screen and leading to the next action in the chain.)
Okay, I get it, it’s not for everyone
I understand some workers need more than that to do their jobs. And some folks don’t have the handful of spare PCs that I do. So I get it. Not everybody can or should work this way. But I was quite shocked that, out of all the folks I work with, I was the only who thought this was THE way to work remotely. Or at least the best way.
Surely, most people with a.. what we might call “gray collar” jobs — we work at a desk all day but we aren’t Wall Street and we don’t wear suits, so.. your typical middle class occupations, yeah? — anyway. We all have a workstation at the office. And surely we have at least one computer at home, even if it’s another laptop. Right? But no, apparently, the “tablet revolution” was a thing, and people don’t have big ol’ desktops or even traditional laptops at home much anymore.
Not only that, even if you could afford to invest in one FOR this season of mass-remote-work, you’d have had a bitch of time getting ahold of one, because everybody and their mother decided now was the time to buy. And not without good reason. Most of that purchasing was actually done BY the small-medium enterprise business market, because their IT departments had to buy a bunch of new laptops for their non-IT colleagues to take home for work!
I guess there’s not much of a big important point or lesson here.
Is there ever?
If anything, I would encourage people to TRY working from home this way, if they have the resources to do so. It significantly lessens the burden on your IT folks, as long as your home PC that you’re using to VPN is secure. That’s really the biggest variable in this equation, and in some cases, it’s why they actually prefer to issue you a company-imaged/company-secured device instead. But, thankfully, with a few network and firewall tweaks, they can lock down the VPN traffic so that it doesn’t allow your home device to talk to anything BUT your own workstation — or at least, any computer in the “workstations” group. So yes, there are some trade-offs. But again, especially if your employer has had trouble keeping up with the surge in demand for “loaner laptops”, maybe suggest you try this way and see if it can work for you.
Yes, I wrote a loop to find a gap in an identity-value sequence. Sue me. (Or don’t, that’d be cool too.)
I’ve done some things I’m not proud of. We all do, in IT, typically when we’re under-the-gun for a deadline or when the systems and frameworks in which we work have some sort of nuance or limitation that we just cannot get around, past, or over. And so we hack. We write code we’re not happy with. We even write code that we despise with every fiber of our well-intentioned being. But it has to be done. Because there’s no other choice.
With that somewhat depressing introduction, I give you my latest “dirty deed, done dirt cheap” (or for a reasonable salary, but still very reprehensible for anybody who values clean code and clean architecture). Yes, this is an actual stored-procedure signature that I wrote. Thank gods it’s not integrated into any software; it just happened to serve a niche purpose at the time. But still! I felt ICKY.
Description: Search for an available ID value in Category starting
at the low end (i.e. find a gap), because it's aesthetically pleasing
to have Root level Categories use low ID values. Yes, really.
But mostly b/c they need to be the same on DEV & PROD environments.
CREATE OR ALTER PROCEDURE dbo.[GetNextAvailableRootCategoryID]
@StartAtID int = 10 /*
'1' is not used but we don't feel like starting THAT low
(that'd be like "starting over" or "starting from scratch")
You never get a checkbook with the first check# being '1', right?
Wait, what's a checkbook?
Ask your parents. Or your grandparents. */
, @MaxID int = 1000 /*
Throw an error if we can't find any gaps before this point */
--and you can imagine how that went...
--something about a loop, a couple IF/ELSE's, a RAISERROR or two,
--and a RETURN @val in a B-tree(eeee).
I know, I know. You’re saying “wow that’s really not that bad!”. Or “wow, you REALLY couldn’t do something better about the constraints to make it less terrible?” Again, time, priorities, and energy. What’s 10 minutes in the big scheme of things, even if the code is silly and perpetuates the bad decisions that came before it? Should the overlaying systems actually depend on matching ID values? NO!!! Obviously not. But was that even on the software development roadmap for the next quarter? Also no.
So, what’ve you got? Show me your dirty deeds done in SQL. It’s okay, nobody’s judging! Except yourself, if you’ve any skin in the game. We’re all our own worst critics. But if we can’t laugh at ourselves sometimes, what the hell are we doing here? =)
Be humble, but be confident in your own expertise. Realize that you will sometimes be the teacher, and sometimes the student.
Howdy folks! It’s been a while. My last post was in October of last decade (har har)! I’ve been busy with otherprojects, work, and life in general. 🙂
Now, this month’s invitation is hosted by John Shaulis. I’m always excited to see someone in the #SQLcommunity that I’ve never heard of before. Looks like he’s familiar with the same WordPress templates as I am. Heh. I like it.
Side-note, I’ve been planning to buy my own domain and remove the ads here for a better reading experience (and more professional feel), so hang in there… it’s happening soon!
And now without further ado.
Everybody reading this knows what impostor syndrome is. I won’t bore you with that. Fact is, if you’re in tech, you either have it or you know someone who does. If neither, well you’re probably a narcissist surrounded by other narcissists and you might wanna look for another job. Heh heh.
I’ll share a personal and ongoing example of where I’m constantly feeling “not good enough at my job”. It’s about server migrations. You know, where you take a running production SQL Server instance, and you have to move it to new hardware/infrastructure. Maybe it’s a physical (bare metal) cluster to a virtualized environment. Maybe it’s just one VM to another VM on a newer platform. It typically involves taking a maintenance window and convincing business stakeholders that the “downtime” is worthwhile.
Now surely we’ve all heard of DBATools by now, right? (If not, go there right now and check ’em out!) They were BUILT to do migrations. Yet I’ve not been able to successfully use them as the whole and ONLY toolset for that purpose. Let me clarify: I’ve used pieces of the framework (such as the copy logins / users command, and others) to HELP me in the migration project, but never been able to simply fire off Start-DbaMigration and go get coffee and watch as things magically successfully fall into place.
Even when I’ve built a beautiful check-list, lined up all my scripts, scheduled things via Agent Jobs, double-checked names and permissions and networks and drive letters and shares. Still, something inevitably goes wrong. Maybe it’s just that a database refuses to go into READ_ONLY mode when I ask it to. Maybe it’s that a whole series of jobs get lost because they were in the wrong category. Maybe user permissions don’t come over on this database because I didn’t sacrifice a chicken on the night of the blood-moon while the wind blew north-east.
Surely, by this point in my 10+ years as a database professional, 4 years at this particular company and role, I would be able to do this with my eyes closed. With no hiccups or misfires. Right? RIGHT??
Wrong. Murphy’s Law is a real thing. More than that, things CHANGE. The environment is always evolving to fit the business needs. The technology is always just a little ahead of my own knowledge-base. And every SQL instance in this environment is, unfortunately, still, a ‘pet‘. Not a ‘cattle‘ (cow?). They’re each carefully and fearfully constructed to suit a specific set of application and business needs, each with their own nuances and barely-documented dependencies.
Plus, it’s not like we’re doing these migrations very often. Anything you don’t do on a constant basis, at least a few times per week, tends to sink back toward the bottom of the mental stack, and you forget the details and gotchas that were involved because your brain needs to keep more important and relevant things near the top, for what you’re working on “right now”. This is normal, at least in my mind.
DevOps stole the cookie from the cookie jar!
Just hire a DevOps Engineer to solve all your problems. Then everything will be all sunshine and rainbows and unicorns. All your servers will be cattle, replaceable little containers that are allowed to fail because there’s always another one waiting to spin-up and replace it. Not like your DATA is important or anything. It’s only the lifeblood of the entire company.
Not really sure where I was going with this. I love DevOps, don’t get me wrong, and it does have a part to play in making the traditional RDBMS platform more agile, but the devilish details and difficulties therein are ALWAYS overlooked/brushed-off in popular discourse, and it’s got me a little jaded.
I mostly wanted to finish the song I’d started with the title-blocks. =P
But srsly. Instead of feeling guilty about your impostor syndrome, just own it. Be humble, but be confident in your own expertise. Realize that you will sometimes be the teacher, and sometimes the student. And hopefully more often the latter, because there’s exponentially more to learn every year we evolve on this crazy spinning rock we call Earth.
In my sample script, I have a “transaction table” where I insert a few events for each office, with the ‘original date’ in PST, and I show how to easily convert that to each office’s local time, or visa-versa.
I’ve ranted about times, datetimes, and the like before. As most programmers & IT pros do, I loathe time zones with a mild passion. As one of my favorite #SQLCommunity members quips:
daylight savings time was actually created by a government works project to ensure that programmers could forever write tedious conditional logic in their date-based queries [and code].
In fact, I’m not even going to repeat what 2 (and many more, I’m sure) people have already said. Go check out their posts above! But since you’re here, this is how I use it for my reporting environment.
To start with, the transactional data is in PST/PDT — i.e. Pacific Time with DST fluctuation. Yes, it’s horrible. No, I don’t know what happens to events or jobs at 2am on the “Fall Back” date, or between 2am and 3am on the “Spring Forward” date. No, I can’t change it right now. Stop whining.
Now, I have offices in Paris France, Hong Kong, and Beijing China. These are 3 different “time zones”, but only 2 different offsets — China and Hong Kong are in the same bucket, namely, UTC +08:00. More on that later.
So I have my OfficeLocation lookup table:
Office | TimeZone --------|------------- Paris | 'Central Europe Standard Time' Beijing | 'China Standard Time' HK | 'China Standard Time'
(Again, see below for why we can’t call HK’s zone “Hong Kong Time” like most websites/APIs would assume.)
Now, the cool thing about this is, we can pull those strings into a variable, or use them straight from the table, to convert our PST/PDT times to the appropriate zone.
Here’s a variable example:
DECLARE @TimeZoneStr sysname; --"sysname" is just nvarchar(128) SELECT @TimeZoneStr = TimeZone FROM OfficeLocation WHERE Office = 'Paris'
DECLARE @MyTimeNow datetime = GETDATE(); DECLARE @TimeInParis datetime; SELECT @TimeInParis = @MyTimeNow AT TIME ZONE 'Pacific Standard Time' --Converts to datetimeoffset AT TIME ZONE @TimeZoneStr --Shifts it to Paris time
PRINT ('The time now in Paris is ' + CONVERT(varchar(30), @TimeInParis, 121);
And here’s an example using the field straight from the table.
, [Time in my location] = GETDATE()
, [Time in remote office location] = CONVERT(datetime, GETDATE()
AT TIME ZONE 'Pacific Standard Time'
AT TIME ZONE ol.TimeZone)
FROM OfficeLocation ol
See the Gist for a full-fledged sample script. In it, I have a “transaction table” where I insert a few events for each office, with the ‘original date’ in PST, and I show how to easily convert that to each office’s local time, or visa-versa.
Here’s the major catch. The information available to YOUR instance of SQL Server is pulled from that server’s Windows Registry hive. No, I’m not making this up. So if that box doesn’t know about, say, ‘Hong Kong Standard Time’, and you try to use that in your SQL statement.. you’re hosed.
And yes, that is a real example from my own experience.
This article shows the “Windows standard format” time zone list. As you can see, they merged some zones with others because.. they felt like it? But apparently Central Europe Standard Time, Central European Standard Time, and Romance Standard Time (all UTC +01:00) were completely necessary to keep separate. Go figure.
In my use-case above, then, I couldn’t actually store the string Hong Kong Time, because my SQL instances (hence my Windows Servers) don’t know what that is. Thankfully, at least for this decade, it doesn’t look like Hong Kong and China will diverge in terms of their geopolitical directions, and we’re safe to assume that HKT = CST (China Standard Time, not to be confused with US Central Time!).
In another example, the typical go-to site for timezone questions says Japan observes “Japan Standard Time”. Obviously enough. But Microsoft, in their infinite wisdom, decided to call that “Tokyo Standard Time”. Go figure again.
It also kinda makes you wonder.. how does this work on SQL on Linux? No such thing as “the registry” there. I’m sure there’s an internal OS data-store that houses time-zone info, of course. Heck, they might even be better at it than Windows. But it makes you think.
If you’re not already running SQL 2016 or upward, this should give you yet another compelling reason to upgrade. Seriously.
And don’t do what I did and attempt to store a “business locations with time-zone offsets” table, that you have to remind yourself every 6 months to go update (manually), and will inevitably fail to do so, and will not support any sensible manner of long-term historical reporting.
More to the point, don’t try to implement dynamic time-zone logic and calendaring yourself, in general. Because trust me, you’re not gonna get it right. Use the built-in tools, use the community resources, and be smart.
My current velocity is sometimes measured in FPH – facepalms per hour.
This is a rant. Fair warning.
I guess the new ‘Millenial’ colloquialism for “grumpy” or “sarcastic” is “salty“. So I’m feeling extra salty this week. For several reasons. One, it’s audit season. Two, I had to churn out about a dozen new reports in the span of 4 days because the manager who was supposed to be tracking that project dropped the ball and forgot they were due by the end of this month until… yeah, last Friday. Wheeeeee!
Thus, I decided, my current ‘velocity’ (a SCRUM/DevOps term for “how much work are you getting done”) shall be measured in FPH – Facepalms Per Hour. Currently I’m at 3. Earlier this week I was approaching the double-digits, when the lovely report consumers kept thinking of “just one more little thing” they forgot about until after I’d delivered the ‘final’ product.
‘Final’ actually being a meaningful adjective in this context approximately NEVER.
How best to describe this scenario while still maintaining separation of “real job” from “blog land”… Hrm. So let’s say we have a CRM, like most companies. This stores customers, among other things, in a database. And since it also stores sales transactions and financials, it’s heavily audited — it has a lot of change-tracking mechanisms.
Now, auditors come along and want a report of some specific type of change over time. I happily oblige. Then… PANIC! And not at the disco. “What are all these changes to these customers by these users who don’t have permission to make said changes?!?”
K, calm down sparky. Try not to sound the alarm; auditors are a sensitive bunch.
Turns out, those changes are, in a word, “fake”. You see, there’s this background “customer sync” process that keeps them up to date with another part of the CRM where the actual changes were made. But, because it’s written poorly, it thinks that ANY field change, even just the Name or Address (which a lot more CSR’s, customer service reps, have the permission to change, because, you know, that’s their job), constitutes a change to the ENTIRE customer record on the other end. So the change tracker logs a change to every single field on the receiving end of that sync process, even though nothing really changed on the source side except maybe one or two fields.
With me so far? Great. So now the question is, “Well, can we get a report that doesn’t show those ‘fake’ changes?” But wait, it has to be “system generated” and you’re not allowed to “filter” or “add special exceptions” to it, because it still needs to be audit-able.
So what you’re saying is, give me a report that shows me what I care about, but you’re not allowed to change the logic behind said report.
So I give them a new report. I don’t explain how the sausage is made, I just make it and serve it up. “But why is this different from the original report?”
Well, do you want the audit-able answer, or the real answer? The audit-able answer is, “We made a system change that allowed us to prevent the ‘fake’ changes from being logged incorrectly.”
The real answer is, “B*tch, I AM the system!” — meaning yes, I excluded those with some hacky logic, and you need to stop asking questions about it.
Anyway. Change Logs are super fun.
Speaking of reporting. I could really go on for pages about how terrible and broken this whole system of “request-based report development” is. But it’s frankly all we have right now. Until there’s sufficient business buy-in to the concept of agile data warehousing and collaborative cross-functional data modeling, shit just comes in one funnel and goes out another with a little sparkle spackled to it. And we call it a report.
Example, you say? Sure! Let’s say we run a special sale on certain types of widgets every quarter. We want to track how these ‘specials’ perform — do they increase our sales of those widgets? By what factor, compared to the other not-on-sale widgets? Can we trend this over several quarters?
Oh but wait. The data structures that govern widget pricing and time-span-based sale pricing, and the logic that relates customer orders to what pricing structure they used at the time of ordering, is awful, terrible, and changes every time there’s a new quarterly promotional sale.
So you’re saying you want a report that trends sales of widgets based on arbitrarily changing promotional pricing as compared to other widgets that may or may not be subject to ‘normal’ pricing during that same time period, all without a simple definitive data-point that says “This is a Quarterly Promo sale, and That is Not.”
Let’s try to get at the root of the problem, shall we? The business doesn’t seem to understand that the way they implement promo-sales is detrimental to long-term/comparative reporting. The data model makes this harder, not easier. Can we perhaps put some heads together and come up with a compromise that both A) makes more business sense, and B) improves the data model to be a bit more intuitive?
What’s your FPH? What causes you to facepalm on a regular basis? Let me know in the comments! :o)
Let’s make it real simple for the end-user to get the ‘best’ behavior by default. Let’s sort by our ‘Revenue’ column in DEscending order first.
You ever have a paginated report, i.e. a standard old-fashioned report with a big table (tablix, matrix), where you want the users to be able to sort by any column they want? And let’s say one such column is ‘Revenue’ — some money amount where the “biggest is the most important”. Like your top customer, for example.
Now, you know about ‘Interactive Sorting’ right? No?!? Oh it’s awesome! Let me remind you. It’s super easy to configure: just get to the Text Box Properties of the header box of whatever column you want to use, & go to the ‘Interactive Sorting’ tab. Like so:
Make sure you pick the appropriate Field in the Dataset to “Sort by”. Hint: it’s usually the one you’re displaying in that column!
But you know what’s kinda annoying? You can’t dictate a “first sort direction” — it just assumes that the first time you click the sort-arrows, you want ‘Ascending’ (lowest first). Then you can switch to ‘DEscending’ (highest first). This makes perfect sense for alpha values (strings), but not always for numeric values — at least not when you’re dealing with money, when generally the highest dollar amount is the most important!
So let’s make it real simple for the end-user to get the ‘best’ behavior by default. Let’s try to make it sort by our ‘Revenue’ column in DEscending order first. Ready?
It’s so small in that picture you might miss it. I’m doing this, but I’ll add whitespace below for easier reading:
= - Fields!Revenue.Value
..^look, there's the minus sign!
(assuming your viewing device hasn't mangled the fixed-width text)
You could also get more pedantic/readable and spell-it-out like so:
Right? Beautiful. So what does it look like on the report?
Notice my silly hyphenated column names, like ‘Custom-er No’ and ‘Rev-enue’. That’s because those sort-icons take up space within the textbox, and you can’t control it, so the text wraps in a non-obvious manner. Thus, I purposefully added a hyphen and a line-break to the text, so I could control how they looked, instead of leaving it to the wiles of the SSRS render-er.
In which we discuss more about the EAV model and some of its merits and pitfalls.
continued from last week…
The Ugly (aka the “Wow really?!?”)
You’ll see this ‘creep’ even in product-catalog systems as mature as Amazon’s. If you search for (geeky as we are) graphics cards, and use the product attribute filters in the left pane to narrow it down, you’ll find that some correctly have their memory type (GDDR5, etc.) listed, while others may not. If you’re really unfortunate, there will be two semi-redundant attribute-sets that you’ll have to “juggle” between to really get at what you want. TVs, for example, may see both an “HDR support” (yes/no) and an “HDR type” (standard, ultra, etc.) — I’m kinda pulling those out of my arse for example’s sake, but you get the point.
Why does this happen? Because at some level, humans are still responsible for ‘tagging’ those products (for lack of better word). And as much encouragement and guidance as the ‘admin software’ may give them, they can (and do) still decide at times to side-step that guidance and say “Nope, I know better; make me this new thing!”
But isn’t that a problem with nearly all data-driven systems? Sure, of course it is. Yet with a model as flexible as EAV, the problem is intensely magnified by the fact that it’s made so easy to do — to ‘extend’.
And unfortunately, the biggest contributor to this problem is the lack of development-time and forethought given to the administration, or management, of the data. You see, this problem could be largely assuaged if the admin-toolset were the FIRST thought and priority in the roadmap. But so often, that thought comes LAST, if at all. So sure, your product feature tagging system looks great, it’s flexible and your customers love it. But you’re throwing tickets over the wall to your data team every time a requirement or use-case changes, or when you need to fix a data-quality problem caused by the users not knowing there was already a “Widget Type” before creating their new “Widget Kind” tag, or misspelling “Is Wierd” because English is weird and has more exceptions to the “I before E” rule than not.
Does this problem go away with a dedicated search-index or NoSQL technology like Elasticsearch or (shudder) MongoDB? Of course not! If anything, it may be worse. Maybe. But wait, those systems make it easier to de-dupe and manage redundancy & data quality, don’t they? Probably. I can’t speak from experience myself, but I’ve heard good things. Once again, it all comes down to the effort you’re willing to invest in the system. If you make data quality a priority, you’ll be happier with the experience. If you don’t, well you’re just another amateur data scientist complaining about dirty non-standardized/non-validated address fields, aren’t ya? =P
I joke with the data scientists, of course. What they do is awesome. They just tend to often re-invent the wheel of data-cleansing/data-wrangling that we DBAs have been doing for a few decades, because they didn’t know the right questions to ask or the right place to look. We need to get better at working together WITH them, not ‘for’ or ‘against’ them.
The Why or When (aka “Is it a decent model for this?”)
The long-story-short version is, consider your business and your data. Try to plan for the future, and anticipate potential changes and growth. It’s not easy, and we never “get it right the first time”. But we can try.
When your attributes are fairly static, and you know that you can tightly control them, you might consider a more rigid model. Something with a handful of lookup tables referenced by the main product entity. This is advantageous for performance and management, at the expense of scalability and extensibility.
When you literally need to support on-the-fly extension, and you’re absolutely married to SQL (i.e. not ready to venture out into NoSQL land just yet), the EAV model may fit the bill. Aaron’s article, and the comments therein, present some fairly valid and reasonable implementation suggestions to make it a little more palatable. Just beware the date — that was written back in 2009. Before we had such things as Elasticsearch and its ilk. I’d heavily encourage the consideration of purpose-built data-stores for this sort of thing, if you have any hope of scaling-out.
Other tools in your toolbox can help with this, too. For example, consider an in-memory data-grid for super-fast reads. The vast majority of data-access to these attributes & values is going to be reading, using it to filter & slice & dice a data-set. You can pay the small performance cost (e.g. write to the underlying SQL database) on the rare occasion when a write/change needs to occur.
Proving the age-old rule of “Just because you CAN, doesn’t mean you SHOULD”, the EAV model is sometimes okay and sometimes not. You need to understand your business and your data to make that call. And you need to consider the magnitude of effort that may be involved in pivoting from one model to another. Unfortunately, in many cases, that part overshadows the rest, and the show business must go on.
Still, I encourage you to at least think about it, and be ready with that knowledge of pros/cons when the time is right to discuss it with stakeholders.
In which we discuss the EAV model and some of its merits and pitfalls.
EAV, or Entity-Attribute-Value, is an data model that’s been around the block. It’s typically injected into a relational database at some point during the overall application/architecture life-cycle, somewhere between when the team realizes that they’ve got way too many lookup tables for a “main business entity” thing, and when they finally make the shift into polyglot data stores.
Wake me up when that actually happens, successfully.
I’m not going to rehash aging internet arguments here, nor bore you with replicated diagrams that you could just as easily look up on Google Images. No, instead, I’m going to tell you why this model is not great, why it’s not bad, and how you should go about deciding if it’s not wrong for you.
Yes, I did negate all those adjectives on purpose. Mostly because nobody really enjoys working with these structures, regardless of how they’re implemented; customers and business stakeholders are ALWAYS CHANGING the requirements and the attributes in question. But, we press on.
Proponents tell us that this model is easily searchable and easy to administer. The “searchable” bit is true; especially when you have the attribute-values pre-defined and don’t rely on end-user text-entry. But that’s true of basically any data model. The key here is that all attribute-values are effectively in one “search index”. But wait, don’t we have purpose-built search indexes nowadays? (Hint: see Elasticsearch.) This will come up again later.
Administerable? Administrable? Administratable? Damn you English! Anyway. Yes, again, if you’re fairly confident in your business users’ ability to effectively track and de-dupe (de-duplicate) incoming requirements/requests using their own brains/eyeballs and the admin tool-set that you build for them.
Oh, right, did I mention that? You have to build the admin app. Because you do NOT want to be writing ad-hoc SQL queries every time a new attribute requirement comes in. (Still, it’s better than making schema changes for new req’s, as I’ll discuss in a bit.)
Mainly, though, the biggest ‘pro’ of this model is that your business requirements, i.e. your attributes and the values they’re allowed to contain, can be flexible. The model allows a theoretically infinite amount of customization to suit your needs; though in practice, as Allen writes in the CodingBlocks article, you do run up against some pretty hard scalability hurdles right-quick. So in practice, you might want to consider more horizontally-scalable data stores, or (God help you) try scaling-out your SQL databases. (Spoiler-alert: big money big money!)
The Bad (aka the “Not Great”)
Which brings me to the first ‘con’. Performance. If you’re modeling this in a relational DB, and you expect it to scale well, you’re probably overly optimistic. Or very small. (If the latter, great! But you don’t always want to be small, right? Right!)
Don’t get me wrong; you can make it work half-decent with good indexing and sufficient layers of abstraction (i.e. don’t write a “kitchen-sink view” that’s responsible for pivoting/piecing-together all the attributes for a product straight outta SQL). But again, is it really the right tool for the job?
Momentary digression. SQL Server, or more generally, the relational database, has long been touted as the “Swiss army knife” of IT; we’ve thrown it at so many problems of different size and shape, that we’ve almost lost track of what it’s actually very GOOD at. Hint: it’s about relationships and normalization.
Another argument against it seems to be data integrity and enforcement. I can understand that, but again, with some clever software overlay and user-guidance, this can become almost a non-issue. But remember, your developers are the ones building said software. So that effort needs to be considered.
The Ugly (to be continued…)
The biggest problem, and quite a legit one, is ‘creep’ — both scope and feature. See, the inherent flexibility in the model will almost encourage data managers to be less careful and considerate when deciding when to add an attribute or value-set, and how to govern the data-set as a whole.
Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!
Our benevolentoverlord prompted us (months ago; I know, I’m always late to the party) to write about what we do in our day-to-day jobs as SQL Server professionals, aka #SQLCareer. The actual idea is to do 4 posts in a row, for 4 working days (either adjacently or the same day each week for 4 weeks).
Sticklers, us? Nope, never.
Without any kind of future guarantees, here is my first.
Emails, tickets, and alerts, oh my! Today was actually pretty quiet on the monitoring front — no major performance issues, no interruptions, and no job failures. That one that occasionally takes 8+ hours did okay last night. More often than not, I have to kill it if I find it’s been running into the work-day. That should be automated. /backlogged
Re-wrote some analyst’s data change script to be more “friendly” and set-based, instead relying on identity values — which is important when they can differ between Dev & Production environments. Deployed a few change requests, including that one.
On the side of less tedium, I fleshed-out more of the plan to virtualize our remaining bare-metal SQL servers. The usual ‘gotchas’ have to be discussed with the Server Admins — reserving compute resources, preventing the ‘noisy neighbor’ problem, and having enough storage to do the migration.
Yes, that means about 2x the storage as they’re currently using. Got a problem with that? No? Good. 😉
Finally, I worked on some code for querying the SSRS ReportServer database to obtain report metadata (including stored-proc usage, parameters, and user activity). The core concepts came from Ted Stathakis‘s SQLSaturdaysession on SSRS 2016. Hopefully, some of that code will be coming to GitHub soon.
“Delight” being sarcastic. No really, I love my job, but sometimes there are just some things that make you go ‘grr arg’.
First up, developing and finalizing some “data cleanup effort” reports. These are things that show the business users “what’s wrong with the data” and give them some guidance on how to “fix it”. Now, because there is no easy “management GUI” for this data, and because actual changes need to go through change-control, it’s not going to be a cake-walk. But given the right inputs and some domain knowledge, they can make the decisions as to what those changes should be, and then hand them off to the BA’s (business analysts). Usually in Excel form.
Next we have the ‘grr’ part. This is where I spent about 40 minutes trying to explain to the 3rd-party software vendor exactly where and how to integrate with our core data systems. Most of the info should not be ‘new’ to them, but since I never know which support tech will be assigned to the issue, it’s best to err on the side of verbosity. To make things more confusing, the folks on my end, who have been interfacing with the vendor thus far, aren’t intimately familiar with the underlying technology bits, so the conversation gets a little dicey before we all arrive on the same page.
Fortunately, since I had their attention, I was able to sneak in a suggestion to upgrade the back-end database for the software (which we self-host, but is essentially managed by them & their software) to something from this decade. MySQL 5.0 is old, guys, mmkay?
A developer needs a new table & some initial data-load to production; it’s passed QA. Great! They wrote the query a little weird, but nothing ol’ SQLPrompt can’t fix.
Commit outstanding stored-proc code & RDL files (SSRS reports) to source control. Even if I’m the only one working on them, I’d rather they be safe & track-able.
Ruh-roh. A developer set off some proc in a Dev db, it’s been running for over 30 minutes… and then they cancelled it! But wait, it’s stuck in rollback. Yes, kids, rolling-back a transaction can take even longer than the transaction itself, due to rollbacks being single-threaded.
Now, since the user’s hitting of the ‘Cancel’ button (aka “Stop” in SSMS) does not wholly and entirely cause a kill SPID command, I do that for them (since only sysadmin or processadmin can kill sessions). Then I run ye oldekill 64 with statusonly (64 happened to be the SPID) to check on its rollback status. Good news, everyone! It’s at… 0%.
And it stays at 0%. For fifteen minutes.
I move on to other tasks of course, since there’s nothing more I can do (short of pulling the plug on the server, which is never a good idea (unless you’re testing your Disaster Recovery Hope plan!). We also create a ticket to fix this stored-proc so that it doesn’t “take forever”, even if it has to churn through millions of records.
Finally, the statusonly check starts moving. It’s about 45% rolled back, with about “300 seconds remaining”. This, as anybody who’s been a DBA will tell you, is in “Microsoft time”. You know those progress-bars you typically get in Windows while something is happening? They’ll start off fast, make it to around 90-some percent, tell you there’s about 1 minute remaining, then take at least five minutes to “finish that last little bit”? Yeah, that’s a “Microsoft minute”. Patent-pending.
But fortunately for us, it does finish in just about the time it promised, maybe a bit longer. Just in time for the Dev to test what they wanted to test before going home for the day. And just in time for me to have my evening not ruined by a runaway transaction.
PS: Anybody who’s ever worked with SQL, including yours truly, has done this (set off a long transaction and/or tried to rollback a long transaction) at least once in their career. If you ask a seasoned DBA or DB-Dev, and they say “No, I’ve never done that!”, they’re a dirty rotten liar. =D
Are You Not Entertained?
If the life of a DBA sounds like fun to you, get in touch with your local IT recruiter and find out what the job market is like! You’ll hear all sorts of doom & gloom about how “the Cloud is taking over everything” and “the DBA is dead; long live DevSecDataOpsUnicorns!”.
No, I joke. Yes, some of the fundamental responsibilities of the DBA are shifting to the cloud vendors, to IaaS/PaaS services, but at the end of the day, being in IT is about learningtech and working with people. Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!