Dirty Deeds Done in SQL

Yes, I wrote a loop to find a gap in an identity-value sequence. Sue me. (Or don’t, that’d be cool too.)

I’ve done some things I’m not proud of. We all do, in IT, typically when we’re under-the-gun for a deadline or when the systems and frameworks in which we work have some sort of nuance or limitation that we just cannot get around, past, or over. And so we hack. We write code we’re not happy with. We even write code that we despise with every fiber of our well-intentioned being. But it has to be done. Because there’s no other choice.

With that somewhat depressing introduction, I give you my latest “dirty deed, done dirt cheap” (or for a reasonable salary, but still very reprehensible for anybody who values clean code and clean architecture). Yes, this is an actual stored-procedure signature that I wrote. Thank gods it’s not integrated into any software; it just happened to serve a niche purpose at the time. But still! I felt ICKY.

cursors are bad mkay
Did you recognize the album art from the cover pic? Just wondering… ūüėČ
/*
Description: Search for an available ID value in Category starting 
at the low end (i.e. find a gap), because it's aesthetically pleasing 
to have Root level Categories use low ID values. Yes, really.
But mostly b/c they need to be the same on DEV & PROD environments.
*/
CREATE OR ALTER PROCEDURE dbo.[GetNextAvailableRootCategoryID]
  @StartAtID int = 10  /*
      '1' is not used but we don't feel like starting THAT low
      (that'd be like "starting over" or "starting from scratch")
      You never get a checkbook with the first check# being '1', right?
      Wait, what's a checkbook?
      Ask your parents. Or your grandparents. */
  , @MaxID int = 1000  /*
      Throw an error if we can't find any gaps before this point */
AS
BEGIN
--and you can imagine how that went...
--something about a loop, a couple IF/ELSE's, a RAISERROR or two,
--and a RETURN @val in a B-tree(eeee).
END;

I know, I know. You’re saying “wow that’s really not that bad!”. Or “wow, you REALLY couldn’t do something better about the constraints to make it less terrible?” Again, time, priorities, and energy. What’s 10 minutes in the big scheme of things, even if the code is silly and perpetuates the bad decisions that came before it? Should the overlaying systems actually depend on matching ID values? NO!!! Obviously not. But was that even on the software development roadmap for the next quarter? Also no.

So, what’ve you got? Show me your dirty deeds done in SQL. It’s okay, nobody’s judging! Except yourself, if you’ve any skin in the game. We’re all our own worst critics. But if we can’t laugh at ourselves sometimes, what the hell are we doing here? =)

Dealing with Time Zones in SQL

In my sample script, I have a “transaction table” where I insert a few events for each office, with the ‘original date’ in PST, and I show how to easily convert that to each office’s local time, or visa-versa.

I’ve ranted about times, datetimes, and the like before. As most programmers & IT pros do, I loathe time zones with a mild passion. As one of my favorite #SQLCommunity members quips:

daylight savings time was actually created by a government works project to ensure that programmers could forever write tedious conditional logic in their date-based queries [and code].

Bert Wagner

Thankfully, in SQL 2016, MS has somewhat heard our outcries and given us an easier way to convert things between time-zones in a somewhat sensible manner.

Resources:

  • The at time zone keyphrase
  • The system metadata table that drives it
  • A good blog post that explains essentially what I’m about to (because it’s great to have different perspectives and unique voices!)
  • Two StackOverflow answers (one, two) on the topic
  • Bert’s succinct and fun post about the same topic, with video!
give me utc or give me death
This is Patrick Henry. He famously said “Give me liberty or give me death!”. Now you know.

A Use-Case

In fact, I’m not even going to repeat what 2 (and many more, I’m sure) people have already said. Go check out their posts above! But since you’re here, this is how I use it for my reporting environment.

To start with, the transactional data is in PST/PDT — i.e. Pacific Time with DST fluctuation. Yes, it’s horrible. No, I don’t know what happens to events or jobs at 2am on the “Fall Back” date, or between 2am and 3am on the “Spring Forward” date. No, I can’t change it right now. Stop whining.

Now, I have offices in Paris France, Hong Kong, and Beijing China. These are 3 different “time zones”, but only 2 different offsets — China and Hong Kong are in the same bucket, namely, UTC +08:00. More on that later.

So I have my OfficeLocation lookup table:

Office  | TimeZone
--------|-------------
Paris | 'Central Europe Standard Time'
Beijing | 'China Standard Time'
HK | 'China Standard Time'

(Again, see below for why we can’t call HK’s zone “Hong Kong Time” like most websites/APIs would assume.)

Now, the cool thing about this is, we can pull those strings into a variable, or use them straight from the table, to convert our PST/PDT times to the appropriate zone.

Here’s a variable example:

DECLARE @TimeZoneStr sysname; --"sysname" is just nvarchar(128)
SELECT @TimeZoneStr = TimeZone
FROM OfficeLocation
WHERE Office = 'Paris'

DECLARE @MyTimeNow datetime = GETDATE();
DECLARE @TimeInParis datetime;
SELECT @TimeInParis = @MyTimeNow
AT TIME ZONE 'Pacific Standard Time' --Converts to datetimeoffset
AT TIME ZONE @TimeZoneStr --Shifts it to Paris time

PRINT ('The time now in Paris is ' + CONVERT(varchar(30), @TimeInParis, 121);

And here’s an example using the field straight from the table.

SELECT Office
    , [Time in my location] = GETDATE()
    , [Time in remote office location] = CONVERT(datetime, GETDATE()
        AT TIME ZONE 'Pacific Standard Time' 
        AT TIME ZONE ol.TimeZone)
FROM OfficeLocation ol

See the Gist for a full-fledged sample script. In it, I have a “transaction table” where I insert a few events for each office, with the ‘original date’ in PST, and I show how to easily convert that to each office’s local time, or visa-versa.

working with time is easy -jon skeet
Well YEAH, if you’re frickin’ JON SKEET! =P
But srsly, go watch his presentation and read his stuff.

Gotchas

Here’s the major catch. The information available to YOUR instance of SQL Server is pulled from that server’s Windows Registry hive. No, I’m not making this up. So if that box doesn’t know about, say, ‘Hong Kong Standard Time’, and you try to use that in your SQL statement.. you’re hosed.

And yes, that is a real example from my own experience.

This article shows the “Windows standard format” time zone list. As you can see, they merged some zones with others because.. they felt like it? But apparently Central Europe Standard Time, Central European Standard Time, and Romance Standard Time (all UTC +01:00) were completely necessary to keep separate. Go figure.

In my use-case above, then, I couldn’t actually store the string Hong Kong Time, because my SQL instances (hence my Windows Servers) don’t know what that is. Thankfully, at least for this decade, it doesn’t look like Hong Kong and China will diverge in terms of their geopolitical directions, and we’re safe to assume that HKT = CST (China Standard Time, not to be confused with US Central Time!).

In another example, the typical go-to site for timezone questions says Japan observes “Japan Standard Time”. Obviously enough. But Microsoft, in their infinite wisdom, decided to call that “Tokyo Standard Time”. Go figure again.

It also kinda makes you wonder.. how does this work on SQL on Linux? No such thing as “the registry” there. I’m sure there’s an internal OS data-store that houses time-zone info, of course. Heck, they might even be better at it than Windows. But it makes you think.

world time zone map
Disclaimer: this is probably already out-of-date. And it’s from Wikipedia.

So What?

If you’re not already running SQL 2016 or upward, this should give you yet another compelling reason to upgrade. Seriously.

And don’t do what I did and attempt to store a “business locations with time-zone offsets” table, that you have to remind yourself every 6 months to go update (manually), and will inevitably fail to do so, and will not support any sensible manner of long-term historical reporting.

More to the point, don’t try to implement dynamic time-zone logic and calendaring yourself, in general. Because trust me, you’re not gonna get it right. Use the built-in tools, use the community resources, and be smart.

That’s all for today! ‚̧

Replication “Just Trust Me”

For what seems like years, I’ve bemoaned the fact that SQL Transactional Replication doesn’t come with a “Just Trust Me” option. I’ll explain more about what I mean in a moment. The other thing I’ve complained about is that there’s no “Pause” button — which not entirely accurate, since obviously you could just stop the distribution and subscription agents. But specifically what I mean is, it’s not easy to ‘put it on hold so you can make some schema changes to one of the tables that’s being replicated’, and then easily “Resume” it after you’re done with said changes.

Well, I’m happy to say that now I have both of these tools/methodologies in my arsenal!

Quick level-set: If you’ve been living under a virtual rock, SQL replication is an old-hat “tried-and-true” method of producing readable copies of your data on other SQL servers, whether for reporting or DR. It’s not an HA technology per-se, although I suppose you could use it for that if you were feeling adventurous. It’s more for “I need a reasonably up-to-date copy of my data ‘over there’ so I can run reports / crappy user-formed / EF-generated queries against it without slowing down my production OLTP system.”

Yes, I did just take a pot-shot at Entity Framework. #DealWithIt

i don't always break replication but when i do it drives me to drink
But not Dos Equis. That stuff is terrible. =P

Why?

The word that comes to most DBA’s minds when they think of replication is ‘brittle’. And for good reason — when it breaks, it breaks hard, and you’re often left trying to pick up the pieces while wondering how much worse it could be if you just started over from scratch (i.e. dropped all the replications and re-created them). Which, honestly, sometimes is easier. But not if you have a large volume of data, and certainly not if that data is indexed and you don’t want your apps to experience a performance-crisis!

Now, because this post has been sitting in my ‘Drafts’ area for far too long, I’m going to break this up into 2 parts, so I can get something out the door. In part 1, I’ll briefly explain each of the key components of the process. In part 2, I’ll dive into a little more step-by-step detail.

Primary resources that went into this: docs, article1, article2, article3. And my very own dba.SEanswer where I apparently went through a similar process back in 2016 and subsequently forgot about it (mostly).

Key 1: Sync-Type

TL;DR: the “Just Trust Me” option is, when you create the subscription, sys.sp_addsubscription, specifying the @sync_type = 'none' parameter value. Huge thanks to @garethn in the SQL Community Slack.

Sidebar: if you haven’t yet joined the SQL Community Slack, WHAT ARE YOU WAITING FOR?!?!? DO IT, DO IT NOW!!!

Ahnold ‘teh Governator’

@sync_type = 'replication support only' may be applicable in some scenarios as well, but I’m not 100% clear on the difference / use-cases at the moment. More to come later, hopefully.

Key 2: Script Publication Procs

Protip: sys.sp_scriptpublicationcustomprocs @publication = 'PublicationName' generates the internal repl-procs that control the table creations/updates on the subscriber. You run this ‘script’ command on the publisher, then get the results (the script it generates), copy-paste to a new SQL file, and run on the subscriber.

This has come in handy on several recent occasions, wherein I had to either swap tables behind-the-scenes due to a PK change, or make a column & index change that involved truncation. Using the “stop, shuffle, start” method, which I’ll get into in part 2, I’m able to tell the subscriber “Hey, the definition of this table has changed, you need to grab these new repl-procs so you can handle it correctly!”

Key 3: Publication Properties

In order to tell our publication that “We’re gonna be making some changes, don’t panic!”, we want to turn OFF 2 properties (assuming they’re true, which they likely are by default) using sys.sp_changepublication @publication='MyPub'. The properties are 'allow_anonymous' and 'immediate_sync', and you simply append the arguments to the proc call like so: @property='allow_anonymous', @value='false' / @property='immediate_sync', @value='false'.

Later, after we’re all done with our under-the-hood changes, we’ll want to turn the back on, in reverse order: first enable 'immediate_sync', then 'allow_anonymous'. Cool? Don’t ask me why; DBAs much smarter than I have decreed it so.

OMG, remember Xena Warrior Princess? Holy wow that’s some nostalgia for ya.

Honorable Mention: Pull Subscriptions

In one instance, I was using a PULL subscription (as opposed to PUSH). I had to re-start the Distribution agent (on the subscriber) twice for it to work (to start actually synchronizing). It STILL shows as ‘Uninitialized Subscription’ in the repl-monitor, though. Kinda annoying.

Pull subscriptions can be nice because they shift the burden to the subscriber DB, so that your publisher (master, primary, whatever you wanna call it) doesn’t get too bogged-down. But as always, there are trade-offs. Check out this handy little comparison guide on the topic from a fellow DBA blogger.

That’s all for now; stay tuned for more as I go into detail about how I used these in what scenarios. Thanks for reading! ‚̧

Follow-up: Cribbage “15’s Counter”

The actual method involves joining 5 copies of the table together, by each right-side table only including cards with higher ID values than the table to its left.

To be honest, my T-SQL Tuesday puzzle was a bit of a last-minute idea, which is why I didn’t have a solution ready-made. But, dear reader, you’re in luck! I have one now.

The code is over here in Gist. You can read thru it, but since the final query — the actual “answer” — is kinda ugly, let me explain my thought process.

Modeling is Important

Even when I’m putting together a silly little demo script like this, I feel that good habits and fundamentals are important. You never know what future developer might read it, copy-paste it, and say to themselves “Cool, I’m gonna follow this example when I do this other thing over here!” So you’ll see my formatting preferences, naming convention (though I must admit, I argued with myself over whether to pluralize the table names or not!), and correctly allocated Primary Keys. And since we’re modeling a card deck, even though I didn’t need to store the ‘NumValue’ (which is what you’d use for a straight/run, where the Jack is 11, Queen is 12, etc.), I did anyway.

Now, when we set up our “Hands”, we’re going to use two ‘PlayerNum’s, just so we can test two different hands at the same time. Cribbage can be played with 3 or 4 players, but we’re keeping this simple. Also, I could have built the hands more aesthetically, i.e. by selecting from Cards using PtValue and Suit, but again, I was trying to script quickly, so I just used the IDs that I knew from the previous query (the “full deck”). And again, there’s a “little extra” tidbit, the ‘IsCut’ indicator — we won’t be using that right now. If you’re still not sure what that means, go read the rules.

The Method

At the end of the original post, I mentioned loops and cursors as possible routes to a solution. That may still be true, but I decided to challenge myself to avoid them. Not because they’re “always bad”, as popular media would have you believe; they’re just often an indicator that a developer isn’t thinking in set-theory¬†when they probably should be.

Let’s start with some basic principles. You have 5 cards in your hand. It takes a minimum of two cards to make 15 (examples include Jack+5, 6+9, etc.), and up to a maximum of.. you guessed it, five cards. So we need to check all combinations of any two, three, four, or five cards. We cannot re-use a card within the same combination; and putting the same three cards in a different order, for example, does NOT count as a separate combo (another ’15’).

So as you start to think about these rules, and if you’ve been around data for a while, especially data with identity¬†values, you might have a little light-bulb. “Aha! I know how to do that. We can simply order the combos by the ID value, and that way we won’t allow duplicates!” And that’s kinda what I did, by enforcing the JOIN predicates that every subsequent derived-table have a ‘CardID’ greater than the prior one. But I’m getting ahead of myself.

The actual method here involves JOINing 5 copies of the table together, mainly just on PlayerNum, but also, as I said, by each right-side table only including cards with higher ID values than the left-side. In this way, we ensure that we’re not allowing the same cards to be “joined” to each other, i.e. we’re removing them from the right-side tables.

And finally, we have four OR‘d conditions: simply “do any of those combinations add up to 15, by the Card’s PtValue?” These are echo’d in the CASE-expression in the SELECT line, where we want to essentially “show the combo”, i.e. tell you what cards make up the ’15’. (Again, for style’s sake, we have an ELSE, but we don’t really need it because it’ll never actually happen.)

Now, it does look kinda ugly. It’s not very extensible — meaning, if you wanted to scale it up to find the ’15’s in a 6- or 7-card hand, or you wanted to look for other kinds of combos (like ’18’s or ’27’s), you’d end up re-writing a good portion of it, or at least copy-pasting a lot. Fortunately for us, Cribbage is fairly simple in this regard — your hand is always the same size, and you only ever care about ’15’s.

(Well, and pairs, 3- and 4-of-a-kinds, straights, flushes, knobs, etc., but again, read the rules if you’re curious. We kept this very simple by limiting ourselves to just one small fraction of the game mechanics.)

The cool thing about this sample, though, at least to me, is that you’re already set up to build on it if you want to try out other Cribbage mechanics. Or even other card games, if you just use the base Suits & Cards.

What Did We Learn?

What’s the point of a puzzle like this? Well, besides introducing you to a fantastic card game, if you didn’t already know about it. The point is to make your brain think in a different way than usual. Are any of us programming card games using a SQL back-end? Probably not. (Although an in-memory equivalent like SQLite or something might be viable!) But the next time you have a “combinations problem” with some real-world data, you might wonder if a method like this could come in handy. Or at least, if it could work out better than a double-nested-loop. =)

PS: I believe, instead of the LEFT JOIN‚Äčs, we could have used OUTER APPLYs. We’d move the conditions from the JOINs into the inner WHERE¬†clause of each derived table, i.e. “this ID > previous ID” and “PlayerNums are equal”. If you’re curious, try it out!

Quickie: Use of APPLY Operator

There are many great use-cases for the T-SQL APPLY¬†operator, and if you haven’t tried it yet, I encourage you to check out Kevin Feasel’s excellent presentation on it here.¬† Today, I want to demonstrate a particularly interesting use-case (at least, to me!) that I found for it while tackling a real business problem — for reporting, specifically, which is what I spend a lot of my time dealing with lately.

The Setup

We have a “history of estimated market values” table, which stores, for some arbitrary dates based largely on user input, values (prices) of our proverbial Widgets.¬† In other words, we can’t guarantee we have every date in there for a given time period.¬† Over on side B, we have a “sales” table, which tells us when a given Widget was sold and for how much.¬† The business question, simply enough, is “When a given Widget was sold, how accurate was its ‘estimated market value’ at the time?”¬† Or, put another way, “How closely did the sale price of that Widget match our estimated market value?”

we don't make widgets
Yeah, yeah, we’re all tired of the Widget business. If only we could come up with a better term for a generic product to use in discussions of business problem samples.

The Tools

I used two interesting bits of TSQL to accomplish this, the main one being our lovely APPLY¬†operator.¬† OUTER APPLY, in this case, because I still wanted the rows from “sales” even if we couldn’t find a closely-matching “market value” record.

If you haven’t used this operator yet, think of it conceptually like a JOIN¬†to a sub-query, but with the added value that you can reference the¬†outer tables’¬†columns within it.¬† CROSS APPLY¬†is like INNER JOIN, and OUTER APPLY¬†is like LEFT JOIN.

Sorry, RIGHT JOIN¬†fans, no love for you here.¬† You’re just too weird.

My other little trick involved using TOP(1)¬†in the apply’d sub-query, with a non-standard ORDER BY¬†clause.¬† As I mentioned, within the APPLY‘s body, you can reference the outer table’s columns.¬† So I’m easily able to compare the SaleDate¬†(outer table) with the EstimateDate¬†(inner query).¬† I want “the closest EstimateDate¬†to the SaleDate‚Äč”, which means I want the row where the difference between those two dates is the smallest.¬† Which means making use of our friend DATEDIFF.¬† So let’s try:

ORDER BY DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)

Do you see the problem yet?¬† If I get a negative value, i.e. my SaleDate is way before my EstimateDate, say -100 days, that’s the top 1 row.¬† I don’t want that!¬† Let’s try our old friend from many many math classes ago, Mr. Absolute Value.

ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)

That’s better!¬† Now I have the top 1 “closest to my sale date” row from MktValueHist.

All Together Now

I’ll put up a Gist with repro/demo code soon.¬† Here’s the meat of it, as a sample select, for convenience:

SELECT Sales.WidgetID, Sales.WidgetName, Sales.Price, Sales.SaleDate
, mvh.MarketValue, mvh.EstimateDate
, Accuracy = some_made_up_mathy_thing_here
FROM Sales
OUTER APPLY (
SELECT TOP(1) mValHist.MarketValue, mValHist.EstimateDate
FROM MarketValueHistory mValHist
WHERE mValHist.WidgetID = Sales.WidgetID
ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, mValHist.EstimateDate)
) mvh
WHERE Sales.SaleDate >= '20180101'

There, my completely fabricated yet totally-based-in-reality example of how to get 2018’s Widget Sale Prices with corresponding “closest to the sale-date” Market Value Estimate from our history table.¬† You could even throw in some fancy math expression for “accuracy”, if you felt like it.¬† Like maybe “relative difference“.¬† =)

Note: For simplicity, we’re dealing with “whole dates”, i.e. just date.¬† Not datetime¬†or datetime2.¬† If your date values do include times, you’d want to change the datediff¬†interval to something more appropriate, like second¬†or millisecond.

cross apply explained vs inner join and correlated-subquery
Forgive the graininess, it was from a Youtube screen-cap.¬† In an Oracle presentation, of all things. And I feel like they meant to say “Data exposition“, not “explosion”, given the latter can have negative connotation.¬† But.. hey, it’s better than a poorly drawn stick-figure or an inappropriate meme, right?¬†¬†RIGHT?

And that’s how APPLY¬†and ABS()¬†saved my bacon today.¬† Hooray!

Have an interesting use-case for APPLY?¬† Or perhaps an alternative approach to a similar problem?¬† I’d love to hear from you!¬† Drop me a comment, a tweet, or a LinkedIn msg.¬† Happy summer!


PS:¬†I’d also recommend checking out Steve Stedman’s useful “Join types poster“, if for no other reason than to have something concrete to use when explaining those concepts to your fellow cube-dwellers.¬† It has cross/outer apply on the 2nd page in their more commonly used pattern, the TVF (table valued function) relationship.

PPS: Fine, have your meme…

apply yourself, from breaking bad
I’ve never seen the show, so I don’t get it, but it was one of the first Google Image results for “apply yourself meme”. Enjoy that.