Some Common Mistakes When Querying SQL Database
Introduction
When I started learning SQL, I found out several issues which I thought to be interesting to share. For our needs, let’s use AdventureWorks2012
database, which can be obtained here.
1. Don’t Forget about NULL
Let us execute the following query:
select Count(*)
from Sales.SalesOrderDetail
Filtering by CarrierTrackingNumber
:
select Count(*)
from Sales.SalesOrderDetail
where CarrierTrackingNumber = '4911-403C-98'
It is natural to assume that the result of the following query:
select Count(*)
from Sales.SalesOrderDetail
where CarrierTrackingNumber <> '4911-403C-98'
121317-12=121205
. However, in fact it is
So where are all the other rows? Those rows have NULL
in CarrierTrackingNumber
column.
The fact is that comparison between NULL
and the value (‘4911-403C-98’ in our case) will return UNKNOWN
, while WHERE
clause returns only statements that are true
. Comparison between value and NULL
will also give UNKNOWN
, which is shown on a screenshot below.
So if you need to compare your value and NULL
, you should use operator IS NULL
. For example:
select Count(*)
from Sales.SalesOrderDetail
where CarrierTrackingNumber <> '4911-403C-98' or CarrierTrackingNumber IS NUll
2. INNER JOIN is Not a Golden Hammer
Let us select all job candidates and id of their addresses (it is more natural to select addresses but let us not complicate the query for our learning needs). As this information is situated in 2 different tables linked by the column BusinessEntityId
, common practice for the beginners is to use INNER JOIN
on this column.
select C.JobCandidateID, A.AddressID
from HumanResources.JobCandidate C inner join Person.BusinessEntityAddress A
on c.BusinessEntityID = A.BusinessEntityID
Again, the problem is NULL
. If we examine JobCandidate
table closely, we’ll see that the column BusinessEntityId
can have NULL
as its values.
Therefore, if we want to select all job candidates, we should use LEFT JOIN instead of INNER JOIN, which will just select all rows from left table.
select C.JobCandidateID, A.AddressID
from HumanResources.JobCandidate C left join Person.BusinessEntityAddress A
on c.BusinessEntityID = A.BusinessEntityID
3. If Syntax Construct Exists, Why Don’t You Use It?
Let us take a look at the two queries, which do exactly the same: select items, which cost more than 100 and were bought more than 1000 times.
Select t.ProductID, count(*) AS TransactionCount From Sales.SalesOrderDetail t
WHERE (Select count(*) From Sales.SalesOrderDetail t1
Where t1.ProductId = t.ProductId AND t1.UnitPrice > 100) > 1000
Group by t.ProductId
Select ProductID, count(*)
From Sales.SalesOrderDetail
Where UnitPrice > 100
Group by ProductID
Having count(*) > 1000
As you can see, the first one is imperformant. Therefore, if subquery or join can be replaced by more simple approach, you should better do so.