Tempdb Issues
The basic concept of
tempdb is simple: It is a system database. This means it is always present as
long as SQL Server is working. It stores temporary data. This means if SQL
Server is restarted, you lose all data stored in tempdb.
What kind of data can you
store in tempdb? Just about anything. To the user, it is just like a regular
database. You can use transactions and all the other SQL Server features, so
long as you do not care about saving the data after stopping the SQL Server
service.
If you store a small
amount of data in tempdb, the data might not actually need to be written to
disk. The SQL Server buffer manager tries to store the data primarily in memory
when possible. The data gets written to disk only when there is not enough
memory to cache it. SQL Server automatically decides which memory pages to
write to disk based on how “hot” each page is—that is, how often the page is
accessed. The more frequently a page is accessed, the hotter the page gets.
Because you can use
transactions in tempdb, there is also a log file for tempdb. SQL Server
transaction logs must always be written to disk. For example, if you do the
following, about 1MB is written to tempdb:
Declare @i int
Set @i = 0
Create table #myTempTable (myBigCharColumn char (1000))
While (@i <1000) begin
Insert into #myTempTable values ('x')
Set @i = @i +1
end
Set @i = 0
Create table #myTempTable (myBigCharColumn char (1000))
While (@i <1000) begin
Insert into #myTempTable values ('x')
Set @i = @i +1
end
If your system has plenty of memory, there
might not be any physical IO in the tempdb data file. However, you will see 1MB
of physical IO in the tempdb log file.
What if you have an initial set of tables or
stored procedures that you want to have in tempdb right after SQL Server
restarts? You can create those in the model database. Each time SQL Server
restarts, it copies tempdb from the model database. If you have set any
database options in the model database, tempdb also inherits those.
For simplicity’s sake, tempdb allows only one
file group for data. By default, tempdb grows automatically based on the space
required to complete database operations that involve it. This means that the
tempdb file group grows until its host disk is full. Why is this the default
behavior? The reason is to avoid out-of-space errors. SQL Server internally
uses tempdb to store temporary data for query processing and other system
tasks. Applications use tempdb to store temporary results. If you run out of
space in tempdb, the server or end-user application gets an out-of-space error.
By automatically using all available disk space, tempdb meets the space needs
of the server and end-user applications as long as practically possible.
When SQL server restarts, the tempdb file size
is reset to the configured value. (The default is 8MB.) The reason for this
design is so that extra space consumed by tempdb is automatically returned to
the operating system for other uses when SQL Server is cycled.
The best practice regarding file size is to
configure it to 80 percent of what
you believe your usage will be, and let auto-grow take care of the remaining 20
percent. If you’re concerned about too much space being used, set a maximum
size for tempdb files. Consider this scenario, for example:
• Disk volume
D: has 5,000GB free space at the moment. You plan to put tempdb there. This
disk volume is also used for other purposes.
•
Estimated space usage is 200GB most of the time for tempdb. But you do not want
tempdb to ever consume more than 1,000GB of space. If tempdb ever grows to be
more than 1,000GB, you would rather fail the application than consume more
space.
• You
should set the tempdb size to 200GB with a maximum size of 1,000GB and
auto-grow enabled.
Tempdb does not allow
auto-shrink, and its database and file-shrink capabilities are limited. In a
production system, file shrink on tempdb rarely succeeds because many hidden
objects stored in tempdb by SQL Server cannot be moved by the file-shrink
command.
So what should you do? My
advice is to rarely shrink tempdb. You should not shrink it just because its
size has grown. You should ask yourself, Is the larger size I am seeing now a
normal occurrence because of my workload? If so, it does not make a lot of
sense to shrink tempdb, because your normal workload will cause it to
automatically grow back to its former size (or, worse, run out of space if you
have disabled auto-grow).
If you do have to shrink
the size of tempdb, I recommend that you do it offline with the -fSQL Server startup flags:
1. Stop the SQL Server
service.
2. Find out where
sqlservr.exe is on your hard drive. The default location is something like
c:\Program Files\Microsoft SQL Server\MSSQL$instance name\instance folder\Binn
for SQL Server 2000 and SQL Server 2005.
3. CD to that directory. Type
the command sqlservr -c -f. This starts SQL server
with a minimum tempdb size.
4. Use the TSQL command ALTER DATABASE TEMPDB MODIFY
FILE () to change the size of the tempdb files.
5. Restart the SQL Server
service.
WHAT
HAS IMPROVED IN SQL SERVER 2005?
SQL Server 2005 did not make many external
changes to tempdb. However, there are several important internal improvements:
1. Less IO activity in the tempdb log file
For example:
Create table #myTempTable (myBigCharColumn char (1000))
Insert into #myTempTable values ('ABC')
Update #myTempTable set myBigCharColumn = 'XYZ'
Insert into #myTempTable values ('ABC')
Update #myTempTable set myBigCharColumn = 'XYZ'
Previously, 2,000 bytes of log data (both 'ABC' and 'XYZ') were written to the log. Now, only 1,000 bytes ('ABC') are written to the tempdb log file.
This is great, but does it work for all data stored in tempdb?
Unfortunately, it works only for data stored in heaps (tables without a
clustered index) or LOBs. You will not see this improvement for data stored in
a clustered index. Does it benefit all DML operations in tempdb? Only INSERT and UPDATE benefit. DELETE does not benefit.
2. Much faster auto-grow of tempdb data files
In SQL Server 2000, it
works like this: You run out of space in tempdb’s data file. SQL Server decides
to grow the file by 10 percent. If your file is 10GB in size, SQL Server
formats the new region of the file by writing 1GB of data (most of it is zeros)
to disk. This can take awhile.
SQL Server 2005 takes
advantage of the Windows API that expands the size of a file without doing IO
on the new sections of the file. Because of this, growing a file occurs almost
instantaneously.
This helps tempdb
auto-grow significantly. There is, however, a bit of a security concern with
this if the SQL Server service is not running under machine administrator
credentials. It is possible for the SQL Server service to see data left by a
previous process on disk. For example:
• One instance of SQL
Server (we’ll call it Instance A) is used to handle public information. It runs
as User A on a machine. User A has normal user privileges on the machine.
• The machine has another
instance of SQL Server (Instance B) that handles sensitive data. It runs as
User B.
• Instance B shrinks its
database file. Some sensitive data is left on disk.
• Instance A grows a file and picks up this data. Now User A can
look at this data via a page-dump command (DBCC PAGE).
That is why this
optimization is not always on. It requires Windows XP or 2003, and it requires
special volume permissions for the SQL Server service account. If the service
is running as the machine administrator, it already has these permissions.
You should take advantage
of this important optimization, especially when the whole machine is used
exclusively for a single SQL Server instance.
3. Improved scalability for concurrent access in tempdb
SQL Server does a good
job of handling concurrent accesses to data. SQL Server also sports a
full-featured, rich data store. Some features, if not used carefully, can cause
performance issues when many users are trying to perform the same operation.
Consider this scenario, for example:
Create Proc myProc As Begin
Create table #myTempTable (myBigCharColumn char (1000))
Insert into #myTempTable values ('ABC')
Drop table #myTempTable
End
Create table #myTempTable (myBigCharColumn char (1000))
Insert into #myTempTable values ('ABC')
Drop table #myTempTable
End
If a large number of users are simultaneously
trying to execute this procedure, they may encounter a bottleneck, because SQL
Server needs to do several things to execute the procedure:
• Create metadata for table #myTempTable by
inserting rows into the database system tables. These rows are visible through system
views such as sys.objects in SQL
Server 2005.
• Allocate two pages for
the table even though the table needs only 1,000 bytes. These two pages are
called the data page and the IAM page. PFS, GAM, and SGAM pages need to be
scanned and updated to find and allocate these pages.
• Free the two pages as
part of drop the table.
• Remove the rows from the system metadata for #myTempTable.
The tempdb metadata tables do not have many
rows. Inserting and deleting a lot of rows inside tempdb can cause B-tree page-level
read/write contention. These are known as page latch contentions.
Each PFS page covers a 64MB range of data.
Each SGAM and GAM page covers a 4GB range of data. A lot of page allocation and
free activity causes page-level latch contentions on these pages, too.
In SQL Server 2005, we did two things to
reduce contention and improve the scalability in this scenario:
• We changed the page
allocation algorithm so that read/write contention is not as severe as before.
• We cache metadata
entries for tempory tables in tempdb so that we do not have to do many inserts
and deletes in the B-tree. We track these activities in an in-memory cache
instead.
4. Improved scalability for multiple files in tempdb
In SQL Server 2000, we
recommend that people have the same number of files as CPUs in their system.
The primary reason for this is to avoid page-level latch contention on PFS,
GAM, and SGAM pages.
In SQL Server 2005, it is
less important to do so because of improved scalability in our page-accessing
algorithms. However, a highly concurrent workload might still encounter page
latch contention. In these cases, we still recommend using the multiple-file
approach.
There is another reason for using multiple
files in tempdb: to achieve better IO bandwidth by file-level striping.
Consider this scenario, for example:
• You have 25 physical
disks in your server machine.
• You use operating
system or hardware striping to group them into five volumes. Each volume has
five disks.
• If you use only one
tempdb file, you get only five disks working for you.
• If you use five tempdb
files, one on each volume, you get all 25 disks working for you.
However, having multiple files also has its
downside. SQL Server uses a proportional fill algorithm to choose the file on
which to allocate the next page. The algorithm attempts to ensure that SQL
Server fills each file in proportion to the free space left in that file so
that they reach their maximum capacity at about the same time. In SQL Server
2000, this algorithm did not scale well when there were a huge number of page
allocations. The improved algorithm in SQL Server 2005 removed these
scalability limitations.
HOW
IS TEMPDB SPACE CONSUMED?
Three types of objects occupy space in
tempdb—user objects, internal objects, and version store objects.
What Is a User Object?
The best way to find out what user objects are
in tempdb is to do the following:
Select * from sys.all_objects
If you want to find out how much space they
take as a whole, you can do this:
Exec Sp_spaceused
Perhaps a better name for user object is
visible object. Some objects in tempdb that are created and managed by the
server are not visible to the end user.
You might find it interesting that
user-defined local temp tables also appear in the viewsys.all_objects. These tables have names such as the
following:
#myTempTable___________NNN
The reason for this naming convention is that
different connections can have the same name for local temp tables. So, the NNN portion (a number)
represents the connection that owns the table.
Even table variables such as @t tables also appear here. You will notice names that consist of a
string of hex numbers:
F0ABCDE
This is the way table variables are named. We
use an internal ID for each of them that is actually just a number.
What Is an Internal
Object?
Contrary to common belief, sp_spaceused does not show all the spaces occupied in tempdb. A vast amount
of space is occupied by “invisible” objects. These include internal objects and
version store objects.
Most internal objects are created by queries.
For example:
Declare @i int
Set @i = 0
Create table #myTempTable (keyColumn int, myBigCharColumn char (1000))
While (@i <1000000) begin
Insert into #myTempTable values (@i, 'x')
Set @i = @i +1
End
Select * from #myTempTable
order by keyColumn
Set @i = 0
Create table #myTempTable (keyColumn int, myBigCharColumn char (1000))
While (@i <1000000) begin
Insert into #myTempTable values (@i, 'x')
Set @i = @i +1
End
Select * from #myTempTable
order by keyColumn
The last select query
has an order by.
Because we have no index on 'keyColumn', it can be serviced only by sorting the result set. This
particular query requires up to 1GB of internal tempdb space for sorting.
The space required for sorting in the
preceding query is managed internally by a SQL Serversort work table object.
This object is visible in the catalog view sys.all_objects.
Internal objects are used for lots of other
purposes:
• Hash joins and hash
aggregates
• Query spool to store
data as part of a multistep query processing
• Keyset cursors to store
the keys
• Static cursors to store
the query result
• Service Broker to store
messages in transit
• XML variables
• LOB variables, including TEXT, IMAGE, NTEXT, VARCHAR(MAX), and VARBINARY(MAX)
• DBCC CHECK
• Query and event
notification
So how do you find out about the space
occupied by these “invisible” internal objects?
SQL Server 2005 provides these three DMVs that
help you determine this:
Sys.dm_db_file_space_usage
Sys.dm_db_task_space_usage
Sys.dm_db_session_space_usage
Sys.dm_db_task_space_usage
Sys.dm_db_session_space_usage
These DMVs give you a snapshot of current
system activity. They can tell you how much invisible space internal objects
occupy.
You may use these tools as you study each
feature previously mentioned. This will give you some idea of how much space
each feature requires.
What Is a Version
Store Object?
Another type of invisible object is the
version store object. The system creates version store objects on demand. The
system can create up to one version store object every minute.
Any SQL Server 2005 feature that generates row
version information may cause a version store to be created. These features
include the following:
• Snapshot isolation
• Triggers
• MARS (multiple active
result sets)
• Online index build
Consider this query, for example:
Declare @i int
Set @i = 0
Use master
Create table myBigTable (keyColumn int, myBigCharColumn char (1000))
While (@i <1000000) begin
Insert into myBigTable values (@i, 'x')
Set @i = @i +1
End
Update myBigTable set myBigCharColumn = 'y'
Set @i = 0
Use master
Create table myBigTable (keyColumn int, myBigCharColumn char (1000))
While (@i <1000000) begin
Insert into myBigTable values (@i, 'x')
Set @i = @i +1
End
Update myBigTable set myBigCharColumn = 'y'
The master database always generates row
versions for any update performed there. In this example, 1GB of row version
information is generated by the last update statement.
The version store will take 1GB of space in tempdb.
So how do you find out how much space is being
occupied by the version store in tempdb?
The best way is to use the performance counter
Transactions: Version store size. You can view this in Perfmon/Sysmon or using
the SQL Server 2005 SQLDiag facility. It collects this counter by default.
Most internal objects are automatically freed
at the end of the query in which they were created. In contrast, version store
objects do not go away after the update statement
that created them.
Instead, a version store is kept until the
versions are no longer needed by the SQL Server instance as a whole. This is
because most versions are generated for other users who might need to access
them. Version stores are not needed only by the user who originally generated
them.
Consider this scenario, for example:
• Assume we only have two
users, A and B.
• User A starts a
transaction that requires a version store. This is a snapshot isolation
transaction.
• User B generates a 1GB
version store. User B logs off.
• User A’s transaction is
open for five hours because it is doing a long and complex query. Then User A
logs off.
• The 1GB row version
store is removed from tempdb when User A finally logs off.
PRACTICAL
TROUBLESHOOTING
Many SQL Server users never need to worry
about tempdb. As mentioned previously, SQL Server uses it as a scratch space
for internal data processing, but this is mostly invisible to the user. By
default, the size of tempdb auto-grows based on the system’s needs. As long as
you have enough disk space for the work you want to perform, you can usually
just let SQL Server take care of tempdb for you.
There are mainly two areas of troubleshooting
for tempdb. One relates to the sizing of tempdb. The other is tempdb
performance.
What to Do If You Run Out of Space in Tempdb
Many people want to know
just how much tempdb space they should have. This question is difficult to
answer. Space-usage estimation in tempdb is so complex that an accurate
estimation is often not worth the effort. I am not even sure if it is practical
to have an accurate estimate in an environment of any complexity.
The practical solution is
to do some rough planning and run the system through some trial and error. You
must balance several factors: hardware budget, performance, development, and
testing time.
How much tempdb space can
you afford to have? Most people have a finite hardware budget. You should give
a reasonable amount of budget to tempdb. How much you should set aside for
tempdb storage depends on your application. If you find out that you cannot
afford the amount of space the application needs, you might be forced to change
the application.
For example, if the
application has a query that uses a hash join and a lot of tempdb space as a
result, maybe you should consider changing that to a nested loop join by adding
appropriate indexes or using query hints. Even if it takes longer to run, you
will not need as much tempdb space.
Another example of a performance trade-off is
the use of the snapshot isolation level. Without snapshot isolation, some
applications experience reader-writer blocking, and throughput is low. With
snapshot isolation, the application throughput is much higher, but tempdb use
is bigger, too. As with many things related to performance, there are always
trade-offs. What you can afford to trade for performance and how much you can
afford to trade depends on your specific circumstances.
It is worth mentioning that tweaking an
application has other costs besides the raw application performance cost or
savings it yields. Modifying an application takes time and resources, as does
testing your changes and redeploying them.
Here are some questions I recommend you ask yourself to develop
a rough estimate of tempdb space usage:
• What are the main
features in your application that use tempdb? You can refer to the earlier list
in this chapter for the SQL Server features that use tempdb.
• For each feature, how
much space will it roughly take if queries are performed in a serial way? This
estimate does not have to be very accurate. You need to look at both the maximum
space requirement and the most common requirement.
• Now consider which
features will be used concurrently by your application. Sum the maximum space
requirements of the largest set of features that will be used concurrently.
This is your rough maximum tempdb space estimate.
After you have a rough estimate, you need to
set up a test environment and create a workload that simulates your worst-case
scenario. The closer your simulation workload mimics the real workload, the
better.
If your real-world data set is too large for
the test environment, you might consider a scaled-down test and multiply the
scaling factor. You also need to watch out for cases where space usage is not
linear to the amount of data you have.
To configure tempdb size properly, you should
observe the free space in tempdb and tem-pdb size over a period of time.
The key point here is “over a period of time.”
Why is this important? Let’s look at an example. Suppose SQL Server uses 10GB
of tempdb space during weekdays, but on weekends it needs 30GB of tempdb to run
a batch reporting job. In this case, it makes no sense to constantly shrink the
tempdb file to 10GB during the week just to have it grow back to 30GB on the
weekends. You should let the system keep 30GB of tempdb space. This advice
applies to both the tempdb data file and tempdb log file.
No matter how accurate the planning or
estimate is, you might still run out of space in production. Now what do you
do?
The best practice here is to set up the
SQLDiag service to constantly monitor the production system and take action
before you run out of space.
A sampling method is recommended for this type
of monitoring. For example, you could have SQLDiag run a query like the
following at five-minute intervals:
Select * from sys.dm_db_session_space_usage
Select * from sys.dm_db_file_space_usage
Dbcc inputbuffer (51)
Dbcc inputbuffer (52)
Dbcc inputbuffer (53)
Select * from sys.dm_db_file_space_usage
Dbcc inputbuffer (51)
Dbcc inputbuffer (52)
Dbcc inputbuffer (53)
This script tells you the following:
• How free is tempdb at
that moment? How is the occupied space divided among the user objects, internal
objects, and version store objects?
• What is the tempdb
space consumed by each session at that moment?
• What is the current
batch executed by session 51 to 53?
Trapping these query outputs in a table allows
you to monitor the system while it runs.
I offer this script only to illustrate a
technique for monitoring tempdb. This method works well for large queries that
consume a lot of space. I have seen queries that take upward of 10GB of tempdb
space. These kinds of queries are not likely to finish in five minutes, so you
should be able to catch them with a query like this one.
Polling the system state at five-minute
intervals should not cause performance issues itself. Sometimes you will want a
more-frequent collection interval. You must decide what the maximum impact your
monitoring should have on the system. You can configure SQLDiag to collect data
at any interval you want.
After your monitoring is set up, if you run
out of space, you need to examine logged diagnostic data and determine what
query is responsible. You might find that the version store is responsible for
most of the space consumption. In that case, you must decide whether the
versioning features should be disabled, or whether you should just add some
disk space.
If a single query is consuming most of the
space, and the version store is not the problem, you might not have many other
options than simply increasing the disk space available to tempdb. Modifying
your application to use less tempdb space might turn out to be too time-consuming
to be worth the effort.
Tweaking your tempdb space usage is really
just another variety of performance tuning. You find the place where you can
achieve the maximum result with the minimum effort.
What Is Tempdb Page Latch Contention?
Another common issue in tempdb performance
relates to latch contention. I have seen customer cases where the CPU is only
20 percent used, and the disk is only 10 percent used, but the throughput
cannot improve regardless of the number of requests sent to the server.
This is a typical blocking scenario. In
tempdb, a common source of blocking is the latch contention on metadata pages
and on PFS, GAM, and SGAM pages.
Recall my earlier mention of SQL Server 2005’s
improvements in these two areas. You should see much less latch contention on
SQL Server 2005; however, you might still see some contention if your
application is not designed properly.
How do you identify such issues? My
recommendation is still to use a sampling technique. Querying DMVs on a time
interval obviously has higher overhead than querying them just one time, but
latch contention is also a pretty rare issue. For example, you can execute the
following pseudo-script every 100 mini-seconds:
Select * from sys.sysprocesses
Loop through all the processes
If blocked !=0 and lastwaittpe like 'PAGE%LATCH_%' and waitresource like '2%'
DBCC PAGE (...)
Loop through all the processes
If blocked !=0 and lastwaittpe like 'PAGE%LATCH_%' and waitresource like '2%'
DBCC PAGE (...)
This script will tell you the following:
• What the waiting
resource is at the moment of the snapshot
• If the process is
blocked by a page latch, what type of page that is
If you collect enough data points during the
time when you suspect the contention is occurring, you should be able to
determine whether it is related to tempdb page latches.
Here is a scenario I have seen in the past. It
was a heavily blocked system:
1. I have about 100 processes in the system.
2. I collected 1,000 data points.
3. I collected 100,000 rows from sys.processes.
4. Twenty percent of the rows have blocked != 0.
5. Among the 20 percent rows, 40 percent have a lastwaittype of PAGELATCH_UP, and page is in tempdb.
6. The output of DBCC PAGE reveals
contention on data pages. The object ID of the data page shows that it is a
system table.
In this case, it turns out that the stored
procedure used by the application had an additionalcreate index on a
temp table, like this:
CREATE PROC myProc AS
create table #myTemp(k int)
insert into #myTemp values (1)
create clustered index .. on #myTemp()
create table #myTemp(k int)
insert into #myTemp values (1)
create clustered index .. on #myTemp()
The create index statement
disabled the temp table caching functionality of SQL Server 2005.
The fix in this case is to create the
clustered index in the same statement as the create table:
CREATE PROC myProc AS
create table #myTemp (k int primary key)
insert into #myTemp values (1)
END
create table #myTemp (k int primary key)
insert into #myTemp values (1)
END
In another case, I found out that most of the
waiting a connection was doing was on PFS pages. The application was doing
something like this:
Create table ##myGlobal (int k, myBigCharColumn varchar (4000))
Create non clustered index idx-k on ##myGlobal (k)
Insert ##myGlobal values (1, replicate ('x', 4000))
CREATE PROC myProc AS
Update ##myGlobal set myBigCharColumn = null
Create non clustered index idx-k on ##myGlobal (k)
Insert ##myGlobal values (1, replicate ('x', 4000))
CREATE PROC myProc AS
Update ##myGlobal set myBigCharColumn = null
The change of row size in this heap caused
lots of free-space state updates in the corresponding PFS pages.
The fix is to turn that nonclustered index
into a clustered index. We do not need to update free-space information on PFS
pages for B-tree pages.