gprsql: tempdb issues

Tempdb Issues

• What Has Improved in SQL Server 2005?

The basic concept of tempdb is simple: It is a system database. This means it is always present as long as SQL Server is working. It stores temporary data. This means if SQL Server is restarted, you lose all data stored in tempdb.

What kind of data can you store in tempdb? Just about anything. To the user, it is just like a regular database. You can use transactions and all the other SQL Server features, so long as you do not care about saving the data after stopping the SQL Server service.

If you store a small amount of data in tempdb, the data might not actually need to be written to disk. The SQL Server buffer manager tries to store the data primarily in memory when possible. The data gets written to disk only when there is not enough memory to cache it. SQL Server automatically decides which memory pages to write to disk based on how “hot” each page is—that is, how often the page is accessed. The more frequently a page is accessed, the hotter the page gets.

Because you can use transactions in tempdb, there is also a log file for tempdb. SQL Server transaction logs must always be written to disk. For example, if you do the following, about 1MB is written to tempdb:

Declare @i int
Set @i = 0
Create table #myTempTable (myBigCharColumn char (1000))
While (@i <1000) begin
Insert into #myTempTable values ('x')
Set @i = @i +1
end

If your system has plenty of memory, there might not be any physical IO in the tempdb data file. However, you will see 1MB of physical IO in the tempdb log file.

What if you have an initial set of tables or stored procedures that you want to have in tempdb right after SQL Server restarts? You can create those in the model database. Each time SQL Server restarts, it copies tempdb from the model database. If you have set any database options in the model database, tempdb also inherits those.

For simplicity’s sake, tempdb allows only one file group for data. By default, tempdb grows automatically based on the space required to complete database operations that involve it. This means that the tempdb file group grows until its host disk is full. Why is this the default behavior? The reason is to avoid out-of-space errors. SQL Server internally uses tempdb to store temporary data for query processing and other system tasks. Applications use tempdb to store temporary results. If you run out of space in tempdb, the server or end-user application gets an out-of-space error. By automatically using all available disk space, tempdb meets the space needs of the server and end-user applications as long as practically possible.

When SQL server restarts, the tempdb file size is reset to the configured value. (The default is 8MB.) The reason for this design is so that extra space consumed by tempdb is automatically returned to the operating system for other uses when SQL Server is cycled.

The best practice regarding file size is to configure it to 80 percent of what you believe your usage will be, and let auto-grow take care of the remaining 20 percent. If you’re concerned about too much space being used, set a maximum size for tempdb files. Consider this scenario, for example:

• Disk volume D: has 5,000GB free space at the moment. You plan to put tempdb there. This disk volume is also used for other purposes.

• Estimated space usage is 200GB most of the time for tempdb. But you do not want tempdb to ever consume more than 1,000GB of space. If tempdb ever grows to be more than 1,000GB, you would rather fail the application than consume more space.

• You should set the tempdb size to 200GB with a maximum size of 1,000GB and auto-grow enabled.

Tempdb does not allow auto-shrink, and its database and file-shrink capabilities are limited. In a production system, file shrink on tempdb rarely succeeds because many hidden objects stored in tempdb by SQL Server cannot be moved by the file-shrink command.

So what should you do? My advice is to rarely shrink tempdb. You should not shrink it just because its size has grown. You should ask yourself, Is the larger size I am seeing now a normal occurrence because of my workload? If so, it does not make a lot of sense to shrink tempdb, because your normal workload will cause it to automatically grow back to its former size (or, worse, run out of space if you have disabled auto-grow).

If you do have to shrink the size of tempdb, I recommend that you do it offline with the -fSQL Server startup flags:

1. Stop the SQL Server service.

2. Find out where sqlservr.exe is on your hard drive. The default location is something like c:\Program Files\Microsoft SQL Server\MSSQL$instance name\instance folder\Binn for SQL Server 2000 and SQL Server 2005.

3. CD to that directory. Type the command sqlservr -c -f. This starts SQL server with a minimum tempdb size.

4. Use the TSQL command ALTER DATABASE TEMPDB MODIFY FILE () to change the size of the tempdb files.

5. Restart the SQL Server service.

WHAT HAS IMPROVED IN SQL SERVER 2005?

SQL Server 2005 did not make many external changes to tempdb. However, there are several important internal improvements:

1. Less IO activity in the tempdb log file

For example:

Create table #myTempTable (myBigCharColumn char (1000))
Insert into #myTempTable values ('ABC')
Update #myTempTable set myBigCharColumn = 'XYZ'

Previously, 2,000 bytes of log data (both 'ABC' and 'XYZ') were written to the log. Now, only 1,000 bytes ('ABC') are written to the tempdb log file.

This is great, but does it work for all data stored in tempdb? Unfortunately, it works only for data stored in heaps (tables without a clustered index) or LOBs. You will not see this improvement for data stored in a clustered index. Does it benefit all DML operations in tempdb? Only INSERT and UPDATE benefit. DELETE does not benefit.

2. Much faster auto-grow of tempdb data files

In SQL Server 2000, it works like this: You run out of space in tempdb’s data file. SQL Server decides to grow the file by 10 percent. If your file is 10GB in size, SQL Server formats the new region of the file by writing 1GB of data (most of it is zeros) to disk. This can take awhile.

SQL Server 2005 takes advantage of the Windows API that expands the size of a file without doing IO on the new sections of the file. Because of this, growing a file occurs almost instantaneously.

This helps tempdb auto-grow significantly. There is, however, a bit of a security concern with this if the SQL Server service is not running under machine administrator credentials. It is possible for the SQL Server service to see data left by a previous process on disk. For example:

• One instance of SQL Server (we’ll call it Instance A) is used to handle public information. It runs as User A on a machine. User A has normal user privileges on the machine.

• The machine has another instance of SQL Server (Instance B) that handles sensitive data. It runs as User B.

• Instance B shrinks its database file. Some sensitive data is left on disk.

• Instance A grows a file and picks up this data. Now User A can look at this data via a page-dump command (DBCC PAGE).

That is why this optimization is not always on. It requires Windows XP or 2003, and it requires special volume permissions for the SQL Server service account. If the service is running as the machine administrator, it already has these permissions.

You should take advantage of this important optimization, especially when the whole machine is used exclusively for a single SQL Server instance.

3. Improved scalability for concurrent access in tempdb

SQL Server does a good job of handling concurrent accesses to data. SQL Server also sports a full-featured, rich data store. Some features, if not used carefully, can cause performance issues when many users are trying to perform the same operation. Consider this scenario, for example:

Create Proc myProc As Begin
Create table #myTempTable (myBigCharColumn char (1000))
Insert into #myTempTable values ('ABC')
Drop table #myTempTable
End

If a large number of users are simultaneously trying to execute this procedure, they may encounter a bottleneck, because SQL Server needs to do several things to execute the procedure:

• Create metadata for table #myTempTable by inserting rows into the database system tables. These rows are visible through system views such as sys.objects in SQL Server 2005.

• Allocate two pages for the table even though the table needs only 1,000 bytes. These two pages are called the data page and the IAM page. PFS, GAM, and SGAM pages need to be scanned and updated to find and allocate these pages.

• Free the two pages as part of drop the table.

• Remove the rows from the system metadata for #myTempTable.

The tempdb metadata tables do not have many rows. Inserting and deleting a lot of rows inside tempdb can cause B-tree page-level read/write contention. These are known as page latch contentions.

Each PFS page covers a 64MB range of data. Each SGAM and GAM page covers a 4GB range of data. A lot of page allocation and free activity causes page-level latch contentions on these pages, too.

In SQL Server 2005, we did two things to reduce contention and improve the scalability in this scenario:

• We changed the page allocation algorithm so that read/write contention is not as severe as before.

• We cache metadata entries for tempory tables in tempdb so that we do not have to do many inserts and deletes in the B-tree. We track these activities in an in-memory cache instead.

4. Improved scalability for multiple files in tempdb

In SQL Server 2000, we recommend that people have the same number of files as CPUs in their system. The primary reason for this is to avoid page-level latch contention on PFS, GAM, and SGAM pages.

In SQL Server 2005, it is less important to do so because of improved scalability in our page-accessing algorithms. However, a highly concurrent workload might still encounter page latch contention. In these cases, we still recommend using the multiple-file approach.

There is another reason for using multiple files in tempdb: to achieve better IO bandwidth by file-level striping. Consider this scenario, for example:

• You have 25 physical disks in your server machine.

• You use operating system or hardware striping to group them into five volumes. Each volume has five disks.

• If you use only one tempdb file, you get only five disks working for you.

• If you use five tempdb files, one on each volume, you get all 25 disks working for you.

However, having multiple files also has its downside. SQL Server uses a proportional fill algorithm to choose the file on which to allocate the next page. The algorithm attempts to ensure that SQL Server fills each file in proportion to the free space left in that file so that they reach their maximum capacity at about the same time. In SQL Server 2000, this algorithm did not scale well when there were a huge number of page allocations. The improved algorithm in SQL Server 2005 removed these scalability limitations.

HOW IS TEMPDB SPACE CONSUMED?

Three types of objects occupy space in tempdb—user objects, internal objects, and version store objects.

What Is a User Object?

The best way to find out what user objects are in tempdb is to do the following:

Select * from sys.all_objects

If you want to find out how much space they take as a whole, you can do this:

Exec Sp_spaceused

Perhaps a better name for user object is visible object. Some objects in tempdb that are created and managed by the server are not visible to the end user.

You might find it interesting that user-defined local temp tables also appear in the viewsys.all_objects. These tables have names such as the following:

#myTempTable___________NNN

The reason for this naming convention is that different connections can have the same name for local temp tables. So, the NNN portion (a number) represents the connection that owns the table.

Even table variables such as @t tables also appear here. You will notice names that consist of a string of hex numbers:

F0ABCDE

This is the way table variables are named. We use an internal ID for each of them that is actually just a number.

What Is an Internal Object?

Contrary to common belief, sp_spaceused does not show all the spaces occupied in tempdb. A vast amount of space is occupied by “invisible” objects. These include internal objects and version store objects.

Most internal objects are created by queries. For example:

Declare @i int
Set @i = 0
Create table #myTempTable (keyColumn int, myBigCharColumn char (1000))
While (@i <1000000) begin
Insert into #myTempTable values (@i, 'x')
Set @i = @i +1
End

Select * from #myTempTable
order by keyColumn

The last select query has an order by. Because we have no index on 'keyColumn', it can be serviced only by sorting the result set. This particular query requires up to 1GB of internal tempdb space for sorting.

The space required for sorting in the preceding query is managed internally by a SQL Serversort work table object. This object is visible in the catalog view sys.all_objects.

Internal objects are used for lots of other purposes:

• Hash joins and hash aggregates

• Query spool to store data as part of a multistep query processing

• Keyset cursors to store the keys

• Static cursors to store the query result

• Service Broker to store messages in transit

• XML variables

• LOB variables, including TEXT, IMAGE, NTEXT, VARCHAR(MAX), and VARBINARY(MAX)

• DBCC CHECK

• Query and event notification

So how do you find out about the space occupied by these “invisible” internal objects?

SQL Server 2005 provides these three DMVs that help you determine this:

     Sys.dm_db_file_space_usage

     Sys.dm_db_task_space_usage

     Sys.dm_db_session_space_usage

These DMVs give you a snapshot of current system activity. They can tell you how much invisible space internal objects occupy.

You may use these tools as you study each feature previously mentioned. This will give you some idea of how much space each feature requires.

What Is a Version Store Object?

Another type of invisible object is the version store object. The system creates version store objects on demand. The system can create up to one version store object every minute.

Any SQL Server 2005 feature that generates row version information may cause a version store to be created. These features include the following:

• Snapshot isolation

• Triggers

• MARS (multiple active result sets)

• Online index build

Consider this query, for example:

Declare @i int
Set @i = 0
Use master
Create table myBigTable (keyColumn int, myBigCharColumn char (1000))
While (@i <1000000) begin
Insert into myBigTable values (@i, 'x')
Set @i = @i +1
End
Update myBigTable set myBigCharColumn = 'y'

The master database always generates row versions for any update performed there. In this example, 1GB of row version information is generated by the last update statement. The version store will take 1GB of space in tempdb.

So how do you find out how much space is being occupied by the version store in tempdb?

The best way is to use the performance counter Transactions: Version store size. You can view this in Perfmon/Sysmon or using the SQL Server 2005 SQLDiag facility. It collects this counter by default.

Most internal objects are automatically freed at the end of the query in which they were created. In contrast, version store objects do not go away after the update statement that created them.

Instead, a version store is kept until the versions are no longer needed by the SQL Server instance as a whole. This is because most versions are generated for other users who might need to access them. Version stores are not needed only by the user who originally generated them.

Consider this scenario, for example:

• Assume we only have two users, A and B.

• User A starts a transaction that requires a version store. This is a snapshot isolation transaction.

• User B generates a 1GB version store. User B logs off.

• User A’s transaction is open for five hours because it is doing a long and complex query. Then User A logs off.

• The 1GB row version store is removed from tempdb when User A finally logs off.

PRACTICAL TROUBLESHOOTING

Many SQL Server users never need to worry about tempdb. As mentioned previously, SQL Server uses it as a scratch space for internal data processing, but this is mostly invisible to the user. By default, the size of tempdb auto-grows based on the system’s needs. As long as you have enough disk space for the work you want to perform, you can usually just let SQL Server take care of tempdb for you.

There are mainly two areas of troubleshooting for tempdb. One relates to the sizing of tempdb. The other is tempdb performance.

What to Do If You Run Out of Space in Tempdb

Many people want to know just how much tempdb space they should have. This question is difficult to answer. Space-usage estimation in tempdb is so complex that an accurate estimation is often not worth the effort. I am not even sure if it is practical to have an accurate estimate in an environment of any complexity.

The practical solution is to do some rough planning and run the system through some trial and error. You must balance several factors: hardware budget, performance, development, and testing time.

How much tempdb space can you afford to have? Most people have a finite hardware budget. You should give a reasonable amount of budget to tempdb. How much you should set aside for tempdb storage depends on your application. If you find out that you cannot afford the amount of space the application needs, you might be forced to change the application.

For example, if the application has a query that uses a hash join and a lot of tempdb space as a result, maybe you should consider changing that to a nested loop join by adding appropriate indexes or using query hints. Even if it takes longer to run, you will not need as much tempdb space.

Another example of a performance trade-off is the use of the snapshot isolation level. Without snapshot isolation, some applications experience reader-writer blocking, and throughput is low. With snapshot isolation, the application throughput is much higher, but tempdb use is bigger, too. As with many things related to performance, there are always trade-offs. What you can afford to trade for performance and how much you can afford to trade depends on your specific circumstances.

It is worth mentioning that tweaking an application has other costs besides the raw application performance cost or savings it yields. Modifying an application takes time and resources, as does testing your changes and redeploying them.

Here are some questions I recommend you ask yourself to develop a rough estimate of tempdb space usage:

• What are the main features in your application that use tempdb? You can refer to the earlier list in this chapter for the SQL Server features that use tempdb.

• For each feature, how much space will it roughly take if queries are performed in a serial way? This estimate does not have to be very accurate. You need to look at both the maximum space requirement and the most common requirement.

• Now consider which features will be used concurrently by your application. Sum the maximum space requirements of the largest set of features that will be used concurrently. This is your rough maximum tempdb space estimate.

After you have a rough estimate, you need to set up a test environment and create a workload that simulates your worst-case scenario. The closer your simulation workload mimics the real workload, the better.

If your real-world data set is too large for the test environment, you might consider a scaled-down test and multiply the scaling factor. You also need to watch out for cases where space usage is not linear to the amount of data you have.

To configure tempdb size properly, you should observe the free space in tempdb and tem-pdb size over a period of time.

The key point here is “over a period of time.” Why is this important? Let’s look at an example. Suppose SQL Server uses 10GB of tempdb space during weekdays, but on weekends it needs 30GB of tempdb to run a batch reporting job. In this case, it makes no sense to constantly shrink the tempdb file to 10GB during the week just to have it grow back to 30GB on the weekends. You should let the system keep 30GB of tempdb space. This advice applies to both the tempdb data file and tempdb log file.

No matter how accurate the planning or estimate is, you might still run out of space in production. Now what do you do?

The best practice here is to set up the SQLDiag service to constantly monitor the production system and take action before you run out of space.

A sampling method is recommended for this type of monitoring. For example, you could have SQLDiag run a query like the following at five-minute intervals:

Select * from sys.dm_db_session_space_usage
Select * from sys.dm_db_file_space_usage
Dbcc inputbuffer (51)
Dbcc inputbuffer (52)
Dbcc inputbuffer (53)

This script tells you the following:

• How free is tempdb at that moment? How is the occupied space divided among the user objects, internal objects, and version store objects?

• What is the tempdb space consumed by each session at that moment?

• What is the current batch executed by session 51 to 53?

Trapping these query outputs in a table allows you to monitor the system while it runs.

I offer this script only to illustrate a technique for monitoring tempdb. This method works well for large queries that consume a lot of space. I have seen queries that take upward of 10GB of tempdb space. These kinds of queries are not likely to finish in five minutes, so you should be able to catch them with a query like this one.

Polling the system state at five-minute intervals should not cause performance issues itself. Sometimes you will want a more-frequent collection interval. You must decide what the maximum impact your monitoring should have on the system. You can configure SQLDiag to collect data at any interval you want.

After your monitoring is set up, if you run out of space, you need to examine logged diagnostic data and determine what query is responsible. You might find that the version store is responsible for most of the space consumption. In that case, you must decide whether the versioning features should be disabled, or whether you should just add some disk space.

If a single query is consuming most of the space, and the version store is not the problem, you might not have many other options than simply increasing the disk space available to tempdb. Modifying your application to use less tempdb space might turn out to be too time-consuming to be worth the effort.

Tweaking your tempdb space usage is really just another variety of performance tuning. You find the place where you can achieve the maximum result with the minimum effort.

What Is Tempdb Page Latch Contention?

Another common issue in tempdb performance relates to latch contention. I have seen customer cases where the CPU is only 20 percent used, and the disk is only 10 percent used, but the throughput cannot improve regardless of the number of requests sent to the server.

This is a typical blocking scenario. In tempdb, a common source of blocking is the latch contention on metadata pages and on PFS, GAM, and SGAM pages.

Recall my earlier mention of SQL Server 2005’s improvements in these two areas. You should see much less latch contention on SQL Server 2005; however, you might still see some contention if your application is not designed properly.

How do you identify such issues? My recommendation is still to use a sampling technique. Querying DMVs on a time interval obviously has higher overhead than querying them just one time, but latch contention is also a pretty rare issue. For example, you can execute the following pseudo-script every 100 mini-seconds:

Select * from sys.sysprocesses
Loop through all the processes
If blocked !=0 and lastwaittpe like 'PAGE%LATCH_%' and waitresource like '2%'
DBCC PAGE (...)

This script will tell you the following:

• What the waiting resource is at the moment of the snapshot

• If the process is blocked by a page latch, what type of page that is

If you collect enough data points during the time when you suspect the contention is occurring, you should be able to determine whether it is related to tempdb page latches.

Here is a scenario I have seen in the past. It was a heavily blocked system:

1. I have about 100 processes in the system.

2. I collected 1,000 data points.

3. I collected 100,000 rows from sys.processes.

4. Twenty percent of the rows have blocked != 0.

5. Among the 20 percent rows, 40 percent have a lastwaittype of PAGELATCH_UP, and page is in tempdb.

6. The output of DBCC PAGE reveals contention on data pages. The object ID of the data page shows that it is a system table.

In this case, it turns out that the stored procedure used by the application had an additionalcreate index on a temp table, like this:

CREATE PROC myProc AS
create table #myTemp(k int)
insert into #myTemp values (1)
create clustered index .. on #myTemp()

The create index statement disabled the temp table caching functionality of SQL Server 2005.

The fix in this case is to create the clustered index in the same statement as the create table:

CREATE PROC myProc AS
create table #myTemp (k int primary key)
insert into #myTemp values (1)
END

In another case, I found out that most of the waiting a connection was doing was on PFS pages. The application was doing something like this:

Create table ##myGlobal (int k, myBigCharColumn varchar (4000))
Create non clustered index idx-k on ##myGlobal (k)
Insert ##myGlobal values (1, replicate ('x', 4000))
CREATE PROC myProc AS
Update ##myGlobal set myBigCharColumn = null

The change of row size in this heap caused lots of free-space state updates in the corresponding PFS pages.

The fix is to turn that nonclustered index into a clustered index. We do not need to update free-space information on PFS pages for B-tree pages.

gprsql

Tuesday, 24 September 2019

tempdb issues

No comments:

Post a Comment