Monday, March 03, 2008

Until recently, Xenomorph has been using a proprietary model to store data. However, reading in the latest Wilmott magazine (January 2008 - "Spreading Wings"),

(article should appear here at some point)

http://www.xenomorph.com/news/

CEO Brian Sentance talks about how Xenomorph are working with Microsoft to store the data on SQL Server. It will be interesting to see what performance can be achieved in the relational world (assuming that is how it is implemented).

On a related note, SQL Server is falling down the rankings on the TPC-C league

http://www.tpc.org/tpcc/results/tpcc_perf_results.asp?resulttype=noncluster

if SQL 2008 offered any tangible performance improvements I would have expected to see an entry by now

There are new records in other areas

http://blogs.msdn.com/sqlperf/archive/2008/02/27/sql-server-2008-launched-today-with-great-performance-amp-scalability.aspx

http://www.tpc.org/tpce/tpce_perf_results.asp

but could this be down to hardware improvements? Furthermore, the absence of competing databases implies that other vendors don't take this as seriously

Monday, March 03, 2008 9:08:07 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]  |  Trackback
Monday, November 20, 2006
Friday, December 16, 2005

After the last test looking at ways to improve slow data retrieval, I have decided to see what is the fastest method of pulling data from a SqlDataReader. This was driven from one of my colleagues claiming that the methods described in

 

http://www.sqljunkies.com/WebLog/amachanic/archive/2005/04/06/10462.aspx

 

were identical, and should not result in any performance difference.

 

The testing is similar to the last entry

 

http://www.noelwatson.com/blog/PermaLink,guid,366274e0-cbb0-4519-a039-6946e43d7bae.aspx

 

with the stored procedure from test 2 being used (read uncommitted and using clustered index seek). However, the loop will be executed 100 times rather than 20, and I will be testing against release rather than build mode. I didn’t build in release mode for the last round of testing as I knew that the greatest gains were to be found in the db layer.

I will be using reflector for .NET

 

http://www.aisto.com/roeder/dotnet/

 

to see what .NET is doing under the covers

1. This was the original test

 while (reader.Read())
   {
    logDetails = reader.GetString(0);
    logTime = reader.GetDateTime(1);
    
   }

As expected, the values for GetString are around 5 times the size (100 vs. 20 loops) of test 2in the previous entry

Looking at the Reflector code,

public string GetString(int i)
{
      SqlString text1 = this.GetSqlString(i);
      return text1.Value;
}

GetString() calls GetSqlString() internally.

2. Modify code to call SqlTypes directly

while (reader.Read())
   {
    logDetails = (string)reader.GetSqlString(0);
    logTime =(DateTime)reader.GetSqlDateTime(1);
   }

This is due to the op_explicit conversions

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemdatasqltypessqldatetimeclasssqldatetimetodatetimeconversiontopic.asp

Modifying the code to

 while (reader.Read())
   {
    logDetails = (SqlString)reader.GetSqlString(0);
    logTime =(SqlDateTime)reader.GetSqlDateTime(1);
   }

and declaring the variables as SqlTypes gives the following result, but you will still have to cast at some point unless you are happy passing SqlTypes around your system.

3. Method proposed in SqlJunkies link

while (reader.Read())
    {
     logDetails = (string)reader[0];
     logTime =(DateTime)reader[1];
    }

This is the quickest method if the second part of test 2 isn’t acceptable.

Friday, December 16, 2005 1:39:52 PM (GMT Standard Time, UTC+00:00)  #    Comments [1]  |  Trackback
Tuesday, December 13, 2005

http://download.microsoft.com/download/1/3/4/134644fd-05ad-4ee8-8b5a-0aed1c18a31e/TShootPerfProbs.doc

WIll read it when we move to 2005 - hopefully in the next 6 months

 

Tuesday, December 13, 2005 4:48:51 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]  |  Trackback

Recently one of the developers was profiling an application to see where improvements could be made to the code. He was using  JetBrains profiler,

http://www.jetbrains.com/profiler/

and this was indicating that retrieving a string using from a SqlDataReader using reader.GetString() was taking a long time.

To recreate the problem, I firstly populated a similar database – the script in the previous entry is sufficient.

http://www.noelwatson.com/blog/PermaLink,guid,b8d1d44e-0058-43d7-8635-25513e9bc86e.aspx

although you will have to add an extra zero to the loop – 1 million records wasn’t enough to recreate the problem.

DECLARE @DateSubtract DECIMAL(9,4)
DECLARE @Counter INT

SELECT @Counter = 0

WHILE @Counter < 10000000
BEGIN
 SELECT @DateSubtract = RAND() * 100000
 INSERT INTO LOG (LogDateTime, Details)
 VALUES (DATEADD(mi, - @DateSubtract, GETDATE()), 'TEST')
 SELECT @Counter = @Counter + 1
END

A simplified version of the stored procedure is shown below

ALTER        PROCEDURE p_LogSearch_s
(
@ip_StartDate DATETIME,
@ip_EndDate DATETIME
)

AS


SELECT  Details, LogDateTime FROM LOG
WHERE (@ip_StartDate IS NULL OR LogDateTime >= @ip_StartDate) AND (@ip_EndDate IS NULL OR LogDateTime < @ip_EndDate)

Test code is shown below. This is called 20 times

private void RetrieveData()
  {
   SqlConnection sqlConn = null;
   SqlCommand sqlCmd = null;
   SqlDataReader reader = null;
   SqlParameter param = null;
   string strSqlConn;
   string logDetails;
   DateTime logTime;
  
    strSqlConn = @"Data Source=(local); Initial Catalog=Log;Integrated Security=SSPI";
    sqlConn = new SqlConnection(strSqlConn);
    sqlCmd = new SqlCommand();


    sqlConn.Open();

    sqlCmd.CommandText = "p_LogSearch_s";
    sqlCmd.CommandType = CommandType.StoredProcedure;
    sqlCmd.Connection = sqlConn;

    param = new SqlParameter("@ip_StartDate",SqlDbType.SmallDateTime);
    param.Value  = new DateTime(2005,12,7,21,30,0,0);
    sqlCmd.Parameters.Add(param);
    param = new SqlParameter("@ip_EndDate",SqlDbType.SmallDateTime);
    param.Value = new DateTime(2005,12,8);
    sqlCmd.Parameters.Add(param);
   
    
    reader = sqlCmd.ExecuteReader();

    while (reader.Read())
    {
     logDetails = reader.GetString(0);
     logTime = reader.GetDateTime(1);
     
    }
  }

We are retrieving 2.5 hours worth of data - 15161 rows


Looking at the results, it is interesting to see that GetString() takes almost 20 times as long as GetDate(). However, this is because GetString() is the first call to retrieve the next row of the data, so will take longer and is a red herring. If we were to swap GetDateTime() above GetString(), the orders would reverse. 303220 is the number of hits we are expecting (20 iterations * 15161 rows), the Read method reads the extra time when the end of the recordset is reached.

Note that some of the calls (e.g FPushMessageLoop) can be ignored as they are related to form population

Solutions

1. Modify Isolation level

One way to improve performance is to set the isolation level to read uncommitted – this is acceptable when reading log data as we are not too fussed if the occasional record is uncommitted. This improves performance by around 20%, although gives an even more misleading result in JetBrains!

2. Modify the stored procedure

The stored procedure can be rewritten

ALTER         PROCEDURE p_LogSearch_s
(
@ip_StartDate DATETIME,
@ip_EndDate DATETIME
)

AS

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT  Details, LogDateTime FROM LOG
WHERE (LogDateTime >= @ip_StartDate) AND (LogDateTime < @ip_EndDate)

and the results are dramatic

However, this stored procedure as it stands may be unacceptable as it may be the case that null variables must be passed to the stored procedure.

3. Improve performance reading from the datareader

With reference to the following link

http://www.sqljunkies.com/WebLog/amachanic/archive/2005/04/06/10462.aspx

it is possible to improve performance when accessing the datareader

while (reader.Read())
    {
     logDetails = reader.GetString(0);
     logTime =reader.GetDateTime(1);
    }

becomes

while (reader.Read())
    {
     logDetails = (string)reader[0];
     logTime =(DateTime)reader[1];
    }

 

This improves performance marginally, get_item taking 59064ms is less than the combined total of GetString (58803ms) and GetDateTime (389ms) in test 1 (UNCOMMITTED).

4. Use a dataset

I did this just out of interest, as I didn't believe performance would be improved

while (reader.Read())
    {
     logDetails = reader.GetString(0);
     logTime =reader.GetDateTime(1);
    }

becomes

DataSet ds = new DataSet();
   SqlDataAdapter da;
   da = new SqlDataAdapter(sqlCmd);
   
   da.Fill(ds);
   foreach(DataRow dr in ds.Tables[0].Rows)
   {
    logDetails = (string)dr[0];
    logTime =(DateTime)dr[1];

   }

 

65312 milliseconds compares favourably with test 1, (note that in the real world, I wouldn't iterate through the dataset but instead would bind directly to a grid) which has

GetString           58803

Read                 3825

ExecuteReader   1398

GetDateTime      389

=64415ms

Dataset vs. DataReaders can be found here

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bdadotnetarch031.asp

Conclusion

Modifying the stored procedure would be the obvious choice. Using code such as

(@ip_StartDate IS NULL OR LogDateTime >= @ip_StartDate)

Doesn’t allow the index to be used correctly (uses a scan rather than a seek)

An alternative is to use a parent stored procedure, and call individual child stored procedures dependant on the parameters passed to the parent sp. Maintaining these can be a headache, but are necessary if performance is critical.

 

 

 |  | 
Tuesday, December 13, 2005 11:12:24 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]  |  Trackback
Friday, December 09, 2005

Many times I have wanted to ignore the time portion of a datetime field in a database when selecting records for a given date. One rudimentary way would be to store date and time in separate fields. There is another alternative

  • Scenario

Logging database with 1 million records

  • Configuration

Parallelism: disabled (compare methods more acurately)

Recovery mode: simple (reduces time when adding indexes)

  • Scripts

CREATE TABLE [dbo].[Log] (
 [LogID] [int] IDENTITY (1, 1) NOT NULL ,
 [LogDateTime] [smalldatetime] NOT NULL ,
 [Details] [char] (10) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO

CREATE  CLUSTERED  INDEX [LogDateTime] ON [dbo].[Log]([LogDateTime]) ON [PRIMARY]
GO

ALTER TABLE [dbo].[Log] ADD
 CONSTRAINT [PK_Log] PRIMARY KEY  NONCLUSTERED
 (
  [LogID]
 )  ON [PRIMARY]
GO


SET NOCOUNT OFF

DECLARE @DateSubtract DECIMAL(9,4)
DECLARE @Counter INT

SELECT @Counter = 0

WHILE @Counter < 1000000
BEGIN
 SELECT @DateSubtract = RAND() * 100000
 INSERT INTO LOG (LogDateTime, Details)
 VALUES (DATEADD(mi, - @DateSubtract, GETDATE()), 'TEST')
 SELECT @Counter = @Counter + 1
END


We will be selecting all records for 7th December 2005 (14188 records)

The first four examples attempt to select records for a given date

1.

SELECT * FROM Log WHERE DATEPART(yy, LogDateTime) = 2005
AND DATEPART(mm, LogDateTime) = 12
AND DATEPART(dd, LogDateTime) = 7

2.

SELECT * FROM Log WHERE DATEADD(d,DATEDIFF(d,0,LogDateTime),0) = DATEADD(d,DATEDIFF(d,0,GETDATE()),-2)

3.

SELECT * FROM Log where DATEDIFF(dd,LogDateTime,GETDATE()-2) = 0   

4.

SELECT * FROM Log WHERE CONVERT(CHAR(8), LogDateTime, 112) = CONVERT(CHAR(8), GETDATE()-2, 112)

The first three have identical performance (4235 logical reads), whereas the 4th is a lot slower (45188 logical reads) - it is unable to use the clustered index on LogDateTime

The next three tests select records where the date is between two dates i.e before 8th December and after 6th December.

I could've used BETWEEN rather than "greater than or equal to" and less than"


5.

SELECT * FROM LOG WHERE LogDateTime >=  CAST(CONVERT(VARCHAR(12),GETDATE() - 2,112) + ' 00:00:00' as DATETIME) AND LogDateTime <  CAST(CONVERT(VARCHAR(12),GETDATE() - 1,112) + ' 00:00:00' as datetime)

 

6.


SELECT * FROM Log WHERE LogDateTime >= DATEADD(d,DATEDIFF(d,0,GETDATE()),-2)
AND LogDateTime < DATEADD(d,DATEDIFF(d,0,GETDATE()),-1)

 

7.

SELECT * FROM Log WHERE LogDateTime >= '20051207' AND LogDateTime < '20051208'

 

These last three methods all cost 63 reads and are by far the best solution. If you have added and index and the index is being used with a scan rather than a seek, or the index is not being used at all, there is a problem!

Execution plan info can be found here

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_5pde.asp

Friday, December 09, 2005 6:39:28 PM (GMT Standard Time, UTC+00:00)  #    Comments [0]  |  Trackback
Wednesday, June 22, 2005
Quick test using Visual Studio Hosted experience
Wednesday, June 22, 2005 8:08:23 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]  |  Trackback
Saturday, March 12, 2005
Tracking down SQL Server performance issues
Saturday, March 12, 2005 7:56:59 AM (GMT Standard Time, UTC+00:00)  #    Comments [0]  |  Trackback

Theme design by Jelle Druyts

Pick a theme: