So I heard a rumor, that Linq did some pretty need things with pushed up against SQL Server with regards to doing smart and efficient server side pagination of your queries. I wanted to play around with things and see what I could discover…
The Setup
I created a simple data table, called.. TestData.
- CREATE TABLE [dbo].[TestData](
- [Id] [int] IDENTITY(1,1) NOT NULL,
- [TestStringIndexed] [varchar](50) NOT NULL,
- [TestDateIndexed] [datetime] NOT NULL,
- [TestNumberIndexed] [int] NOT NULL,
- [TestStringNonIndexed] [varchar](50) NOT NULL,
- [TestDateNonIndexed] [datetime] NOT NULL,
- [TestNumberNonINdexed] [int] NOT NULL,
- CONSTRAINT [PK_TestData] PRIMARY KEY CLUSTERED
- (
- [Id] ASC
- )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS= ON) ON [PRIMARY]
- ) ON [PRIMARY]
-
- GO
Then, I used my handy dandy RedGate Data Generator (No affiliation, just a happy customer) to insert a million records into my test table.
After that, I created a simple Entity Data Model for my one single table :)
With my model set up, I created a simple repository (following one of the billion of samples of the repository pattern found on the interwebs). Since I’m just testing the skip and take, I only needed a simple fetch…
using System.Collections.Generic;
using System.Linq;
public interface ITestDataRepository
{
TData Fetch(int id);
IEnumerable<TData> FetchAll();
}
public class TestDataRepository : ITestDataRepository
{
private BlogTestingEntities _entities = new BlogTestingEntities();
public TData Fetch(int id)
{
return (from o in _entities.TDataSet
where o.Id == id
select o).FirstOrDefault();
}
public IEnumerable<TData> FetchAll()
{
return _entities.TDataSet.ToList();
}
}
Next was to set up a simple service that my application could call to get the test data (again, following any one of the billion examples)..
using System.Collections.Generic;
using System.Linq;
public interface ITestDataService
{
TData Fetch(int id);
IEnumerable<TData> FetchAll();
}
public class TestDataService : ITestDataService
{
private readonly ITestDataRepository _repository;
public TestDataService() : this(new TestDataRepository()){}
public TestDataService(ITestDataRepository repository)
{
_repository = repository;
}
public TData Fetch(int id)
{
return _repository.Fetch(id);
}
public IEnumerable<TData> FetchAll()
{
return _repository.FetchAll();
}
}
Finally, I made my application consume the service…
using System;
using System.Diagnostics;
using System.Linq;
class Program
{
static void Main(string[] args)
{
Write("Welcome to the Linq Skip/Take Test Application");
var svc = new TestDataService();
var watch = new Stopwatch();
Write("Fetching All Records [FetchAll()]");
watch.Start();
var allFetched = svc.FetchAll();
watch.Stop();
Write(" Total Records Returned : " + allFetched.Count());
Write(" Record Start Index : " + allFetched.First().Id);
Write(" Record End Index : " + allFetched.Last().Id);
Write(" Total Time : " + watch.ElapsedMilliseconds);
Write();
Write("Press any key to exit...");
Console.ReadKey(true);
}
private static void Write()
{
Console.WriteLine();
}
private static void Write(string msg, params string[] args)
{
Console.WriteLine(msg, args);
}
}
Quick Click of run and keeping an eye on SQL Profiler let’s us know everything is happening right and just as we expected (I’m fetching every record from the table)
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[TestStringIndexed] AS [TestStringIndexed],
[Extent1].[TestDateIndexed] AS [TestDateIndexed],
[Extent1].[TestNumberIndexed] AS [TestNumberIndexed],
[Extent1].[TestStringNonIndexed] AS [TestStringNonIndexed],
[Extent1].[TestDateNonIndexed] AS [TestDateNonIndexed],
[Extent1].[TestNumberNonINdexed] AS [TestNumberNonINdexed]
FROM [dbo].[TestData] AS [Extent1]
Testing the Pagination
So, now it’s time to move forward with our testing of our handy skip and fetch methods to see what kind of damage we can do progress we can make!
First off, extend our program to make use of the overloaded FetchAll method to specify a page index and size:
watch.Reset();
Write("Fetching All Records [FetchAll(4, 100)]");
watch.Start();
var allFetchPaged = svc.FetchAll(4, 100);
watch.Stop();
Write(" Total Records Returned : " + allFetchPaged.Count());
Write(" Record Start Index : " + allFetchPaged.First().Id);
Write(" Record End Index : " + allFetchPaged.Last().Id);
Write(" Total Time : " + watch.ElapsedMilliseconds);
Write();
Click Run and watch our results..
So we see that we’ve only got 100 rows returned as expected, with the starting index at 401 and ending at 500… And it took considerably less time to retrieve the results! But wait! Profiler reports that the query passed to SQL server is EXACTLLY the same as before. So now I’m starting to speculate, that the Entity Framework is counting the records returned and stopping the query once it’s retrieved enough to satisfy the results. Let’s test by changing our page index to 9,999…
Ok.. so that still leaves us puzzled… it’s even faster this time getting the tail end of the results as it was before. But still showing the exact same query as the original FetchAll() method. Let’s swap them around so the page query is first, still on the last index page of the pagination..
AH HA! So the Entity Framework must be doing some wicked clever caching of our results some how to help protect us from ourselves! But why doesn’t the paging happen on the server like I was lead to believe?
The Magic of OrderBy
When reading the example How to: Return or Skip Elements in a Sequence (LINQ to SQL) There’s a quick little blurb at the bottom, that’s almost non-noteworthy:
The following example uses Take to select the first five Employees hired. Note that the collection is first sorted by HireDate.
So.. let’s go ahead and modify our service to include this handy little bit of information
public IEnumerable<TData> FetchAll(int pageIndex, int pageSize)
{
var startIndex = (pageIndex * pageSize);
return _repository.FetchAll().OrderBy(o => o.Id).Skip(startIndex).Take(pageSize);
}
And run our results:
And the query is the same as before as well…
Cut to the chase will ya?
The problem comes from how we’ve derived our repository…
The FetchAll() method in our repository is simply defined as:
public IEnumerable<TData> FetchAll()
{
return _entities.TDataSet.ToList();
}
So what we’re returning from the repository is an already run query on the _entities.TDataSet saying “Give me all of your records, and transform them to this handy dandy enumerable list”. Then the service is responsible for taking that list and carving it up for the desired results (.OrderBy().Skip().Take())
At this point, you have two choices…
1. Move the pagination into the repository
using System.Collections.Generic;
using System.Linq;
public interface ITestDataRepository
{
TData Fetch(int id);
IEnumerable<TData> FetchAll();
IEnumerable<TData> FetchAllPaged(int pageIndex, int pageSize);
}
public class TestDataRepository : ITestDataRepository
{
private BlogTestingEntities _entities = new BlogTestingEntities();
public TData Fetch(int id)
{
return (from o in _entities.TDataSet
where o.Id == id
select o).FirstOrDefault();
}
public IEnumerable<TData> FetchAll()
{
return _entities.TDataSet.ToList();
}
public IEnumerable<TData> FetchAllPaged(int pageIndex, int pageSize)
{
var startIndex = pageIndex*pageSize;
return _entities.TDataSet.OrderBy(o => o.Id).Skip(startIndex).Take(pageSize);
}
}
using System.Collections.Generic;
using System.Linq;
public interface ITestDataService
{
TData Fetch(int id);
IEnumerable<TData> FetchAll();
IEnumerable<TData> FetchAllPaged(int pageIndex, int pageSize);
IEnumerable<TData> FetchAll(int pageIndex, int pageSize);
}
public class TestDataService : ITestDataService
{
private readonly ITestDataRepository _repository;
public TestDataService() : this(new TestDataRepository()){}
public TestDataService(ITestDataRepository repository)
{
_repository = repository;
}
public TData Fetch(int id)
{
return _repository.Fetch(id);
}
public IEnumerable<TData> FetchAll()
{
return _repository.FetchAll();
}
public IEnumerable<TData> FetchAllPaged(int pageIndex, int pageSize)
{
return _repository.FetchAllPaged(pageIndex, pageSize);
}
public IEnumerable<TData> FetchAll(int pageIndex, int pageSize)
{
var startIndex = (pageIndex * pageSize);
return _repository.FetchAll().OrderBy(o => o.Id).Skip(startIndex).Take(pageSize);
}
}
Finally! We’ve got improvements in our results and an optimized query from Profiler!
SELECT TOP (100)
[Extent1].[Id] AS [Id],
[Extent1].[TestStringIndexed] AS [TestStringIndexed],
[Extent1].[TestDateIndexed] AS [TestDateIndexed],
[Extent1].[TestNumberIndexed] AS [TestNumberIndexed],
[Extent1].[TestStringNonIndexed] AS [TestStringNonIndexed],
[Extent1].[TestDateNonIndexed] AS [TestDateNonIndexed],
[Extent1].[TestNumberNonINdexed] AS [TestNumberNonINdexed]
FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[TestStringIndexed] AS [TestStringIndexed], [Extent1].[TestDateIndexed] AS [TestDateIndexed], [Extent1].[TestNumberIndexed] AS [TestNumberIndexed], [Extent1].[TestStringNonIndexed] AS [TestStringNonIndexed], [Extent1].[TestDateNonIndexed] AS [TestDateNonIndexed], [Extent1].[TestNumberNonINdexed] AS [TestNumberNonINdexed], row_number() OVER (ORDER BY [Extent1].[Id] ASC) AS [row_number]
FROM [dbo].[TestData] AS [Extent1]
) AS [Extent1]
WHERE [Extent1].[row_number] > 999900
ORDER BY [Extent1].[Id] ASC
The thing I *DON’T* like about this solution.. is now the repository has been infected with what I consider to be business logic. IMHO, the repository shouldn’t care what I want to order my results by, or how to determine what my start and stop is for my page index. I believe that should go into the business layer (service). So having said that…
2. Change the return type from FetchAll in the repository
Instead of returning IEnumerable<T> from the respository method, let’s go ahead and return IQueryable<T> so that we can let the business layer apply it’s logic straight to the query builder..
public IQueryable<TData> FetchAllQ()
{
return _entities.TDataSet;
}
Then, in the service layer, consume it..
public IEnumerable<TData> FetchAllQ(int pageIndex, int pageSize)
{
var startIndex = (pageIndex * pageSize);
return _repository.FetchAllQ().OrderBy(o => o.Id).Skip(startIndex).Take(pageSize);
}
The end result is the same… We’ve got an optimized query being build and passed to the server so we’re effectively using server side paging to get our data. Since we’ve exposed the IQueryable interface from the repository, we’re allowing our business layer to (oddly enough) make business decisions on how that data is requested from the repository. We’re still maintaining our separation of responsibilities, because the repository is the one that is actually acting as the DAL (ok, the Entity Framework is, but let’s not split hairs).