Demysityfy Entity framework queries

Linq to SQL - What is really generated and when is it fetched?

When?

Do you know what really triggers a call to the database when using entity framework (EF)? When working with EF you get in contact with c#DBSet<T> and c#IQueryable<T> both of these abstractions are used to hide the complexity of creating SQL Queries from C# Linq statements. There are a couple of Linq methods that will trigger a call to the database. The most common are ToList, First, FirstOrDefault, Single, SingleOrDefault, All, Any, Count, Min, Max, Find and the async versions of these methods.

What is generated?

If you have some basic knowledge of SQL you probably already have a good understanding of what SQL that will be generated from the basic Linq methods. If that is the case either you can wait or go to the later parts of this article series or you can look at these examples just as a reminder.

First or FirstOrDefault

First will return the first element in the c# DBSet<T> or the First element in the c# DBSet<T> matching the sent in labda expression. If no element are found the method will throw an exception. If you don't want the method to throw an exception if the element is not found you can instead use FirstOrDefault. This method will instead return NULL if no matching element is found. Giving you the opertunity to handle the NULL value however you want.

using (var db = new DemoContext())
{
    // Can be used without lambda
    var bookings = await db.Bookings.FirstAsync();

    // Or with lambda
    var bookings = await db.Bookings.FirstAsync(x => x.BookRef == "00000F");
}
SELECT b.book_ref, b.book_date, b.total_amount
      FROM bookings.bookings AS b
      LIMIT 1

SELECT b.book_ref, b.book_date, b.total_amount
      FROM bookings.bookings AS b
      WHERE b.book_ref = '00000F'
      LIMIT 1

Single or SingleOrDefault

Single will return the one element in the c# DBSet<T> that match the sent in lambda expression. The big difference between this and First is that this method will throw an exception if more than one element match the sent in expression. You can also see a slight difference in the generated SQL where this method in order to be able to say that only one entity match the expression needs too try fetch two elements from the database. This method are because of that slightly less performant than First.

Same as with First and FirstOrDefault, Single will throw an exception if no element are found and SingleOrDefault will return NULL if no element are found.

using (var db = new DemoContext())
{
    // Can be used without lambda
    var bookings = await db.Bookings.SingleAsync();

    // Or with lambda
    var bookings = await db.Bookings.SingleAsync(x => x.BookRef == "00000F");
}
SELECT b.book_ref, b.book_date, b.total_amount
      FROM bookings.bookings AS b
      WHERE b.book_ref = '00000F'
      LIMIT 2

Find

Find will fetch a single entity by sending in the primary key as a parameter to the method.

The big difference between Find and Single or First is that Find will first look if the entity is already a part of the ef change tracker and if that's the case ef will return that entity without doing another db call.

using (var db = new DemoContext())
{
    var bookings = await db.Bookings.FindAsync("00000F");
}
SELECT b.book_ref, b.book_date, b.total_amount
      FROM bookings.bookings AS b
      WHERE b.book_ref = @__p_0
      LIMIT 1

Any, All and Count

Any

Any will check if there is any record in the database matching the sent in lambda expression and return true or false depending on the result.

using (var db = new DemoContext())
{
    var bookings = await db.Bookings.AnyAsync(x => x.BookRef == "00000F");
}
SELECT EXISTS (
          SELECT 1
          FROM bookings.bookings AS b
          WHERE b.book_ref = '00000F')

All

All will check if all the records in the database match the sent in lambda expression and return true or false depending on the result. It do so by generating a SQL query that will return 1 if any record does not match the expression.

using (var db = new DemoContext())
{
    var bookings = await db.Bookings.AllAsync(x => x.BookRef == "00000F");
}
SELECT NOT EXISTS (
          SELECT 1
          FROM bookings.bookings AS b
          WHERE b.book_ref <> '00000F')

Count

Count will count the number of records in the database table or how many records that match the sent in lambda expression. This method returns a int representing the number of records.

There is also a LongCountAsync that can be used if you rather want the result back as a long datatype. Here you can see a slight difference in the generated SQL where CountAsync will cast the result to a integer and the LongCountAsync returns the raw value.

using (var db = new DemoContext())
{
    var bookings = await db.Bookings.CountAsync(x => x.BookRef == "00000F");

    var bookings2 = await db.Bookings.LongCountAsync(x => x.BookRef == "00000F");
}
SELECT count(*)::int
      FROM bookings.bookings AS b
      WHERE b.book_ref = '00000F'

SELECT count(*)
      FROM bookings.bookings AS b
      WHERE b.book_ref = '00000F'

Max and Min

Max

using (var db = new DemoContext())
{
    var maxTotal = await db.Bookings.MaxAsync(x => x.TotalAmount);
}
SELECT max(b.total_amount)
      FROM bookings.bookings AS b

Min

using (var db = new DemoContext())
{
    var minTotal = await db.Bookings.MinAsync(x => x.TotalAmount);
}
SELECT min(b.total_amount)
      FROM bookings.bookings AS b

Sum and Average

Sum

This method returns the summerized value of a given column represented by the decimal datatype. We can also see something new in the generated SQL. Since total amount is nullable EF wraps the sum statement in the COALESCE function returning 0.0 if a NULL value is present.

using (var db = new DemoContext())
{
    var totalAmount = await db.Bookings.SumAsync(x => x.TotalAmount);
}
SELECT COALESCE(sum(b.total_amount), 0.0)
      FROM bookings.bookings AS b

Average

Average returns the average of a given columns value represented by the decimal datatype.

using (var db = new DemoContext())
{
    var avgAmount = await db.Bookings.AverageAsync(x => x.TotalAmount);
}
SELECT avg(b.total_amount)
      FROM bookings.bookings AS b

ToList or ToArray

The most used functions in Linq to SQL (at least by me) is the ToList or ToArray methods. Probably you won't use it by it self, but instead you will use it at the end of a more complex query. In the most simplest form this query looks like this.

using (var db = new DemoContext())
{
    var bookings = await db.Bookings.ToListAsync();
}
SELECT b.book_ref, b.book_date, b.total_amount
    FROM bookings.bookings AS b

Building a complex query

Where

Let's say you want to only fetch bookings where the total cost is over 3000. Conviniently enough there is a method for this in Linq to SQL and it even have the same name.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Where(x => x.TotalAmount > 3000)
        .ToArrayAsync();
}
SELECT b.book_ref, b.book_date, b.total_amount
      FROM bookings.bookings AS b
      WHERE b.total_amount > 3000.0

Now lets say we also want to filter on the BookDate and only fetch Bookings that are before 2017-7-29. Then we simply just add that condition to the Where filter.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Where(x => x.TotalAmount > 3000 && 
            x.BookDate <= new DateTime(2017, 7, 29, 0, 0, 0, 0, DateTimeKind.Utc))
        .ToArrayAsync();
}
SELECT b.book_ref, b.book_date, b.total_amount
      FROM bookings.bookings AS b
      WHERE b.total_amount > 3000.0 AND b.book_date <= TIMESTAMPTZ '2017-07-29T00:00:00Z'

Bookings can also have a collection of Tickets connected too it. Now we want to bring those in as well.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Where(x => x.TotalAmount > 3000 && 
            x.BookDate <= new DateTime(2017, 7, 29, 0, 0, 0, 0, DateTimeKind.Utc))
        .ToArrayAsync();

    var tickets = bookings.SelectMany(x => x.Tickets).ToArray();
}

If we inspect the tickets we can see that the array is empty. Does this mean that there are not tickets for bookings that cost more than 3000 and is booked earlier than 2017-7-29? No not really. In later versions of EF Lazy loading is not included by default. This means that we need to alter the Linq statement a bit in order to get the tickets. Here we introduce the Include method. I learned SQL before C# and I feel that it's easier to understand the include by simply think of it as a "what tables I want to JOIN in" keyword. So if I want to JOIN in the tickets I need to Include them in Linq to SQL.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Include(x => x.Tickets)
        .Where(x => x.TotalAmount > 3000 && 
            x.BookDate <= new DateTime(2017, 7, 29, 0, 0, 0, 0, DateTimeKind.Utc))
        .ToArrayAsync();

    var tickets = bookings.SelectMany(x => x.Tickets).ToArray();
}
SELECT b.book_ref, b.book_date, b.total_amount, t.ticket_no, t.book_ref, t.contact_data, t.passenger_id, t.passenger_name
      FROM bookings.bookings AS b
      LEFT JOIN bookings.tickets AS t ON b.book_ref = t.book_ref
      WHERE b.total_amount > 3000.0 AND b.book_date <= TIMESTAMPTZ '2017-07-29T00:00:00Z'
      ORDER BY b.book_ref

When we are using the Include method we are doing an explicit include meaning that we tell EF that we want this table joined in. We can also to an implicit include (join). To show this we also need to introduce the topic of projection. Projection is simply a way for us to be more specific about what properties from the table we want returned from the database. Looking at the below example we can see that we have created a new DTO record that only have the BookDate and the TotalAmount since that is the only thing we care about in this example. EF in this case adapt the query to only fetch the columns from the bookings table we need in order to create the BookingSummaryDto. This can be a good way of both speeding up you queries and also minimize the memory footprint (more on this in a later post).

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Select(x => new BookingSummaryDto(
                x.BookDate,
                x.TotalAmount
            ))
        .ToArrayAsync();
}
SELECT b.book_date, b.total_amount
      FROM bookings.bookings AS b

Now that we understand the concept of projection we can take a look at what an implicit include are. Now we can see that we are still doing a left join even if we did not actually use the Include statement. This happens because of the fact that we are accessing and projecting the tickets into a TicketDto inside the Select statement and therefore triggering an implicit join.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Select(x => new BookingDto(
                x.BookDate,
                x.TotalAmount,
                x.Tickets.Select(t => new TicketDto(t.TicketNo)).ToList()
            ))
        .ToArrayAsync();
}
SELECT b.book_date, b.total_amount, b.book_ref, t.ticket_no
      FROM bookings.bookings AS b
      LEFT JOIN bookings.tickets AS t ON b.book_ref = t.book_ref
      ORDER BY b.book_ref

The third way of triggering a join is to manually create a join statement in Linq.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Join(db.Tickets,
            booking => booking.BookRef,
            ticket => ticket.BookRef,
            (booking, ticket) =>
                new BookingDto(booking.BookDate, booking.TotalAmount, new() { new TicketDto(ticket.TicketNo) })
        {})
        .ToArrayAsync();
}
SELECT b.book_date, b.total_amount, t.ticket_no
      FROM bookings.bookings AS b
      INNER JOIN bookings.tickets AS t ON b.book_ref = t.book_ref

Doing a join this way in EF will give you back en array of all the rows from the join statement. Meaning that for example the query below will return two bookings object since there are two tickets related to that booking and the booking information will be dublicated in both objects.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .Where(x => x.BookRef == "000181")
        .Join(db.Tickets,
            booking => booking.BookRef,
            ticket => ticket.BookRef,
            (booking, ticket) =>
                new BookingDto(booking.BookDate, booking.TotalAmount, new() { new TicketDto(ticket.TicketNo) })
        {})
        .ToArrayAsync();
}
SELECT b.book_date, b.total_amount, t.ticket_no
      FROM bookings.bookings AS b
      INNER JOIN bookings.tickets AS t ON b.book_ref = t.book_ref

If you instead want to get only one booking back and also a list of the related tickets you can use the GroupJoin statement. The difference in this query will also be that we are doing a left join since we want to get back an empty list if no tickets related to that booking are found.

using (var db = new DemoContext())
{
    var bookings = await db
        .Bookings
        .GroupJoin(db.Tickets, 
            booking => booking.BookRef, 
            ticket => ticket.BookRef, 
            (booking, tickets) => 
                new BookingDto(booking.BookDate , booking.TotalAmount, tickets.Select(x => new TicketDto(x.TicketNo)).ToList())
        {})
        .ToArrayAsync();
}
SELECT b.book_date, b.total_amount, b.book_ref, t0.ticket_no
      FROM bookings.bookings AS b
      LEFT JOIN LATERAL (
          SELECT t.ticket_no
          FROM bookings.tickets AS t
          WHERE b.book_ref = t.book_ref
      ) AS t0 ON TRUE
      ORDER BY b.book_ref

When fetching related entities and especially when these are a list of entities it could be a good idea to limit the amount of data fetched. This can be done by applying filtering methods to the navigation properties and can have a tremendous impact on the performance of your query. In this first example we have a simple filtering where we only fetch the related tickes that have CantactData connected to it. In the second example we instead want to look even one more level deep and that if the tickes have any connected TicketFlight with the amount of above 1000. With that said we have now seen that it workes perfectly to include other filtering, selections and aggregations in hte inner query and EF will perfectly handle the convertion between this and SQL.

var bookings = await db
    .Bookings
    .Select(x => new BookingDto(x.BookDate,
                                x.TotalAmount,
                                x.Tickets.Where(t => t.ContactData != null)
                                .Select(t => new TicketDto(t.TicketNo))
                                .ToList()))
    .ToArrayAsync();
SELECT b.book_date, b.total_amount, b.book_ref, t0.ticket_no
      FROM bookings.bookings AS b
      LEFT JOIN (
          SELECT t.ticket_no, t.book_ref
          FROM bookings.tickets AS t
          WHERE t.contact_data IS NOT NULL
      ) AS t0 ON b.book_ref = t0.book_ref
      ORDER BY b.book_ref
var bookings = await db
    .Bookings
    .Select(x => new BookingDto(x.BookDate,
                                x.TotalAmount,
                                x.Tickets.Where(t => t.TicketFlights.Any(tf => tf.Amount > 1000))
                                .Select(t => new TicketDto(t.TicketNo))
                                .ToList()))
    .ToArrayAsync();
SELECT b.book_date, b.total_amount, b.book_ref, t1.ticket_no
      FROM bookings.bookings AS b
      LEFT JOIN (
          SELECT t.ticket_no, t.book_ref
          FROM bookings.tickets AS t
          WHERE EXISTS (
              SELECT 1
              FROM bookings.ticket_flights AS t0
              WHERE t.ticket_no = t0.ticket_no AND t0.amount > 1000.0)
      ) AS t1 ON b.book_ref = t1.book_ref
      ORDER BY b.book_ref

In this article we have had a quick overview of some basic DB querying using Entity framework core and Postgresql. We have went through all the basic Linq expressions and also learned what SQL they are generating. Knowing this will be an important skill for understanding how to write performant applications, solve performance bottle necks and maybe the most important thing. Knowing the limitations of entity framework.