Recently I’ve developed a simple processing application that generates some data in database. For business logic it was important to understand what entity was created the last on some steps. Since there was no concern about scaling or running several instances simultaneously, it was decided to use the primary key itself, the auto-incremental integer value. I didn’t want to rely somehow on date-time, or invent something sophisticated.

The application was done, deployed. But after several iterations on big amount of data, we noticed that the data is broken. The business logic, that relies on order, takes data from the wrong entities.

For the application I’ve used:

  • .NET Core 3.0.
  • Entity Framework Core 2.2.6 framework.
  • PostgreSQL database.
  • Npgsql.EntityFrameworkCore.PostgreSQL 2.2.4 database driver.

Let me show the simplified version of the code.

[Table("documents")]
public class Document
{
    [Key]
    [Column("id")]
    public int Id { get; set; }
    
    [Column("name")]
    public string Name { get; set; }
}

public class MyDbContext : DbContext
{
    public DbSet<Document> Documents { get; set; }
}

There is a table documents with auto-increment integer primary key id, and one field to demonstrate the issue.

The logic is very straightforward: loop and create new documents with the value based on loop index.

using (var context = new MyDbContext())
{
    for (var i = 1; i <= 100; i++)
    {
        var document = new Document
                       {
                           Name = $"Document {i}"
                       };

        context.Documents.Add(document);
    }

    await context.SaveChangesAsync();
}

On every iteration the document in added to the DbSet. After the loop all added documents are sent together to the database for saving.

What would you expect to see in the database? The following?

idname
1Document 1
2Document 2
3Document 3
4Document 4
5Document 5
6Document 6

Unfortunately, the result is surprising:

The first assumption would be that the database generated what we didn’t expect, as it is responsible for assigning new auto-incremental IDs.

To check this, let’s enable debug logging in Entity Framework and take a look what it actually sent to DB:

Executing DbCommand [
    Parameters=[
        @p0='Document 1',
        @p1='Document 73',
        @p2='Document 72',
        @p3='Document 71',
        @p4='Document 70',
        @p5='Document 69',
        @p6='Document 68',
        @p7='Document 67',
        @p8='Document 66',
        @p9='Document 65',
        @p10='Document 64',
        @p11='Document 63',
        @p12='Document 62',
        @p13='Document 61',
        @p14='Document 60',
        @p15='Document 59',
        @p16='Document 58',
        @p17='Document 57',
        @p18='Document 56',
        @p19='Document 55',
        @p20='Document 54',
        @p21='Document 53',
        @p22='Document 74',
        @p23='Document 52',
        @p24='Document 75',
        ...],
    CommandType='Text',
    CommandTimeout='30']
INSERT INTO myschema.documents (name)
VALUES (@p0)
RETURNING id;
INSERT INTO myschema.documents (name)
VALUES (@p1)
RETURNING id;
INSERT INTO myschema.documents (name)
VALUES (@p2)
RETURNING id;
INSERT INTO myschema.documents (name)
VALUES (@p3)
RETURNING id;
...

It seems like the Entity Framework caused the mess, and the database just executed INSERT instructions in the provided order.

The next question, what part of Entity Framework works weird: the core or the Npgsql driver from the PostgreSQL team?

Each database driver implements some basic stuff expected by Entity Framework. In case of generating INSERT instructions, we should take a look at the implementation of the UpdateSqlGenerator, or to be more precise – the method AppendInsertOperation().

The code of PostgreSQL driver is open-source and we can easily check how it adds insert operations.

The AppendInsertOperation() is called by Entity Framework for each item in the ModificationCommandBatch.ModificationCommands list. If we debug and check the content of the list, we can clearly see that it already contains the broken order of Document 1, Document 73, Document 72, ...

So, the issue is on the Entity Framework side.

And the last, similar issues can be found on GitHub and StackOverflow. Here is the one with the nice answer regarding the Entity Framework behavior:

There is no guarantee that entities will be inserted in the order that they are tracked. There are actually many reasons for this, both around efficiency of internal data structures and the need to re-order where dependencies exist. Also, many stores will not preserve order once inserted, and hence trying to preserve order while inserting doesn’t help.

ajcvickers

Add comment