Hitul Mistry presented a talk at Europython 2022, Dublin, Ireland on the Walk-through of Django internals.
Watch Video
Summary of talk
How do Django starts
Django can be started with a simple command
python manage.py runserver
Django performs the below steps to start the server.
-
Find management commands
django.core.management.ManagementUtility
has toexecute()
() method that gets called.ManagementUtility.execute()
is the front door to execute any management commands.- Load models in all the apps.
- Start HTTP Server.
-
Parse arguments
- Django calls
CommandParser.parse_args()
to parse the command line arguments. - CommandParser inherits python’s built-in
ArgumentParser
class and overrides theparse_args
method. The overridden method makes Django error messages more relevant. - Django prepares the list of the possible management commands and later tries to compare with the existing list command.
- In case, we write
python manage.py runserver
then django also tries to recommend the best matchrunserver
.
- Django calls
-
Load settings
- Loads the settings. Django finds the settings.py path from a
DJANGO_SETTINGS_MODULE
environment variable. It gets initialized in the manage.py module. - In case, Django could not find the module on that path, it raises the exception.
- Django doesn’t load the settings unless we try to use the variable inside the settings. For instance, till the time we try to call settings.
, Django does not load them. - At the time of trying to access an attribute from settings, Django internally dynamically imports the settings module, loops over all the attributes in that module and in the end stores them in a class variable as key-value pair.
- LazySettings has setup method which loads the settings into a class variable.
- Django implements lazy behaviour by implementing python class methods getattr, repr, setattr, delattr.
- Django has configure() method which can be called to dynamically configure the settings. If this method is called before loading the settings then Django will not import the module from
DJANGO_SETTINGS_MODULE
. - The Django configure method cannot be called once Django imports the settings module.
- Loads the settings. Django finds the settings.py path from a
-
Load App Configuration and Logging
- Django calls django.setup() method which loads all the django apps, loads modules and marks them ready.
- Django loads the logging settings from the
settings.py.
Django uses Python’s logging module. - Loads app configurations from individual apps. It can be found in apps.py in all the Django apps. Django finds it in Django’s internal code base as well as, apps.py.
- Django tries to find the classes in app.py which are inherited from
django.apps.registry.Apps.
In case it finds two classes, it tried to find one which has default marked. If Django finds more than one class marked default, it raises an exception. - Load the models in all the apps.
- Django holds the operation in between by raising exceptions in case it finds a problem while loading models and apps.
-
Start HTTP Server
- runserver.py is the management command in django’s core application.
- Django has
basehttp
module(django.core.servers.basehttp) run()
method checks for threading and runs the server. socketserver.ThreadingMixIn
is used for threading andwsgiref.simple_server.WSGIServer
for HTTPServer.- Implementation is multithreaded if threading is true.
-
Auto reloader
- Django runserver auto restarts itself in case any file in the code changes.
- Django has two approaches to implement it. StatReload is the default way Django implements whenever we start the server.
-StatReloader
- Django prepares the list of the files along with their modified time which are in the scope of running the Django code.
- Background thread spawned by Django which internally checks at every second for file modification time. In case it finds a change in any of the file’s modification time, Django reloads the code.
- Watchman
- Watchman is the utility which is more efficient compared to the StatReloader. watchmap leverages OS features such as,
- Inotify - Linux
- FSEvents / kqueue - Mac
- Windows - Beta
- On change in file, OS sends a signal to Django. Django restarts on the signal.
- Installing pywatchman enables the watchman support in Django.
- Watchman is the utility which is more efficient compared to the StatReloader. watchmap leverages OS features such as,
How does request works?
Lets see, simple HTTP request,
curl --location --request POST 'http://localhost:8000/test/' \
--header 'Content-Type: application/json' \
--data-raw '{
"key": "value"
}'
Request web server receives
Above request is converted into a raw request as below while traveling on the network. We used wireshark to capture below raw request data.
Request gets converted into raw text.
Frame 1: 372 bytes on wire (2976 bits), 372 bytes captured (2976 bits) on interface lo, id 0
Ethernet II, Src: 00:00:00_00:00:00 (00:00:00:00:00:00), Dst: 00:00:00_00:00:00 (00:00:00:00:00:00)
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 40806, Dst Port: 8000, Seq: 1, Ack: 1, Len: 306
Hypertext Transfer Protocol
POST /test/ HTTP/1.1\r\n
[Expert Info (Chat/Sequence): POST /test/ HTTP/1.1\r\n]
[POST /test/ HTTP/1.1\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Method: POST
Request URI: /test/
Request Version: HTTP/1.1
Content-Type: application/json\r\n
User-Agent: PostmanRuntime/7.26.5\r\n
Accept: */*\r\n
Cache-Control: no-cache\r\n
Postman-Token: f821ed79-842b-4681-816a-a06f593a4c98\r\n
Host: localhost:8000\r\n
Accept-Encoding: gzip, deflate, br\r\n
Connection: keep-alive\r\n
Content-Length: 22\r\n
[Content length: 22]
\r\n
[Full request URI: http://localhost:8000/test/]
[HTTP request 1/1]
[Response in frame: 50]
File Data: 22 bytes
JavaScript Object Notation: application/json
{"key": "value"}
Let’s see how a simple http request gets served by Django.
HTTP clients sends the HTTP request in HTTP protocol. Request received by webserver such as Gunicorn, uWSGI, Nginx, runserver(django). Webserver and Django communicate with WSGI protocol.
Inside the main root project application, there is a wsgi.py module. Module has method called get_wsgi_application()
which internally calling WSGIHandler().
WSGIHandler():
def __init__(self, *args, **kwargs):
# initialization
def __call__(self, environ, start_response):
# Gets called whenever request comes.
WSGIHandler() has two methods init and call methods. init gets called on HTTP server startup and call gets called whenever any web HTTP request comes.
Wsgiref parses the raw request we show in the request and converts them into the parsed dictionary.
Dictionary contains all the request parameters.
{ 'HTTPACCEPT': '/_', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate, br', 'HTTP_CACHE_CONTROL': 'no-cache', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_HOST': 'localhost:8000', ... 'wsgi.errors': <_io.TextIOWrapper name='
call method calls get_response method which is passed as a function argument. Internally it matches route, execute middlewares, executes view. On each view and middleware calling Django has exception handling implemented.
How does Django ORM works?
ORM has components,
- Model
- Manager
- QuerySet
- Query
- SQLCompiler
- DatabaseWrapper
- Database Driver
Django Model
class College(models.Model):
name = models.CharField(max_length=100)
address = models.CharField(max_length=400)
class Student(models.Model):
Name = models.CharField(max_length=200)
enr_no = models.IntegerField()
college = models.ForeignKey(College)
-
If we import models and try to print attributes in the model then we get below result.
-
models.College.**dict**
mappingproxy({'__module__': 'debug.models',
'__doc__': 'College(id, name, address)',
'_meta': <Options for College>,
'DoesNotExist': debug.models.College.DoesNotExist,
'MultipleObjectsReturned': debug.models.College.MultipleObjectsReturned,
'name': <django.db.models.query_utils.DeferredAttribute at 0x7fdad9b6ed30>,
'address': <django.db.models.query_utils.DeferredAttribute at 0x7fdad9b6ed68>,
'id': <django.db.models.query_utils.DeferredAttribute at 0x7fdad9b6ee80>,
'objects': <django.db.models.manager.ManagerDescriptor at 0x7fdad9b6eef0>,
'student_set': <django.db.models.fields.related_descriptors.ReverseManyToOneDescriptor at 0x7fdad9b78588>})
- User models are inherited from the django.db.models.base.Model.
- Model has metaclass django.db.models.base.ModelBase prepares the model attributes.
- add_to_class is the method which adds the new attributes to method. It checks for contribute_to_class method.
- _meta is the instance of django.db.models.options.Options class defines utilities.
- Model also has _state which is the instance of ModelState.
class ModelState:
db = None
adding = True
fields_cache = ModelStateFieldsCacheDescriptor()
- ModelState class stores the information about on save method call on the model, should Django initiate update or insert.
- Whenever we initialize the model (student_instance = Student(name=”Hitul Mistry”, enr_no=”123456”, college=college_instance)) at that time, adding value will be true. Now, later, if we will call save on student_instance then django will do the insert query.
- Now, if we modify the value after getting the value from the database and later, modify it,
student_instance = models.Student.objects.filter().last()
student_instance.name = “Hitul Mistry”
student_instance.save()
- Here on above example, on assigning the name attribute to student_instance will mark adding in _state to false and later on save method call, Django will do the update query.
fields_cache
stores the foreign key objects on callingprefetch_related()
.from_db
method gets called ondjango.db.models.base.Model
before passing the results from the database results to instance. It initializes model with values received from database.- Implements eq, str, repr, hash, getstate, setstate to allow certain operations on model object.
- Django Models also has
Fields(models.IntegerField, models.CharField etc.)
which inherited fromdjango.db.models.Field
. - Each field has a method called contributeto_class method which gets called at the time of model class creation by meta class. Some of the methods which are specific to the field are added to the model. For example, get_next_by<field_name> is added by DateTimeField.
get_internal_type
will check into the database’s internal type to DB type mappings.data_types
mapping will be found in the DatabaseWrapper. It is generally used in migrations for building the query. In case it could not find the value from the mapping then db_type method will be called.
data_types = {
‘Autofield’: ‘serial’,
‘CharField’: ‘varchar(%(max_length)s)’,
…
}
from_db_value()
converts value to python type from database type. Example, timezone conversion to given timezone).get_db_prep_save()
called before saving into database.
Django Manager
debug_models.College.objects.create(
name="ABC College",
address="Abc college campus, rolland street road."
)
from_queryset
method dynamically builds the inheritance class.- Manager overrides all the public methods and queryset_only false from queryset at the runtime.
- Create method calls Model’s save method internally.
- Create method returns the model object.
- Can create multiple managers but _default_manager should be true.
Django Query
- Holds the values for compiler.
django.db.models.sql.Query
is inherited bydjango.db.models.sql.subqueries
, it has different query classes such as InsertQuery, AggregateQuery, UpdateQuery etc.- Each query has their own methods
insert_values
,add_update_fields
,add_related_update
,add_filter
etc. - Each Query has their separate Compiler attached as an attribute compiler.
(django.db.models.sql.compiler)
- SQLInsertCompiler
- SQLAggregateCompiler
- SQLUpdateCompiler
- SQLDeleteCompiler
- SQLCompiler
as_sql
method, which prepares the query and later executed byexecute_query
method.- DatabaseWrapper executes the query and returns the values.
Django Queryset
- Filter returns the queryset object.
- Querysets are container for objects.
- Querysets are lazy.
- They implement repr, iter, len, bool, getitem etc.
- Querysets has cache.
queryset.iterator(chunksize=100)
should be used whenever required. Prefetch_related is not supported.django.db.models.sql.Query
class has different methodsadd_filter
,add_q
,add_select_related
,add_annotation
,add_extra
,add_ordering
etc to hold the data.- SQLCompiler forms the select query for the database and DatabaseWrapper executes it.
How do Django does query in query chaining?
debug_models.College.objects.filter(
name="ABC College"
).filter(address__contains=”abc”)
- Filter returns the QuerySet object and holds the query arguments.
as_sql
method in SQLCompiler forms the SQL query based on the parameters passed.
debug_models.College.objects.filter(
name="ABC College"
).update(name=”BBC College”)
- It calls simply update query to database and returns number of records updated.
- It does not call save method and also no post and pre save signals will not be called.
django.db.models.sql.subqueries.UpdateQuery
has add_update_values,add_related_update
etc which stores the data in Query for the compiler.SQLUpdateCompile
r will form the query and DatabaseWrapper will execute the query on database.
debug_models.College.objects.filter(
name="ABC College"
).delete()
- Django does SQL Delete query into database and also sends the pre and post delete signals.
- Executes raw query like
DELETE FROM "debug_college" WHERE "debug_college"."id" IN (36, 35, 34, 33, 32, 31, 30);
django.db.models.deletion.Collector
class collects the objects to be deleted(collect()
), deleting the objects(delete()
) and sending the pre and post delete signals.SQLDeleteCompiler
forms the SQL query for the actual delete.- Related objects has on_delete which defines the behavior on related objects deletion. Supported options are
CASCADE, PROTECT, RESTRICT
. - Raw queries can be used for better performance.
Django Database Wrapper
django.db.backends.<database>.base.py
- It provides the methods for creating connections and cursors.
- It contains django model type to db type mappings.
- It also contains database level different mappings for operators(exact, iexact, regex etc), pattern mappings(contains, icontains etc.) etc.
operators = {
'exact': '= %s',
'iexact': '= UPPER(%s)',
'contains': 'LIKE %s',
…
}
pattern_ops = {
'contains': "LIKE '%%' || {} || '%%'",
'icontains': "LIKE '%%' || UPPER({}) ||
'%%'",
'startswith': "LIKE {} || '%%'"
…
}
data_types = {
'AutoField': 'serial',
'BigAutoField': 'bigserial',
'BinaryField': 'bytea',
...
}
DatabaseFeatures
django.db.backends.<database>.features.py
- It contains certains attributes which helps Django to form query or raise exception while using it. Example features are,
class DatabaseFeatures(BaseDatabaseFeatures):
allows_group_by_selected_pks = True
can_return_columns_from_insert = True
can_return_rows_from_bulk_insert = True
has_real_datatype = True
….
DatabaseOperations
django.db.backends.<database>.operations.py
- It contains the common operations queries which can be leveraged by django models, fields and compiler.
- Examples set_time_zone_sql, datetime_cast_date_sql, distinct_sql etc.