Django is a full stack framework with capability to scale upto millions of users. Whenever we reach to the scale we have to follow the best practices in Django to effective performance and development. We have listed 70+ best practices which should be followed.
While scalling django we spend our unexpected time while getting the response from third party tools and databases. If we can better manage them and also understand the internal working of Django we can scale Django.
1. Cache the views
- Django supports caching the views for time. If view is cached then view logic will not be called and response will be served from the cache.
- It can be unsed if response of the view has persistant response.
- Django has decorator
cache_page
and it can have the argument time in seconds. - Example
@cache_page(60 * 10)
def user_data(request):
...
- Above view will be cached for 10 minutes and later, view function
user_data
will be called and its output will be stored into cache.
2. Template fragment caching
- If we cannot cache the full view then Django can cache the individual fragments into template.
- Example
{% load cache %} {% cache 60 menu_items %}
<p>Menu Name: {{ menu_item.name }}</p>
{% endcache %}
- Above code example will cache the menu name html snippet for 1 minute. Therefore, that section will be cached and db queries can be avoided.
- Example
{% load cache %} {% cache 60 user_details user_instance %}
<p>First name: {{ user_instance.first_name }}</p>
<p>Last name: {{ user_instance.last_name }}</p>
{% endcache %}
- In above example, user's first name and last name details will be cached but, here multiple cache copies will be created based on the
user_instance
value. - Better performance can be seen once caching is applied at multiple templates and bigger chunks of templates.
3. Cache Django ORM queries
- While scalling Django applications, ORM hits lots of database queries.
- Sometimes, ORM queries are repeated and that can be cached.
- Use package django-cachalot or similar for caching the ORM queries.
4. Setup maximum timeout for single database connection
- By default Django closes the connection after each http request. Therefore it establishes the new db connection on each new request.
- Creating new connection is overhead to database and application.
- Setup
CONN_MAX_AGE
parameter to couple of minutes. Value ofCONN_MAX_AGE
parameter should be in sync with database's connection timeout parameter.
5. Use Django db connection options
- Django supports
OPTIONS
inDATABASES
configuration. OPTIONS
contains db specific configurations. It should be configured if required to gain better performance.- Django has great documentation for
OPTIONS
.
6. Deactivate unused apps and code
- Whenever django starts it loads all the
INSTALLED_APPS
,middlewares
,database
,system checks
,template engines
and much more at the startup. - If we have unused apps installed then at the time of the startups it delays the startup time.
- If we have unused template engine path then each time django may scan those directories. It may lead to unesessary Disk/SSD IOPS consumption.
- If there are some middlewares and apps to be used in the development only, then can be better be coded like,
INSTALLED_APPS = [
"app1",
"app2"
]
if DEBUG:
INSTALLED_APPS.append("app_debug1")
- In above code snippet we are using the app
app_debug1
only in case of debug is True.
7. Use bulk query
- If we are inserting more than 10 rows in one view or function then it is always better to use bulk queries.
- Example
import json
# Getting the user_details from the request.
# Lets assume user_details is a JSON with around 1000 records in it.
# Sample schema
# [{"name": "hitul"}, {"name": "rex"}]
user_details = request.POST.get("user_details")
user_details_dict = json.loads(user_details)
for detail in user_details:
models.UserDetails.objects.create(
name=detail.get("name")
)
- If
user_details_dict
has 1000 records then 1000 insert queries will be initiated.
user_details = request.POST.get("user_details")
user_details_dict = json.loads(user_details)
user_details_instances = []
for detail in user_details:
user_details_instances.append(
models.UserDetails(
name=detail.get("name")
)
)
models.UserDetails.objects.bulk_create(user_details_instances)
- Above code with
bulk_create
in django will just do single insert query into database and it will be 10x faster. - Performance of the database insert will be predictable and consistance with the help of
bulk_create
.
8. Use iterator
user_details = models.User.objects.filter()
for user_instance in user_details:
print(user_instance.first_name)
- If
models.User.objects.filter()
has 100 users then on each for loop cycle django will do one queryselect user.d, user.first_name, user.last_name from user where id = <id>
for 100 times with different<id>
value.
user_details = models.User.objects.filter().iterator()
for user_instance in user_details:
print(user_instance.first_name)
- In above example, Django will just single query to fetch all the records from db and later, on loop cycles no db queries will be initiated. Because, we have used iterator after
filter
.
user_details = models.User.objects.filter().iterator(chunk_size=10)
for user_instance in user_details:
print(user_instance.first_name)
- In above example Django will fetch the records in the batch of 10 i.e. after processing 10 records, Django will hit database to fetch more 10 records.
- If database is suppose to return couple of thousand rows then
chunk_size
should be used. Because if django tries to load all the record in one shot then after receiving the data it will have to map it with the django queryset which can lead to much more delay in processing. - In some case we might get into out of memory on server if huge chunk of data is being returned from the database.
9. Use select_related
user_details = models.User.objects.filter().iterator()
# Here, lets suppose profile is a foreign key to Profile table.
for user_instance in user_details:
print(user_instance.profile.profile_picture)
- In above example Django will make sql query to fetch the
profile_picture
from database because, it is a foreign key and to fetch the foreign key data Django does the query on runtime to get the values. - Suppose there are 1000 records are there then django will do 1000 queries to fetch foreign key values.
user_details = models.User.objects.filter().prefetch_related('profile__profile_picture')
# Here, lets suppose profile is a foreign key to Profile table.
for user_instance in user_details:
print(user_instance.profile.profile_picture)
- In above example django will not do the hit to fetch
profile_picture
because it has already fetched the results. However, it will make the query on each for loop cycles. Because, django'siterator
is not used. - Django's
iterator
andselect_related
cannot be used together. Read the article. - To solve the problem we can choose either
iterator
orselect_related
based on lesser number of queries being done.
10. Use prefetch_related
class Student(models.Model):
pass
class College(models.Model):
a = ForeignKey(Student)
for student_instance in Student.objects.all():
print(student_instance.college_set.all())
- In above example, django will hit the query to database to fetch
college_set
(reverse foreignkey lookup) query.
class Student(models.Model):
pass
class College(models.Model):
a = ForeignKey(Student)
for student in Student.objects.prefetch_related('college_set').all():
print(student)
- In above example django will not do any extra query on loop to fetch the reverse lookup query.
11. Use values and values list
models.User.objects.all()
- Above example will hit database to fetch all the columns of the table.
models.User.objects.all().values('first_name', 'last_name')
- Above query will hit the database to fetch just
first_name
andlast_name
columns. Only difference between this and normal queryset will be thatvalues
will returndict
. - If database table has multiple columns then
values
can fetch the required column data only. It can avoid the unnecessary overhead of fetching and mapping the extra columns with the queryset and as a result boosts the performance. values_list
also does the similar job asvalues
with different it will returntuple
.
12. Use only and defer
- Just like
values
andvalues_list
only
anddefer
are another ways to optimize the django performance by querying required columns.
models.User.objects.all().defer('first_name')
- Above django orm call will fetch all the colum data except,
first_name
. Whenever, we try to access the other column from the queryset, that column details will be fetched from the database on that specific instance.
models.User.objects.all().only('first_name')
only
is also inverse ofdefer
.only
will fetch thefirst_name
only and whenever we try to access other column, database query will be initiated to fetch the data.
13. Use update_fields in save
user_instance = models.User.objects.filter(id=10).last()
user_instance.first_name = "Hitul"
user_instance.save()
- Above example will query database to update query to update all the columns in the database.
UPDATE core_user SET core_user.first_name='hitul', core_user.last_name='mistry', password='sfu7Hdsfsdf76', username='hitul', email='hitul@insurnest.com' where id = 10;
- Above query will be initiated by Django on
save
method. Here we can notice that, even though we have not updated any column thanfirst_name
but still other columns also got updated. - Above query can lead to data curruption, as one thread is updating all the columns and at the same time another thread might be doing some operation on other field.
- Above query can also lead to bad performance.
user_instance = models.User.objects.filter(id=10).last()
user_instance.first_name = "Hitul"
user_instance.save(update_fields=["first_name"])
- In above example, we have given
update_fields
argument to update specific column.
UPDATE core_user SET core_user.first_name='hitul' where id = 10;
- Above query will be initiated if we are passing
update_fields
.update_fields
argument will make sure that only passed columns will be updated in the database.
14. Use Django Cache Framework for effective caching
- Django has built in cache framework which can cache key value pairs and supports Memcached, File and Database out of box.
- Cache does not do any kind of processing and just fetches the value from the location and returns it. If caches are used rightly it can boost the performance.
- Use Redis for caching as it is blazzingly faster. It stores all the data into
RAM
. - Redis support lots of data structures which can be helpful to store various kind of data. Make sure to check time complexity of each operation on data structure before using any data structure.
- Always try to choose the data structure whose operations we are going to perform are not proposonal to data size. As an example, fetching the value by the key has always
O(1)
complexity in Redis. It's performance will be consistant in case of any volume. - If achieving
O(1)
is not possible then we can go with the other data structures with nearby lesser complexities. In any case, we should have the fair idea of the worst case complexities.
15. DB Router
- If using multiple databases such as master and slave or shareded databases then use the proper data routing in django.
- Django has built-in support for database routing for queries.
- All the read queries can be routed to slave cluster. If database has not synchronized replication enabled then it might be possible that, we have writen to master database and later, same row is not available on slave database at the same time.
- Above delayed replication case should be taken care into consideration and routing should be done acordingly.
- If some database is not healthy and stuck then that database should be monitored and should not be used for the querying. Redis can be used to put the stats of the database and based on the status of the database routing can be dynamically written.
16. Use InMemory database for session management
- Django supports all the session engines which are supported in the Django Cache Framework.
- Sessions can be stored into Arospike or Redis or ]Memcached databases to fetch the results faster.
- Session validation takes place at each logged in user's requests. Therefore, it should be fastly available.
- In memory databases are optimized to return the data faster.
- In memory databases can loose the data in case of failure, however, it generally can be handled by setting up cluster of databases and contineous backups, therefore in the worst case scenario, we may lose couple of seconds data.
- Even though we loose the data then, session details are nothing of the major harm to business. In the worst case scenario couple of users will have to re-login.
17. Create indexes
- If some column is going to be queried repeatedly then, create the database indexes on that column.
- Django migration has built-in support for creating indexes in database.
- If multiple columns are going to be queries together then index_together can be used to create composite indexes.
- Different database supports different kind of indexes with various characteristics. Research on it and apply the best suitable index on the database tables.
- Tools such as pgbadger can be used to analyse the query pattern on the production.
- PostgreSQL has support for different kind of indexes which can outperform the performance in various conditions and workloads. Research about it for other databases and apply the best one for the application.
18. Use Streaming response
- If giving the option to sharing the CSV download option for the data then it might possible that we have couple of thousands or more than that number of rows.
- Typical way can be to download the data and create the csv file in temp directory and later serve that file with
HttpRespponse
. - In above aproach if csv file size is big then it might possible that we take couple of seconds just to generate the csv and also our disk can be filled with csv files.
- Once csv file is generated then serving might also take some time.
- Django has support for streaming csv.
- In streaming csv aproach we can generate the csv in chunks and those chunks can be sent as response.
- In this aproach we don't need to create any temp files nor wait for full csv to be generated. Once
19. Include gzip compression
- In every web request we get the http request and later we respond with that http request with raw text response. - If response is little bigger then response should be gziped to compress the response size, which can lead to lesser bandwidth and response time for the client.
- Django has built in support for gziping the response.
- Compress the responses greather than couple of
kb
.
20. Lazy variables
- Use
lazy
for the variables which can take time to evalute. lazy
function in the django will only evalute the variable value whenever they are being used.- Follow the great article to better understand the concept.
21. No Print statements
- In codebase don't put
print
statements. It does not give any sense of what is going on, from where it goes initiated, time etc in logs. - Use
loggers
of django which gives lots of insights of the log.
22. Serve static and media from the CDN
CDN(Content Distribution Network)
is the service which copies the files across different part of the globe. Later, whenever file is accesses, it is getting served from the nearby CDN server location from the client.- File serving needs good amount of bandwidth from servers and
CDN
network has petabyte scale bandwidth available with them. - There are multiple
CDN
services available. Examples are Amazon Cloudfront, Google CDN, Cloudflare CDN, Azure CDN etc.
23. Profile the code
- Code profiler is a tool which can showcase each and every function call along with time it took to execute. Mordern profilers also showcase insightful visulizations.
- Django profiler can give the good insight on each function call time and tree of exection of function.
- Profiling helps to find out what lib or snippet of the code is creating roadblock. Later, it can be fixed by the developer.
- Profiling should be scheduled at regular intervals and while doing the development django-silk can be used. It can profile the each and every request and can visulize it.
- Python also has built in profiling support.
- Python call graphs can be generated with tools tuna and pycallgraph.
24. Django silk
- Use Django silk in the development.
- Django silk stores response time, route, call profile, sql queries details for each request.
- It can be helpful to find out the issues at the development stage only.
- Django silk should enabled only for needed time only, because it stores the request data into database which over the time creates lots of data blot into database.
- On production environment we should not use the django silk.
25. Django-query-profiler
- django-query-profile can be used if want to check the sql query performance.
26. django-model-utils
- django-model-utils or similar libraries comes up with commonly used utilities. It can be used save time and become productive.
27. Use background workers
- If some request is going to take couple of seconds to return then better to use background works such as celery.
- Celery is a background worker and has support for multiple messaging brokers for the communication.
- djcelery is the well mantained package to use the celery with Django.
- celery communicaty also has celery-beat to run cronjobs or periodic background jobs.
- celery also supports various conditional parameters to retry, alerting, chain of task executions.
- celery can have multiple machines attached to single cluster. Therefore, based on the more load, more computing power can be added.
- celery also has rich dashboard support to visulize the working of workers.
- celery should be used with the messaging broker which provides better persistance and resillent.
28. Never drive the https traffic from Django. Use Nginx or Load balancer to do it.
- To serve the ssl traffic from Django we should put nginx ahead of it or use cloud load balancer.
- Django servers are not well optimized to decrypt and encrypt the ssl certificates. nginx and cloud load balaners has built in support for it and they are well optimized to support it.
29. Use Django Debug Toolbar while development
- django-debug-toolbar is a good library to see sql queries, configuration, cache hits, settings, cpu used, template used etc on each request.
- It also generates good grantt chart to visulize the sql query execution.
30. Use pre-commit hooks
- Whenever we are doing the local development, before doing the git commit we should always check the code quality and automated checks should be in place.
- pre-commit hooks checks the code quality at the time of doing the
git commit
. If pre-commit hooks finds any problem with the code then it does not let the developer commit the code and force the developer the fix the issues in the code. - Eventually
pre-commit hook
practices lead to habbits for the developers. - pre-commit is a great library to install and manage the pre-commit hooks.
- It also has built in pre-commit hooks for python development.
- Our sample pre-commit hook config to get started. It has all the auto formating and code quality checks in place.
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-ast
- id: check-json
- id: check-merge-conflict
- id: check-symlinks
- id: debug-statements
- id: mixed-line-ending
- id: requirements-txt-fixer
- id: check-added-large-files
- id: detect-private-key
- id: flake8
args: [--max-line-length=170]
- repo: https://github.com/asottile/pyupgrade
rev: v2.7.3
hooks:
- id: pyupgrade
- repo: https://github.com/psf/black
rev: 19.3b0
hooks:
- id: black
- Custom pre-commit hooks can also be developed as per the needs to automate the code reviews.
31. Django specific pre-commit hooks
- Django specific pre-commit hooks are also available.
32. Do squashmigrations at regular intervals.
- squashmigrations is the django's built in utility to merge the multiple migrations.
- It should be carried out at regular intervals to fix the bloat of the migrations.
33. Use Blue/Green deployments
- Blue/Green deployment is the aproach to achieve zero downtime deployments without any spike or 5x error on the client size.
- In the Blue/Green deployments, there are two identical environments Blue and Green.
- At a time one environment will serve the traffic.
- At the time of the deployment, new environment(if blue is serving traffic currently then green and vice versa) will be setup.
- Now, once new environment passes all the tests new environment is replaced with the older environment.
- Older environment will be active for couple of minutes and later it will be turned down or can be up for more time.
- If new environment has any issues than to rollback new environment is replaced by older environment.
- Blue/Green deployments are more resilient and safe.
- Kubernetes, Docker managed services such as AWS ECS etc has built in support for the blue green deployment.
- Alternatively, custom DevOps scripts can be written with the help of deployment tools such as Ansible, fabric, chef etc.
34. Use connection pooler if huge scale
- Django does not support connectin pooling. Therefore, there might be huge number of db connection requests in distributed systems.
- PostgerSQL has centralized database connection pooler such as pgbouncer, pgbouncer-rr (patched version from Amazon and with more functionalities).
- pgbouncer and pgbouncer-rr are more bandwidth and memory intensive application. Threfore, host them with amble amount of bandwidth and memory.
- Some databases also comes up their own connection pooling, it can be leveraged if centralized connection pooling support not available.
35. Use background workers instead of signals
- If signals logic has lots of db modifications then it is better to use background workers such as celery.
36. Use databases which support django ORM out of box
- If developing the scalable applications then use SQL databases which can support the Django ORM out of the box.
- Django has end to end support for SQL databases.
- Let's say if we introduce any new database such as MongoDB for the user management then we cannot use django's built in functions.
- It eventually lead to more work to support all the functions of django with new database.
- We can use other databases as supporting database where we are storing application specific data and not affecting django's main function.
37. Never use runserver
in production
- Django's default python manage.py runserver is single threaded and it is purely for development purpose only.
38. Maintain requirements.txt
- Regularly maintain the
requirements.txt
pre-commit
hooks can be used to updaterequirements.txt
before each commit.
39. Create local_settings.py
- Just like nodejs has
.env
file, django haslocal_settings.py
. local_settings.py
can be added into.gitignore
file to restrict git from detecting change in file.- Never commit
local_settings.py
. local_settings.py
should have all the settings variables which are envrionment(development, prod and qa) specific.
40. Use latest version of Django and spend 10-20% of time in month to improve or optimize the code or upgrade to django latest version.
- Django is always coming up with better features and improvements in each new version.
- On each new stable release spend time to migrate to latest django release because, more delay we do in migration we get behind and later, it become too tedius to migrate.
- To migrate to latest version read Django's version release note starting from the version we are using at the movement to one we want to upgrade. Read and check all the items such as, db, code etc need to be modified and prepare the plan with the changes.
- We never get deadicated time to perform such activities, therefore, perform 10-20% of time every week on such kind of activities.
41. Use managed databases
- Always use managed databases where we don't need to worry about DevOps.
- AWS, Azure, Google cloud, Digital Ocean provides managed databases. Use their services to host the database.
- Host own database only in the case of business, compliance or too much cost implecations in place, because managing the database by self also has time cost involved.
- DevOps of the database has good amount of learning curve.
- In case we have to manage the databases by self then use tools such as patroni or build similar ones for DevOps.
42. Use Grafana, Promesis etc for monitoring
- Once Django app is deployed, server can have multiple issues such as high CPU, running out of Memory, high disk IOPS, higher bandwidth consumption, high response times etc.
- Monitoring should be in the place to detect those kind of issues before the time it creates chaos.
- For the critical issues create mobile call based alerting and some issues which may lead to bigger issues in future, create alerting on telegram, slack, sms etc.
- Long term issues can be disk is 60% filled or, CPU is staying at 60% for longer time, memory leaks, CPU spikes for random time without real traffic etc, can be something cooking up in the infrastructure.
- Alerting can be helpful to find and validate such issues ahead of time.
- grafana, influxdb, telegraf, prometheus etc are open source solutions for the monitoring system.
- All the mordern cloud providers such as AWS, Google cloud, Azure etc has built in monitoring and alerting systems which does not require any kind of DevOps. Their services can be leveraged.
43. Use Virtual Environments
- Virtual Environments creates the seperate environment for seperate django projects.
- There are multiple open-source soutions such as pyenv, virtualenvwrapper, pipenv, virtualenv, venv etc available for maintaining the Virtual Environments.
- Use the one which suits the best for your needs. Some of the env does not support Windows. Such point should be taken into consideration before choosing one.
- It is recommended to use the single virtual environment library starting from the development to production deployments and across the team.
44. Use one type of views in whole project like, either class based or functional views.
- Django support functional and class based views.
- Project should always have single kind of views, like either class based views everywhere or functional views everywhere.
- Class based views are better at managing the code, it is recommended to use it.
45. Use PEP8
- Python has coding standard pep8 for writing the code in structured way. Such standards should be enforsed into code.
- While doing the development pylint, black, prettier etc tools should be used. This tools has plugins available for IDEs for auto formating and validation.
- All the developers who are working on the project should be using the same library for code formatting and standard.
- It can be enforced with the help of
pre-commit
hooks orci/cd health checks
.
46. Do developer load testing at intervals
- Scalable application should perform well under load and deliver consistant performance.
- Developer should do code benchmark at regular intervals to perform the load tests and later compare it with the intented performance measures.
- Apache Benchmark is a simple tool from Apache Software Foundation to benchmark the application with various load testing scenarios.
- Ideal benchmark standard should be documented at the project level and later on each load test, result should be compared with ideal beanchmark standard.
47. Use SonarQube or such static code analyser
- GUI tool such as sonarqube should be used which attaches the comment to code.
- It can be productive for the code reviewer as, best practices checks are already being done by the tool.
48. Make sure of Database related failures before migration
- Each database has their own concurrency control.
- It is documented for each database and it contains the database behaviour on various SQL queries on database. Some queries might lock the whole database or create the deadlock.
- Learn about the database concurrency control and later apply the migrations.
- While doing the blue/green deployments it might possible that older environment is doing some writing on the older table column which is nullable in past and new migration we are going to make the existing field not-null.
- Now, here we might get the exception in older migration as, older working codebase is still referring to the older code where, field is still nullable.
- Review all the migrations and think through all the cases which may go wrong at the time of the migration.
- Dry run the migrations on local and development environment servers for the failure scenarios before running it.
- If some migration is getting stuck for longer then debug and fix it before running it on production.
- Check django migration transactions details for database and access the behaviour on what will happen in case of server network lost between the server running the migration and db, server went out of memory, server freeze etc behaviours.
- Above scenarios can also happen in the production environment, therefore, be prepared for such chaos.
49. Never delete the column from the Django Model before know the behaviour of db
- It is always better to mark the column nullable and later after couple of days or months delete it.
- Some database might lock the whole table from writing into table till the time operation completes.
- Above situation can lead to chaos and unpredictable timelines for completion of migrations.
- Such kind of operations should be performed at the time when no traffic or limited traffic is on the app.
- It is because, if that field is being used by existing deployments then it might raise the errors in the deployments and we might get into the situation where, rollback not possible.
50. Use Ansible, Fabric, Chef etc tools for the deployments.
- Use Ansible, Fabric, Chef etc tools for the deployments.
- Above tools can be used to automate the 100% deployment process.
- Usage of the above tools are less error-prone compare to developer assisted deployments.
- Write and handle all the failure and rollback cases.
51. Use Sentry or New Relic
- Sentry is error reporting tool.
- When running the production environment it gives great insights of the errors.
- It can also alert the developer whenever some error occurs on production.
- It is really easy to install and get started.
- new relic is really easy to setup.
- It is a application performance monitoring tool.
- It has multiple products which can give lots of insights of production application performance.
- It can be helpful to debug the production issues.
52. Use Django Extensions
- Django extensions include multiple userful and commonly used utilities for Django.
- It includes management commands, database field, admin extensions and much more.
- It will make you productive.
53. Use Django Storages
- Whenever using the S3, Azure Storage, Dropbox, FTP, Google Cloud Storage, SFTP, Digital Ocean Storage etc, for file storage, django-storage has great support for the services.
- It can boost the development productivity.
54. We can use webserver caching or Varnish cache
- If Django view is returning fixed views such as blog articles then nginx caching, varnish cache, browser caching etc can be used.
- nginx and varnish cache are well optimized to serve the cached content. Moreover,
- Usage of above tools also avoids the overhead of calling the application server.
55. Use browser caching for REST APIs and fixed html content
- If route is returning the fixed content based on the url change the browser based cache can be used.
- Server returns ETag which is a version number of the content and on the ETag change browser hits again to backend.
- Such kind of caching can avoid the backend overhead.
- Caching time should be specified to make sure that cache invalidation takes place.
- Django has multiple utilities to enable the browser caches.
56. Use S3 or Storage to store the media files and later use CDNs to serve them.
- Use the AWS S3, Google cloud storage, Azure file storage etc services to store the media files.
- Above services copies the data across multiple data centers, almost unlimited bandwidth to serve the files and have high uptime.
- If from the beginning of the project such decision has been taken then whenever django app is moved to distributed architecture setup, can be done without almost no efforts on storage side.
57. Use security header scanners for scanning
- Regularly scan the website for the security and add the required HTTP headers for the security.
- https://securityheaders.com/ can be useful.
58. Go into django’s internal codebase and understand the logics.
- Django's internal code structure is really simple and easy.
- If some concepts are not getting cleared from the documentation, can get into internal codebase of the Django to understand.
- It can take some efforts to understand the code at start, however eventually it will become habbit.
- Maybe, after learning the django internals you can become good contributor to django community.
- Take help from Django community and stackoverflow to understand internals.
59. Use AsyncIO with care. If using it then make sure our packages are also supporting it.
- Django not yet support AsyncIO out of box. Therefore, it is better to read the Django documentation before using it everywhere.
- Django also suggest that, AsyncIO may decrease the performance of the app, therefore do the benchmark after migrating to AsyncIO and avoid using the AsyncIO for the view.
60. Tune Gunicorn and uWSGI
- If using the Gunicorn and uWSGI read the documentation.
- Worker processes are number of process spawns to handle the requests. Ideally number of workers should not exceed the number of CPU core on the server.
auto reload
on code change should be disabled on the production.- Logs should be stored into files.
- Gunicorn has support to sync the statistics with statsd. It can be used to sync the matrics and later, those stats can be monitored and alerted.
- Custom process name should be set to identify the process from the process manager.
- Timeout should be defined and it should be in sync with the load balancer or api gateway timeouts.
- If nginx is ahead of Gunicorn and uWSGI then file based socket should be used for the communication. File based sockets are 5x more faster than http based sockets.
- Gunicorn and uWSGI supports multiple worker classes such as
eventlet
,gevent
,gthread
etc. Each one has their own pros and cons. - To get the best performance, one with the best suitable for the application should be used.
- Under each worker there can be number of thread.
- It should be tuned to best positive number.
- It can be tuned starting from number of cores under each worker thread to different combination should be tried to get the best performance.
- Number of threads can be increased or decreased based on type of operations application is doing, like file operations, network operations, db operations etc.
- While doing the file io python threads are suspended, however for different workerclass behaviour can be different.
61. Setup open file limits
- In linux based servers there are limit on number of files which can be opened per process.
- On each http connection with server atleast 2 files opened. It can be increased based on the configuration and setup of the server.
- Bydefault linux systems have the limit of 1024 files per process at a time.
- It should be increased to the number which best suites the workload.
62. Tune Nginx
- Nginx has multiple parameters such as workers, queue, timeout, ssl, cache, keep_alive connections, proxy timeout, proxy cache etc should be optimized to get the best performance.
63. Define exact matrics to be monitored
- Once Django application is deployed, exact matrics such as CPU, DISK IOPS, RAM USAGE, Bandwidth, Request per second, failure responses per second, average response time etc should be defined and whenever threashold passes, developer or devops team should be alerted.
64. Use distributed logging
- Whenever Django application is deployed in distribueted fashion, logs should be stored at centralized place.
- There are tools logstash, graylog, cloudwatch etc to store the logs at centralized place which can be queries.
65. Understand Django request serving architecture.
- Django request can be served with the wsgi protocol.
- Understand the internals of it and working around it to gauge the issues ahead of time.
66. If spinning a new process from the django main threads, use celery.
- Avoid spinning new processes or threads with python's
multithreading
ormultiprocessing
inside Django views or utils. - Instead use Celery background worker for it.
67. Use better tailored loggers for your need on production
- Django has settings variable
LOGGER
. Tailor is as per the production needs. - Python has lots of logging parameters which can be included for the logging.
- Sample Logger config
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'filters': {
'require_debug_false': {
'()': 'django.utils.log.RequireDebugFalse'
},
'require_debug_true': {
'()': 'django.utils.log.RequireDebugTrue'
}
},
'formatters': {
'main_formatter': {
'format': '%(levelname)s:%(name)s: %(message)s '
'(%(asctime)s; %(filename)s:%(lineno)d)',
'datefmt': "%Y-%m-%d %H:%M:%S",
},
},
'handlers': {
'mail_admins': {
'level': 'ERROR',
'filters': ['require_debug_false'],
'class': 'django.utils.log.AdminEmailHandler'
},
'console': {
'level': 'DEBUG',
'filters': ['require_debug_true'],
'class': 'logging.StreamHandler',
'formatter': 'main_formatter',
}
},
'loggers': {
'django.request': {
'handlers': ['console'],
'level': 'ERROR',
'propagate': True,
},
'django': {
'handlers': ['console', ],
},
'py.warnings': {
'handlers': ['console', ],
},
'': {
'handlers': ['console'],
'level': "DEBUG",
},
}
}
68. Assign name to each url and re-use them everywhere
from django.urls import reverse
# urls.py
url(r'abc/', views.UserView, name="user-details"),
# views.py
def view():
route_url = reverse("user-details")
return HttpResponse(route_url)
# template
<a href="{% url 'user-details' %}">User details</a>
- As described above, url has name.
- Later, it can be reused in template and views.
- If in future, route is modified in the
urls.py
then no need to modify it into code.
69. Use .exists instead of calling count if want to check
users_count = models.User.objects.filter(age__get=22).count()
if users_count:
print("User exists")
- In above example code, django will do the count query to database.
users_exists = models.User.objects.filter(age__get=22).exists()
if users_exists:
print("User exists")
- In above example, django will not do any query into database to check if user exists or not.
70. Avoid using lambda for complex logics
- Whenever writing the lambda functions and if some error comes then in the stacktrace we only get the line number of the lambda.
- We never get the local variable or other details to debug the issue on the production or development.
- It is alway better to use lambda for simple logic writing only.
- Simple logic example can be simple if else.
- If using it with the loop then if error comes then on which exact element in the array error came will be tedius to identify.
71. Read Django Development Philosophies
- Django framework has philosophies or guiding principles for the development.
- It must be read to understand the logic and reason behind the design and architecture of Django.
- It can make developer to better developer and decision making can be precise.
72. Never run django in Debug = True mode on production.
- If debug is mode is
True
in Django then Django does debugging logs and runs utilities to capture debug insights. - Debug mode degrades the performance of the application.
73. Use Docker for the development and deployments.
- Ideally develoment and production machine should be identical.
- Docker creates the identical environment between development and production.