Boosting SEO with Django Ninja, Pydantic, and JSON-LD

To improve your website's search ranking, search engines rely on structured data. JSON-LD, which stands for JSON for Linking Data, is a special format that allows you to embed machine-readable information into your web pages.

How JSON-LD is used for SEO

JSON-LD is used for search engine optimization (SEO) because it provides search engines with explicit information about the content on your web pages. Instead of relying on algorithms to infer meaning from raw text and HTML, you can use JSON-LD to provide search engines with clear information about your content's meaning, creator, offerings, and relationships with other online resources.

Schema.org provides the basis for this structured information by serving as a universal dictionary of types and their associated properties. By using Schema.org types, you ensure that search engines like Google can understand the information you provide.

Why Structured Data is Important for SEO

Structured data enables you to use rich snippets such as star ratings, event times, and recipe instructions, in search results. These elements capture users' attention and visually invite the user to click on your site, improving your click rate.
It also informs Google and other search engines about the meaning of your page content. The search engine is better able to parse whether your link is to a product with price and reviews, or a blog post by an author.
Structured data can unlock special search features and improve your odds of appearing in knowledge panels, carousels, FAQ boxes, and voice search results.
All these features provide a better user experience: when people can preview key information directly in search results, they are more likely to click.
Your site stays ahead of the curve as search engines get more semantic and context-aware. Structured data helps future-proof your site by sligning more closely to a format that is easier for search engines to parse.

Adding JSON-LD metadata to your site

JSON-LD metadata is added to the <head> section of your HTML document through a script tag like this:

<script type="application/ld+json">
    <!-- Your structured data goes here -->
</script>

Having the JSON-LD data in the <head> section makes your HTML cleaner and easier to read for both humans and crawlers.

JSON-LD directly in the Django template

At Revsys, our first attempt at adding JSON-LD to our sites relied on embedding the data in the Django template. For the most part, this has worked fine and we've had good results from an SEO perspective. But in terms of maintainability, it has not been the most efficient approach. We have now started transitioning to generating the structured data using Django Ninja and Pydantic. As a result, we now have cleaner templates and better maintainability.

The code below illustrates how we used to embed our JSON-LD for our blog post page, with the data directly in the template:

{% block extra_head %}
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://www.revsys.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Blog",
      "item": "https://www.revsys.com/blog/"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "{{ self.title|escapejs }}",
      "item": "https://www.revsys.com{{ request.path }}"
    }
  ]
}
</script>
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "{{ self.title|escapejs }}",
  "description": "{{ self.get_description|escapejs }}",
  "datePublished": "{{ self.first_published_at|date:'c' }}",
  "dateModified": "{{ self.latest_revision_created_at|date:'c' }}",

  "author": {
    "@type": "Person",
    "name": "{{ self.get_author_name|escapejs }}"
    {% if self.author.specific.url %},"url": "{{ self.author.specific.url }}"{% elif self.author.specific.slug %},"url": "https://www.revsys.com{% routablepageurl self.get_parent.specific 'posts_by_author' self.author.specific.slug %}"{% endif %}

  },
  "publisher": {
    "@type": "Organization",
    "name": "REVSYS",
    "logo": {
      "@type": "ImageObject",
      "url": "https://www.revsys.com{% static 'images/revsys_logo_white.png' %}"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.revsys.com{{ request.path }}"
  }
}
</script>
{% endblock %}

This works, but we don't love this approach because:

Mixing JSON-LD logic with presentation logic makes templates cluttered, which are harder to read and maintain
There's no way to validate that your JSON-LD is properly formatted or follows schema.org standards
It's easy to make syntax errors in the JSON that break the structured data
Testing the JSON-LD is harder because the output requires rendering the entire template
The same schema logic gets duplicated across different templates

Generating JSON-LD with Pydantic and Django Ninja

We refactored our schema generation using Django Ninja and Pydantic. Now, instead of embedding logic in templates, we generate structured data server-side and pass it to templates as context variables.

This has given us several benefits:

Our code is more modular because we keep all schema generation logic in one file per app, making the codebase more organized and easier to navigate.
The Pydantic models we create are reusable, which is handy since many JSON-LD types use the same subtypes.
Utilizing Pydantic's type-checking and validation capabilities ensures that our structured data is valid and adheres to Schema.org standards, reducing the chance that we accidentally share invalid data with search engines.
Our SEO is future-proofed: with our centralized approach, expanding our schema to new content types is simpler and more manageable.

By moving schema generation out of templates and into Django Ninja and Pydantic, we have created a system that is both maintainable and developer-friendly.

We created a file schema.py that holds our Pydantic models that represent Schema.org types we need to use for the data we are turning into JSON-LD:

# schema.py
from typing import List, Optional
from pydantic import BaseModel, Field
from ninja import ModelSchema
from pydantic.config import ConfigDict

class PersonSchema(BaseModel):
    type: str = Field(default="Person", alias="@type")
    name: str
    url: Optional[str] = None


class OrganizationSchema(BaseModel):
    type: str = Field(default="Organization", alias="@type")
    name: str
    logo: "ImageObjectSchema"


class ImageObjectSchema(BaseModel):
    type: str = Field(default="ImageObject", alias="@type")
    url: str


class WebPageSchema(BaseModel):
    type: str = Field(default="WebPage", alias="@type")
    id: str = Field(alias="@id")

    model_config = ConfigDict(populate_by_name=True)


class ListItemSchema(BaseModel):
    type: str = Field(default="ListItem", alias="@type")
    position: int
    name: str
    item: str


class BreadcrumbListSchema(BaseModel):
    context: str = Field(default="https://schema.org", alias="@context")
    type: str = Field(default="BreadcrumbList", alias="@type")
    itemListElement: List[ListItemSchema]


class BlogPostingSchema(BaseModel):
    context: Optional[str] = Field(default="https://schema.org", alias="@context")
    type: str = Field(default="BlogPosting", alias="@type")
    headline: str
    description: Optional[str] = None
    datePublished: str
    dateModified: Optional[str] = None
    author: PersonSchema
    publisher: Optional[OrganizationSchema] = None
    mainEntityOfPage: Optional[WebPageSchema] = None
    url: Optional[str] = None
    blogPost: Optional[BaseModel] = None

Using Django Ninja's ModelSchema for automatic schema generation

One of the features of Django Ninja is ModelSchema, which automatically generates Pydantic schemas from your Django models. This is useful when you want to include model data in your JSON-LD without manually defining every field.

In our blog post implementation, we can use ModelSchema to automatically include blog page data alongside our structured schema:

# schema.py
from ninja import ModelSchema
import blog.models

class BlogPageSchema(ModelSchema):
    class Config:
        model = blog.models.BlogPage
        model_fields = [
            "title",
            "subtitle",
            "first_published_at",
            "category",
            "main_url",
            "main_url_text",
            "featured",
            "slug",
        ]

Then we integrate this ModelSchema into our blog posting schema generation:

def get_post_schema(post) -> str:
    author = PersonSchema(
        name=post.get_author_name() or "REVSYS"
    )

    # ... author URL logic ...

    schema = BlogPostingSchema(
        headline=post.title,
        description=post.get_description(),
        datePublished=post.first_published_at.isoformat(),
        author=author,
        publisher=publisher,
        mainEntityOfPage=main_entity,
        # Include the model data as additional structured information
        blogPost=BlogPageSchema.from_orm(post)
    )

    return schema.model_dump_json(by_alias=True, indent=2)

This approach gives you flexibility of JSON-LD schemas, and the convenience of automatically generated model schemas.

We then create helper functions to generate JSON-LD from our Pydantic models and update the schema.py:

# schema.py

def get_breadcrumb_schema(name: str, path: str, post_title: str = None) -> str:
    """Generate JSON-LD breadcrumb schema for navigation structure."""
    items = [
        ListItemSchema(
            position=1,
            name="Home",
            item="https://www.revsys.com/"
        ),
        ListItemSchema(
            position=2,
            name=name,
            item=f"https://www.revsys.com{path if not post_title else '/blog/'}"
        )
    ]

    if post_title:
        items.append(ListItemSchema(
            position=3,
            name=post_title,
            item=f"https://www.revsys.com{path}"
        ))

    schema = BreadcrumbListSchema(itemListElement=items)
    return schema.model_dump_json(by_alias=True, indent=2)


def get_post_schema(post: Any) -> str:
    """Generate JSON-LD schema for a blog post using Schema.org BlogPosting type."""
    author = PersonSchema(
        name=post.get_author_name() or "REVSYS"
    )

    if post.author and hasattr(post.author, 'specific'):
        author_specific = post.author.specific
        if hasattr(author_specific, 'url') and author_specific.url:
            author.url = author_specific.url
        elif hasattr(author_specific, 'slug') and author_specific.slug:
            parent_page = post.get_parent()
            if parent_page:
                author.url = f"https://www.revsys.com{parent_page.url}author/{author_specific.slug}/"

    publisher = OrganizationSchema(
        name="REVSYS",
        logo=ImageObjectSchema(
            url="https://www.revsys.com/static/images/2017/revsys_logo_white.png"
        )
    )

    main_entity = WebPageSchema(
        id=f"https://www.revsys.com{post.url}"
    )

    schema = BlogPostingSchema(
        headline=post.title,
        description=post.get_description(),
        datePublished=post.first_published_at.isoformat(),
        author=author,
        publisher=publisher,
        mainEntityOfPage=main_entity
    )

    if hasattr(post, 'latest_revision_created_at') and post.latest_revision_created_at:
        schema.dateModified = post.latest_revision_created_at.isoformat()

    return schema.model_dump_json(by_alias=True, indent=2)

The model_dump_json() method is a Pydantic feature that converts your schema objects into JSON strings. The arguments: by_alias=True ensures that field aliases (like @context, @type, @id) are used instead of the Python field names and indent=2: formats the JSON with proper indentation, making it readable as well as easier to debug.

Here's what the output looks like for a blog post:

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Building Better Django Apps with Pydantic",
  "description": "Learn how to integrate Pydantic with Django for better validation and cleaner code.",
  "datePublished": "2024-01-15T10:30:00",
  "url": "https://www.revsys.com/blog/building-better-django-apps-pydantic/",
  "author": {
    "@type": "Person", 
    "name": "Jane Developer",
    "url": "https://www.revsys.com/blog/author/jane-developer/"
  },
  "publisher": {
    "@type": "Organization",
    "name": "REVSYS",
    "logo": {
      "@type": "ImageObject",
      "url": "https://www.revsys.com/static/images/2017/revsys_logo_white.png"
    }
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.revsys.com/blog/building-better-django-apps-pydantic/"
  }
}

Without by_alias=True, you would get Python field names like type instead of @type, which would break the JSON-LD standard.

The next step is to update our models to include schema generation in their context. At Revsys, we use Wagtail for our blog, so this example shows overriding the Page model's get_context method to add the JSON-LD schema elements we need for each blog post. If you are using regular Django models, you might create a get_schemas() method on your model, which you could then call from your Django view to pass the JSON-LD schemas into your context.

from blog.schema import get_breadcrumb_schema, get_post_schema
from wagtail.models import Page

class BlogPage(Page):
    def get_context(self, request):
        """Add JSON-LD schema data to the page context."""
        context = super().get_context(request)
        context["breadcrumb_schema"] = get_breadcrumb_schema("Blog", request.path, post_title=self.title)
        context["post_schema"] = get_post_schema(self)
        return context

In our template, we removed the raw JSON-LD code and replaced it with the context variables. Our updated template is now much cleaner.

{% block extra_head %}
{% if breadcrumb_schema %}
  <script type="application/ld+json">
    {{ breadcrumb_schema|safe }}
  </script>
{% endif %}

{% if post_schema %}
  <script type="application/ld+json">
    {{ post_schema|safe }}
  </script>
{% endif %}
{% endblock %}

Reuse one Pydantic model for multiple schemas

We can reuse many of the same components (like PersonSchema and OrganizationSchema) in structured data for other pages, pages for our conference talks and presentations. This helps Google show them in event carousels and highlights. These are a great candidate for structured data, using the Event schema type. We can create a new EventSchema Pydantic model that makes use of our existing schemas, since we are following the types defined by Schema.org.

# schema.py
class PlaceSchema(BaseModel):
    type: str = Field(default="Place", alias="@type")
    name: str
    address: Optional[str] = None

class EventSchema(BaseModel):
    context: str = Field(default="https://schema.org", alias="@context")
    type: str = Field(default="EducationEvent", alias="@type")
    name: str
    startDate: str
    location: Optional[PlaceSchema] = None
    performer: PersonSchema # from our first example
    organizer: OrganizationSchema # from our first example
    url: Optional[str] = None

Then, we add a new helper function for our Talk model:

# schema.py
def get_talk_schema(talk: Any) -> str:
    """Generate JSON-LD schema for a conference talk using Schema.org Event type."""
    speaker = PersonSchema(
        name=talk.speaker_name,
        url=getattr(talk, "speaker_url", None)
    )

    organizer = OrganizationSchema(
        name="REVSYS",
        logo=ImageObjectSchema(
            url="https://www.revsys.com/static/images/2017/revsys_logo_white.png"
        )
    )

    location = None
    if hasattr(talk, "venue_name"):
        location = PlaceSchema(
            name=talk.venue_name,
            address=getattr(talk, "venue_address", None)
        )

    schema = EventSchema(
        name=talk.title,
        startDate=talk.date.isoformat(),
        location=location,
        performer=speaker,
        organizer=organizer,
        url=f"https://www.revsys.com{talk.url}"
    )

    return schema.model_dump_json(by_alias=True, indent=2)

And update our Wagtail TalkPage model so we can add this new schema to the page context:

# models.py
from app.schema import get_talk_schema

class TalkPage(Page):
    def get_context(self, request):
        """Add event schema data to the page context for conference talks."""
        context = super().get_context(request)
        context["event_schema"] = get_talk_schema(self)
        return context

Make sure that the change also reflects on the template page for the talks:

{% block extra_head %}
  {% if event_schema %}
    <script type="application/ld+json">
      {{ event_schema|safe }}
    </script>
  {% endif %}
{% endblock %}