A walk in GraphQL

Day 1: Queries and Resolvers

Query

Before jumping to the code let’s break down the “Query” concept into meaningful details.

What does query mean in GraphQL?

Generally speaking, a “query” is not “a thing” but a process that involves several building blocks in order to complete the operation:

Definition of the documents

1. The GraphQL document

One or more GraphQL Documents containing executable or representative definitions of a GraphQL type system must be provided.

2. The representative definition (schema)

On the representative definition there must be the “Root Operation definition” related to the operation (will see this later) we want to perform, and the definition of the data type (whenever not included on the built-in Scalar Types) the operation is meant to return; in this case an Object Type.

## an Object Type definition
type Person {
  id: ID ## a Scalar type definition for the `id` field
  name: String! ## a Scalar type definition for the `name` field
}

## A Root Operation definition
type Query {
  ## a Field describing a valid operation
  getPerson: Person!
}

3. The executable definition (request)

On the executable definition there must be a valid Operation Definition specifying the OperationType (query, mutation, subscription) and a SelectionSet describing the Fields describing data graph we want to receive.

query { ## OperationType
  getPerson { ## parent SelectionSet Field
    name, ## child SelectionSet Field
    id ## parent SelectionSet Field
  }
}

A detailed description of the query operation definition is described at The Anatomy of a GraphQL Query - by Sashko Stubailo

Execution of a query operation

Now what? It gets way more interesting!

Once you send the request to the server with the query operation definition (usually using the POST verb as GraphQL doesn’t quite follow the HTTP protocol), your query will go through 3 phases during execution:

1. Parsing the incoming request

Since the incoming request is just a string and GraphQL can’t understand it as is, it has to parse it into an AST (Abstract Syntax Tree) in order to perform any necessary validation against the document before moving forward. (Read this interesting article Understanding the GraphQL AST - by Adam Hannigan)

Here’s an example of our query operation definition as an AST (a part of it as it’s long):

query operation definition AST sample

Try it yourself here: AST Explorer

2. Validation

Now is time to validate the produced AST:

GraphQL does not just verify if a request is syntactically correct, but also ensures that it is unambiguous and mistake‐free in the context of a given GraphQL schema.

An invalid request is still technically executable, and will always produce a stable result as defined by the algorithms in the Execution section, however that result may be ambiguous, surprising, or unexpected relative to a request containing validation errors, so execution should only occur for valid requests.

Typically validation is performed in the context of a request immediately before execution, however a GraphQL service may execute a request without explicitly validating it if that exact same request is known to have been validated before.

Source: GraphQL spec (June 2018) - Validation

The brilliant thing of the validation phase is that, as a developer, you have to do nothing about it!! The runtime will do that for you and in case of error it’ll provide you a verbose error message so you can fix it.

What?!!! 🤔

– will it check if a field is defined on the Query Type?
– yes
– will it check if the field accepts a given argument?
– yes
– will it check if the type of the argument is defined on the Query Type?
– yes
– will it recursively do those verification down to the last leaf?
– yes
– will it …
– enough ❤️

3. Execution

Once validation is passed, the runtime will transverse the AST invoking the resolver for each node of the graph and produce a result (typically a JSON document reflecting the query operation hierarchical structure )

Let’s see how that might look like:

Query graph

A Query-driven schema approach

Of course you can design your schema mirroring your data storage structure, nothing stops you from doing that … but … what’s the advantage on that?

The real power of your schema design relies on abstraction! You can create your Object Types based on the operations you need to execute, AND you can populate those object from any number of sources! You may have one field coming from a specific table of a MySQL database and another field coming from MongoDB and another coming from an XML-RPC or a third-party API!

Since the query hierarchical structure is self-descriptive:

  1. Think about which data you’re gonna need and how the relationships should look like.
  2. Write a query
  3. Define your Type system
  4. Define your resolvers

See this Example of a query-driven schema.

That’s a game changer, now we have:

  1. An agnostic data storage layer that doesn’t need to know or consider the client needs
  2. An abstraction layer able to easily define a data shape contract independently from how and where the data is stored, and adaptable to the client’s needs
  3. A declarative client layer able to lead the way the data and operation’s shapes should look like

Note the scale now is tilted to the client’s need and not the other way around.

The client’s BIG BENEFITS

Breaking Changes

The schema “contract” is one of the most important, correction, we’d dare to say IS THE MOST IMPORTANT part of your GraphQL API, and keeping it consistent over time is critical.

You can change the way you solve a problem, or break down a solution into many pieces, or optimize things, granted you won’t introduce a breaking change on your schema.

If you do so, you’re gonna be in big trouble (people’s gonna hurt you) since everyone is expecting additive changes without mayor versioning changes of the API and ideally an ad-vitam backwards compatible API.

Here’s a list of things on the DO-NOT-EVER-EVER-BY-ANY-MEANS-DO list:

Resolvers

We talked briefly about resolvers on our Introduction chapter, so let start breaking it down.

We said “resolvers are functions containing arbitrary body code responsible for returning the related Value for a given Field in the Executable Definition of the Schema …. really (ಠ_ಠ)? Thanks for nothing (◔_◔).

Ok, that was kinda hermetic. Let’s break it down starting with our Type Definition in our representative definition:

## an Object Type definition
type Person {
  id: ID
  name: String!
  surname: String!
  fullName: String!
  age: Int
  friend: Person
}

Now we need our “entry point”, the Root Operation definition with a field describing our API also in our representative definition:

## A Root Operation definition
type Query {
  ## a Field describing a valid operation
  getPerson: Person!
}

And now, on the executable definition we make a request by sending a GraphQL query operation.

query { ## OperationType
  getPerson { ## parent SelectionSet Field
    name ## child SelectionSet Field
    age ## child SelectionSet Field, sibling to name
  }
}

We won’t go again through the “execution phase” we already saw on the Query section. Instead we’ll go deeper on the “how” it’s executed.

Resolver function signature

Let’s make a stop on the “function containing arbitrary body code” thing.
Why not just “containing arbitrary code”? What’s that “body” word telling us?

Depending on the language it might be explicit, or implicit, but every resolver function in GraphQL will accept 4 positional arguments (that’s not on the body and it’s not arbitrary)


/**
 * JavaScript example
 *
 * @param {Object} obj The result returned by the parent field resolver (also found as root, parent, ...)
 * @param {Object} args The object containing the arguments passed into the field in the query.
 * @param {Object} context An arbitrary object shared across the resolver chain in a particular query.
 * @param {Object} info An object containing the execution state of the query. The generated AST!
 *
 * @returns {Null|Undefined|Array|Promise|Object|Scalar}
 */
function myResolver (obj, args, context, info){
  // here your arbitrary code
}

In case you didn’t specify a resolver for a type, GraphQL will fallback to a Default Resolver which will:

  1. Returns a property from obj with the same field name, or
  2. Calls a function on obj with the same field name and passes the query arguments along to that function.

For more detailed descriptions:

Resolver Chain

Now another concept arises: Resolver Chain. To understand that let’s go back to the “It’s Graphs All the Way Down” and traverse our executable definition invoking resolvers (it’s an intentional shallow example, for a deeply nested one take a look here)

The hierarchical structure of the query will be replicated and the sibling resolvers will be invoked in parallel.

Query.getPerson()+---> Person.name()
                  \
                   +-> Person.age()

Here’s how our code might look like:

const resolverMap = {
  Query: {
    getPerson(obj, args, context, info) {
      return {
        id: '1',
        name: 'Darth',
        surname: 'Vader',
      };
    },
  }
};

Output:

{
  "data": {
    "getPerson": {
      "name": "Darth",
      "age": null
    }
  }
}

At this point you might have noticed some weird things.

  1. getPerson is returning a partial shape
  2. There’s no resolver for Person or for its fields.
  3. Some information (age) I asked for is not present on the resolver’s returned data, but is present on the output
  4. All above just happened without errors

What happened here?

Default Resolver

Most of GraphQl server implementations will provide a Default Resolver to fallback whenever an explicit resolver hasn’t been provided.

Here’s a sample code from graphql-js repository

/**
 * If a resolve function is not given, then a default resolve behavior is used
 * which takes the property of the source object of the same name as the field
 * and returns it as the result, or if it's a function, returns the result
 * of calling that function while passing along args and context value.
 */
export const defaultFieldResolver: GraphQLFieldResolver<
  mixed,
  mixed,
> = function(source: any, args, contextValue, info) {
  // ensure source is a value for which property access is acceptable.
  if (isObjectLike(source) || typeof source === 'function') {
    const property = source[info.fieldName];
    if (typeof property === 'function') {
      return source[info.fieldName](args, contextValue, info);
    }
    return property;
  }
};

and here a human readable explanation from Apollo team’s Default resolvers documentation

If you were really attentive there’s another interesting thing happening there:

age is defined as a nullable property in our schema, so defaulting to null when the property is missing makes total sense, on the other hand, fullName is defined as non-nullable! why it didn’t throw an error?

Because a resolver will ONLY be invoked when the information is required for the output, since it’s not directly or indirectly (relationship) involved on the requested query, it won’t be executed, therefore it won’t fail.

Let’s make it fail!

Query

query {
  getPerson {
    fullName
    age
  }
}

Response

{
  "errors": [
    {
      "message": "Cannot return null for non-nullable field Person.fullName.",
      "locations": [
        {
          "line": /* the line number */,
          "column": /*the column number */
        }
      ],
      "path": [
        "getPerson",
        "fullName"
      ],
      "extensions": {
        "code": "INTERNAL_SERVER_ERROR",
        "exception": {
          "stacktrace": [/* bunch of staff here */]
        }
      }
    }
  ],
  "data": null
}

💥 BOOM! How can we solve that?

Field-level resolver

We saw what a top-level resolver is, now let’s add a field-level resolver to our resolver map.

const resolverMap = {
    Query: {
      getPerson(obj, args, context, info) {
        return {
          id: '1',
          name: 'Darth',
          surname: 'Vader',
        };
      },
    },
    Person: {
      fullName(obj, args, context, info) {
        return `${obj.name} ${obj.surname}`;
      }
    }
  };
{
  "data": {
    "getPerson": {
      "fullName": "Darth Vader",
      "age": null
    }
  }
}

WOW! The top-level resolver getPerson passed along the Person object containing name and surname properties down to the next chain link and therefore is available on the first argument of the fullName field-resolver which returns a scalar value; and age field is still falling back to the default resolver. ヽ( •_)ᕗ

Asynchronous resolvers

In a real world application you’ll deal with one or more different sources for your data which will require asynchronous actions to return the data. One of the possible values a resolver can return is a Promise.

Here a trivial example in JavaScript

const resolverMap = {
    Query: {
      getPerson(obj, args, context, info) {
        return new Promise((resolve, reject) => {
          setTimeout(() => {
            resolve({
              id: '1',
              name: 'Darth',
              surname: 'Vader',
            });
          }, 5000)
        });
      },
    },
    Person: {
      fullName(obj, args, context, info) {
        return `${obj.name} ${obj.surname}`;
      }
    }
  };

The top-level resolver will return a Promise, after 5 seconds it’ll resolve to the Person value as the first argument for the fullName field-level to be executed.

Multiple Queries

Let’s put it this way: Is just like the “Fight Club” rules

  1. What’s the first rule of GraphQL? “It’s Graphs All the Way Down”
  2. What’s the second rule of GraphQL? “It’s Graphs All the Way Down”

Imagine the following query

query {
  getPerson {
    name
    age
  }
  persons {
    surname
  }
}

If we already said all siblings are executed in parallel like name and age of getPerson, what does it make the graph relationship different from getPerson and persons of query?

Nothing.

Nested queries and the n + 1 problem

We won’t dive deep into this topic but it is worthwhile to mention. Let’s put all together here

Schema

type Person {
  id: ID
  name: String!
  surname: String!
  fullName: String!
  age: Int
  friends: [Person]!
}

type Query {
  persons: [Person]!
}

Query

query {
  persons {
    name
    friends {
      name
    }
  }
}

If we have Person A, B, C, D and all persons (but A) are friends of A we’ll have the following response:

{
  "data": {
    "persons": [
      {
        "name": "A",
        "friends": []
      },
      {
        "name": "B",
        "friends": [{"name": "A"}]
      },
      {
        "name": "C",
        "friends": [{"name": "A"}]
      },
      {
        "name": "D",
        "friends": [{"name": "A"}]
      }
    ]
  }
}

It means that we didn’t over fetch data from the client to GraphQL server, but we did from GraphQL to the persistence system because we had to ask for the A Person’s data 4 times!! (n rountrips for the friends + 1 for the A Person)

This problem is usually tackled with batched requests or caching and even though there’s nothing out of the box that automatically do that for you there’s no need to re-invent the wheel, there are several options already available for you to include in your application, you might do it client-side, server-side, in-memory, HTTP caching and it’ll depend on the tools available for your architecture.

See the Learning Resources for more info


IMPORTANT NOTE:

The execution flow is non-deterministic because

So, defining your resolvers as atomic and pure functions is critical. Meaning DON’T mutate the context object on your resolvers, ever, or you’ll get badly hurt rather sooner than later.

…the resolution of fields other than top‐level mutation fields must always be side effect‐free and idempotent, the execution order must not affect the result

Source: GraphQL spec June 2018 6.3.1 Normal and Serial Execution.

Patience, we’ll go deeper into that on day 4 (Mutations).

Exercise

For a given datasource (abstracted as json here) containing n rows of skills and n rows of persons we provided a sample implementation of a GraphQL server for each technology containing:

The example contains the necessary code to run a simple query for a random skill. This code defines the necessary types, the relationships for skills, the available query randomSkill, the resolver, the skill entity model and the db abstraction.

Given the following query operation

query {
  randomPerson {
    id
    age
    eyeColor
    fullName
    email
    friends {
      name
    }
    skills {
      name
    }
    favSkill {
      name
      parent {
        name
      }
    }
  }
}

Exercise requirements

Technologies

Select the exercise on your preferred technology:

Learning Resources