## Avoiding memory overuse

While disk space is becoming increasingly cheaper, quick-access memory still is
a relatively expensive resource in computing infrastructure. Also, it is most
often not automatically expansive, its limits can be pretty hard to overcome
(even though swap - which stores it temporarily in disk softens it a bit).

We should be conscious about our usage of this scare resource, without trying to
preemptively improve it too much (premature optimization is the root of all
evil). It's not so much that we should spare from overusing it, but we should
mostly be aware that if something has an high usage for it it might bust it out
completely on a different machine, on a different load or with a different set
of data.

As a rule of thumb to know if it's worth to look into it, a web page request
should never go over 150 Mb and a background processing over 300 Mb of peak
memory usage. In php one can see what this value was by calling the function
`memory_get_peak_usage`. Most debuggers/profilers also make this value apparent.

Generically, unless we've been having performance/memory issues on the specific
application or area of the application, it's not worth it to look for such
memory drains, they usually make themselves notice.

But naturally, if we bump into fatal failures for memory exhaustion and/or if
we are able to anticipate that we are seeing a strong candidate for it, such as
when we deal with thousands of records at once, it's important to know a few
techniques on how to avoid overuse of memory.

## Streams

A programming concept that is used to avoid memory overuse is streams.

A stream is an asynchronous programming pattern where a set of data to be
processed is received continuously or at an unknown rate on its input interface,
temporarily stored in a FIFO (first-in-first-out) buffer storage and as its
processed, sent out on its output interface.

In this pattern, the maximum memory is defined by the buffer size. It's used
for continuous and never-ending processing. In this approach is not so relevant
how much big the data to be processed is, but more how fast can it be processed,
what is its throughput. The stream must be able to handle the data fast
enough that the buffer doesn't fill itself (or the buffer must be big enough to
accommodate it).

Although true streams are continuous and data is processed as-soon-as-possible,
there are situations where that is hard to achieve due to the nature of the
underlying architecture or programming language and when in practice a stream
is emulated through polling, where the processor of the stream periodically
checks if there is new data to be processed.

There can also be streams with a buffer that is not limited, and let it be
limited itself by the underlying infrastructure available memory. Even if that
would also risk a memory exhaustion, the risk is of orders of magnitude lower
than not using a stream and handling the whole data at once.

## Pagination

Pagination can be seen as a variation/simplification of the streams pattern.

When it's the processor that controls the rate at which data is fetched from
another source (for example, from the database), instead of fetching all the
data at once, it can treat it as if it were a book which its read (and
processed) page-by-page.

Most database-abstraction tools already have native support and/or examples on
how to achieve pagination.

This is the go-to solution when handling massive amounts of data with an ORM.
Specifically, in Doctrine2 there is an abstraction to providing pages almost
transparently to the collection user called
[Paginator](https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/tutorials/pagination.html).


Streams and Pagination


My name is **Luís Faceira** and I'm an addicted learner, obsessed with process and culture in software engineering.

I've worked with companies of all sizes to increase software development productivity, through the establishment of agile software development practices leading to continuous delivery.

I'm a public speaker on the subject of DevOps and Continuous Delivery and I practice what I preach daily, by applying the same techniques and technologies that I help other companies setup inside Winning Scientific Technology.

## Professional history

I've started programming when I was 10 years old and started my first online business/stint when I was 13. I've never stopped learning since then.

After some time working in a consultancy multinational, and as a result of increasing frustration from the waste of inefficient software development practices, in 2008 I've co-founded *be.ubi*, a web-based software development shop. Our efficient approach to software development (before the popularity of DevOps), and our early heavy usage of cloud attracted the interest of multiple peers which wanted to learn from us. In 2019, my company integrated a larger management consulting group and is since called *Winning Scientific Technology*.

It is an award-winning software development company that specialized itself in continuous delivery services and solutions. It develops continuous delivery tools and provides consulting, custom development, training and hiring services to teams and companies of all sizes, from small startups to international enterprises, in verticals ranging from financial services to sports.

Besides being an equity partner, I currently serve as Winning's Head of Engineering and Product. I'm particularly focused on the technological vision for the group and its software and product development initiatives.

I also provide training and consulting on DevOps and Continuous Delivery.

Specialties: Continuous Delivery, DevOps, Software Development, Quality Assurance, Agile, IT Entrepreneurship, Product Development, Object-Oriented Programming, Cloud, Containers.

## Life

I was born in Portugal. I currently live in Aveiro, a vibrant coastal city in Portugal filled with canals and modern art buildings. I've lived here most of my life, though for longer or shorter periods, I've lived in multiple cities from Lisbon to San Francisco.

I love to travel. I specially love to take a plane with my wife to another point of the globe, rent a car and do a trip across smaller less-known destinations of a foreign country.

Once in a while I also enjoy more adventurous escapes such as going for skydiving, scuba diving, rafting, roller coasting, etc.

I have a daughter with 13 years old, and a little boy with 3.

For me, technology is not only a professional side, it also takes a big part of my personal life and I'm a gadget freak, always looking for the next little automation to be made on our house, car, etc.

I like movies, specially dark comedies and dense thrillers.


This is a short page about me and my work.

About Me

Blog


There are multiple ways to reach out to me.

If you want to exchange ideas/thoughts on something that might be useful to others, I suggest you not to be shy and tweet me [@luisfaceira](https://twitter.com/luisfaceira). My DMs are also open.

For professional enquiries, you can e-mail me to `luis@this site's domain`.


Contact Luís Faceira

I'm addicted to learning and sharing with others what I've learnt regarding software engineering processes, software design, DevOps and continuous delivery.

Hi, I'm Luís.

I've started programming when I was a kid. In 2008 I've started my own software development shop with a strong emphasis on iterative methodologies, internal quality, and continuous delivery, which is now a team of ~50 consultants and is the technical arm of <a href="https://winning-consulting.com" target="_blank">Winning</a>. You can find out more about me in the dedicated page.

About

Besides being my professional presentation, this website is also my public notebook. You will find here a collection of patterns that I've found myself teaching to dozens of engineers that I've coached and leaded along the years. I find it particularly useful to assist in the process of code-reviewing.

What is this site?

Posts

Home


Particularly in dynamically typed languages (such as PHP and Javascript), there
is a strong support for operating key-value arrays (aka hashmaps).

This provides extreme flexibility in situations where an item can have multiple
non-determined properties.

But non-determinism is a source of many problems, and is generally better
to use well-established interfaces and types, using real OOP objects/classes
to establish which are the acceptable or not-acceptable properties for such item.

This is particularly significant when defining the interface/signature of a method,
it's always best if what such method supports is clear and even, if possible,
enforced by the method signature.

So:

```php
// Instead of this:
public function printLastLoginOfUser(array $userProperties)
{
  $username = $userProperties['username'];
  $lastLogin = $userProperties['lastLogin'];
  
  echo 'User '.$username.' last login: '.$lastLogin;
}

// Either use the precise properties you need
public function printLastLogin(string $username, \DateTime $lastLogin)
{
  echo 'User '.$username.' last login: '.$lastLogin;
}

// Or even better, use a typed class/object, if possible
public function printLastLogin(User $user)
{
  $username = $user->getUsername();
  $lastLogin = $user->getLastLogin();
  echo 'User '.$username.' last login: '.$lastLogin;
}
```

By using well-defined interfaces not only does the code becomes more clear and
safe, it will also be more apparent what's its scope and in which class they should
be organized into. For example, the approach above of using the class should
be refactored into the User class itself, like this:

```php
public class User
{
  public function printLastLogin()
  {
    echo 'User '.$this->username.' last login: '.$this->lastLogin;
  }
}
```

Key-value arrays are acceptable to use on in-method logic, and sometimes in
private methods, but for public methods, it is very rarely really needed such
amount of flexibility, and there are many advantages to defining the exact
properties needed or a typed class with the group of properties.

Therefore, as a rule of thumb: *Do not use key-value arrays on public method arguments*

In the extremely rare situation where the method requires more than 5 arguments,
consider creating a typed class or structure first, and only if not viable use key-value arrays.


Avoid key-value arrays


While programming it is usual to have logic that is in some way bounded to
a specific value, which we call "magic values" or "hardcoded values".

There are multiple reasons why such values appear, some more valid and
comprehensible than others.

## Magic values derived from requirements

For example, let's say you have a method that lists recent events:

```php
// Example of what should NOT be done
class EventRepository extends EntityRepository
{
  public function findRecentEvents()
  {
    $recentEvents = $this->getQueryBuilder()
                      ->orderBy('date')
                      ->setMaxResults(10)
                      ->getQuery()->getResult();

    return $recentEvents;
  }
}
```

In this context, the value "10" is a magic value. It isn't logic, it's simply
an hardcoded parameter.

Its origin is probably a functional requirement, in the sense that in the project
it was defined that a "recent event" means one of the last 10 events ordered by
date.

Instead of a magic value mixed with the logic, there are advantages into extracting
the value into an explicit constant like the following:

```php
class EventRepository extends EntityRepository
{
  constant AMOUNT_OF_EVENTS_CONSIDERED_RECENT = 10;

  public function findRecentEvents()
  {
    $recentEvents = $this->getQueryBuilder()
                      ->orderBy('date')
                      ->setMaxResults(self::AMOUNT_OF_EVENTS_CONSIDERED_RECENT)
                      ->getQuery()->getResult();

    return $recentEvents;
  }
}
```

There are multiple advantages of the second approach:

* Separates logic from configuration (even if a static configuration)
* The meaning of the value becomes self-documented, making it easier to read
* Even if I don't know what the `setMaxResults` does, the name of the constant
makes it easier to understand
* If the same magic value is meaningful elsewhere, it will be reused instead of
duplicated
* It makes it easier to change simultaneous across multiple places, instead of
find/replacing the value "10" (which can be used for different things and mean
different things in different contexts) you can check for the usages of this
specific constant
* It makes it easier to potentially convert it into a dynamically defined
value, mainly because of the previous point

Also, it is worth contemplating if a magic value that derives from a requirement
such as the one above is something that is expectable to remain the same in the
future, or if it is likely that it might change in the future.
In the latter situation, a proper configuration system that injects the value
as a parameter to the class is more appropriate.

## Internally defined values for logic variations

Another example of a magic value, this time with a different motivation and type:

```php
// Example of what should NOT be done
class Vehicle
{
  public function isSpaceRocket()
  {
    if ($this->type === 'space_rocket') {
      return true;
    } else {
      return false;
    }
  }
}
```

In this context, the 'space_rocket' is also a magic value. Contrary to the previous
example, it is not likely that this was explicitly defined as a requirement, but
is a special internal string used to tie things together between different parts
of the code.

This is much worse than the previous example, because it is almost guaranteed
that the same magic value will be used in multiple contexts. A simple mistyping
of the value will make the logic fail. It has almost all the disadvantages/advantages
of the previous example except the verbosity (which can be identical).

Besides putting it inside the class as a constant, as with the previous example,
there are other ways to avoid these values more fit of these situations:

1. Refactor the code to use explicit classes/interfaces/traits instead of
hardcoded variations:

   ```php
   class SpaceRocket extends Vehicle {}

   // instead of if($vehicle->isSpaceRocket()) :
   if ($vehicle instanceof SpaceRocket)
   ```

2. Create an abstract class with the multiple types:

   ```php
   abstract class VehicleType
   {
     const CAR = 'car';
     const SPACE_ROCKET = 'space_rocket';
   }

   class Vehicle
   {
     public function isSpaceRocket()
     {
       if ($this->type === Vehicle::SPACE_ROCKET) {
         return true;
       } else {
         return false;
       }
     }
   }
   ```

## Magic values used for "wiring"

There is another common example of magic values in symfony projects, the usage
of special "wiring" values, such as the ones used with getting services from
service locators.

```php
// Do not do this:
$repo = $this->entityManager->getRepository('TransportationBundle:Vehicle');
```

First, ideally, you should avoid service locators, as described in the
"[Avoid Service Locators](/patterns/avoid-service-locators/)" pattern.

Even when not taking that best extra step of defining the repository as a dependency,
there is a very simple alternative, which is to use the existing `::class` constant
that points to the fully-qualified-name of a class, which allows to do the same
as above, with a statically typed value (refactorable, statically-verifiable, etc.):

```php
// if you must really get a repo from the entity manager use
//  the :class constant which even leverages the "use" declarations
$repo = $this->entityManager->getRepository(Vehicle::class);
```

## Exception - un-translatable string

One fo the rare exceptions where using a magic value is acceptable is
where the magic value is a string that self-explains itself and that is not
supposed to be translated, for example with messages used in logging:

```php
$logger->write('Opening connection to the database');
```


Avoid magic values


## DI != SL

[Dependency Injection](/patterns/dependency-injection) is an highly recommended pattern
to make the dependencies of a class configurable by its users, with many advantages.

With the increase of popularity of this pattern and its recommendation of usage
in the context of web frameworks, arose another pattern, which when misused, as it usually is,
becomes an anti-pattern: the service locator (aka dependency injection container).

## Service-Locator pattern

To understand the negative anti-pattern, we should first understand the originating
positive pattern.

When applying the concept of [DI](/patterns/dependency-injection) into a big application
structured around a framework, it's easy to end up with injecting a dependency
(e.g. `EntityManager`) repeatedly in multiple classes that require it.

For example, we could end up with code in multiple controllers that would do this repeatedly:

```php
use Doctrine\DBAL\Driver\PDOMySql as MySqlDriver;
use Doctrine\ORM\EntityManager;
use MyProject\OAuth\Authenticator;

public class LoginController
{
  public function authenticateAction(string $username, string $password)
  {
    // Extremely bad example, for multiple reasons!

    // Preparing the dependency to be injected
    $driver = new MySqlDriver();
    $connection = $driver->connect('localhost', 'db_user', 'db_pass');
    $entityManager = new EntityManager($connection);

    // Injecting the dependency to the authenticator
    $authenticator = new Authenticator($entityManager);
    $authenticator->authenticate($username, $password);
    // ...
  }
}
```

These classes, called from multiple places are usually called **Services**. Entity managers
that handle database persistence or loggers are two classical examples of common services
in a web-based application.

To avoid repeating the configuration of the dependency over and over,
to create a central point where you can change the configurations of it,
and even its implementation details (for example, switching from a mysql driver to
a postgres driver), the *Service Locator* (SL) or *Dependency Injection Container* (DIC)
pattern can be used, and most DI-friendly frameworks provide one.

If you use the frameworks `ContainerAware` properties (already embedded into, for example,
symfony's base `Controller` classes, then you can get such dependencies already
instantiated from the Service Locator, like this:

```php
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use MyProject\OAuth\Authenticator;

public class LoginController extends Controller
{
  public function authenticateAction(string $username, string $password)
  {
    $entityManager = $this->container->get('em');

    $authenticator = new Authenticator($entityManager);
    $authenticator->authenticate($username, $password);
    // ...
  }
}
```

The container would know (from its configuration) how to create an EntityManager,
which Driver to use, which username and passwords, etc.

You might notice some improvements:

* We don't need to repeat our db configuration across all our controllers
* Our controller no longer depends on a concrete implementation (MySql)
* It is not even tied to using a real doctrine entity manager, as long
as it is one compatible with the authenticator dependency, making it possible
to use an extension or even a mock of such entity manager
* Testing the first controller without the penalty of real database interaction
was impossible, now it can be done

## Abusing Service Locators

Service locators are extremely powerful to provide a `glue` into multiple
services, and are very useful for a framework to coordinate its configured services.

In symfony, for example, the popularity of its dependency injection component
lead developers to use it everywhere, to get the entity manager, to get the logger,
and eventually lead to even business logic classes to use it. Everything is called
a service nowadays!

Picking the same example of above, here's what someone who uses service locator
would probably (wrongly) do:

```php
use Symfony\Bundle\FrameworkBundle\Controller\Controller;

public function LoginController extends Controller
{
  public function authenticateAction(string $username, string $password)
  {
    $em = $this->container->get('em');
    $authenticator = $this->container->get('authenticator');

    $isAuthenticated = $authenticator->authenticate($username, $password);

    if ($isAuthenticated) {
      $user = $em->getTable('User')->findByUsername($username);
      $user->setLastLogin();
      // ..
    }
    // ..
  }
}
```

Although useful for the framework itself, using them ourselves directly should be
avoided as much as possible, since it introduces other (avoidable) problems:

* Instead of a soft-dependency, it introduces an **undetermined** dependency. We have no
guarantees that when I ask for a service with name 'authenticator', I will get one
of the desired interface, not even that it has the 'authenticate()' method!
* Makes it out-of-reach for IDEs, static analyzers, etc. to automatically know the
type of the object and providing auto-completion, refactoring, etc.
* Since the dependency becomes defined as a string, it's much harder to refactor by IDEs
* Automatic dependency analysis have no clue on what the dependencies are
* A mistype of the service name goes unnoticed (until it breaks when running the code,
possibly in production)
* Makes all the classes that use the SL framework-dependent

### How to avoid/refactor DIC abuse

There are two very different ways to avoid using the service locator, one of the two
applies 99% of the time, so as a general rule of thumb "DO NOT USE THE SERVICE LOCATOR".

I call one of the techniques "move-it-up", the other I call "own-it".

To decide on what's the best one of the two, one just need to analyze if we're depending
on a "true" service, or on an implementation.

#### Move-it-up

If you are really depending upon a service (as in the case of the EntityManager, in the sample),
one that is used in multiple places and which you want to ensure it has consistency, and
about which you couldn't care less about how it really is implemented... then a SL is
something that makes perfect sense... except... it doesn't have to be you to use it,
you can/should move the dependency up in the chain until the point it's the job of the
framework to use it.

In the example we're following, let's do this to the EntityManager:

```php
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Doctrine\ORM\EntityManagerInterface;

public class LoginController extends Controller
{
  public function authenticateAction(EntityManagerInterface $em, string $username, string $password)
  {
    $authenticator = $this->container->get('authenticator');

    $isAuthenticated = $authenticator->authenticate($username, $password);

    if ($isAuthenticated) {
      $user = $em->getTable('User')->findByUsername($username);
      $user->setLastLogin();
    }

    // ...
  }
}
```

Notice that now, in what regards to the EM, we have guarantees that it will have a
certain interface, that it will have a `getTable()` method, etc.

It's no longer an undetermined dependency, but is not an hard-dependency either,
you can inject whatever implements the EntityManagerInterface, so you get all the
benefits of DI without the drawbacks of the SL.

There can be a drawback, however, depending on the framework you're using. By "moving-up" the
injection to who calls the controller, you might be now required to configure the controller
(or command) as a service, in the service locator configurations.
It would still be worth it, but with some frameworks you don't even need to do that,
if they support [auto-wiring of dependencies](https://symfony.com/doc/current/service_container/autowiring.html),
as symfony does since version 2.8.

#### Own-it

But if it's not a true service, there's another alternative to consider.
If your class is one of the few (or the only class) that will depend upon it,
and if you're tied with its implementation details (as it frequently happens
between classes of same namespace, though it shouldn't), then an alternative is
to actually NOT use Dependency Injection at all.

It's better to have an explicit hard dependency to another class than to
have the false appearance and false comfort of an undetermined dependency.

Hard dependencies should be avoided for code over which you have almost no control
(external libraries) and definitely avoided for dependencies that will be slow (and therefore
important to mock in testing). Other than that, it's not a crime to own such dependencies.

If the `LoginController` will always use an `OAuth\Authenticator`, if there is
no foreseeable situation where a different implementation with the same
interface would be necessary, then maybe it just isn't the case where the
dependency injection pattern shines, and we can use it.

So, mixing the example of moving up the `EntityManager` dependency with
owning the `Authenticator` dependency, this would be the result:

```php
use Doctrine\ORM\EntityManagerInterface;
use MyProject\OAuth\Authenticator;

public class LoginController
{
  public function authenticateAction(EntityManagerInterface $em, string $username, string $password)
  {
    $authenticator = new Authenticator($em);

    $isAuthenticated = $authenticator->authenticate($username, $password);

    if ($isAuthenticated) {
      $user = $em->getTable('User')->findByUsername($username);
      $user->setLastLogin();
    }

    // ...
  }
}
```

Notice that we no longer depend on the framework and that all our dependencies are
well defined (one of them flexible, the other owned as an hard dependency).

The code is much easier to read, test, refactor, etc.

### When should we own it?

There is no golden rule into defining a class as a service that should be
on the service locator or not. If it's used a lot, then probably it should.

If it isn't used a lot, it is more subjective if it's better to move it up
or to own it, it's a balance of tradeoffs.

When in doubt, move it up.


Avoid using Service Locators


## Basic Concept

Dependency injection (aka Inversion of Control) is a very important pattern to
avoid having dependencies of concrete implementation classes, removing the
usage of constructors `new ClassX()`, which removes all possibility for
extending, mocking, switching implementations, etc.

Instead of explicitly using a dependency, a class declares its dependencies
either on its constructor (mandatory dependencies) or in setters (optional deps),
making the responsibility of its user to **inject the dependency**.

Instead of this:

```php
public class MyClassA
{
  public function methodA()
  {
    $logger = new Logger();
    $logger->log('using an hard dependency on the Logger class');
  }
}
```

with the dependency injection pattern, we do this:

```php
public class MyClassA
{
  public function __construct(LoggerInterface $logger)
  {
    $this->logger = $logger;
  }

  public function methodA()
  {
    $this->logger->log('using a soft, extendable/mockable dependency');
  }
}
```

## DI does not eliminate dependencies

It might make all of this appear worthless, but it's important to understand
that the DI pattern does not make a dependency disappear, it "simply"
migrates such dependency to an higher level.

By having a class get one of its dependencies injected, we're of course not really
fully avoiding that the dependent class is instantiated somewhere. In the example
above, we're simply providing flexibility to the user/caller of ClassA to decide
how to, in this case, log things.

So, eventually, there would be some other class, `ControllerB` that will instantiate the logger:

```php
public class ControllerB
{
  public function action()
  {
    $logger = new Logger();
    $objectA = new MyClassA($logger);
    $objectA->methodA();
  }
}
```

Even if it might seem we're just "moving the problem around", there are still advantages:

* it allows for other controllers to use `MyClassA` in a different way
* it makes 'MyClassA` easier to test (making it possible to mock its dependencies)

## Service-Locators

This DI pattern is mixed up with the use and (sometimes overuse) of the
service locator, which becomes an anti-pattern.

Read more about why you should [avoid service locators](/patterns/avoid-service-locators)!


Dependency Injection


Doctrine defines a concept of repository classes as a solution to encapsulate
the logic of how to query such repository to retrieve objects of that class.

Below are multiple best-practices on how to achieve reusable, readable and
consistent approaches for doctrine repositories:

## All your queries are belong to us

The core responsibility of a Doctrine Repository is precisely to be able to
encapsulate query building and execution logic, so we should avoid, as much
as possible, to build or execute queries outside of it.

Instead of this:

```php
public class UsersController extends Controller
{
  //...
  public function listAdminUsersAction()
  {
    $admins = $this->userRepository->createQueryBuilder()
        ->where('is_admin = \'true\'')
        ->andWhere('is_active = \'true\'')
        ->andWhere('deleted_at IS NULL)
        ->execute();

    return $admins;
  }
}
```

Consider instead extracting the logic into the repository:

```php
public class UsersController extends Controller
{
  //...
  public function listAdminUsersAction()
  {
    $admins = $this->userRepository->findActiveAdmins();
  }
}

public class UserRepository
{
  public function findActiveAdmins()
  {
    $qb = $this->createQueryBuilder()
             ->where('is_admin = '\true\'')
             ->andWhere('is_active = '\true\'')
             ->andWhere('deleted_at IS NULL);
    
    $admins = $qb->execute();
    
    return $admins;
  }
}
```

Advantages:

* Promotes Single-Responsibility-Principle
* Removes logic out of controllers
* Concentrates querying logic in a single class
* Makes it more clear what it does (for example, who uses the repository
will immediately be aware that there is the concept of `currently* being
an admin.

## Private query builders

It is common that multiple queries have some shared querying. Instead of repeating
the query in multiple find methods of the repository, one should create a private
method for creating a base query builder with such shared criteria.

Example:

```php
public class UserRepository
{
  private function getActiveQueryBuilder()
  {
    $qb = $this->createQueryBuilder()
            ->where('is_active = true')
            ->andWhere('deleted_at IS NULL');

    return $qb;
  }

  public function findActive()
  {
    $qb = $this->getActiveQueryBuilder();

    $activeUsers = $qb->getQuery()->execute();

    return $activeUsers;
  }
  
  public function findActiveAdmins()
  {
    $qb = $this->getActiveQueryBuilder()
            ->andWhere('is_admin = true');

    $activeAdmins = $qb->getQuery()->execute();

    return $activeAdmins;
  }
}
```

Those methods should be private, to avoid the temptation to add logic to them
outside of the repository class.

One notable acceptable exception occurs in the context of the EntityType of
Symfony forms, where to list only a subset of entities of a certain type.
A common way is to use its `query_builder` option, in which case it MUST simply
use a `getFooBarQueryBuilder()` without any additional changes.

### Handling really complex repositories

Sometimes the multitude of criteria that can be applied across different
methods of a repository makes it very hard to use an approach of chained
query builders.

For those situations, there is the notion of Criteria. This [blog post by
Doctrine's creator](https://beberlei.de/2013/03/04/doctrine_repositories.html)
explains how to use them.

## Prefer andWhere() instead of where()

The two methods are very similar and easy to be confused, but with a significant
distinction: the `where()` overrides everything set up until that moment while
the `andWhere()` adds criteria to an existing query builder.

```php
// This won't work well, because it will override the "active" criteria:
$qb = $this->getActiveQueryBuilder()->where('is_admin = true');

// This always works, no matter if there is or isn't any criteria already existing
$qb = $this->getActiveQueryBuilder()->andWhere('is_admin = true');
```

The single situation where a `where` is acceptable is if chained immediately
after creating a new query builder:

```php
$qb = $this->createQueryBuilder()
          ->where('x IS NOT NULL');
```

## Return objects by default

Doctrine Repositories are part of the ORM framework. They should assist in
handling, by default, objects.

Therefore, the default assumed return type for a public method of, for example,
a `UserRepository` should be a collection of `User` objects.

You should only default to return objects, and in the rare occasions where,
for some advanced reasons (e.g. an hydration proven to be extremely slow), the
name of the method shall explicitly say that it returns another type, for example:

```php
public function getActiveUsersArray();
```

## Use the `findBy` terminology to declare filters to apply

The way a method should be used should be immediately apparent from its method
name, so if it has arguments, it should be easy to guess what types they should
have by the method name. In the case of providing filters to apply to a
query operation, they should be stated on the method name, like the following:

```php
public function findActiveByUsername(string $username)
{
  $user = $this->getActiveQueryBuilder()
            ->andWhere('username = :username')
               ->setParameter('username', $username)
            ->getSingleResult();

  return $user;
}
```

## Prefer injecting a repository over the EntityManager

One of the most common injections into a controller (if you're [avoiding service
locators](/patterns/avoid-service-locators)) is of the entity manager.

But many times, you do not really need to inject the entity manager, but instead
can and should inject the repository itself.

Instead of this:

```php
public class ListUsersController()
{
  public function __construct(EntityManagerInterface $em)
  {
    $this->em = $em;
  }

  public function listAdminsAction()
  {
    $admins = $this->em->getRepository('User')->findActiveAdmins();
    return $admins;
  }
}
```

You should do this:

```php
public class ListUsersController()
{
  public function __construct(UserRepository $repo)
  {
    $this->repo = $repo;
  }

  public function listAdminsAction()
  {
    $admins = $this->repo->findActiveAdmins();
    return $admins;
  }
}
```

## Avoid filtering by another entity or id

Frequently, one wants to get a list of items in a repository that are somehow
related with another entity.

For example, imagine we want to get the comments that a certain user did
on a certain date. One common approach is to use the comments query builder on
the comments repository to filter it by such options. Like this:

```php
public class CommentsRepository
{
  public function findByUserAndDate(User $user, \Datetime $date)
  {
    $comments = $this->createQueryBuilder()
                     ->where('user = :user')
                         ->setParameter('user', $user)
                     ->andWhere('created_at = :created_at')
                         ->setParameter('created_at', $date)
                     ->execute();
    return $comments;
  }
}
```

However, most of the times that you want to filter by a related entity, you want
to do it in the context of such entity, so a better API for doing that
would be `$myUser->getCommentsOnDate($date)`. This is the intuitive way to
retrieve related objects in OOP, and is how we also retrieve unfiltered objects
such as `$myUser->getComments()`.

To achieve this, first, you should ensure that a relation exists between the two
entities. If it's common that you need to filter one repository by an entity
(or its id) then it's a fact that they do have a conceptual relationship, so that
relationship should be declared in the entity. Note that this doesn't necessarily
mean that changes to the schema are necessary, only that the relation should be
declared in doctrine annotation so that we're able to leverage the ORM to handle
such relationships.

Then, it shall be possible to do this instead of the above:

```php
public class User
{
  public function getCommentsOnDate(\Datetime $date)
  {
    $createdOnDateCriteria = Criteria::create()
        ->where(Criteria::expr()->eq('created_at', $date));

    $commentsOnDate = $this->comments->matching($createdOnDateCriteria);

    return $commentsOnDate;
  }
}
```


Doctrine Repositories Best-Practices


One of the core concepts of object-oriented programming is encouraging to model
a solution to a system problem by splitting the complexity into smaller units,
(namespaces, classes, methods, etc.).

The purpose of such separation is to eliminate the amount of simultaneous
comprehension of a system that the programmer would need to understand, manage
and evolve a system-wide solution.

Instead of having to comprehend everything at the same time, he can instead
comprehend just a smaller unit and part of the problem/solution.

For this approach to be effective, when a person is trying to comprehend a unit,
which in its smaller form means a method, it should NOT have to look anywhere
else to understand it. The code inside it, its documentation, and the interfaces
of other units (methods) MUST be enough to understand what is going on.

This means that there can NOT exist any assumptions on how a method will be
used, by whom, in what context, other than the assumptions that are reasonable
implied by the interface of the method. Ideally, its signature (name and arguments)
should be enough, in last resort its documentation should clarify.

It is also not acceptable to explain such assumptions inside the code of the unit
because, they are not visible in the units that use it.

## Example

Consider the situation where we have a system that manages multiple types of
vehicles (cars, motorcycles, bicycles, etc.).

### Implicit assumptions (bad)

```php
public class Driver
{
  public function prepare(Vehicle $vehicle)
  {
    $tankSize = $vehicle->getTankSize();
    $dieselInTank = $vehicle->getGasInTank();
    $amountToFill = $tankSize - $dieselInTank;

    $vehicle->addDiesel($amountToFill);
  }

  public function moveForward(Vehicle $vehicle, int seconds)
  {
    if (! $vehicle->engineRunning()) {
      $vehicle->startEngine();
    }
    $vehicle->pushGasPedal($seconds);
  }
}
```

The above two methods take multiple visible assumptions about a vehicle:

* The vehicle has a tank for which we can get its size and status
* The vehicle is propelled by diesel (and not gasoline, for example)
* The vehicle has an engine that can be checked to be running and started
* The vehicle goes forward by pushing a gas pedal

Sometimes we take these sort of assumptions (either by mistake or on purpose)
based on our knowledge of the current state of the whole system. Maybe we know
that the single place that uses the above methods is the following piece of code
on another part of the system:

```php
public function doLongTravelByDieselCar(Vehicle $car)
{
    prepare($car);
    moveForward($car, 50000);
    // ...
}
```

The problem is, the project inevitably evolves, and the whole point of separating
the system into smaller manageable units is precisely to AVOID having to take into
account what other units do.

And as the system evolves, we'll eventually replicate parts of the code that was
originally only on that diesel car, to other situations... assumptions change,
and whoever is programming the other parts of the system, the other units, won't
care (as they shouldn't) how this works, they believe in the contract that it
establishes that when asks a `Driver` to `moveForward(Vehicle)` it will do so.

This is a time-ticking bomb:

```php
$driver->moveForward($bicycle);
```

## Explicit assumptions (good)

Assumptions are inevitable. In fact, assumptions are exactly what define the
interface of methods. It's OK to have them.

One should not attempt to develop all sorts of logic to drive all vehicles at
once. If for now our system will only need for our class `Driver` to be
able to drive diesel cars, we should NOT implement the logic for him to be able
to drive bicycles (YAGNI). It might be needed in the future, but it also might not.

But if the system does evolve into needing to drive bicycles, it must be clear
that it is not prepared to do that, the assumptions that the methods make should
be clear on their interfaces.

There are multiple ways to achieve that:

### Increase verbosity

Changing the unit's (method or class) name to transpire the assumptions it is
making is the simplest way to make them explicit and avoid future problems.

For example, we could call the above sample methods
`prepareDieselVehicle(Vehicle $vehicle)` and
`moveEngineVehicleForward(Vehicle $vehicle)`.

Much less likely that someone might call those with a bicycle, right?

### Use specialized classes

If a vehicle can take many forms, and they behave differently, we could model
those vehicles as extended classes of the base Vehicle class.

We could therefore use a `moveForward(Car $car)`. Shorter, clearer, and the
language engine will ensure that the method will never be called with a bicycle,
and hardly anyone will even consider it.

### Use interfaces

Better than simple class hierarchy, most OOP languages provide a mechanism
to handle assumptions: Interfaces.

The previous example of class hierarchy is not ideal, because it will limit to
use the moveForward with a car, but that wasn't really the assumption. The
assumption was that the vehicle provided had an engine. The method is already
ready to handle cars, vans. buses, trucks, etc., not just cars.

So, we could define the method's signature in the following way:
`moveForward(EngineVehicleInterface $vehicle)`

### Test assumptions, throw exceptions

Exceptions deserve their own independent pattern page, but they are particularly
relevant in the context of managing assumptions.

That's what they are made for in the first place. When you get thrown an
Exception for trying to connect to a database, is because the method has the
assumption that it should be used with an existing and available connection.

If one of our methods has an assumption, it should ideally test for those
assumptions and throw exceptions when they fail to be verified.

For example, it's perfectly acceptable that a `startEngine()` method has the
assumption that there is an engine. But to protect a situation where that rule
isn't followed (probably by mistake), we should do the following:

```php
public class Vehicle
{

  public method startEngine()
  {
    if (! $this->hasEngine()) {
      throw new VehicleWithoutEngineException();
    }
  }
}
```

## TL;DR

If you must take assumptions on how others will use a unit (method/class), ensure
that such assumptions are perfectly explicit in the unit's interface, and when
necessary test such assumptions and throw exceptions if they fail.


Explicit assumptions on the interface


## Usability of urls

The URLs that are used for the multiple interactions with an application
are commonly seen as a technical detail of a web application, specially if it's
not part of a integration API.

But although they can be essentially ignored by the end users, they are actually
visible and interacted upon by the end user, so they are a sort of user interface,
so its usability should not be disregarded

Good urls should be understandable, should be guessable (with the exception of
direct-sharing private links), and should be possible to manipulate by the users.

They might not be ideal for them to be programmed (it would require web-scraping)
but the routes/urls of a web application establish, in great measure, are its
world-facing interface. So, many of the reasoning that apply to good url design
on Restful APIs also apply to good url design of the main HTML endpoints of an
application.

## Restful urls

Good restful urls follow the following rules:

* The url identifies the target of an action, like this:
  * `/optionalNamespaces/collectionName`
  * `/optionalNamespaces/collectionName/itemIdentification`
* Standard CRUD actions are identified by the HTTP method:
  * Create: `PUT` or `POST` on the collection URL
  * Read: `GET` without parameters, on the collection URL (to list) or on the
individual item URL (to view details)
  * Update: `POST` on the item's url or `PUT` on the collection URL (with the id
as parameter)
  * DELETE: `DELETE` on the item url

* Additional non-crud actions can be established by appending `/extraAction` to
the item or collection url
* Optional parameters are NOT part of the url and should be sent as POST parameters.

## Avoid Query strings

Query strings are the classical way to provide parameters to a web application,
in the form `viewPage?id=123`.

One of the problems of these query strings is that they are ignored by search
engines and other mechanisms as not being part of the URL. So, we should use
`page/123` instead, which most web frameworks already provide as default,
converting the `123` to an argument for a more generic action.

Besides the parameters that identify the target of an action, or the action itself,
if additional ones are necessary, they should preferably be sent as POST
parameters.

## Avoid removing support to known URLs

If a URL has some visibility, it's a risk to change it, and it should be avoided.
Since the URL is the main interface of the application, its users might be
expecting for a certain URL to work, and we should avoid not delivering on a
user expectation. This is particularly relevant if the user has set an URL as
favorite.

This does not mean that bad urls should stay bad forever. It means we should think
on them deeply as we create them, being aware that it's something that shouldn't
be changed often.

If that hasn't been done, or for some other reason we find ourselves in a situation
where we really need to change an url, we should consider establishing a way to
answer the "old" url with a Permanent Redirect (HTTP 301) response, at least for
a reasonable "transitioning" period.


Good urls


Model-View-Controller is one architectural pattern to clearly separate an
application into three layers.

This is the predominant pattern established by server-side web frameworks such
as Symfony, and it's important to understand it well.

There are some misconceptions and common misuses of the pattern, so here is an
attempt to clarify how it should be applied.

This separation is mostly conceptual. Even though there are some distinctions
in the common file types, base classes or common contents, most frameworks don't
really force us to organize our files in a way that explicitly separates the
three layers that correspond to three directories in the project.

So, more than placing a certain concern in a certain place, this pattern is
focused on ensuring that the concerns don't mix together. What's relevant is not
that a Controller should be on a certain directory, it's that it does not contain
model or view logic.

## The View

The view is usually the easiest part to understand and separate.

It is mainly (but not only) composed by templates that allow to render data
in what is usually a user-friendly way.

In Symfony projects, that usually means a Twig template. But it can also be
menu or form renderers, for example.

There are a couple of things that make up a good view layer:

* Templates should NOT use variables or any logic other than extremely simple
ifs and iterators to decide parts of the template to show
* Templates can only interact with the Model to retrieve simple accessors

An important concept is that an application can actually have multiple separate
view layers. For example, it can have an entire view layer for HTML and a separate
one for API.

## Controller

The controller layer is where the technical interaction logic occurs. It translates
the inputs and the context of the web request and translates them into
the appropriate actions to be executed upon the model, passes such values to fill
the view and present it to the user.

This is where the frameworks provide more value, by most of times doing most
of the heavy-lifting of pattern-matching routing, handling security, 404s, etc.

But Controllers should simply retrieve inputs, providing them to the model, then
the results to the view. And that should be it, *NO LOGIC*, no ifs, no loops, etc.

### CONTROLLERS SHOULD HAVE NO LOGIC EXCEPT INTERACTION LOGIC

The single exception to having logic in the controller is if its interaction logic
(though most is likely achievable through embedded framework configurations).

Interaction logic is only the one that decides technical aspects (e.g. the HTTP
return code), but should NEVER include the supporting business logic that supports
the decision on what interaction to apply.

For example, in a Controller, instead of this:

```php
if ($document->getOwner() != $currentUser && !$currentUser->isAdmin()) {
  $this->createNotFoundException('Document not found!');
}
```

we should do this instead:

```php
if (! $currentUser->isAllowedToSeeDocument($document)) {
  $this->createNotFoundException('Document not found!');
}
```

Bare in mind that an application can also have multiple groups of isolated
controllers. For example, a web application can have, beside its main HTTP-driven
controller layer, a group of command-line commands for system administration.

In this layer it is common to see the anti-pattern to put too much logic in the
Controller layer, but most of the times, refactoring it to the Model should be
easy.

## Model

Model is everything else, and is where the heart of the application should be.

Its where the BUSINESS LOGIC lives, where the system has the rules that represent
the business domain and the business rules on how to interact with such domain.

Because of its wider scope, this is usually the hardest of the three layers to
maintain organized.

Sometimes the concept of the Model is confused with the concept of database-backed
entities, but Model are all business-domain-specific classes, which includes
both persisted and not persisted objects.

Entities should be classes that represent business objects that are unique in
the sense that have an identity, and even if two are identical (think of two human
twins), they are still considered two separated objects that can become different
over time.

By having an identity, not considered to be ephemeral immutable objects, Entities
are usually persisted through an EntityManager (such as Doctrine).

An entity should be the single responsible for changing its state, and all state
coordinated state changes to an entity should be made by it.

Instead of this:

```php
$user->setActive('false');
$user->setDeactivatedAt($now);
$user->setDeactivatedBy($currentUser);
```

Consider this instead:

```php
$user->deactivateBy($currentUser);
```

Public getters and setters should be avoided, and only created when it makes sense
to have a direct and isolated change of that specific property (for example, if that
property is tied to a form).

Besides persisted entities, classes should be organized according to the business
domain logic (Domain Driven Design), and represent a direct translation of
business concepts and activities.

Reversely, there should not be *no awareness of the controller, session, or other
interface-related objects* in the Model. It's up to the controller to filter what
is needed and pass down just the business-related concepts to the Model.


Model-View-Controller

Time and time again I get asked the question: "Who should write the tests?"

Most times this is a trap-question, when this is asked in an attempt to end
some sort of tribal internal black-and-white discussion between those who
argue it should be the developer and those who argue that the developer is
precisely the last person on earth who should do it.

Both have sound arguments, and as usual, the truth lies somewhere in the
middle, which in this particular case seems an almost impossible concept.



This is not just about who writes the tests. It's about the inner driver of 
quality assurance. There are two approaches to quality assurance in software
projects, that although complementary, are looked upon as contradictory and
incompatible, which, in my experience, frequently leads to failure. They are:

* **Certification QA** - Made by another person/group/entity/process providing an
independent confirmation of quality, which can't be safely done by who has the
vices and assumptions of whoever develops the solution
* **Improvement QA** - Techniques used by whoever is developing the solution,
that lead it to improve the quality of its work, namely by emulating being
another person/group/entity on the perspective of the user of the work produced.

-----------------------

## Certification QA

This is the classical approach, and the one you're more likely to find in big
corporations. But, isolated from Internal QA has a massive track record of
failure after failure.

It looks at QA (and in particular testing) as a way for entity `A` to assess
the work of entity `B`, in order to ensure (and sometimes to formally certify)
that the work produced has NO problems. The tester must be *external,
independent and unbiased*.

One limitation of this is that its output is mostly binary: it either passes
or it doesn't. When on earth have you seen quality defined as existing or
nonexisting?

It has the advantage of better preventing that developer deceives himself
(or others) on whether its work meets certain requirements or not. But the
result is rarely to identify improvement opportunities as much as it is to
identify what needs to be fixed.

On the other hand, by being binary, creates clear incentives to simply sweep
under the rug and do whatever it needs for the quality control process
to pass, instead of focusing on improving the quality. Piles of trash keep
accumulating...

I argue that, if only this sort of QA is done, and is not reinforced with
internal QA, it very quickly leads to unmaintainable rotten code. It
leads to watermelon software: **green on the outside, red on the inside**

![Green on the outside, red on the inside][watermelon_image]

------------------------

## Improvement QA

This is the approach that is somehow more encouraged by modern agile practices
such as TDD or XP.

Within _Improvement QA_, entity `A` executes some practices that force it to
emulate itself as being an external `B`, in order to be more self-critical
and immediately improve the quality of its work.

For example, when a team does internal code-review, it forces itself to
imagine being a future programmer, which will have to evolve and change it,
or imagine doing support and tracking a bug across a piece of code. This
makes it easier to self-criticize and improve the names of the variables, the
architecture, etc. and immediately improve it.

When developing a unit test, the developer is forcing itself to being a future
user of its unit (outside of its current issue), and thinking about its public
interfaces, instead of just thinking on how to solve their immediate problem.

Notice that, the primary output of this sort of QA is not binary, the unit
test development is not as much focused on checking that the new unit complies
or not with what it should do. Instead, its focused on validating if its
interface is clear, easy to use (the test uses it), flexible (it's hard to
test coupled code), and thus proactively improve its internal quality.

> *But isn't the unit test validating its specification?*

Sure, but that's a byproduct. In addition to the primary output (improved
internal quality), that you get during its development, you can then add the
automated test into the set used for regression checking, and thus it acts
as the "external, independent, unbiased".

![Robots are best in being unbiased][robots_image]

_Improvement QA_ promotes awareness of the importance of quality (internal and
external) within the development team and the growth of know-how and
experience on how to do higher quality work in smaller iterations.

Additionally, I argue that when verification isn't seen as an external
process, testing code is less likely to be treated as `second-class code`,
leading to an higher quality, easier to maintain and therefore less likely to
become a burden of false positives and confusing outputs.

This approach does not necessarily imply that the test must be made by the
developer, but above all that **QA shall feed quality into development instead
of just monitoring for the lack of it**.

--------------------------

## Comparison

| Type of QA                                        | Improvement                     | Certification              |
|---------------------------------------------------|---------------------------------|----------------------------|
| To eliminate the bias of the problem-solver...    | the team changes hats           | another entity controls it |
| It mainly aims to...                              | improve quality                 | control minimum quality    |
| It identifies...                                  | opportunities to improve        | problems                   |
| Its output is...                                  | subjective, range               | objective, binary          |
| Focuses on...                                     | internal quality                | external quality           |
| Leads to...                                       | sustainable projects            | green on the outside       |
| Best done by...                                   | the developer itself            | a robot                    |
| The quality of the testing code itself becomes... | as good as the rest of the code | a second-class citizen     |

To achieve sustainable quality, start with a focus on *Improvement QA*. Add
*Certification QA* when its cost is residual (as in the case of reusing the
developed tests for regression), or whenever they are considered high risk
(billing, authentication, deployment, etc.).

----------------------

## TL;DR

> *So, who writes them, then?*

**Improvement QA** should be done by those that can directly impact the
development. When viable, by the developer itself.

**Certification QA** should be done by another entity. When viable by a BOT that
simply reuses tests developed by the development team.



[watermelon_image]: https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Death_Star_Watermelon_%284737176336%29.jpg/640px-Death_Star_Watermelon_%284737176336%29.jpg
[robots_image]: https://c1.staticflickr.com/1/701/22157065162_70e7b5c6e1_b.jpg