Skip to content

Latest commit

 

History

History
232 lines (196 loc) · 11.6 KB

CONTRIBUTING.md

File metadata and controls

232 lines (196 loc) · 11.6 KB

Contributing

Feel free to provide any kind of code base contribution. This could include implementing new publications, fixing errors, writing unit tests, improving algorithms and extending the documentation.

You can also contribute through the issue tracker. Pull requests that address unassigned issues in the tracker are welcome. Many thanks to all existing and future contributors for their participation.

🛠️ Workflow

The typical workflow for pygrank contributions comprises the following steps:

  1. Fork the master branch from the GitHub repository.
  2. Clone the fork locally (recommended: also copy the pre-commit file to .git/hooks).
  3. Edit the library.
  4. Commit changes.
  5. Push changes to the fork.
  6. Create a pull request from the fork back to the original master branch.

You can use any (virtual) environment to edit the local clone, such as conda or venv (provided by PyCharm). The environment should come with Python 3.9 or later installed. Make sure that both base library dependencies networkx, numpy, scipy, sklearn (scikit-learn), wget, as well as tensorflow, torch, matvec, torch-sparse (the last three needed support unit testing for the respective backends) are installed and upgraded to their latest versions.

🛠️ Architecture

pygrank adheres to a hierarchical architecture to manage inter-module dependencies, which new code should maintain for import statements to work. For example, do not design evaluation measures that depend on algorithms. Rather, such components should be delegated to some of the other modules. For reference, we re-iterate here the project's architecture. For more details, please refer to the documentation.

architecture

We ask that, when contributing new code, you try to import methods and classes through the highest-level architectural component they belong to that does not conflict with the code. For example, to design a new filter you need import utility methods from pygrank.algorithms.utils, since a higher-level import would create self-recursions by trying to import all sub-modules. On the other hand, in the same module you can safely import classes from pygrank.measures.

✅ Pull Checklist

Before creating a pull request, make sure that your submission checks the following points:

  1. Class and method docstrings should adhere to Google's docstring conventions. Additionally, code examples should be prefaced by a line starting with the word Example and ending in : and their code lines start with >>>.
  2. When implementing new or existing research (we are more than happy to accomodate this), you are required to also update the library's citations.
  3. New code should maintain CamelCase notation for classes and lower_case_with_underscores for methods and variables.
  4. New files should be placed in appropriate modules and new methods and classes can be accessed from the top level.
  5. Module dependencies should comply on the above-described architecture.
  6. [optional] Algorithms should exhibit near-linear (e.g. polylog-linear) running times and memory allocation with respect to the number of edges, to scale well to large graphs.
  7. [pre-commit] Run python docgenerator.py to add new classes to the documentation. This can be automated for commits to the master branch by copying the pre-commit script file to the local folder .git/hooks.
  8. [github actions] Pass all unit tests with no errors, unless your purpose is to introduce new unit tests that reveal existing bugs in the code base. Refrain from remodularizing the code unless absolutely necessary (creating new packages is fine, but backwards compatibility of import statements is mandatory).
  9. [github actions] Unit tests should provide near-100% code coverage.

✏️ Implementing New Node Ranking Algorithms

Which classes to subclass?

To create a new node ranking algorithm, you are required to subclass one of the classes found in pygrank.algorithms.filters.abstract_filters:

  • GraphFilter identifies generic graph filters (is subclassed by the next two)
  • RecursiveGraphFilter identifies graph filters that can be described with a recursive formula
  • ClosedFormGraphFilter identifies graph filers that can be described in closed form

Please extend this documentation if a new family of node ranking algorithms is implemented.

Where to write code?
  • New abstract graph filter classes (e.g. that define families of new algorithms) should be placed in the module pygrank.algorithms.filters.abstract_filters.
  • New graph filters should be placed in modules pygrank.algorithms.filters.[family], where family is either an existing submodule or a new one.
  • For new filter families, make sure to provide access to their classes through pygrank.algorithms.filters.__init__.py (this is important, as it helps docgenerator.py automatically create documentation for new algorithms).
Which method(s) to override?

Depending on which class you subclass, you need to override and implement a different method; more general GraphFilter classes need to implement at least a step _step(M, personalization, ranks, *args, **kwargs) method of that class with the correct arguments that implements iterative convolutions through the methods provided by the backend (these can be imported from pygrank.backend). You can add any arguments and keyword arguments corresponding to additional graph signals needed for your algorithms to run, but take care to put all algorithm-describing parameters in the constructor. Note that personalization and ranks in this method are graph signals but the method is expected to return the output of backend calculations, which will be directly assigned to ranks.np. The other two abstract subclasses partially implement this method to simplify definition of new filters.

For RecursiveGraphFilter subclasses, you only need to implement the method _formula(self, M, personalization, ranks, *args, **kwargs), which describes an iterative formula describing the filter. Contrary to above, for this method personalization and ranks are backend primitives. Similarly to before, it should also return a backend primitive.

For ClosedFormGraphFilter subclasses, you only need to implement the method _coefficient(self, previous_coefficient) which calculates the next coefficient an of the closed form graph filter a0+a1M+a2M2+... given that the previous coefficient is an-1, where the latter is inputted as None when a0 is being calculated. It may be easier to use n = self.convergence.iteration-1 to explicitly calculate the returned value.

How to structure constructors?

Constructors of graph filters should pass extra arguments to parent classes. This ensures that new algorithms share the same breadth of customization as parent classes. Only additional arguments not parsed by parent classes need to be documented (inherited arguments will be automatically added when docgenerator.py is used to construct documentation). Try to parameterize constructors as much as possible, to ensure that researchers can easily try different variations.

As an example, the following snippet introduces a new algorithm that performs a non-recursive implementation of personalized PageRank based on the recursion a0 = 1-alpha, an=alpha an-1 for n>0:

class NewAlgorithm(ClosedFormGraphFilter):
    def __init__(self, alpha=0.85, **kwargs):
        """
        Instantiates the new algorithm.
        Args:
            alpha: Optional. The new algorithm'personalization parameter. Default value is 0.85.
        """
        super().__init__(**kwargs)
        self.alpha = alpha
    
    def _coefficient(self, previous_coefficient):
        if previous_coefficient is None:
            return 1-self.alpha
        return previous_coefficient*self.alpha
    ...
How to expose algorithm citations?

Exposing citations for your algorithm can be achieved by updating the list of references that would be generated by the inheriting graph filters. The list of references may already inherit properties of the algorithm's family (e.g. from ClosedFormGraphFilter in the previous example) and thus only the first one holding the algorithm's name should be edited. Additional references or parameter descriptions need to be placed just after the algorithm's name. Overall, the list of references will be concatenated based on natural language heuristics to create a comprehensive description. For instance, describing the algorithm and parameter of the previous example can be done per:

class NewAlgorithm(ClosedFormGraphFilter):
    ...
    def references(self):
        refs = super().references()
        refs[0] = "personalized PageRank \\cite{page1999pagerank}"
        refs.insert(1, f"restart probability {int((1-self.alpha)*1000)/1000.}")
        return refs
    ...

⚠️ Do not forget to follow the pull checklist.

✏️ Implementing New Postprocessors

Which class to subclass?

Postprocessors need to subclass the abstract pygrank.algorithms.postprocess.Postprocessor class.

Which method to override?

Postprocessors need to subclass the method _transform(self, ranks) where ranks is a graph signal -typically the outcome of some other node ranking algorithm (e.g. a graph filter). Implementations of this method can return any kind of data convertible to graph signals, such as dictionaries or backend primitives. When possible, use the latter (i.e. manipulate ranks.np with backend operations and return the result) to ensure faster computations through the graph filter pipeline, for example when iterative postprocessors are applied afterwards.

✏️ Implementing New Measures

TODO

✏️ Implementing New Tuners

Which class to subclass?

Tuners need to subclass the abstract pygrank.algorithms.autotune.tuning.Tuner class.

Which method to override?

Tuners need to override the method _tune(self, graph, personalization, *args, **kwargs) . This method's arguments and keyword arguments need to be passed to node ranking algorithm candidates and it shoul'd return a tuple tuned_algorithm, personalization, where the tuned algorithm is generated by the tuner (ideally through some scheme that is passed to the constructor) and the personalization signal should either be the original signal or a new one obtained through a seamless backend pipeline from the original, so that it is backpropagate-able. Keep in mind that tuned_algorithm(personalization) will be used to procure the tuner's outcome. At worst, return a Tautology() and the desired tuning outcome for the given personalization argument, but try to avoid this if possible. Do note, that this could be a necessary evil if Arnoldi or Lanczos decompositions are supported, but prefer returning meaningful algorithms when possible.

What to place in tuner constructors?

Ideally, new tuners should be able to obtain a preferred backend in which to perform tuning. Refer to the implementation of existing tuners as a guideline of how to switch between backends. Switching to different backends should ideally support backpropagation on the original signal; thus, tuners should construct optimal node ranking algorithms, which are backpropagate-able.

Tuners should support as many types of algorithms as possible and thus parameterize-able generic methods to construct such algorithms could be added to tuner constructors.