Skip to content

Commit

Permalink
Lazy loading (#87)
Browse files Browse the repository at this point in the history
Implement compact in-memory representation and lazy loading

  * [BREAKING] serialization module has been removed, instead, each class now
    provides a `serialize` method as well as a static method `deserialize`.
  * [BREAKING] FiltersEngine now exposes different methods for update:
    `update` which expects a diff of filters, `updateList` and
    `updateResources`. This API should be a cleared and allows using the
    adblocker without managing filters lists.
  * [BREAKING] ReverseIndex' API dropped the use of a callback to specify
    filters and instead expects a list of filters.
  * [BREAKING] parsing and matching filters can now be done using methods of
    the filters classes directly instead of free functions. For example
    NetworkFilter has a `parse` and `match` method (with the same expected
    arguments).
  * ReverseIndex is now implemented using a very compact
    representation (stored in a typed array).
  * `toString` method of filters should now be more accurate.
  * Addition of numerous unit tests (coverage is now >90%)
  • Loading branch information
remusao authored Jan 28, 2019
1 parent 679c800 commit 2b90a75
Show file tree
Hide file tree
Showing 39 changed files with 5,877 additions and 3,718 deletions.
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,23 @@

*not released yet*

* Implement lazy loading and compact internal representation [#87](https://github.com/cliqz-oss/adblocker/pull/87)
* [BREAKING] serialization module has been removed, instead, each class now
provides a `serialize` method as well as a static method `deserialize`.
* [BREAKING] FiltersEngine now exposes different methods for update:
`update` which expects a diff of filters, `updateList` and
`updateResources`. This API should be a cleared and allows using the
adblocker without managing filters lists.
* [BREAKING] ReverseIndex' API dropped the use of a callback to specify
filters and instead expects a list of filters.
* [BREAKING] parsing and matching filters can now be done using methods of
the filters classes directly instead of free functions. For example
NetworkFilter has a `parse` and `match` method (with the same expected
arguments).
* ReverseIndex is now implemented using a very compact
representation (stored in a typed array).
* `toString` method of filters should now be more accurate.
* Addition of numerous unit tests (coverage is now >90%)
* Implement support for :style cosmetic filters [#86](https://github.com/cliqz-oss/adblocker/pull/86)
* [BREAKING] `getCosmeticsFilters` will now return CSS as a single string
(stylesheet) instead of a list of selectors. This simplifies the usage and
Expand Down
145 changes: 125 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,152 @@
# Adblocker

A fast, pure-JavaScript content-blocking library made by Cliqz.
A fast and memory efficient, pure-JavaScript content-blocking library made by Cliqz.

This library is the building block technology used to power Cliqz and Ghostery's Adblocking. Being a pure JavaScript library, it can be used for various purposes such as:
This library is the building block technology used to power Ghostery and
Cliqz' Adblocking. Being a pure JavaScript library it is trivial to include in
any new project and can also be used as a building block for tooling. For
example this library can be used for:

* Building a content-blocking extension (see [this example](./example) for a minimal content-blocking webextension)
* Building tooling around popular block-lists such as [EasyList](https://github.com/easylist/easylist)
* Converting between various formats of filters (EasyList, Safari Block Lists, etc.)
* Detecting duplicates in lists
- validating filters
- normalizing filters
- detecting redundant filters
* Detecting dead domains
* etc.

The library provides the low-level implementation to fetch, parse and match filters; which makes it possible to manipulate the lists at a high level.
The library provides abstractions to manipulate filters at a low-level.

## Development
## Getting Started

This package can be installed directly from `npm`:

Install dependencies:
```sh
$ npm install
$ npm install @cliqz/adblocker
```

Build:
Or you can install it from sources directly:
```sh
$ npm ci
$ npm pack
$ npm run test
```

Test:
```sh
$ npm run test
Multiple bundles are provided in the `dist` folder.

## Usage


### Network Filters

Here is how one can parse and match individual *network* filters:

```javascript
const { NetworkFilter, Request } = require('@cliqz/adblocker');

// 1. Parsing
const filter = NetworkFilter.parse('||domain.com/ads.js$script');

// 2. Matching
filter.match(new Request({
type: 'script',

domain: 'domain.com',
hostname: 'domain.com',
url: 'https://domain.com/ads.js?param=42',

sourceUrl: 'https://domain.com/',
sourceHostname: 'domain.com',
sourceDomain: 'domain.com',
}));
```

You can use the following bundle: `adblocker.umd.js`.
Matching requires you to provide an instance of `Request` which knows
about the type of the request (e.g.: `main_frame`, `script,` etc.) as
well as the URL, hostname and domain of the request and *source URL*.
To make things a bit easier, the library exposes a `makeRequest` helper
which can be used alongside a library like `tldts` (or another library
providing parsing of hostnames and domains) to provide the parsing:

```javascript
const tldts = require('tldts');
const { NetworkFilter, makeRequest } = require('@cliqz/adblocker');

// 1. Parsing
const filter = NetworkFilter.parse('||domain.com/ads.js$script');

// 2. Matching
filter.match(makeRequest({
type: 'script',
url: 'https://domain.com/ads.js',
}, tldts)); // true
```

### Cosmetic Filters

Here is how one can parse and match individual *cosmetic* filters:

```javascript
const { CosmeticFilter } = require('@cliqz/adblocker');

// 1. Parsing
const filter = CosmeticFilter.parse('domain.*,domain2.com###selector');

// 2. The `match` method expects a hostname and domain as arguments
filter.match('sub.domain.com', 'domain.com'); // true
```

### Filters Engine

Manipulating filters at a low level is useful to build tooling or debugging, but
to perform efficient matching we need to use `FiltersEngine` which can be seen
as a "container" for both network and cosmetic filters. The filters are
organized in a very compact way and allow fast matching against requests.

```javascript
const { FiltersEngine } = require('@cliqz/adblocker');

const engine = FiltersEngine.parse(`
! This is a custom list
||domain.com/ads.js$script
###selector
domain.com,entity.*##+js(script,args1)
`);

// It is possible to serialize the full engine to a typed array for caching
const serialized = engine.serialize();
const deserialized = FiltersEngine.deserialize(serialized);

// Matching network filters
const {
match, // `true` if there is a match
redirect, // data url to redirect to if any
exception, // instance of NetworkFilter exception if any
filter, // instance of NetworkFilter which matched
} = engine.match(new Request(...));

// Matching CSP (content security policy) filters
const directives = engine.getCSPDirectives(new Request(...));

// Matching cosmetic filters
const {
styles, // stylesheet to inject in the page
scripts, // Array of scriptlets to inject in the page
} = engine.getCosmeticFilters('sub.domain.com', 'domain.com');
```

## Releasing Checklist
# Release Checklist

To publish a new version:

1. Update `version` in [package.json](./package.json)
2. Update [CHANGELOG.md](./CHANGELOG.md)
3. New commit on local `master` branch (e.g.: `Release vx.y.z`)
5. Make release PR with your commit
6. Merge and create new Release on GitHub
6. Travis takes care of the rest!
1. Create a new branch (e.g.: `release-x.y.z`)
2. Update `version` in [package.json](./package.json)
3. Update [CHANGELOG.md](./CHANGELOG.md)
4. Create a release commit (e.g.: "Release vx.y.z")
5. Create a PR for the release
6. Merge and create a new Release on GitHub
7. Travis takes care of the rest!

## License

Expand Down
12 changes: 3 additions & 9 deletions bench/micro.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,12 @@ function benchEngineCreation({ lists, resources }) {
});
}

function benchEngineOptimization({ engine }) {
return engine.optimize();
}

function benchEngineSerialization({ engine }) {
return engine.serialize();
}

function benchEngineDeserialization({ serialized }) {
return adblocker.deserializeEngine(serialized, 1);
return adblocker.FiltersEngine.deserialize(serialized);
}

function benchStringHashing({ filters }) {
Expand All @@ -42,11 +38,10 @@ function benchParsingImpl(lists, { loadNetworkFilters, loadCosmeticFilters }) {
let dummy = 0;

for (let i = 0; i < lists.length; i += 1) {
dummy = (dummy + adblocker.parseList(lists[i], {
dummy = (dummy + adblocker.parseFilters(lists[i], {
loadNetworkFilters,
loadCosmeticFilters,
debug: false,
}).length) % 100000;
}).networkFilters.length) % 100000;
}

return dummy;
Expand All @@ -70,7 +65,6 @@ function benchNetworkFiltersParsing({ lists }) {
module.exports = {
benchCosmeticsFiltersParsing,
benchEngineCreation,
benchEngineOptimization,
benchEngineSerialization,
benchEngineDeserialization,
benchNetworkFiltersParsing,
Expand Down
5 changes: 1 addition & 4 deletions bench/run_benchmark.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ const {

const {
benchEngineCreation,
benchEngineOptimization,
benchEngineSerialization,
benchEngineDeserialization,
benchNetworkFiltersParsing,
Expand Down Expand Up @@ -68,7 +67,7 @@ function triggerGC() {

function getMemoryConsumption() {
triggerGC();
return process.memoryUsage().heapUsed / 1024 / 1024;
return process.memoryUsage().heapUsed;
}


Expand Down Expand Up @@ -136,7 +135,6 @@ function runMicroBenchmarks(lists, resources) {
};

[
benchEngineOptimization,
benchStringHashing,
benchCosmeticsFiltersParsing,
benchStringTokenize,
Expand Down Expand Up @@ -176,7 +174,6 @@ function runMemoryBench(lists, resources) {
const { engine, serialized } = createEngine(lists, resources, {
loadCosmeticFilters: true,
loadNetworkFilters: true,
optimizeAOT: true,
}, true /* Also serialize engine */);
const engineMemory = getMemoryConsumption() - baseMemory;

Expand Down
17 changes: 6 additions & 11 deletions bench/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,12 @@ const fs = require('fs');
const adblocker = require('../dist/adblocker.cjs.js');

function createEngine(lists, resources, options = {}, serialize = false) {
const engine = new adblocker.FiltersEngine({
...options,
version: 1,
});

engine.onUpdateResource([{ filters: resources, checksum: '' }]);
engine.onUpdateFilters(lists.map((list, i) => ({
asset: `${i}`,
checksum: '',
filters: lists[i],
})), new Set());
const engine = adblocker.FiltersEngine.parse(
lists.join('\n'),
options,
);

engine.updateResources(resources, '');

return {
engine,
Expand Down
37 changes: 21 additions & 16 deletions example/background.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,6 @@ import * as adblocker from '../index';
* should be blocked or altered.
*/
function loadAdblocker() {
const engine = new adblocker.FiltersEngine({
enableOptimizations: true,
loadCosmeticFilters: true,
loadNetworkFilters: true,
optimizeAOT: true,
});

console.log('Fetching resources...');
return Promise.all([adblocker.fetchLists(), adblocker.fetchResources()]).then(
([responses, resources]) => {
Expand All @@ -26,16 +19,28 @@ function loadAdblocker() {
}
}

engine.onUpdateResource([{ filters: resources, checksum: '' }]);
engine.onUpdateFilters([
{
asset: 'filters',
checksum: '',
filters: [...deduplicatedLines].join('\n'),
},
]);
let t0 = Date.now();
const engine = adblocker.FiltersEngine.parse([...deduplicatedLines].join('\n'));
let total = Date.now() - t0;
console.log('parsing filters', total);

t0 = Date.now();
engine.updateResources(resources, '' + adblocker.fastHash(resources));
total = Date.now() - t0;
console.log('parsing resources', total);

t0 = Date.now();
const serialized = engine.serialize();
total = Date.now() - t0;
console.log('serialization', total);
console.log('size', serialized.byteLength);

t0 = Date.now();
const deserialized = adblocker.FiltersEngine.deserialize(serialized);
total = Date.now() - t0;
console.log('deserialization', total);

return adblocker.deserializeEngine(engine.serialize());
return deserialized;
},
);
}
Expand Down
12 changes: 4 additions & 8 deletions index.ts
Original file line number Diff line number Diff line change
@@ -1,20 +1,16 @@
// Cosmetic injection
export { default as injectCosmetics, IMessageFromBackground } from './src/cosmetics-injection';

// Blocking
export { default as FiltersEngine } from './src/engine/engine';
export { default as ReverseIndex } from './src/engine/reverse-index';
export { default as Request, makeRequest } from './src/request';
export { deserializeEngine } from './src/serialization';
export { default as CosmeticFilter } from './src/filters/cosmetic';
export { default as NetworkFilter } from './src/filters/network';

export { default as matchCosmeticFilter } from './src/matching/cosmetics';
export { default as matchNetworkFilter } from './src/matching/network';

export { parseCosmeticFilter } from './src/parsing/cosmetic-filter';
export { parseNetworkFilter } from './src/parsing/network-filter';
export { f, parseList } from './src/parsing/list';
export { f, List, default as Lists, parseFilters } from './src/lists';

export { compactTokens, hasEmptyIntersection, mergeCompactSets } from './src/compact-set';

export { fetchLists, fetchResources } from './src/fetch';
export { tokenize, fastHash, updateResponseHeadersWithCSP } from './src/utils';
export { default as StaticDataView } from './src/data-view';
Loading

0 comments on commit 2b90a75

Please sign in to comment.