Skip to content

Content analysis for Hacklab Jyväskylä website, used for enhancing content structure.

Notifications You must be signed in to change notification settings

HacklabJKL/jkl.hacklab.fi_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset description

WordPress eXtended RSS file hacklabjyvskyl.WordPress.2021-12-26.xml exported via Wordpress Admin Panel Tools > Export including all post data is converted by load_posts.py into /data/hacklabjkl-blog.json TinyDB json database file. Original XML file not included in repository as it contains irrelevant and possibly private data such as blog author email addresses.

Database file is processed in process_posts.py via Neuwo.ai API /getAiTopics providing Finnish/English NLP keyword analysis that is stored in neuwo_data key.

There was Unicode UTF-8 encoding problems on Mac, replaced \u00f6 \u00e4 characters with ö and ä on text editor.

Neuwo API results

neuwo_data json key contains:

  • tags: Keywords in Finnish
    • value: tag name
    • score: relevance score in decimal 0-1
    • URI: Neuwo side url for tag, not useful for now
  • marketing_categories: IAB 2.2 content taxonomy, international standard format for content categorization in two levels (tiers) https://iabtechlab.com/standards/content-taxonomy/
    • label: IAB taxonomy label
    • ID
    • relevance: relevance score in decimal 0-1
  • brand_safety: used for marketing purposes, something like indicating nsfw-content

Demonstrative image

Ideas

  • Create summary statistics and visualizations (tag cloud etc) from Neuwo data
  • Defining useful taxonomy structure for the Hacklab Jkl blog
  • Mapping certain Neuwo keyword suggestions into choosen Hacklab Jkl taxonomy structure
  • Finding way to utilise https://taxopress.com or other tag tools on Wordpress when creating blog content in the future

Included simple example script statistics_test.py accessing json data file via TinyDB.

About

Content analysis for Hacklab Jyväskylä website, used for enhancing content structure.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages