visrecall/WebInterface/Front-end/generate-experiment-files/main.ipynb

581 lines
51 KiB
Text

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Experimental set-up: ##\n",
"\n",
"This code will generate experimental files that can either be independently hosted on a website and run with recruited participants, or via our [MTurk iPython notebook](https://github.com/a-newman/mturk-api-notebook) be used for launching Amazon Mechanical Turk (MTurk) HITs. \n",
"\n",
"An experiment is composed of different sets of images:\n",
"* **target images** are the images you want to collect attention data on - those are images that you provide (in directory `sourcedir` below)\n",
"* **tutorial images** are images that will be shown to participants at the beginning of the experiment to get them familiarized with the codecharts set-up (you can reuse the tutorial image we provide, or provide your own in directory `tutorial_source_dir` below)\n",
" * *hint: if your images are very different in content from the images in our set, you may want to train your participants on your own images, to avoid a context switch between the tutorial and main experiment*\n",
"* **sentinel images** are images interspersed throughout the experiment where participant attention is guided to a very specific point on the screen, used as validation/calibration images to ensure participants are actually moving their eyes and looking where they're supposed to; the code below will intersperse images from the `sentinel_target_images` directory we provide throughout your experimental sequence\n",
" * sentinel images can be interspersed throughout both the tutorial and target images, or excluded from the tutorial (via `add_sentinels_to_tutorial` flag below); we recommend having sentinel images as part of the tutorial to familiarize participants with such images as well\n",
" \n",
"The code below will populate the `rootdir` task directory with #`num_subject_files` subject files for you, where each subject file corresponds to an experiment you can run on a single participant. For each subject file, a set of #`num_images_per_sf` will be randomly sampled from the `sourcedir` image directory. A set of #`num_sentinels_per_sf` sentinel images will also be sampled from the `sentinel_imdir` image directory, and distributed throughout the experiment. A tutorial will be generated at the beginning of the experiment with #`num_imgs_per_tutorial` randomly sampled from the `tutorial_source_dir` image directory, along with an additional #`num_sentinels_per_tutorial` sentinel files distributed throughout the tutorial (if `add_sentinels_to_tutorial` flag is set to true). "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import string\n",
"import random\n",
"import json \n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import base64 \n",
"import glob"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"sourcedir = '10' # replace this with your own directory of experiment images\n",
"filldir = '1' # the filler images for False alarm and Correct rejection\n",
"blurdir = '10_blur' # blurry images"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# PARAMETERS for generating subject files\n",
"num_subject_files = 1 # number of subject files to generate (i.e., # of mturk assignments that will be put up) \n",
"num_images_per_sf = 20 # number of target images per subject file \n",
"num_imgs_per_tutorial = 0 # number of tutorial images per subject file\n",
"num_sentinels_per_sf = 0 # number of sentinel images to distribute throughout the experiment (excluding the tutorial)\n",
"add_sentinels_to_tutorial = False # whether to have sentinel images as part of the tutorial\n",
"num_sentinels_per_tutorial = 0 # number of sentinel images to distribute throughout the tutorial"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another bit of terminology and experimental logistics involves **buckets** which are a way to distribute experiment stimuli so that multiple experiments can be run in parallel (and participants can be reused for different subsets of images). If you have a lot of images that you want to collect data on, and for each participant you are sampling a set of only #`num_images_per_sf`, then you might have to generate a large `num_subject_files` in order to have enough data points per image. A way to speed up data collection is to split all the target images into #`num_buckets` disjoint buckets, and then to generate subject files per bucket. Given that subject files generated per bucket are guaranteed to have a disjoint set of images, the same participant can be run on multiple subject files from different buckets. MTurk HITs corresponding to different buckets can be launched all at once. In summary, in MTurk terms, you can generate as many HITs as `num_buckets` specified below, and as many assignments per HIT as `num_subject_files`. \n",
"\n",
"The way the codecharts methodology works, a jittered grid of alphanumeric triplets appears after every image presentation (whether it is a target, sentinel, or tutorial image), since a participant will need to indicate where on the preceding image s/he looked, by reporting a triplet. To avoid generating an excessive number of codecharts (that bulks up all the subject files), we can reuse some codecharts across buckets. The way we do this is by pre-generating #`ncodecharts` codecharts, and then randomly sampling from these when generating the individual subject files."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# we pre-generate some codecharts and sentinel images so that we can reuse these across participants and buckets \n",
"# and potentially not have to generate as many files; these can be set to any number, and the corresponding code\n",
"# will just sample as many images as need per subject file\n",
"\n",
"ncodecharts = num_subject_files*num_images_per_sf # number of codecharts to generate; can be changed\n",
"sentinel_images_per_bucket = num_subject_files*num_sentinels_per_sf # can be changed\n",
"\n",
"# set these parameters\n",
"num_buckets = 1 # number of disjoint sets of subject files to create (for running multiple parallel HITs)\n",
"start_bucket_at = 0 # you can use this and the next parameter to generate more buckets if running the code later\n",
"which_buckets = [0] # a list of specific buckets e.g., [4,5,6] to generate experiment data for\n",
"\n",
"rootdir = '../assets/task_data' # where all the experiment data will be stored\n",
"if not os.path.exists(rootdir):\n",
" print('Creating directory %s'%(rootdir))\n",
" os.makedirs(rootdir)\n",
"\n",
"real_image_dir = os.path.join(rootdir,'real_images') # target images, distributed by buckets\n",
" # (shared across buckets)\n",
"sentinel_image_dir = os.path.join(rootdir,'sentinel_images') # sentinel images, distributed by buckets\n",
"sentinel_CC_dir = os.path.join(rootdir,'sentinel_CC') # codecharts corresponding to the sentinel images\n",
" # (shared across buckets)\n",
"#sentinel_targetim_dir = os.path.join(rootdir, 'sentinel_target') \n",
"real_blurred_dir = os.path.join(rootdir,'real_blurred')\n",
"real_filler_dir = os.path.join(rootdir,'real_filler')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"20 files copied from 10 to ../assets/task_data/all_images\n",
"Padding 20 image files to dimensions: [1734,1340]...\n",
"Done!\n"
]
}
],
"source": [
"# this cell creates an `all_images` directory, copies images from sourcedir, and pads them to the required dimensions\n",
"\n",
"import create_padded_image_dir\n",
"\n",
"all_image_dir = os.path.join(rootdir,'all_images')\n",
"if not os.path.exists(all_image_dir):\n",
" print('Creating directory %s'%(all_image_dir))\n",
" os.makedirs(all_image_dir)\n",
" \n",
"allfiles = []\n",
"for ext in ('*.jpeg', '*.png', '*.jpg'):\n",
" allfiles.extend(glob.glob(os.path.join(sourcedir, ext)))\n",
"print(\"%d files copied from %s to %s\"%(len(allfiles),sourcedir,all_image_dir))\n",
" \n",
"image_width,image_height = create_padded_image_dir.save_padded_images(all_image_dir,allfiles)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"20 files copied from 10_blur to ../assets/task_data/blurred_images\n",
"Padding 20 image files to dimensions: [1734,1340]...\n",
"Done!\n"
]
}
],
"source": [
"blurred_image_dir = os.path.join(rootdir,'blurred_images')\n",
"\n",
"if not os.path.exists(blurred_image_dir):\n",
" print('Creating directory %s'%(blurred_image_dir))\n",
" os.makedirs(blurred_image_dir)\n",
" \n",
"blurfiles = []\n",
"for ext in ('*.jpeg', '*.png', '*.jpg'):\n",
" blurfiles.extend(glob.glob(os.path.join(blurdir, ext)))\n",
"print(\"%d files copied from %s to %s\"%(len(blurfiles),blurdir,blurred_image_dir))\n",
" \n",
"image_width,image_height = create_padded_image_dir.save_padded_images(blurred_image_dir,blurfiles)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"20 files copied from 1 to ../assets/task_data/filler_images\n",
"Padding 20 image files to dimensions: [1786,1340]...\n",
"Done!\n"
]
}
],
"source": [
"filler_image_dir = os.path.join(rootdir,'filler_images')\n",
"\n",
"if not os.path.exists(filler_image_dir):\n",
" print('Creating directory %s'%(filler_image_dir))\n",
" os.makedirs(filler_image_dir)\n",
" \n",
"fillerfiles = []\n",
"for ext in ('*.jpeg', '*.png', '*.jpg'):\n",
" fillerfiles.extend(glob.glob(os.path.join(filldir, ext)))\n",
"print(\"%d files copied from %s to %s\"%(len(fillerfiles),filldir,filler_image_dir))\n",
" \n",
"image_width,image_height = create_padded_image_dir.save_padded_images(filler_image_dir,fillerfiles)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Distributing images across 1 buckets\n",
"Populating ../assets/task_data/real_images/bucket0 with 20 images\n"
]
}
],
"source": [
"# this cell creates the requested number of buckets and distributes images from `all_image_dir` to the corresponding\n",
"# bucket directories inside `real_image_dir`\n",
"\n",
"from distribute_image_files_by_buckets import distribute_images\n",
"\n",
"distribute_images(all_image_dir,real_image_dir,num_buckets,start_bucket_at)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Distributing images across 1 buckets\n",
"Populating ../assets/task_data/real_blurred/bucket0 with 20 images\n"
]
}
],
"source": [
"distribute_images(blurred_image_dir,real_blurred_dir,num_buckets,start_bucket_at)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Distributing images across 1 buckets\n",
"Populating ../assets/task_data/real_filler/bucket0 with 20 images\n"
]
}
],
"source": [
"distribute_images(filler_image_dir,real_filler_dir,num_buckets,start_bucket_at)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We create sentinel images by taking a small object (one of a: fixation cross, red dot, or image of a face) and choosing a random location for it on a blank image (away from the image boundaries by at least `border_padding` pixels). The code below creates #`sentinel_images_per_bucket` such sentinel images in each bucket. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that all the previous cells have generated the requisite image, codechart, sentinel, and tutorial files, the following code will generate `num_subject_files` individual subject files by sampling from the appropriate image directories and creating an experimental sequence. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Only run this if you want new subject json"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"../assets/task_data/real_blurred/bucket0/*.jpg\n",
"Generating 1 subject files in bucket 0\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"wsj265.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"treasuryD07_3.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"economist_daily_chart_242.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"economist_daily_chart_257.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"economist_daily_chart_243.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"whoJ43_1.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"economist_daily_chart_150.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"3iRYXLvZ8oVQDMLR-CebnQ==.0.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"economist_daily_chart_194.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"whoQ12_2.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"whoF03.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"wsj3.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"whoB10_1.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"3LY3OX8bU7uKhgcRPgDRxw==.0.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"economist_daily_chart_262.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"economist_daily_chart_260.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"0BmWZbQdEukHi79Lit01oQ==.0.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"wsj86.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"whoJ36_2.json\n",
"['../assets/task_data/real_blurred/bucket0/wsj265.png', '../assets/task_data/real_blurred/bucket0/treasuryD07_3.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_242.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_257.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_243.png', '../assets/task_data/real_blurred/bucket0/whoJ43_1.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_150.png', '../assets/task_data/real_blurred/bucket0/3iRYXLvZ8oVQDMLR-CebnQ==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_194.png', '../assets/task_data/real_blurred/bucket0/whoQ12_2.png', '../assets/task_data/real_blurred/bucket0/whoF03.png', '../assets/task_data/real_blurred/bucket0/wsj3.png', '../assets/task_data/real_blurred/bucket0/whoB10_1.png', '../assets/task_data/real_blurred/bucket0/3LY3OX8bU7uKhgcRPgDRxw==.0.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_262.png', '../assets/task_data/real_blurred/bucket0/economist_daily_chart_260.png', '../assets/task_data/real_blurred/bucket0/0BmWZbQdEukHi79Lit01oQ==.0.png', '../assets/task_data/real_blurred/bucket0/wsj86.png', '../assets/task_data/real_blurred/bucket0/whoJ36_2.png', '../assets/task_data/real_blurred/bucket0/whoK21.png']\n",
"whoK21.json\n",
"Subject file ../assets/task_data/subject_files/bucket0/subject_file_0.json DONE\n"
]
}
],
"source": [
"gt_answers = []\n",
"reco_answers = []\n",
"start_subjects_at = 0 # where to start creating subject files at (if had created other subject files previously)\n",
"#if os.path.exists(os.path.join(rootdir,'subject_files/bucket0')):\n",
"# subjfiles = glob.glob(os.path.join(rootdir,'subject_files/bucket0/*.json'))\n",
"# start_subjects_at = len(subjfiles)\n",
"\n",
"\n",
"\n",
"## GENERATING SUBJECT FILES \n",
"subjdir = os.path.join(rootdir,'subject_files')\n",
"if not os.path.exists(subjdir):\n",
" os.makedirs(subjdir)\n",
" #os.makedirs(os.path.join(rootdir,'full_subject_files'))\n",
"\n",
"\n",
" \n",
"# iterate over all buckets \n",
"for b in range(len(which_buckets)): \n",
"\n",
" bucket = 'bucket%d'%(which_buckets[b])\n",
" img_bucket_dir = os.path.join(real_image_dir,bucket)\n",
" blur_bucket_dir = os.path.join(real_blurred_dir,bucket)\n",
" filler_bucket_dir = os.path.join(real_filler_dir,bucket)\n",
" img_files = []\n",
" blur_files = []\n",
" filler_files = []\n",
" for ext in ('*.jpeg', '*.png', '*.jpg'):\n",
" img_files.extend(glob.glob(os.path.join(img_bucket_dir, ext)))\n",
" \n",
" for ext in ('*.jpeg', '*.png', '*.jpg'):\n",
" blur_files.extend(glob.glob(os.path.join(blur_bucket_dir, ext)))\n",
" print(blur_files)\n",
" for ext in ('*.jpeg', '*.png', '*.jpg'):\n",
" filler_files.extend(glob.glob(os.path.join(filler_bucket_dir, ext)))\n",
" \n",
" filler_files.extend(img_files) #40 images for recognition task\n",
" random.shuffle(filler_files)\n",
" \n",
" print(os.path.join(blur_bucket_dir, ext))\n",
" #sentinel_bucket_dir = os.path.join(sentinel_image_dir,bucket)\n",
" #sentinel_files = glob.glob(os.path.join(sentinel_bucket_dir,'*.jpg'))\n",
" \n",
" #with open(os.path.join(sentinel_bucket_dir,'sentinel_codes_full.json')) as f:\n",
" # sentinel_codes_data = json.load(f) # contains mapping of image path to valid codes\n",
" \n",
" subjdir = os.path.join(rootdir,'subject_files',bucket)\n",
" if not os.path.exists(subjdir):\n",
" os.makedirs(subjdir)\n",
" #os.makedirs(os.path.join(rootdir,'full_subject_files',bucket))\n",
" \n",
" print('Generating %d subject files in bucket %d'%(num_subject_files,b))\n",
" # for each bucket, generate subject files \n",
" for i in range(num_subject_files):\n",
" \n",
" #random.shuffle(img_files)\n",
" #random.shuffle(sentinel_files)\n",
" #random.shuffle(real_codecharts)\n",
"\n",
" # for each subject files, add real images \n",
" sf_data = []\n",
" full_sf_data = []\n",
"\n",
" \n",
" # initialize temporary arrays, because will shuffle real & sentinel tutorial images before adding to\n",
" # final subject files\n",
" sf_data_temp = []\n",
" full_sf_data_temp = []\n",
" \n",
" \n",
" \n",
" \n",
" # ADDING REAL IMAGES \n",
" for j in range(int(num_images_per_sf/2)):\n",
" for k in range(2):\n",
" image_data = {}\n",
" image_data[\"image\"] = img_files[j*2+k] # stores image path \n",
"\n",
" # select a code chart\n",
" #pathname = real_codecharts[j*2+k] # since shuffled, will pick up first set of random codecharts\n",
"\n",
" #image_data[\"codechart\"] = pathname # stores codechart path \n",
" #image_data[\"codes\"] = real_codes_data[pathname]['valid_codes'] # stores valid codes \n",
" image_data[\"flag\"] = 'real' # stores flag of whether we have real or sentinel image\n",
"\n",
" full_image_data = image_data.copy() # identical to image_data but includes a key for coordinates\n",
" #full_image_data[\"coordinates\"] = real_codes_data[pathname]['coordinates'] # store locations - (x, y) coordinate of each triplet \n",
"\n",
" sf_data.append(image_data)\n",
" full_sf_data.append(full_image_data)\n",
" \n",
" for w in range(2):\n",
" blur_data = {}\n",
" print(blur_files)\n",
" blur_data['image'] = blur_files[j*2+w]\n",
" # on Windows comment this out\n",
" #QA_file_name = blur_files[j*2+w].split('\\\\')[-1][:-4]+'.json'\n",
" QA_file_name = blur_files[j*2+w].split('/')[-1][:-4]+'.json'\n",
" \n",
" with open(os.path.join(sourcedir,QA_file_name)) as f:\n",
" print(QA_file_name)\n",
" blur_data['QA'] = json.load(f)\n",
" #print(blur_data['QA'])\n",
" for item in blur_data['QA']:\n",
" if 'answer' in blur_data['QA'][item]:\n",
" gt_answers.append(blur_data['QA'][item]['answer'])\n",
" #pathname = real_codecharts[j*2+w] # since shuffled, will pick up first set of random codecharts\n",
"\n",
" #blur_data[\"codechart\"] = pathname # stores codechart path \n",
" #blur_data[\"codes\"] = real_codes_data[pathname]['valid_codes'] # stores valid codes \n",
" blur_data[\"flag\"] = 'blur'\n",
" \n",
" full_blur_data = blur_data.copy()\n",
" #full_blur_data[\"coordinates\"] = real_codes_data[pathname]['coordinates']\n",
" \n",
" sf_data.append(blur_data)\n",
" full_sf_data.append(full_blur_data)\n",
" \n",
" for img in filler_files:\n",
" filler_data = {}\n",
" filler_data[\"image\"] = img # stores image path \n",
"\n",
" # select a code chart\n",
" #pathname = real_codecharts[1] # since shuffled, will pick up first set of random codecharts\n",
"\n",
" #filler_data[\"codechart\"] = pathname # stores codechart path \n",
" #filler_data[\"codes\"] = real_codes_data[pathname]['valid_codes'] # stores valid codes \n",
" filler_data[\"flag\"] = 'fill' # stores flag of whether we have real or sentinel image\n",
" filler_data[\"showed\"] = \"real_images\" in img\n",
" if filler_data[\"showed\"] == True:\n",
" reco_answers.append('1')\n",
" else:\n",
" reco_answers.append('2')\n",
"\n",
" full_filler_data = filler_data.copy() # identical to image_data but includes a key for coordinates\n",
" #full_filler_data[\"coordinates\"] = real_codes_data[pathname]['coordinates'] # store locations - (x, y) coordinate of each triplet \n",
"\n",
" sf_data.append(filler_data)\n",
" full_sf_data.append(full_filler_data)\n",
" \n",
" \n",
"\n",
" \n",
"\n",
" # Add an image_id to each subject file entry\n",
" image_id = 0 # represents the index of the image in the subject file \n",
" for d in range(len(sf_data)): \n",
" sf_data[d]['index'] = image_id\n",
" full_sf_data[d]['index'] = image_id\n",
" image_id+=1\n",
"\n",
" subj_num = start_subjects_at+i\n",
" with open(os.path.join(rootdir,'subject_files',bucket,'subject_file_%d.json'%(subj_num)), 'w') as outfile: \n",
" print('Subject file %s DONE'%(outfile.name))\n",
" json.dump(sf_data, outfile)\n",
" with open(os.path.join(rootdir,'full_subject_files',bucket,'subject_file_%d.json'%(subj_num)), 'w') as outfile: \n",
" json.dump(full_sf_data, outfile)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"100"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(gt_answers)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"40"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(reco_answers)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"#import numpy as np\n",
"#np.save('gt_answers2',gt_answers)\n",
"#np.save('recogt_answers2',reco_answers)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}