Published October 12, 2023
| Version v4
DSSGx 2023 - NRW Bebauungspläne (nrw-bplan-scrape)
Description
Description
This dataset contains all inputs needed as well as outputs of running the full pipeline for creating the NRW land sealing dataset. This can be reproduced by running this notebook.
Dataset Structure
- nrw
- bplan
- features
- keywords
- exact_search
baunvo_keywords.csv: Results y/n of keywords found in documents relating to baunvo and article 13b.
- fuzzy_search:
keyword_dict_hochwasser.json: Results of keywords found in documents relating to "hochwasser", e.g. hqhäufig and hq100- contains 7 csv files with results of fuzzy key search for keywords. The file name indicates the key being searched for and the text around this keyword is extracted in a row for each document
- exact_search
- keywords
- raw
-
images: images from here can be added to this folder
-
links:
NRW_BP.geojson: The file downloaded from the NRV geoportal, containing all raw data on URLs to land parcel bplans.land_parcels.geojson: A processed version of NRW_BP.geojson.NRW_BP_parsed_links.csv: A csv formatted version of NRW_BP.geojson.
-
text:
bp_text.json: Raw output of the text text extraction of each pdf. Contains only columns for the filename and the extracted text.document_texts.json: Enriched version of bp_texts.json in which columns about the documents have been appended.
-
pdfs: pdfs extarcted from the NRW Geoportal and are found here, can be added to this folder
-
- knowledge_extraction_agent: Contains 6 json files. The filename corresponds to the key looked for in the fuzzy keyword search (e.g.
fh.jsoncooresponds tofirsthöhe.csv,gfz.jsoncorrresponds togeschossflächenzahl.csv). More info can be found here knowledge_agent_output.json: Is a toy example for 10 files of the output of the pipeline for the knowledge agent (merging of results innrw/bplan/knowledge_extraction_agent)
- features
- clean
document_texts.xlsx: See here for more informationexact_keyword.xlsx: This corresponds to baunvo_keywords.csv.fuzzy_keyword.xlsx: Is the merged version of the files found innrw/bplan/fuzzy_searchknowledge_agent.xlsx: The .xlsx version ofnrw/bplan/knowledge_agent_output.json)land_parcels.xlsx: See here for more informationregional_plans.xlsx: The .xlsx version of the data table found here
- rplan
- features: contains
regional_plan_sections.json, the output of the pipeline - a more detailed can be found here - raw
- geo: contains
regions_map.geojsonwith information on the geolocations of the regional plans - pdfs: contains pdfs of regional plans for NRW - used as input to run the pipeline
- text: contains text extracted with Tika from all pdf regional plans
- geo: contains
- features: contains
- bplan
Contact
-
Homepage: DSSGx Munich organization page.
Details
| Resource type | Funded research project dataset |
| Title | DSSGx 2023 - NRW Bebauungspläne (nrw-bplan-scrape) |
| Creators |
|
| Research Fields | Other |
| Size | 802 MB |
| Formats | ZIP archive (.zip) |
| External Resource | https://huggingface.co/datasets/DSSGxMunich/nrw-bplan-scrape/blob/main/data.zip |
| Countries | Germany |