

That’s an interesting read. I’ll definitely give json a try too.
That’s an interesting read. I’ll definitely give json a try too.
That’s good to know.
Gonna be honest, I’ll need to research a bit more what validating against a schema is, but I get the general idea, and I like it.
For initial testing and prototypes, I probably won’t worry about validation, but once I get to the point of refining the system, validation like that would be a good idea.
One concern I’m seeing from other comments is that I may have more data than SQLite is ideal for. I have thousands of stories (My estimate is between 10 and 40 thousand), and many of the stories can be several pages long.
Gotcha. I think I’m aiming for something that runs off a single program. I want to be able to start it up whenever or even transfer it to a drive and use it on something like my laptop. Your idea sounds like it may work, but I’ll have to give it a deeper look.
I’m not entirely sure yet, but probably yes to both. The story text will likely stay unchanged, but I’ll likely experiment with various ways to analyze the stories.
The main idea I want to try is assigning stories “likely tags” based on the frequency of keywords. So castle and sword could indicate fantasy while robot and ship could indicate sci-fi. There are a lot of stories missing tags, so something like this would be helpful.
What’s your reasoning for that?
At this point, I think I’ll only use yaml as the scraper output and then create a database tool to convert that into whatever data format I end up using.
A few keywords in there I’ll have to look up, but I get the majority of it.
Yeah, I’m not too sure yet how complex the tags will be in the end. They are basically genres at the start, but I may make them more complex as I go.
After reading some of the other comments, I doubt I’ll use yaml as the main storage method. I do like the idea of using yaml for the scraper output though. Would give me a nice way to organize the data elements for each story in a way that can be easily read when needed.
Is this something that can be run locally without a server? I’m aiming for something as simple as opening the notes app on your phone and selecting a story.
That’s a good idea! Would yaml be alright for this too? I like the readability and Python styled syntax compared to json.
Did not know that. I’ll keep that in mind.
Don’t know the limits of Yaml, especially for large chunks of data, but I do like its easy readability and similarity to Python. I’ll probably try out a bit of yaml as well as some of the other recommendations other have given me.
I do like the sound of that.
I’m not too worried about performance, since, once everything is running, most of the operations will only be ran every few weeks or so. Don’t want it slowing to a crawl for sure though.
The text search looks promising. I’ve had the idea of automating “likely tags” that look for keywords (sword = fantasy while spaceship = sci-fi). It’s not perfect, but it could be useful to roughly categorize all the stories that are missing tags.
For a little context on the scoville scale:
After reading some of the other comments, I’m definitely going to separate the systems. I’ll use something like json or yaml as the output for the raw scraped data, and some sort of database for the final program.