This repository began as a 7th-semester minor project and evolved into our 8th-semester major project, "Advanced Stock Price Forecasting Using a Hybrid Model of Numerical and Textual Analysis." It utilizes Python, NLP (NLTK, spaCy), ML models, Grafana, InfluxDB, and Streamlit for data analysis and visualization.
Stock-Market-Prediction
This repository began as a 7th-semester minor project and evolved into our 8th-semester major project, "Advanced Stock Price Forecasting Using a Hybrid Model of Numerical and Textual Analysis." It utilizes Python, NLP (NLTK, spaCy), ML models, Grafana, InfluxDB, and Streamlit for data analysis and visualization.
Project Description
The Advanced Stock Price Forecasting Using a Hybrid Model of Numerical and Textual Analysis project involves a comprehensive approach to predicting stock prices using both numerical data and textual analysis. The project components include:
Data Collection and Storage: We gathered historical stock data of major companies and stored it in an InfluxDB database to efficiently handle large-scale time-series data.
Data Visualization: A Grafana dashboard has been set up for real-time visualization of stock prices and analysis results, enhancing data interpretation and decision-making processes.
Textual Analysis for Enhanced Forecasting: We utilized Natural Language Processing (NLP) libraries, such as NLTKβ¦
For my college major project, I built a Stock Market Prediction app using Streamlit. The app includes interactive visualizations, model evaluation, live predictions, and dashboards powered by tools like Grafana and Power BI.
Initially, the Streamlit app had over 2500+ lines of code β all in a single file π΅. I had separate versions for local and deployed apps because some file paths were broken in the production environment.
This made even small changes (like adding a new function) a complete nightmare.
It had all functions β data handling, UI logic, visualizations, predictions β bundled into one giant file. This made debugging, testing, and adding new features quite challenging and error-prone.
While the code was already modular in terms of function structure, everything lived in a single place. So I decided to properly refactor it β extracting each functional block into its own dedicated file. Thatβs why the term "refactoring" is used in the cover image, and "modularization steps" are outlined throughout the article.
So, I took the modular functions and placed them in separate files, organizing the code for better structure and clarity. π
Why Modularization? π‘
Codebase Modularization means breaking a large, messy file into independent, reusable, and manageable modules.
π’ Visualizing the Idea
Imagine your codebase as a large building β with a central hall and multiple rooms where each person (function) lives. Initially, everything is connected, but itβs a bit noisy, and the boundaries are blurry.
π Now, you refactor (transform) the setup:
You redesign the structure into separate flats. Each flat is self-contained, cleaner, and still linked through a central lobby (like a main() function). People (functions) still collaborate, but now with clearer separation, more peace, and organized communication.
This transformation reflects how we modularize code: splitting logic into individual function files, cleaning dependencies, and improving maintainability β all while preserving the original connections.
πΌοΈ This is what Iβm trying to visualize in my cover image β like transforming a shared hall into peaceful, connected flats. π’β‘οΈποΈ
Curious if the cover made sense and looked good to you! πβ¨
Pros and Cons βοΈ
β Pros (in My Appβs Context)
π§ͺ Easier Debugging and Testing: With functions and modules isolated, testing becomes straightforward, allowing for easier identification and resolution of bugs in specific areas, without impacting other parts of the app.
π§Ή Better Readability and Code Structure: Modularization ensures the code is clean and well-organized, improving readability. Each module is purpose-driven, making it simple for anyone β including future developers β to understand the logic and flow.
π Independent Development/Deployment of Modules: Each module is developed and deployed independently, allowing updates or changes in one part of the project without affecting the rest of the app. This separation streamlines the workflow and keeps the process efficient.
π Scalable for Future Improvements: The modular structure is built for growth, making it easy to expand the app by adding new features without disturbing the existing code. This design ensures the app is ready for future enhancements and scaling.
β Cons of Modularization (in my appβs context)
While modularization comes with a lot of benefits, there are a few challenges I encountered during this refactor:
β±οΈ Initial setup time: Setting up the modular structure took quite a bit of effort initially β especially extracting functions, handling dependencies, and preparing import aggregators.
π§΅ Managing imports and dependencies: Every individual function file needs to have its required imports β which adds complexity, especially when working with libraries like streamlit, pandas, and plotly.
π Local vs Deployed sync: I had to maintain two separate sets of files β one for local and one for deployment β because some paths and behaviors differ in production (as I explained in this GitHub issue).
* π Circular dependencies: Refactoring functions into separate modules means being extra cautious to avoid circular imports (e.g., two modules importing from each other), which can break the app.
* ποΈ Managing directory structures: Maintaining a clean and logical directory hierarchy (like features_functions_local/ and features_functions_deployed/) is necessary but requires planning, otherwise it gets messy fast.
Even though this was from my project app perspective, itβs something that applies to most projects too! π
"""
Author: Madhurima Rawat
Script to split top-level functions (excluding 'main') from a large Streamlit app file
into separate Python files, preserving:
- Any comment immediately above the function
- Any global variable used within the function
Each function is saved as a standalone `.py` file inside the `split_functions/` directory.
"""importastimportosimportastunparse# === CONFIGURATION ===
INPUT_FILE="Streamlit_app_local_combined.py"# The source Python file to extract functions from
# For Local Running functions
# OUTPUT_DIR = "feature_functions_local" # Directory to store individual function files
# For Deployed Functions
OUTPUT_DIR=("feature_functions_deployed"# Directory to store individual function files
)# Ensure the output directory exists
os.makedirs(OUTPUT_DIR,exist_ok=True)# Read the entire source file content
withopen(INPUT_FILE,"r",encoding="utf-8")asf:source=f.read()# Parse the source code into an abstract syntax tree (AST)
tree=ast.parse(source)lines=source.splitlines()# Containers to hold relevant nodes
function_nodes=[]# Functions to extract
global_vars=[]# Global variables (assignments) at top level
import_lines=[]# All top-level import statements
# Classify top-level elements in the AST
fornodeintree.body:ifisinstance(node,(ast.Import,ast.ImportFrom)):import_lines.append(node)# Collect import lines
elifisinstance(node,ast.Assign):global_vars.append(node)# Track top-level assignments
elifisinstance(node,ast.FunctionDef)andnode.name!="main":function_nodes.append(node)# Collect functions except 'main'
# Extract comment immediately above a function, if present
defget_leading_comment(node):lineno=node.lineno-2whilelineno>=0andlines[lineno].strip().startswith("#"):returnlines[lineno].strip()return""# Get all variable names used in a function
defget_used_names(node):return{n.idforninast.walk(node)ifisinstance(n,ast.Name)}# Prepare a mapping of global variable name -> source code
all_globals={t.targets[0].id:astunparse.unparse(t).strip()fortinglobal_varsifisinstance(t.targets[0],ast.Name)}# Create a separate file for each function
forfunc_nodeinfunction_nodes:func_name=func_node.namefunc_code=ast.get_source_segment(source,func_node)# Full source code of the function
used_names=get_used_names(func_node)# Variables used in the function
leading_comment=get_leading_comment(func_node)# Optional comment above the function
globals_needed=[all_globals[name]fornameinused_namesifnameinall_globals]file_path=os.path.join(OUTPUT_DIR,f"{func_name}.py")withopen(file_path,"w",encoding="utf-8")asout_file:ifleading_comment:out_file.write(leading_comment+"\n")# Write leading comment if present
forginglobals_needed:out_file.write(g+"\n")# Write any required globals
out_file.write("\n"+func_code+"\n")# Finally, write the function code
print(f"β Done! Functions split into {OUTPUT_DIR}")
Step 2: Dependency_Adder.py
π Uses AST (Abstract Syntax Tree) to detect and inject the right import statements into each function file.
"""
Author: Madhurima Rawat
Script to analyze individual Python function files inside the 'split_functions' directory
and prepend necessary import statements based on used modules, functions, or objects.
The goal is to ensure each function file is self-contained by including all relevant
imports at the top. This script uses a heuristic approach, looking for known identifiers
to determine which modules are needed.
"""importosimportastfrompathlibimportPath# === CONFIGURATION ===
# Path to the folder containing the split function files(Local)
# functions_folder = Path("feature_functions_local")
# Path to the folder containing the split function files (Deployed)
functions_folder=Path("feature_functions_deployed")# Mapping of commonly used identifiers to their respective import statements
# Includes one-line comments categorized for clarity and maintainability
import_suggestions={# --- STREAMLIT APP and VISUALIZATION FRAMEWORK ---
"st":"# Importing Streamlit for building the web-based interactive application framework\nimport streamlit as st","plt":"# Importing Matplotlib for generating static plots and charts\nimport matplotlib.pyplot as plt","go":"# Importing Plotly for creating interactive and dynamic visual plots\nimport plotly.graph_objects as go","sns":"# Importing Seaborn for enhanced data visualizations\nimport seaborn as sns",# --- DATA HANDLING and MANIPULATION ---
"pd":"# Importing Pandas for data manipulation and analysis\nimport pandas as pd","np":"# Importing NumPy for numerical computations and array operations\nimport numpy as np","os":"# Importing OS module for handling file and directory paths\nimport os","datetime":"# Importing datetime for working with timestamps and date ranges\nfrom datetime import datetime, timedelta","base64":"# Importing base64 for encoding and decoding binary data\nimport base64",# --- MACHINE LEARNING and MODELING ---
"pickle":"# Importing Pickle for loading/saving pre-trained machine learning models\nimport pickle","LinearRegression":"# Linear Regression model\nfrom sklearn.linear_model import LinearRegression","RandomForestRegressor":"# Random Forest Regressor\nfrom sklearn.ensemble import RandomForestRegressor","SVR":"# Support Vector Machine Regressor\nfrom sklearn.svm import SVR","mean_squared_error":"# Importing evaluation metrics from Scikit-learn\nfrom sklearn.metrics import mean_squared_error, r2_score, precision_score, recall_score, f1_score","MinMaxScaler":"# For scaling data to a 0β1 range\nfrom sklearn.preprocessing import MinMaxScaler","TfidfVectorizer":"# Text feature extraction\nfrom sklearn.feature_extraction.text import TfidfVectorizer",# --- DEEP LEARNING (PyTorch) ---
"torch":"# Importing PyTorch for building and training deep learning models\nimport torch","nn":"# Importing PyTorch's neural network module\nimport torch.nn as nn",# --- NATURAL LANGUAGE PROCESSING (NLP) ---
"TextBlob":"# Importing TextBlob for basic natural language processing tasks\nfrom textblob import TextBlob",# --- FINANCIAL DATA and UTILITIES ---
"yf":"# Importing yfinance for fetching historical stock data from Yahoo Finance\nimport yfinance as yf","webbrowser":"# Importing webbrowser module to open URLs in the default browser\nimport webbrowser","openpyxl":"# Importing openpyxl to enable writing Excel files (.xlsx)\nimport openpyxl",}defget_required_imports(source_code):"""
Analyze source code of a function to detect required imports
based on the presence of known identifiers.
"""tree=ast.parse(source_code)imports=set()fornodeinast.walk(tree):# Detect simple names like 'pd', 'st', etc.
ifisinstance(node,ast.Name):ifnode.idinimport_suggestions:imports.add(import_suggestions[node.id])# Detect attribute access like 'plt.plot'
elifisinstance(node,ast.Attribute):value_id=getattr(node.value,"id",None)ifvalue_idinimport_suggestions:imports.add(import_suggestions[value_id])returnsorted(imports)# === MAIN PROCESS ===
changed_files_count=0# Counter for changed files
changed_lines_count=0# Counter for changed lines
# Iterate through each Python file in the target folder
forfilepathinfunctions_folder.glob("*.py"):withopen(filepath,"r",encoding="utf-8")asfile:original_code=file.read()# Determine needed imports based on code analysis
needed_imports=get_required_imports(original_code)# Combine imports and original content
updated_code="\n".join(needed_imports)+"\n\n"+original_code# Only write back to the file if changes were made
ifupdated_code!=original_code:withopen(filepath,"w",encoding="utf-8")asfile:file.write(updated_code)# Count the changes
changed_files_count+=1changed_lines_count+=updated_code.count("\n")# Output results
print(f"β Done! {changed_files_count} file(s) updated. {changed_lines_count} lines changed.")
Step 3: Function_Importing.py
π¦ Generates a unified import file to easily access all modular functions in one place.
"""
Author: Madhurima Rawat
Script to generate a single import file (Import_Functions.py) that aggregates
function imports from individual .py files inside the 'split_functions' directory.
Each file is assumed to define a function with the same name as the filename.
The output file includes:
- A docstring at the top explaining the purpose
- Individual import statements
- A summary comment with total number of imported functions
"""importos# === CONFIGURATION ===
# For Local
# FUNCTION_DIR = "feature_functions_local"
# IMPORT_FILE = "Import_Functions_Local.py"
# For Deployment
FUNCTION_DIR="feature_functions_deployed"IMPORT_FILE="Import_Functions_Deployed.py"# List all Python files in the function directory
function_files=[fforfinos.listdir(FUNCTION_DIR)iff.endswith(".py")]total_imports=len(function_files)# === GENERATE THE IMPORT FILE ===
withopen(IMPORT_FILE,"w",encoding="utf-8")asf:# Write top-level docstring to the output file
f.write('"""\n')f.write(f"This file was auto-generated to import all functions from '{FUNCTION_DIR}/'.\n")f.write(f"Each function file is expected to define a function named after the filename.\n")f.write(f"Total functions imported: {total_imports}\n")f.write('"""\n\n')f.write("# === FUNCTION IMPORTS ===\n")# Write each import statement
forfilenameinfunction_files:module_name=filename[:-3]# Remove .py extension
f.write(f"from {FUNCTION_DIR}.{module_name} import {module_name}\n")# Footer summary comment
f.write(f"\n# β Total functions imported: {total_imports}\n")print(f"β '{IMPORT_FILE}' created with {total_imports} function imports from '{FUNCTION_DIR}/'")
Step 4: Split_Clean_Main_Code.py
π§Ό Cleans the main app file (app_cleaned.py) by extracting embedded functions and retaining only the core logic. Also removes unused dependencies for better clarity and maintainability.
"""
Author: Madhurima Rawat
Script to clean a Streamlit app by removing all top-level functions except 'main'.
Also removes unused import statements.
Preserves:
- All used import statements
- Global variables
- Top-level code (outside functions)
- The 'main' function (if present)
Saves the result as 'app_cleaned.py'.
"""importastimportblack# For Deployed
INPUT_FILE="Streamlit_app_combined.py"CLEANED_FILE="Streamlit_app.py"# Read source
withopen(INPUT_FILE,"r",encoding="utf-8")asf:source=f.read()lines=source.splitlines()tree=ast.parse(source)# Step 1: Remove non-main top-level functions and their header comments
lines_to_remove=set()fornodeintree.body:ifisinstance(node,ast.FunctionDef)andnode.name!="main":comment_line=node.lineno-2if0<=comment_line<len(lines):lines_to_remove.add(comment_line)foriinrange(node.lineno-1,node.end_lineno):lines_to_remove.add(i)cleaned_lines=[linefori,lineinenumerate(lines)ifinotinlines_to_remove]cleaned_code="".join(line+"\n"forlineincleaned_lines)# Step 2: Parse cleaned code to remove unused imports
classImportUsageAnalyzer(ast.NodeVisitor):def__init__(self):self.imports={}self.used_names=set()defvisit_Import(self,node):foraliasinnode.names:self.imports[alias.asnameoralias.name]=node.linenodefvisit_ImportFrom(self,node):foraliasinnode.names:name=alias.asnameoralias.nameself.imports[name]=node.linenodefvisit_Name(self,node):self.used_names.add(node.id)# Analyze the cleaned code
tree=ast.parse(cleaned_code)analyzer=ImportUsageAnalyzer()analyzer.visit(tree)# Identify unused imports
unused_import_lines=set()forname,linenoinanalyzer.imports.items():ifnamenotinanalyzer.used_names:unused_import_lines.add(lineno-1)# Convert to 0-based
# Final clean-up: remove unused imports
final_lines=[linefori,lineinenumerate(cleaned_code.splitlines())ifinotinunused_import_lines]final_code="".join(line+"\n"forlineinfinal_lines)formatted_code=black.format_str(final_code,mode=black.FileMode())# Write the cleaned, formatted file
withopen(CLEANED_FILE,"w",encoding="utf-8")asf:f.write(formatted_code)print(f"β Cleaned and formatted file saved as '{CLEANED_FILE}'""(only 'main' retained, unused imports and comments removed).")
Step 5: Run Final Modular App
π― Fully cleaned and modularized version β ready for production deployment!
The current monolithic codebase needs to be modularized to improve readability, maintainability, and deployment workflow. This issue involves splitting out individual functions into logically separated files and preparing the cleaned main app for deployment.
β Tasks to be Completed
π§ Function Splitting
Run Function_Splitting.py to break down the monolithic app into smaller reusable function files saved in:
π¦ Add Dependencies
Use Dependency_Adder.py to automatically inject all necessary import statements at the top of each function file.
π₯ Create Aggregated Import File
Execute Function_Importing/Import_Functions.py to generate a single file for importing all extracted functions efficiently.
π§Ή Clean the Main File
Run Split_Clean_Main_Code.py to generate app_cleaned.py, which contains only the cleaned main() logic without function definitions.
π§ Insert Custom Class
Add the following class to the display_real_time_stock_prediction.py file in both local and deployed folders:
# Class for real time stock data fetching and prediction# --- CLASS DEFINITION STARTS ---classStockPricePredictor:
...
π Keep Directory Paths Consistent
In the deployed version, make sure these constants are preserved:
This structure modularizes the codebase while retaining the original functionality
π Folder Naming Convention
The function components have been cleanly separated and organized using two key directories:
features_functions_local β for local development and testing.
features_functions_deployed β for final cleaned and deployable code.
βοΈ Step-by-Step Breakdown of the Refactoring Workflow:
π§ Function Extraction
Run Function_Splitting.py
β€ This extracts all functions from the original app into individual Python files inside the respective features_functions_* folders.
β Dependency Injection
Run Dependency_Adder.py
β€ Automatically prepends necessary import statements to each function file by analyzing its contents using ast.
π₯ Import Organizer
Navigate to Function_Importing/Import_Functions.py
β€ Generates an aggregated import script from all modularized functions for use in the final build.
π§Ή Main Code Cleaner
Run Split_Clean_Main_Code.py
β€ Extracts and saves the core main() logic as app_cleaned.py, removing previously defined function bodies.
π§ Post-Cleaning Instructions:
After completing the steps above:
β Copy and insert this block into the file display_real_time_stock_prediction.py inside both features_functions_localandfeatures_functions_deployed folders:
# Class for real time stock data fetching and prediction# --- CLASS DEFINITION STARTS ---classStockPricePredictor:
...
βΉοΈ Ensure this is appended after the function definitions inside the same file, or integrated in a logically correct place based on usage.
β In the deployed app, ensure the following paths are preserved:
# Directory containing the preprocessed datasetsDATASET_DIR="Codes/Historical_Data_Analysis/Preprocessed_Dataset"# Directory containing the original historical datasetsDATASET_DIR_1="Codes/Historical_Data_Analysis"
β Finally, execute the split main function logic in the cleaned app_cleaned.py within the deployed directory.
β Outcome:
We now have a fully modular, self-contained, and deployment-ready Streamlit application with clearly separated concerns for:
π οΈ Customize for Your Workflow: Using This for Your Own Project
This was originally tailored to my own project setup, but you can easily adapt it to fit yours. Here's how:
1. Extract Functions
Use the first step to extract all functions and store them in any directory of your choice. Clean them up as needed.
2. Add All Dependencies
In the Dependency_Adder.py file, list all your required libraries in the following format for clear and automatic management of dependencies:
Example Format:
import_suggestions={# --- STREAMLIT APP & VISUALIZATION FRAMEWORK ---
"st":"# Importing Streamlit for building the web-based interactive application framework\nimport streamlit as st","plt":"# Importing Matplotlib for generating static plots and charts\nimport matplotlib.pyplot as plt","go":"# Importing Plotly for creating interactive and dynamic visual plots\nimport plotly.graph_objects as go","sns":"# Importing Seaborn for enhanced data visualizations\nimport seaborn as sns",# --- DATA HANDLING & MANIPULATION ---
"pd":"# Importing Pandas for data manipulation and analysis\nimport pandas as pd","np":"# Importing NumPy for numerical computations and array operations\nimport numpy as np","os":"# Importing OS module for handling file and directory paths\nimport os","datetime":"# Importing datetime for working with timestamps and date ranges\nfrom datetime import datetime, timedelta","base64":"# Importing base64 for encoding and decoding binary data\nimport base64",# --- MACHINE LEARNING & MODELING ---
# Continue with any additional dependencies...
}
How It Works:
π οΈ Organize Dependencies: Define each library with a user-friendly alias and a short description in the import_suggestions dictionary.
π Auto-Search & Add: The Dependency_Adder.py will scan this dictionary, automatically adding the correct imports to your code files.
3. Import Functions & Organize Main Logic
Once your dependencies are added:
π Run Function_Importing.py β This scans your codebase and inserts the appropriate imports automatically into each function file.
π§Ή Clean up your main logic using Split_Clean_Main_Code.py β This separates the logic from the main file and organizes it into modular functions.
# Importing all functions
fromImport_Functions_Deployedimport*
This way, your main file stays clean, and all logic is neatly modularized and connected!
4. Design Considerations
Ensure you have a central main() function to coordinate the workflow. This helps prevent circular dependencies.
Your code should already be modular β meaning, key logic should be inside functions. If it's not, this process wonβt be effective.
This method doesnβt extract classes or global variables used by functions. Youβll need to either:
Add those manually, or
Ensure they exist within the main file, or are defined as needed before calling the extracted functions.
5. Run the Main File
Once the functions are organized, you can run your project from the main file just like in my setup.
Final Thoughts π¬
So this was my whole code flow β from my appβs point of view π.
Let me know if you found this helpful! π¬
π Explore the Project
Iβve added comments and documentation to my code, so the article doesn't become a novel. Each file could have its own article! Let me know if you'd like a detailed breakdown of any part.
π οΈ Customize for Your Workflow: Using This for Your Own Project
Iβve also added a dedicated section to help you adapt this setup for your own codebase.
If you have any questions or need help implementing it, feel free to reach out β happy to help! π
I'm also in the process of finalizing my thesis. Once it's complete, I'll publish a full series on this stock market prediction project β covering the concept, execution, my teammates, challenges, and more. Let me know if you're excited for it! π