I created my own search engine

Publish Date: Feb 19

Hello everyone,

I'm excited to share some new features and improvements in my custom search engine project. This search engine is designed to work seamlessly with my web crawler, providing efficient and accurate search results. Let's dive into the latest updates!

@aminnairi has already asked why I don't use a NoSQL database. The software and the web crawler now use mongoDB as a noSQL database, which leads to faster search results.

The AI is still from Llama, now llama-3.3-70b.
In the search results, you can right-click to display a preview of the website. In addition, the favicons are only loaded when all search results have been successfully loaded. These are stored locally temporarily so that they do not have to be retrieved again each time.

SchBenedikt / search-engine

The matching search engine to my web crawler.

search-engine

The matching search engine to my web crawler.

The Docker image is currently not working https://hub.docker.com/r/schbenedikt/search

Features

Display of the search speed.
Ask AI for help.
Uses MongoDB for database operations.

Docker Instructions

Building the Docker Image

To build the Docker image, run the following command in the root directory of the repository:

docker build -t ghcr.io/schbenedikt/search-engine:latest .

Running the Docker Container

To run the Docker container, use the following command:

docker run -p 5560:5560 ghcr.io/schbenedikt/search-engine:latest

This will start the Flask application using Gunicorn as the WSGI server, and it will be accessible at http://localhost:5560.

Pulling the Docker Image

The Docker image is publicly accessible. To pull the Docker image from GitHub Container Registry, use the following command:

docker pull ghcr.io/schbenedikt/search-engine:latest

Note

Ensure that the tags field in the GitHub Actions workflow is correctly set to ghcr.io/schbenedikt/search-engine:latest to avoid multiple packages.

Running with Docker Compose

To run…

View on GitHub

The databases can now be managed via the settings page. There is now also the option to add multiple databases at the same time. When a search is made, the system checks whether there are websites that are saved multiple times so that the same website is not displayed multiple times.

This brings us to the filter functions:
The meta data is used to retrieve the various website types, which can then be used to filter. However, since there may be a "website" type and a "website" type, these can be combined into a "all websites" type.

SchBenedikt / web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

web-crawler

A simple web crawler using Python that stores the metadata and main content of each web page in a database.

Purpose and Functionality

The web crawler is designed to crawl web pages starting from a base URL, extract metadata such as title, description, image, locale, type, and main content, and store this information in a MongoDB database. The crawler can handle multiple levels of depth and respects the robots.txt rules of the websites it visits.

Dependencies

The project requires the following dependencies:

requests
beautifulsoup4
pymongo

You can install the dependencies using the following command:

pip install -r requirements.txt

Setting Up and Running the Web Crawler

Clone the repository:

git clone https://github.com/schBenedikt/web-crawler.git
cd web-crawler

Install the dependencies:

pip install -r requirements.txt

Ensure that MongoDB is running on your local machine. The web crawler connects to MongoDB at localhost:27017 and uses a database named search_engine.
Run the web crawler:

python

…

View on GitHub

Instrument Sans is now used as the default font and there is also a dark mode that is automatically activated when the system is also in dark mode.

What do you think of the project?
Feel free to write it in the comments!

Comments 1 total

Royal HighgrassMar 27, 2025
This is a great project to have in your portfolio. Well done!

Add comment

techtech @techtech

I created my own search engine

SchBenedikt / search-engine

The matching search engine to my web crawler.

search-engine

Features

Docker Instructions

Building the Docker Image

Running the Docker Container

Pulling the Docker Image

Note

Running with Docker Compose

SchBenedikt / web-crawler

A simple web crawler using Python that stores the metadata of each web page in a database.

web-crawler

Purpose and Functionality

Dependencies

Setting Up and Running the Web Crawler

Comments 1 total