PostgreSQL CSV Errors-lowCalAlt_update3
Kazi Priom

Kazi Priom @itsizakb

About: Student at the University of Oklahoma. Open to software job opportunities. Will be posting my process of developing software.

Location:
Oklahoma City, Oklahoma
Joined:
Nov 21, 2024

PostgreSQL CSV Errors-lowCalAlt_update3

Publish Date: Nov 26 '24
0 0

Image description

I've had a hard time recently working with the CSV files that Nutrionix published for their database. They are quite large, and that makes the files hard to open. However, the main issue was deleting the correct columns and having correct punctuation around items. This caused PostgreSQL to give me various errors. Here are the main ones.

  1. The column names in the CSV were not the same as database
  2. I had an empty column which was represented by a ','
  3. I didn't have quotations around items
  4. single quotations (') needed another single quotation, (').

I initially tried solving these issues using Excel, but Excel has a problem of deleting quotation marks in the file. I discovered that is a common issue for individuals, so I used pandas. Here is the Python script I used for one of the CSV files:


import pandas as pd

#read
file = pd.read_csv('food_insertion3(Done).csv')
#rename columns
file.rename(columns={'fdc_id' :
     'item_id', "description": "item_description", "food_category_id" :
    "food_category"}, inplace= True)
#drop unneeded columns
file.drop(columns='data_type', inplace=True)
#apply quotations
file = file.applymap(lambda x: f'"{x}"' if isinstance(x,str) else x)



file.to_csv('food_insertion3(Done).csv', index= False)
#deal with single quotes
import csv

with open('food_insertion3(Done).csv', 'r', newline='', encoding='utf-8') as infile:
    reader = csv.reader(infile)
    rows = []

    for row in reader:
        cor_row = []
        for field in row:
            cor_field = field.replace("'", "''")
            cor_row.append(cor_field)
        rows.append(cor_row)

with open('food_insertion3(Done).csv', 'w', newline='', encoding='utf-8') as outfile:
    writer = csv.writer(outfile)
    writer.writerows(rows)
Enter fullscreen mode Exit fullscreen mode

Comments 0 total

    Add comment