COPY [ BINARY ] table [ WITH OIDS ] FROM { 'filename' | stdin } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ] COPY [ BINARY ] table [ WITH OIDS ] TO { 'filename' | stdout } [ [USING] DELIMITERS 'delimiter' ] [ WITH NULL AS 'null string' ]
By passing the COPY command the BINARY parameter, you are stating that you wish for it to store and read data in binary format (instead of text). When using binary format, the WITH NULL and DELIMITERS options are not applicable.
Use this parameter to supply the name of an existing table.
Use this keyword to save the unique object ID (OID) of each row (whether data is being copied out to a file or read in from one).
This parameter takes the absolute path to the file selected for input or output (i.e., /usr/local/pgsql/data/employeetable).
Use this keyword to indicate that data will be entered from the client application. Stdin stands for standard input; if you are using psql to enter data, you will be prompted to type in text.
Use this keyword to indicate that output will be sent to the client application.
Enter the symbol or character you wish to use for separating fields in a row.
Single character | |
---|---|
The delimiter you choose must only be one character; if you enter something longer than this, only the first character will be used as the delimiter. |
Enter in the characters that designate a NULL value; by default, "\N" designates NULL, but you can change it to something more suitable by using this parameter. Also remember that when data is copied into the database, any strings that match your specified NULL string will be interpreted as NULL values, so make sure to use the same string when you copy data out of the database.
Use the COPY command to transfer data between tables in a PostgreSQL database and files within a filesystem. There are two ways to use COPY: COPY TO and COPY FROM. Use COPY TO when you want to output the contents of a table in your database to a file, or to standard output (i.e., the client connected to the database). Use COPY FROM when you wish to import data from a standard file, or from client input.
COPY Isn't \copy | |
---|---|
The SQL COPY command should not be confused with the psql \copy command. This command essentially performs a COPY FROM stdin or COPY TO stdout, then stores the acquired data in a psql-accessible file. This means that the file access rights are controlled by the client (frontend), instead of the backend. |
The SQL COPY command is server-side, meaning file reading and writing are both controlled by the database server. Because of this, you must make sure that files you specify are be visible and readable or writable, depending on whether you are using COPY FROM or COPY TO) to the PostgreSQL backend. This is why it is important that you use absolute paths to files, instead of relative paths.
Binary or Text? | |
---|---|
You can either use normal text for transfering data, or you can use binary format (when specified with the BINARY keyword). Using binary format will speed up COPY commands significantly; however, binary formatting decreases the portability of your application. |
There are a few fairly obvious limitations to the COPY command. In order for COPY to read from the tables specified, your user account must have select access to them. If you are directing COPY to insert values into a table, your account must also have insert or update access.
Using the COPY TO command will check constraints and any triggers you may have setup, but it will not invoke rules or act on column defaults.
Error Handling | |
---|---|
COPY will stop operation upon reaching the first error. This should not lead to problems in the event of a COPY FROM, but the target relation will be partially modified in a COPY TO. VACUUM should be used to clean up after a failed copy. |
If you choose to use normal text formatting instead of binary, the file COPY creates will be formatted as such: each row will appear on a single line and columns will be separated by the delimiter character you chose. Any embedded characters located in the file have a preceding backslash (\) and column values are output as text strings.
Each instance will follow this format:
<attr1><separator><attr2><separator>...<separator><attrn><newline>If WITH OIDS is specified, the oid will be placed at the beginning of the line.
When COPY sends its results to standard output (stdout), it will end the transfer with the following format: a backslash (\), a period (.), and a newline to mark the end of the file. If an EOF (end of file) is reached before the normal end-format, it will automatically terminate.
Due to the backslash character having multiple definitions, you'll need to use an escape sequence to represent an actual backslash character. Do this by typing two consecutive backslashes ("\\"). The other characters that require this method to display correctly are as follow. The tab character is represented as a backslash and a tab. To make a newline, use a backslash and a newline.
Pre-format! | |
---|---|
Remember to pre-format any non-PostgreSQL text data you are loading into the database by changing backslashes to double-backslashes. |
In addition to saving data in text format, PostgreSQL can also save things in binary format. This is the format compiled programs are stored in and operate using.
The file header contains 24 bytes of fixed fields and a variable length header extension area. The fixed fields are as follow:
Example signature: "PGBCOPY\n\377\r\n\0"
The signature is used to identify files that were not processed through a clean 8-bit transfer; it is changed by dropped nulls, parity changes, newline transaction filters, and dropped high bits.
An example is: "Ox01020304"
This is a 32 byte integer constant in the source byte order. Byte-flipping could occur on the fields remaining if the incorrect byte order was detected.
The flags field is an int32 bit mask which is the main storage point for file formatting information. Within this field, bits are ordered from 0 to 31. To hold backwards-compatibility formatting information, about half of these bits are sectioned off (0 to 15). Bits 16 through 31 are used to flag normal file formatting information. Anything unexpected set in this range are ignored. The only bit here that has a definition within is bit 16.
If bit 16 is set to 1, object IDs are included in the file.
If bit 16 is set to 0, object IDs are included in the file.
The remainder of the header is int32 bytes in length; in earlier versions, this was set to zero and the first tuple immediately followed.
The structure of tuples within the binary file is as follows: an int16 count of the fields within the tuple (this is the same within every tuple), an int16 typlen word, then the field data for each field. The possible options for the typlen field are as follow:
NULL; this field would contain no data.
This would set the field to be a fixed-length datatype. N number bytes of data follow the typlen word.
This sets the field to be a varlena datatype. The next four bytes are the varlena header, consisting of a value's total length.
Reserved for future implementation.
To create a convenient way for users to check data for their expected results, all non-NULL fields have a matching typlen with the expected typlen destination column.
A few formatting options were left un-implemented to improve portability of binary file dumps. Mainly, extra data between fields is not possible (i.e., alignment data), and there is no distinguishment between datatypes based on passes by reference or passes by value.
If OIDs have been included in the file output, they will be placed following the field count word. This will not be included in the field count.
The example below copies the employees table to the emp_table file, using an exclamation point ("!") as the field delimiter.
COPY employees TO /usr/local/pgsql/emp_table USING DELIMITERS '!';
The following example copies data from a UNIX file into the publishers table.
COPY employees FROM '/user/home/pma/sql/employee_data';