How to split and search in comma-separated string in MySQL

Split comma-separated values and retrieve a value at certain position

Surprisingly MySQL doesn’t have a dedicated function for this operations as opposed to split_part in PostgreSQL. Luckily it has SUBSTRING_INDEX() function that does almost what we need.

From the official documentation:

SUBSTRING_INDEX(str,delim,count)

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.

mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
        -> 'www.mysql'
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
        -> 'mysql.com'

This function is multibyte safe.

Definitions

CREATE TABLE employees (
   id SERIAL,
   name VARCHAR(255),
   address TEXT,
   PRIMARY KEY (id)
);

INSERT INTO employees (id, name, address) VALUES 
(1, 'John Doe', '4225 Collins Street,Apt. A,Erie,PA,16510'),
(2, 'Bob Smith', '234 Main Street,,Erie,PA,16512');

Solution

Suppose we want to find all employees working in a certain state (for example, PA) which is a fourth field in a column holding address as comma-separated values.

With SUBSTRING_INDEX(address, ',', 4) function we will be able to extract everything up to the fourth column containing state

SELECT SUBSTRING_INDEX(address, ',', 4) FROM employees;
+-----------+------------------------------------+
| name      | SUBSTRING_INDEX(address, ',', 4)   |
+-----------+------------------------------------+
| John Doe  | 4225 Collins Street,Apt. A,Erie,PA | 
| Bob Smith | 234 Main Street,,Erie,PA           | 
+-----------+------------------------------------+

Now we know that state would always be the last field in the resulting value. Because SUBSTRING_INDEX() function allows to use negative values to extract fields counting from the right, we will use this ability to extract the rightmost field containing state abbreviation.

SELECT name, SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 4), ',', -1) AS state FROM employees;
+-----------+-------+
| name      | state |
+-----------+-------+
| John Doe  | PA    | 
| Bob Smith | PA    | 
+-----------+-------+

Final solution to get all employees working in a certain state is shown below.

SELECT name FROM employees WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 4), ',', -1)='PA';
+-----------+
| name      |
+-----------+
| John Doe  | 
| Bob Smith | 
+-----------+

Caveats

One thing to be aware about SUBSTRING_INDEX() is that the function returns the whole string if the requested field doesn’t exist.

For example, SUBSTRING_INDEX(address, ',', 6) returns full address.

SELECT name, SUBSTRING_INDEX(address, ',', 6) FROM employees;
+-----------+------------------------------------------+
| name      | SUBSTRING_INDEX(address, ',', 6)         |
+-----------+------------------------------------------+
| John Doe  | 4225 Collins Street,Apt. A,Erie,PA,16510 | 
| Bob Smith | 234 Main Street,,Erie,PA,16510           | 
+-----------+------------------------------------------+

Then attempt to extract sixth field with SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 6), ',', -1) will give us zip code (fifth field) which is wrong.

SELECT name, SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 6), ',', -1) AS country FROM employees;
+-----------+---------+
| name      | country |
+-----------+---------+
| John Doe  | 16510   | 
| Bob Smith | 16510   | 
+-----------+---------+

One possible workaround to this problem would be to check if the number of fields is greater than or equal to requested index and then the code the above, or return empty string, if the field index is invalid.

SELECT 
   name, 
   IF(
      LENGTH(address) - LENGTH(REPLACE(address, ',', '')) + 1 >= 6,
      SUBSTRING_INDEX(SUBSTRING_INDEX(address, ',', 6), ',', -1),
      ''
   ) AS country 
FROM employees;
+-----------+---------+
| name      | country |
+-----------+---------+
| John Doe  |         | 
| Bob Smith |         | 
+-----------+---------+

Possible uses

In most cases it is better to use a separate lookup table or a column for each field instead of storing data as comma-separated values for later lookup.

However, there could be rare cases where the trick above could be useful. For example, recently I needed to return a list of values from user-defined function and the only way to do it would be to return a string containing comma-separated values. I ended up using SUBSTRING_INDEX() to order the results by one of the fields.

Search in a column containing comma-separated values

MySQL has a dedicated function FIND_IN_SET() that returns field index if the value is found in a string containing comma-separated values.

For example, the following statement returns one-based index of value C in string A,B,C,D.

SELECT FIND_IN_SET('C', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
|      3 | 
+--------+

If the given value is not found, FIND_IN_SET() function returns 0.

SELECT FIND_IN_SET('Z', 'A,B,C,D') AS result;
+--------+
| result |
+--------+
|      0 | 
+--------+

Comments

  1. select distinct
      SUBSTRING_INDEX(SUBSTRING_INDEX('1,2,3,4', ',', numbers.n), ',', -1) name
    from (
       select @rownum := @rownum + 1 as n
       from YourTable
       cross join (select @rownum := 0) r
    ) numbers 
    order by name
    
  2. How to search data in column with comma-separated values?

    Suppose I have column with data like:

    [data]
    a,b,c
    g,h,k
    

    IF i search a,c,h I want SQL to return both rows.

Leave a Reply

You may use simple HTML to add links or lists to your comment. Also use <pre><code class="language-*">...</code></pre> to mark up code snippets. We support language-js, language-markup and language-css for comments.
(Optional)